The system will attempt to detect Axt, Fasta, Fastqsolexa, Gff, Gff3, Html, Lav, Maf, Tabular, Wiggle, Bed and Interval (Bed with headers) formats. If your file is not detected properly as one of the known formats, it most likely means that it has some format problems (e.g., different number of columns on different rows). You can still coerce the system to set your data to the format you think it should be. You can also upload compressed files, which will automatically be decompressed.
A binary sequence file in 'ab1' format with a '.ab1' file extension. You must manually select this 'File Format' when uploading the file.
blastz pairwise alignment format. Each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines. The summary line contains chromosomal position and size information about the alignment. It consists of 9 required fields.
A binary file compressed in the BGZF format with a '.bam' file extension.
Tab delimited format (tabular)
Does not require header line
Contains 3 required fields:
May contain 9 additional optional BED fields:
chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,3512 chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399, 0,3601
A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (">") symbol in the first column. All lines should be shorter than 80 characters:
>sequence1 atgcgtttgcgtgc gtcggtttcgttgc >sequence2 tttcgtgcgtatag tggcgcggtga
FastqSolexa is the Illumina (Solexa) variant of the Fastq format, which stores sequences and quality scores in a single file:
@seq1 GACAGCTTGGTTTTTAGTGAGTTGTTCCTTTCTTT +seq1 hhhhhhhhhhhhhhhhhhhhhhhhhhPW@hhhhhh @seq2 GCAATGACGGCAGCAATAAACTCAACAGGTGCTGG +seq2 hhhhhhhhhhhhhhYhhahhhhWhAhFhSIJGChO
@seq1 GAATTGATCAGGACATAGGACAACTGTAGGCACCAT +seq1 40 40 40 40 35 40 40 40 25 40 40 26 40 9 33 11 40 35 17 40 40 33 40 7 9 15 3 22 15 30 11 17 9 4 9 4 @seq2 GAGTTCTCGTCGCCTGTAGGCACCATCAATCGTATG +seq2 40 15 40 17 6 36 40 40 40 25 40 9 35 33 40 14 14 18 15 17 19 28 31 4 24 18 27 14 15 18 2 8 12 8 11 9
GFF lines have nine required fields that must be tab-separated.
The GFF3 format addresses the most common extensions to GFF, while preserving backward compatibility with previous formats.
Interval (Genomic Intervals)
Tab delimited format (tabular)
File must start with definition line in the following format (columns may be in any order).:
#CHROM START END STRAND
CHROM - The name of the chromosome (e.g. chr3, chrY, chr2_random) or contig (e.g. ctgY1).
START - The starting position of the feature in the chromosome or contig. The first base in a chromosome is numbered 0.
END - The ending position of the feature in the chromosome or contig. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.
STRAND - Defines the strand - either '+' or '-'.
#CHROM START END STRAND NAME COMMENT chr1 10 100 + exon myExon chrX 1000 10050 - gene myGene
Lav is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav..
TBA and multiz multiple alignment format. The first line of a .maf file begins with ##maf. This word is followed by white-space-separated "variable=value" pairs. There should be no white space surrounding the "=".
A binary sequence file in 'scf' format with a '.scf' file extension. You must manually select this 'File Format' when uploading the file.
A binary file in 'Standard Flowgram Format' with a '.sff' file extension.
Tabular (tab delimited)
Any data in tab delimited format (tabular)
The wiggle format is line-oriented. Wiggle data is preceded by a track definition line, which adds a number of options for controlling the default display of this track.
Other text type
Any text file