FASTA format files |
|
The FASTA file format is a widely used format for specifying biosequence information. FASTA format files are ordinary text files with special rules about how to specify sequences and their identities. eArray specifically uses FASTA format files to upload nucleic acid sequence information, such as target sequences for generating probes with Simple Tiling and Gene Expression (GE) probe design. It is also available as a file format when you download probe lists.
FASTA sequence data file guidelines:
A sequence in FASTA format begins with a single description line, followed by one or more lines of sequence data. More than one sequence can be specified in a single FASTA format file.
The description line is distinguished from the sequence data by a greater-than (">") symbol as the first character in the line. In general, include only the sequence identifier in the description line. If you include other annotation in the description line, it must not exceed 255 characters (including spaces).
eArray interprets the sequence ID for a given FASTA record as the character string that occurs after the “>” symbol, before the first space, on the annotation line. The sequence ID must not exceed 64 characters (including spaces).
The associated sequences must be represented in an abbreviated version of the IUB/IUPAC nucleic acid code. All sequence data must contain only the capital characters A, T, C, G. eArray masks all other characters out of the sequence.
Example of two FASTA-formatted sequences in a file:
>NM_012514 Rattus norvegicus breast cancer 1 (Brca1), mRNA
CGCTGGTGCAACTCGAAGACCTATCTCCTTCCCGGGGGGGCTTCTCCGGCATTTAGGCCT
CGGCGTTTGGAAGTACGGAGGTTTTTCTCGGAAGAAAGTTCACTGGAAGTGGAAGAAATG
GATTTATCTGCTGTTCGAATTCAAGAAGTACAAAATGTCCTTCATGCTATGCAGAAAATC
TTGGAGTGTCCAATCTGTTTGGAACTGATCAAAGAACCGGTTTCCACACAGTGCGACCAC
ATATTTTGCAAATTTTGTATGCTGAAACTCCTTAACCAGAAGAAAGGACCTTCCCAGTGT
CCTTTGTGTAAGAATGAGATAACCAAAAGGAGCCTACAAGGAAGTGCAAGG
>NM_012515
TGTGGATCTTTCCAGAACAGCAGTTGCAATCACTATGTCTCAATCCTGGGTACCCGCCGT
GGGCCTCACTCTGGTGCCCAGCCTGGGGGGCTTCATGGGAGCCTACTTTGTGCGTGGTGA
GGGCCTCCGCTGGTATGCTAGCTTGCAGAAACCCTCCTGGCATCCGCCTCGCTGGACACT
CGCTCCCATCTGGGGCACACTGTATTCGGCCATGGGGTATGGCTCCTACATAATCTGGAA
AGAGCTGGGAGGTTTCACAGAGGAGGCTATGGTTCCCTTGGGTCTCTACACTGGTCAGCT
See also
Create a microarray design from target transcripts (wizard)