FASTA format files

The FASTA file format is a widely used format for specifying biosequence information. FASTA format files are ordinary text files with special rules about how to specify sequences and their identities. eArray specifically uses FASTA format files to upload nucleic acid sequence information, such as target sequences for generating probes with Simple Tiling and Gene Expression (GE) probe design. It is also available as a file format when you download probe lists.

FASTA sequence data file guidelines:

A sequence in FASTA format begins with a single description line, followed by one or more lines of sequence data. More than one sequence can be specified in a single FASTA format file.
The description line is distinguished from the sequence data by a greater-than (">") symbol as the first character in the line. In general, include only the sequence identifier in the description line. If you include other annotation in the description line, it must not exceed 255 characters (including spaces).
eArray interprets the sequence ID for a given FASTA record as the character string that occurs after the “>” symbol, before the first space, on the annotation line. The sequence ID must not exceed 64 characters (including spaces).
The associated sequences must be represented in an abbreviated version of the IUB/IUPAC nucleic acid code. All sequence data must contain only the capital characters A, T, C, G. eArray masks all other characters out of the sequence.

Example of two FASTA-formatted sequences in a file:

>NM_012514 Rattus norvegicus breast cancer 1 (Brca1), mRNA

CGCTGGTGCAACTCGAAGACCTATCTCCTTCCCGGGGGGGCTTCTCCGGCATTTAGGCCT

CGGCGTTTGGAAGTACGGAGGTTTTTCTCGGAAGAAAGTTCACTGGAAGTGGAAGAAATG

GATTTATCTGCTGTTCGAATTCAAGAAGTACAAAATGTCCTTCATGCTATGCAGAAAATC

TTGGAGTGTCCAATCTGTTTGGAACTGATCAAAGAACCGGTTTCCACACAGTGCGACCAC

ATATTTTGCAAATTTTGTATGCTGAAACTCCTTAACCAGAAGAAAGGACCTTCCCAGTGT

CCTTTGTGTAAGAATGAGATAACCAAAAGGAGCCTACAAGGAAGTGCAAGG

>NM_012515

TGTGGATCTTTCCAGAACAGCAGTTGCAATCACTATGTCTCAATCCTGGGTACCCGCCGT

GGGCCTCACTCTGGTGCCCAGCCTGGGGGGCTTCATGGGAGCCTACTTTGTGCGTGGTGA

GGGCCTCCGCTGGTATGCTAGCTTGCAGAAACCCTCCTGGCATCCGCCTCGCTGGACACT

CGCTCCCATCTGGGGCACACTGTATTCGGCCATGGGGTATGGCTCCTACATAATCTGGAA

AGAGCTGGGAGGTTTCACAGAGGAGGCTATGGTTCCCTTGGGTCTCTACACTGGTCAGCT