Probe file formats and requirements for uploading |
|
Probe data files must follow certain format rules so eArray can interpret them correctly.
General format of data within a file
Specific requirements for individual types of data
Some possible causes of upload errors
Note:
•
Probe uploads are not available for microRNA probes.
•
You cannot upload probes as certain specialized probe types, such
as SNP, Exon, or lincRNA probes.
eArray supports these file types for probe uploads:
Microsoft Excel files (*.xls) – Note: If you use Microsoft Excel 2007 to create the file, save the file as an Excel 97-2003 workbook. This saves the file in the required *.xls format.
Tab-delimited text files (*.tdt or *.txt) – Only place tabs between fields (columns) in a record (row). Use new line characters at the ends of records.
eArray supports these file formats:
Complete – Seven columns:
ProbeID
Sequence
TargetID
Accessions
GeneSymbols
Description
ChromosomalLocation
Minimal – Two columns:
ProbeID
Sequence
In uploaded files, eArray:
Accepts columns in any order – You label columns as part of the upload process.
Accepts extra columns – When you label columns during the upload process, be sure to label any extra columns as Ignore.
Accepts, but does not interpret column headings – Be sure to mark My uploaded file contains "Column Headings" when you label columns during the upload process.
Does not accept double or single quotation marks, angle brackets, or forward or backward slashes.
Ignores blank lines.
Expects all entries within a row to be separated by tabs, even if the actual entry is blank.
Note:
•
Upload no more than 200,000 probes at a time to eArray. Probes from
very large uploaded files may not appear in your account for an extended
period of time.
•
Certain specialized Agilent probes cannot be uploaded, including microRNA
and SNP probes.
ProbeID – A unique identifier for the probe sequence, containing up to 15 characters. Probe ID cannot be blank.
Sequence – The base sequence of the probe, in 5' to 3' orientation. The sequence must be from 20 to 60 nucleotides in length, and must only contain the capital characters A, C, G, and T. Sequence cannot be blank.
TargetID – Also referred to as the primary accession, TargetID uniquely identifies the sequence that most exemplifies the target transcript. Only one annotation value is allowed, and it can include or omit the source designation. For example, both ref|AK075564 and AK075564 are acceptable. TargetID can be blank.
Accessions
– Unique identifier(s)
that refer to a nucleotide sequence that is a target for the associated
probe and/or a protein sequence that is a product of the target. Accessions
are represented in a <source>|<ID> pair format. <source>
is the symbol of the database from which the accession was derived
and <ID> is the unique identifier accession. For example, ref|NM_015752 is a <source>|<ID>
pair where ref (NCBI RefSeq) is the source
and NM_015752 is the unique identifier for that source.
The Accessions field can contain multiple <source>|<ID>
pairs, delimited by pipe "|" characters. For example,
gi|7657630|ref|NM_015752 is
an allowable accession that gives both an NCBI gene identifier (gi), and a RefSeq identifier (ref) for the same probe
sequence. Accessions can be blank.
GeneSymbols – A unique abbreviation for a gene name. GeneSymbols can be blank.
Description – A description of a phenotype, gene product, or its function. Description can be blank.
ChromosomalLocation – The chromosome number and the location of the sequence on the chromosome, expressed in the following example notation: chr19:11392326-11391822. Only one ChromosomalLocation is allowed. It can include or omit the source, and it can be blank.
Your file contains two probes with the same ProbeID, but different sequences, and you select Remove replicate probes from upload.
Your file contains two probes with the same ProbeID and the same sequence, and you have not selected Remove replicate probes from upload.
You are not the owner of an existing probe whose annotation would be overwritten by a probe in your upload file.
A probe from your uploaded file has the same ProbeID as one that already exists in eArray, but it does not have the same sequence, species, or application type as the one it is overwriting.
One or more entries in your uploaded file do not have the correct format.
A system error occurs during the upload process.
See also