Overview of Gene Expression (GE) Probe Design

The GE Probe Design process creates Expression type probes based on target sequences that you specify. To specify target sequences, you either upload a sequence data file in FASTA format, or a file that contains GenBank accessions. In addition, GE probe design uses transcriptome data as a similarity database to eliminate potential probe sequences that would have significant cross-hybridization with targets other than the one of interest.

eArray can guide you through the process of creating a microarray with GE Probe Design. See Create a microarray design from target transcripts (Wizard).

Note: GE Probe Design is not available to guest users, or within a collaboration.

Click any of these links for information about the GE Probe Design process:

What are the main steps in the GE Probe Design process?

What file format do I need to use for my sequence data?

What methods of GE Probe Design are available to me?

What options do I have for specifying transcriptome data?

What happens after I submit my design job to Agilent?

How long will it take for my design job to be completed?

What can I do with the GE Probe Design results?

How does GE Probe Design handle highly homologous sequences?

In addition, some of the eArray FAQs contain answers to specific questions relevant to GE Probe design.

What are the main steps in the GE Probe Design process?

The GE probe design process has three main steps:

Create a target sequence data file in FASTA format or a TDT file of GenBank accessions. These are the target sequences that eArray uses to create your probes. You upload this file to eArray when you set up your GE Probe Design job.
Set up a GE Probe Design job. In this step you select a probe design methodology, specify the detailed parameters that eArray uses to process your sequence data into usable probes, and upload target and optional transcriptome files. You then submit the design job to Agilent for processing.
When the probe design process is complete, Agilent sends you an e-mail to let you know that your design results are available. You then can view the probe design results, create a probe group based on the probe design results and download the results.

What file format do I need to use for my sequence data?

Target sequence data must be in a single FASTA format file. If you also have custom transcriptome data that you want the probe design process to use as a similarity database, this data must be in another single FASTA format file.

Important: Not all aspects of the FASTA format are supported by eArray. Be sure to read FASTA format files.

You can also specify target sequences as GenBank accessions. Format GenBank accessions as a list with new line (return) characters between the entries. Save the file as a plain text file. eArray resolves the accessions to actual sequence data prior to creating probes.

Before you upload any of these files, you must first compress them into *.zip format files.

What methods of GE probe design are available to me?

The two available probe design methodologies are:

Base composition methodology – The resulting probes adhere as closely as possible to the base composition profile that provides optimal performance on the Agilent platform, and they are all of equal length. This is the standard method, and it works best with Agilent protocols, and most eukaryotic organisms.
Tm matching methodology – The resulting probes have melting temperatures (Tm) as similar as possible to a value you specify. Probes can be all of equal length, or probes can be trimmed to increase compliance with the desired Tm. Tm matching methodology can work well when you design probes for prokaryotic organisms.

What options do I have for specifying transcriptome data?

GE Probe Design uses transcriptome data as a similarity database to eliminate potential probe sequences that would have significant cross-hybridization with targets other than the one of interest. The better characterized the transcriptome, the better eArray will be able to design probes that are specific for each of the transcripts in your targets.

When you set up a GE Probe Design job, you have three options for this similarity database:

Agilent transcriptome – Use one of Agilent's available species transcriptome databases. Select this option if it is available for your species of interest, as these databases have been specifically constructed for use in GE Probe Design.
Uploaded transcriptome file – Upload a FASTA format file of transcriptome sequence data.
Target sequence file – Use your uploaded target sequence file as a similarity database. This option works for uploaded target files that contain either actual sequence data, or GenBank accessions.

What happens after I submit my GE Probe Design job to Agilent?

After you submit your design job to Agilent, your design specifications and sequence data get put in a queue along with other users' design jobs. You can then check the status of your job in the Search Results pane of the GE Probe Design page, or on the Pending Jobs pane of your individual workspace home page. eArray sends you an e-mail when the process is finished.

How long will it take for my design job to be completed?

You may need to wait up to one day or more for your GE Probe Design results to become available, depending on the size of your sequence data files, and the number of design jobs ahead of yours in the queue.

What can I do with the GE Probe Design results?

You can view and download GE Probe Design results, and create a probe group from them.

Note: eArray does not commit the probes created by GE Probe Design to the main eArray database until after you create a probe group from the results. Thus, newly-designed probes cannot be searched or browsed until that time.

How does GE Probe Design handle highly homologous sequences?

Before designing probes, eArray first clusters target sequences and transcriptome sequences. Highly homologous sequences are considered as a single entity for evaluating cross-hybridization problems. Two sequences will be clustered together if they are over 95% identical over 95% of both sequence lengths.

Highly homologous sequences can simply be a result of errors during the preparation of input sequence files. Also, it is usually not possible to find distinctive sequences long enough to place probes for highly homologous sequences. By clustering these kinds of sequences, eArray ignores file processing errors and, for all targets, selects high-quality probes that do not cross-hybridize to other sequences.

If eArray designs a probe that binds to more than one of the target or transcriptome sequences within a cluster, it does not report or consider any potential cross-hybridization problems (X-hyb). Thus, if you notice that a probe binds to more than one target, but eArray does not report this, those targets may be members of the same cluster. After you submit a GE Probe Design job, and eArray completes it, you can download a file that lists the clusters that eArray generated during the design process. See Download GE probe design results.