Set up a GE Probe Design job

The Gene Expression (GE) Probe Design process creates probes based on target sequence data or GenBank accessions that you upload. For details about the process, see Overview of Gene Expression (GE) Probe Design.

You can also use a wizard to guide you through the process of creating a microarray with GE Probe Design.

Before you set up a job

You must be an eArray registered user. See How do I register.
You must have either a FASTA format file that contains your target sequence data, or a text file that contains a list of GenBank accessions. For GenBank accessions, the file must contain the accession numbers separated by new line (return) characters. The file names must not contain any spaces. Create a compressed (*.zip format) version of this file for upload purposes.
If an Agilent-constructed species transcriptome database is not available for your species of interest, prepare a FASTA format file that contains custom transcriptome data to use as a similarity database in the probe design process. Create a compressed (*.zip format) version of this file for upload purposes.

To set parameters and upload file(s) for a GE Probe Design job

Click the Workspace tab. GE Probe Design is not available within a collaboration.
Set the application type to Expression, if needed. GE Probe Design creates only Expression type probes.
Click Probe > GE Probe Design.

The GE Probe Design page appears.

At the top of the page, select one of these methods:
- Base composition methodology – Creates probes that are all the same length, based on an empirically-determined optimal base composition profile. It is the standard GE probe design methodology. This method works best with Agilent protocols, and with most eukaryotic organisms.
- Tm-matching methodology – Creates probes so they all have similar predicted Tm. This can be a good option for prokaryotic targets.
Set the following parameters in the Probe Details pane. Some of these parameters are available only for the Tm matching methodology.

Parameter	Instructions/Details
Design Job Name	Type a name that will allow you to later identify this specific GE probe design job.
Probe Length	Type the maximum length for the generated probes. The allowable length is from 25 to 60 bases. Agilent has found that a probe length of 60 bases provides the optimal balance between sensitivity and specificity for most applications on the Agilent in situ microarray platform.
Probes per target	Select from 1 to 10 probes per target. This is the maximum number of probes the probe design process returns for each uploaded target sequence. If the target sequences are of poor quality (for example, if they contain repetitive and/or vector sequences), the probe design process can return fewer probes than you specify. Because of the length and high quality of the generated probes, Agilent recommends that you create one probe per target sequence. However, if you design multiple probes per target sequence, you can select the best of those probes after a validation process.
Masking	eArray always uses both of these options in the probe design process, and they cannot be disabled. Vector – Identifies and ignores contaminant segments during probe design. Target sequences can contain contaminant segments not actually found in the sample under study. These segments are often artifacts from cloning vectors (e.g. plasmid, phage, BAC, YAC) used in cloning and amplification processes. Repeat – Identifies and ignores repetitive sequences within your target sequences during probe design. The genome of any given organism contains interspersed repeats and low complexity DNA sequences. These sequences, which are unique at a species level, are replicated many times throughout the genome, and are found in the transcriptome as well. Replicate regions are poor candidates for unique probes.
Probe Orientation	Select one of these options: Sense – Produces probes in the sense or "coding strand" orientation, similar in sequence to the mRNA targets. Use this option if the sample preparation methodology yields cDNA or cRNA molecules. Antisense – Select this option if you want probes in antisense or "template" orientation, complementary in sequence to the mRNA targets. This is the best option if your samples are directly labeled RNA.
Design Options	Select your preferred design process. These options are only meaningful if you select to design more than one probe per target sequence. Best Probe Methodology – The probe design process favors production of the highest quality probes, rather than even coverage of each target sequence. Best Distribution Methodology – The probe design process favors even coverage of each target sequence, rather than production of the highest quality probes.
Design with 3' bias	Mark this option if you want probes derived mainly from the first 1,000 bp from the 3' end of each of your target sequences. If you use an Agilent (or other) labeling protocol that uses linear amplification, it is important to select probes from the 3’ end of the sequence. Linear amplification generates sequences that are shorter than the initial template due to the attenuation of the polymerase reaction. Because of this, most of the labeled product represents only the first 1,000 bp from the 3' end of each target sequence. It is important to design probes that represent this region.
Allow Probes to be Trimmed	(Available only for Tm-Matching GE Probe Design jobs) Mark this option to allow bases to be removed from candidate probes to increase compliance with the Preferred Probe Tm. eArray will not trim probes to shorter than 45 bases. In concept, a shorter probe has less complementary sequence available, which can reduce its specificity, or infringe on its ability to form a stable duplex with the desired target. However, the risk of this occurring to a significant extent is very low.
Preferred Probe Tm	(Available only for Tm-Matching GE Probe Design jobs) Type the target Tm for the probe design process. The Tm is the temperature at which equal populations of a probe and its target sequence exist as a 50:50 mixture of duplex and single-stranded forms. Select a probe Tm based on two factors: The mean and standard deviation of the Tm of all potential probes that could be generated for the target transcriptome. The hybridization temperature identified in the hybridization protocol. In practice, the target Tm should be ~20°C higher than the hybridization temperature. For example, if the hybridization temperature is 60°C, then the target probe Tm should be 80°C.

Specify the following under Target File Details. The target file contains the sequences or GenBank accessions from which eArray generates probes.

- Species – Select the desired species. Species is mandatory.
- Select one of these upload options:

Upload in FASTA format – Click Browse. Select the desired FASTA format target sequence file, then click Open. The name of your file appears in the box. eArray requires a compressed (*.zip format) version of the file.

Upload as GenBank Accessions – Click Browse. Select the desired file of GenBank Accessions, then click Open. The name of the file appears in the box. eArray resolves the GenBank Accessions in the file to actual sequence data before starting the GE Probe Design process. eArray requires a compressed (*.zip format) version of the file.

Under Transcriptome Details, select the type of transcriptome data that you want the probe design process to use. The probe design process uses transcriptome data as a similarity database to eliminate potential probe sequences that would have significant cross-hybridization with targets other than the one of interest.

Use Target File as Transcriptome – Uses the file you specified in Upload Target File as the transcriptome similarity database. Select this option if you are designing a "whole transcriptome" array for an organism not represented within the Agilent transcriptome set, and the target file represents most or all transcripts within the target transcriptome. This option works for uploaded target files containing either actual sequence data, or GenBank accessions.

Select Agilent-provided Transcriptome (by Species) – Uses one of Agilent's available species transcriptome databases. If a transcriptome is available for your species of interest, select this option. These databases have been specifically constructed for use in GE Probe Design. Select the desired species from the list.

1. 1. Note: The species you select here will override the species you chose previously.

Upload Transcriptome File – Uses a FASTA format transcriptome file that you provide as the similarity database. To use this option, click Browse. Select the desired transcriptome file, then click Open. The location of the file appears in the box. eArray requires a compressed (*.zip format) version of the file.

Click Submit.

eArray validates your uploaded file(s) for format, and submits your probe design job to Agilent for processing. A message informs you that you will receive an e-mail when your probe design job is finished.
Click Exit.

A search results pane appears at the bottom of the page. Use this pane to monitor the status of the job. When the job has a status of Completed, you can view, download, or delete the results, or create a new probe group with the results.

GE Probe Design is a computation-intensive process. It can take up to one day or more for your design job to finish, depending on the size of your files and the number of other users' jobs ahead of yours in the queue.