Searching for probes

The basic elements of a microarray are probes, and eArray provides many ways for you to define the probes that you want. One of the main ways is to search for them. In eArray, you can retrieve probes of interest with a variety of search methods. In addition, a probe search provides an easy way to construct a probe group based upon your own criteria. You then use one or more probe groups to create a microarray design. See Why create probe groups.

You can customize eArray to pre-set certain search parameters. You can also customize the content and column order of search results. See Set user preferences.

This help topic contains the following sections:

eArray probe search tools

Choosing a search tool

Additional guidance from Agilent

eArray probe search tools

You can search the probes in the eArray probe database in several ways:

Type of Search	What Is	How To	Example
Search Your Probes	What is	How to	Examples
GO Search (Expression application type only)	What is	How to	Example
HD Probe Search (CGH and ChIP application types only)	What is	How to	Example
SNP Probe Search (CGH application type only)	What is	How to	Example
Exon Probe Search (Expression application type only)	What is	How to	Example

Note: The eArray tools for the SureSelect Target Enrichment and SureSelect RNA Enrichment application types are described in a separate section of this help system. See SureSelect Target Enrichment libraries and RNA enrichment libraries.

Choosing a search tool

For probes that are already designed, the search methodology that you use to find probes depends upon your level of familiarity with the target sequences, and the specific kind of study that you want to do. For example, if you want to do a differential gene expression study, but you do not know specific transcript or gene identifiers, you can use GO Search to return probes for genes based upon biological process or molecular function.

You could also use Search Your Probes to take a similar, but less structured approach. When you enter one or more keywords as search criteria, you can search probe annotation, which often describes a target gene's function. A search can return all probes whose annotation contains the word or phrase you typed. If you know the specific targets for which you want probes, you can also use Search Your Probes to advantage. You can upload lists of GenBank IDs or gene symbols, and return all probes that match the criteria. This allows you to be very specific about your search criteria.

Each probe search tool has distinct advantages, as described in the table below.

Search tool	Comments
Search Your Probes	This tool gives you many options to retrieve the probes that you want. You can type a single search term, and the search can return all probes that contain the term in any of their annotation. This is a good search methodology to use when you first explore the content of the database, and want to see what types of probes exist in the system. For example, you can type the term kinase, and the search returns all probes that have this word in their annotation, including probes with annotation such as protein kinase C, delta, and hexokinase 3. You can also select a specific type of annotation or accession, and enter one or many search terms of that type. This is most useful when you want to use identifiers such as probe IDs, GenBank IDs, or gene symbols. You can simply upload a list, and the search returns all probes that match.
GO Search	A Gene Ontology (GO) Search is a good way to identify probes for genes and gene products that are associated with biological processes, molecular functions, and/or cellular components that you want to investigate. You enter a standard GO term, and the search retrieves all of the probes that are associated with the term. eArray can also help you find relevant GO terms to use in the search. This type of search is available for standard Expression probes.
HD Probe Search	You can search the HD CGH and ChIP databases to return probes within specified genomic regions, at a very high density. You can then use these probes to create a microarray that has higher resolution than is seen for catalog array offerings for these applications. To do an HD search, you define the desired genomic regions, and the density or number of probes that you want.
SNP Probe Search	This specialized search is the only way to retrieve Agilent SNP probes, which are probes that are designed specifically for Agilent CGH+SNP microarrays. You can use CGH+SNP microarrays to deduce the genotypes of SNP sites, calculate allele-specific SNP copy numbers, and find regions of loss of heterozygosity (LOH). This search is available in the CGH application type. For more information, see CGH+SNP Microarrays.
Exon Probe Search	This specialized search is the only way to retrieve Agilent Exon probes, which are probes that are designed specifically for Gene Expression Exon microarrays. You can use these microarrays to study differential splicing between tissues, between disease and non-disease states (i.e. cancerous vs. non-cancerous), and between different forms of the same disease. This search is available in the Expression application type. For more information, see Exon microarrays.

Additional guidance from Agilent

How do I create a complex CGH search that yields probes at different resolutions for different genomic regions and also with different filtering criteria for different regions?

This needs to be done through different, iterative searches. For each search a different Probe Group will be created, the microarray is then designed by combining the different Probe Groups. Once all of the Probe Groups are created, an array calculator is available on the Microarray tab to help calculate which array format makes the best sense for the given number of probes. There are utilities available under the Probe Group page to compare Probe Groups and remove duplicates in different Probe Groups.

For the CGH application, what are the advantages and disadvantages of using the ‘Genomic Tiling’ function (under the “Probe” tab) compared to using Agilent HD-CGH Database Probes?

Overall, probes generated using the ‘Genomic Tiling’ function will perform more poorly than probes found in the Agilent HD-CGH Probe Database. Agilent very strongly recommends using the Agilent HD-CGH Probe Database and not the ‘Genomic Tiling’ option. Only for regions where there are not enough HD probes available in the database should ‘Genomic Tiling’ be considered.

All HD probes in the database (except for probes in regions in which no optimal T_m probes exist) have been T_m matched and have a predicted performance score (based on T_m, GC content, a hairpin ΔG, sequence complexity, and metrics to measure homology with the rest of the reference genome). The eArray pair-wise reduction algorithm will pick the best HD probes based on the user-selected average HD probe spacing per interval or the total number of HD probes.

Additionally, during design the HD probes have passed a Tm filter, are annotated such that a user can choose between different similarity filtering options (non-unique probe filter, perfect match filter, or similarity score filter), and if there are catalog probes present in search results they can be preferentially selected. In contrast, using eArray’s ‘Genomic Tiling’ feature probes are picked at a fixed spacing and there is no chance to Tm balance or optimize probes selected for by performance. Probes created should perform no better than those picked at random. The only options to improve probe performance are probe trimming and skipping of repeat masked regions.

See the figure below for an example of CGH data from Agilent HD-CGH probes compared to ‘Genomic Tiling’ probes. The median log2 ratios of the HD-CGH probes are closer to the expected value of -1 with a smaller spread when compared to the ‘Genomic Tiling’ probes.

When designing CGH microarrays, how can I avoid GC-rich, high-Tm or repeat regions?

For probes selected using the Agilent HD-CGH Probe Database:

All HD probes in the database have been designed to avoid repeat masked regions, are T_m matched (except for probes in regions in which no optimal T_m probes exist) and have a predicted performance score (based on T_m, GC content, a hairpin ΔG, sequence complexity, and metrics to measure homology with the rest of the reference genome). The eArray pair-wise reduction algorithm will pick the best HD probes based on the user-selected average HD probe spacing per interval or the total number of HD probes. Additionally, HD probes can be filtered using a T_m filter, similarity filter (non-unique probe filter, perfect match filter, or similarity score filter), or catalog probes can be preferentially selected.
For probes selected using ‘Genomic Tiling’:

Probes in GC-rich, high-T_m, and repeat regions can be very problematic. The only options in ‘Genomic Tiling’ are probe trimming and skipping of repeat masked regions. The new probe set can be further improved by filtering out high-T_m or GC-rich probes. T_m and GC% can be obtained from eArray by using the Score Custom Probes utility under the Probe tab. A probe search using the passing probe IDs can then be used to create a new Probe Group.

In the CGH HD Probe Search, which ‘Similarity Filter’ will work for my design? What are the consequences of using or not using the filters?

When ‘No Filter’ is selected any probe may be selected, regardless of similarity to other genomic sites. Keep in mind that data from “non-unique” probes will be harder to interpret, and it can be beneficial to limit the maximum number of perfect genomic hits by using the Non-Unique Probe Filter option.

When the Perfect Match Filter (or when the maximum number of perfect genomic hits is set to 1 in the Non-Unique Probe Filter) is selected, probes with more than one perfect match to the genome are excluded and as a result it will not be possible to find probes in Segmental Duplication or Pseudo-Autosomal Regions (PAR).

The Similarity Score Filter is the most stringent filter. This filter excludes probes with significant similarity to other sites in the genome and there will be genomic regions where no probes can be found.

In the CGH HD Probe Search, how do I target exons only, not just genes?

Using the Include Regions and selecting Exonic will limit the probes returned to those marked as overlapping exons only. This will work when searching by genomic intervals or gene annotations (transcript and gene identifiers). The designer should be aware that exons typically are more GC-rich than the rest of the genome, these probes, in general, will have a lower probe scores. See the FAQ ”What is a CGH probe score?” for more information. Using this “Exonic” approach there is no possibility to select the nearest probes in introns instead.
Identify the coordinates for the exons outside of eArray and use them as Genomic Intervals to select probes in. Then with the “Extended Interval Boundary” option in eArray the Genomic Intervals can be extended beyond the exon boundaries with a selected number of base pairs. Here are the instructions on how to identify coordinates for exons in UCSC.

Using the Table Browser utility from the UCSC genome browser, make sure the proper species and genome build are selected from the drop-down lists. It is important that the build matches eArray. For “track”, select desired gene definition track, Agilent recommends “CCDS”, “RefSeq Genes”, or “UCSC genes”. Follow the UCSC instructions on how to restrict the search. Options include filtering on genomic regions or track identifiers (names/accessions). Make sure the output contains chrom, exonStarts and exonEnds. This will give you all of the exon coordinates for the regions you defined.

To input these locations into an eArray HD Probe Search, split the line into pieces (one for each exon) and adjust the start coordinate (the output is 0-based, eArray expects 1-based). The orientation (i.e., strand information) does not matter in an eArray probe search. For example, the following line defining three exons (chrom; exon count; exon starts; exon ends):

chr1; 3; 2450045, 2451144, 2451409; 2451048, 2451310, 2451544

Can be converted to the format needed in eArray.:

chr1:2450046-2451048
chr1:2451145-2451310
chr1:2451410-2451544