Custom microarray design guidance

Agilent offers the following application-specific guidance to help you create optimal microarray designs:

SureSelect Capture Array design guidance

Additional guidance can also be found in the eArray Frequently Asked Questions (FAQs).

Expression array design guidance

Agilent recommends designing and including the following types of probe sets when generating custom Gene Expression microarray designs:

Agilent’s negative control probes for optimal background-subtraction with the Agilent Feature Extraction software. Agilent negative control probes are included in Agilent’s QC grid. If the use of customized negative controls probes is desired, we recommend that they be designated as non control probes for the purposes of Agilent’s Feature Extraction software.
Replicated non-control probes for use in the Multiplicative Detrending step of the Agilent Feature Extraction software. The Multiplicative Detrending step detects and corrects for trends in array uniformity and uses replicated non-control probes as a default. A minimum of 15 probes should be replicated 5-10 times on each microarray design. If replicated probes are not used, the default settings should be adjusted in the Feature Extraction protocol.
Probe set representing non-differentially expressed genes for use in accurate normalization of microarray experiments, where typical normalization assumptions about differential expression are not met due to a relatively low probe count or strong bias in differential expression. These probes should span the full range of signal intensity for optimal normalization. Agilent recommends that a minimum of 1% of the non control probes be part of this list for each custom design. If prior knowledge of non-differentially expressed genes is unavailable, we recommend that these probes be selected to randomly cover the dynamic range of the experiment. For these custom Gene Expression designs, microarray data should be normalized using data from these control probes. For custom “whole genome” type arrays, inclusion of a normalization gene list is generally not necessary.

The process for generation and use of a DyeNorm Gene List for two-color microarray data analysis is described in the Feature Extraction Software User Guide. The non-differentially expressed gene list can also be used for one-color microarray data normalization in downstream applications such as GeneSpring GX, as described in the GeneSpring GX Software User Manual.

Additional guidance can be found in the eArray Frequently Asked Questions (FAQs) for custom Gene Expression microarrays.

CGH array design guidance

Agilent’s High-Definition Comparative Genomic Hybridization (HD-CGH) database provides users the flexibility to create custom microarray designs for analysis of genome regions of interest to them at the resolution of their choosing. Agilent recommends including the following types of probe sets when generating custom CGH microarray designs:

Normalization control probes that represent non-aberrant/non-variant regions for the purpose of accurate normalization of the data using Agilent’s Feature Extraction software – A probe normalization group needs to be used only when the assumption of dye normalization fails. The assumption is that the overall signal intensities between the two channels are the same. When you design a whole genome array, you do not need to include a specific normalization probe group, since there will be a sufficient number of non-aberrant regions for proper normalization. For other situations, Agilent provides a control probe group, or you can select your own. Use this Agilent or user-defined normalization probe group in Feature Extraction (FE) to properly normalize the array. The probes in a dye normalization probe group must occupy at least 1% of the total number of features on the array after filtering. If you do not use an Agilent or user-specified normalization probe group, and the assumptions of dye normalization are not met within reason, FE will not be able to correct all of the systematic bias. This will alter copy number estimates. For proper normalization, specify the normalization probe group in the Agilent Feature Extraction software.
Replicate probes for determining the Reproducibility QC metric in Agilent’s Feature Extraction and DNA Analytics software – The Reproducibility metric is calculated as the Median %CV of background-subtracted signal for these replicate probes after outlier rejection. If you choose not to use this probe group, it is important to include a set of replicate probes for calculating reproducibility. For user-defined replicate probe groups, Agilent recommends a minimum of 300 probes replicated five times.

Additional guidance can be found in the eArray Frequently Asked Questions (FAQs) for custom CGH microarrays.

SureSelect Capture Array design guidance

Agilent recommends that you take the following guidelines under consideration when designing your array-based DNA capture microarrays:

Use as tight a probe spacing as your application allows. Agilent recommends a probe spacing of 3-bp between probe starts. This is because Agilent has done the most extensive testing on 3-bp probe-spacing, which enables targeting of 700kb-800kb of the genome, depending on the specific number of independent regions that you target. Agilent has also obtained successful capture results through limited testing of larger probe spacing (including 15 bp and 20 bp), but has observed lower average read depths. In these limited tests, the drop in read-depth has been a little better than expected, on average only 75% of the expected drop. For your application and sequencing throughput, this lower read-depth may not be adequate for your needs.
Pad (extend) your intervals by 100bp-200bp on either side. Although large numbers of reads are typically observed at interval endpoints, Agilent has observed that optimal capture depths are achieved 100 bp to 200 bp inside the interval boundaries. We therefore recommend that you extend your intervals by 100 bp to 200 bp, depending on your needs. For example, if you target exons, do not to use Exon endpoints directly. Instead, use a set of coordinates that start 100 bp before the start of each exon, and that end 100 bp after the end of each exon. If you use the Genomic Tiling tool in eArray to select probes, you can use it to extend your intervals.
Be aware of duplicated and "repeat" regions. Agilent does not check all probes to see if they target unique regions of the genome. The only exceptions are probes that contain known repeat regions (as identified by Repeat Masker), which are omitted from the design if you use the Genomic Tiling tool in eArray to design probes. Note that probes that target duplicate regions can produce spurious results or they can produce reads that are too ambiguous to map with your sequencing software. In practice, such probes are relatively rare. To an extent, this issue is a natural consequence of the duplication that occurs in complex genomes.

For more information about design considerations for array-based DNA capture, please consult Agilent’s application note Complementing Next Generation Sequencing Technologies: Capture and Release Assay Using Agilent DNA Microarrays (#5989-8700EN).

In addition, you may find the Agilent tutorial on SureSelect Capture Arrays useful. To view this tutorial, go to the eArray Login page. Under Additional Information, click SureSelect Capture Array Tutorial with Wizard.

microRNA array design guidance

The Agilent microRNA microarray solution provides a robust and sensitive method for detection of microRNAs from total RNA. Agilent has introduced Human, Mouse and Rat catalog arrays, which have been designed and empirically tested to provide sensitive and specific measurements of all microRNAs from the Sanger miRBase database for these species.

eArray enables the design of custom microRNA arrays: researchers may design microarrays measuring the microRNAs of their choosing from the Sanger miRBase database. The design principles used in the design of our catalog arrays, outlined below, are also applied for these custom arrays. This approach reduces uncertainty around the design of custom arrays while continuing to provide the most sensitive and robust assay to meet the needs of researchers. eArray allows the flexibility for researchers to study the microRNAs of their choice on the 8x15K format.

Agilent microRNA array design principles

Before designing a custom microRNA array, it is useful to understand some of the underlying principles of the Agilent microRNA platform:

Probe design and labeling methods that are linked. The mature microRNAs are labeled via the ligation of a Cy3 conjugated pCp molecule to the 3’ end of the microRNA. This labeling reaction introduces an additional “C” base to the 3’ end of all of the labeled RNA molecules. During probe design, we take advantage of this “C” base, by adding an additional “G” to the 5’ end of the active probe sequence. The addition of this G:C base pair to the probe: microRNA interaction helps stabilize the interaction, and provides some additional selectivity to labeled mature microRNAs.
Multiple probes and probe replicates for each microRNA. Each microRNA represented on an Agilent microRNA array is measured by multiple probes. In addition, each probe sequence is replicated multiple times. This replication allows for both improved robustness, as outlier features are removed during data summarization in Feature Extraction, and improved sensitivity, as the presence of the probe replicates helps drive the hybridization reaction towards equilibrium.
Robust data summarization. The data summarization procedures used in the Agilent Feature Extraction software allow for the summarization of the multiple probes and probe replicates into a robust measurement for each microRNA. This measurement, the "TotalGeneSignal,” is found in both of the Feature Extraction output files: the full text and “GeneView” files. Details as to how the data summarization is completed can be found in the Feature Extraction Reference Guide.

Guidance for design of microRNA custom microarrays

Agilent has predesigned probes to all the mature microRNAs for all species* in the most recent Sanger miRBase release. Both the identification of the appropriate probe sequences and the methods implemented for custom microRNA microarray design are designed to provide a robust sensitive and specific measurement of your microRNAs of interest. All designs are based on the 8x15K format: in any microRNA design, customers have the ability to create “sets” to accommodate designs requiring more than 15,000 features.

[*NOTE: Although Agilent has designed probes for all species in the Sanger miRBase database, users should be aware that the Agilent labeling protocol requires an accessible hydroxyl group on the 3’ end of the microRNA for ligation of a dye-conjugated pCp molecule. microRNAs from some species (mostly plants) have a 3’ modification which may interfere with the Agilent labeling method. Researchers interested in studying 3’ modified microRNAs likely will need to use alternative labeling methods and should plan to conduct experiments to optimize the assay conditions.]

General Design Guidance

Probe search: Searching for probes in eArray is based on the premise that each microRNA is represented by multiple probes. Probes are therefore returned in groups, based on the microRNA to which they are designed. Searches for probes can be performed against multiple miRBase builds, but probes for a microRNA present in the latest build will be returned only when including that build in the search. See below for more information on multiple miRBase build designs.
Microarray Formats: At the present time, only the 8x15K format is enabled in eArray, because microRNA probe and assay design, including probe replicates, RNA input requirements, and labeling reaction conditions are based on that format. For users wishing to measure more microRNAs than can be accommodated on this format, a “Microarray Set” can be created. More information on creating microarray sets can be found below.
Microarray Layout: Selected probes will be laid out based on the design principles outlined above. Users have the option of representing each microRNA by 16 or 20 features. Twenty features will result in slightly more robust data, while 16 features will allow for the inclusion of more microRNAs per array. All probes will be randomly distributed on the array which results in probe replicates being spread across the array yielding the most robust downstream data. Any space not used for the selected probes will be filled by a structural control and considered “blank.” These “blank” features will be ignored in the downstream analysis.

Guidance for specific cases

Creating an array from multiple miRBase build designs

The eArray probe database contains probes designed to the current Sanger miRBase release. It also contains probes to microRNAs that were present in selected former database builds that may have been removed or changed in the intervening periods. For the earliest Sanger miRBase builds, eArray only contains probes for selected species (9.1, human only; 10.1 and 11.0, human, mouse and rat).

It is possible to design an array containing probes to microRNAs from multiple database builds. The systematic ID given to the “old-build specific” microRNAs will be appended with the database version (e.g., hsa-miR-139_v9.1). Note that not all microRNA sequence changes will result in new probes being designed (e.g., addition of one base to the 5’ end). If the probe sequences did not change, the microRNA is considered unchanged.

Due to the data processing steps in Feature Extraction, each microRNA sequence being measured must have a unique systematic name, to enable proper TotalGeneSignal calculation.
Combined with the fact that some microRNAs may be present in multiple database build with the same name, but different sequences, requires us to alter the previous names.

microRNA sequences from previous builds that are unchanged in the current build are searchable in the most recent build only. microRNAs with name changes between builds are searchable in both builds, but the primary annotation is based on the newest build.

Creating multi-species arrays – It is possible to design a multi-species array using eArray. Due to the similarity of certain cross-species microRNAs and the design of the microRNA assay, the layout of such designs needs to be done very carefully. This system is designed to allow you as much flexibility as possible to design multi-species arrays while maintaining the Agilent microRNA microarray design principles.

Primary species must be identified. When microRNAs are selected for a given design from multiple species and those microRNAs are identical, probes to only one of those microRNAs will be incorporated in the design. A species priority order will be used to determine which probes will be incorporated in the design, with the primary species getting the first priority. Users must choose the primary species when creating a microarray. Remaining priority species is according to Agilent’s pre-defined species priority order.

Probes for all microRNAs of the first priority species will be incorporated in the design.
Probes for all microRNAs for the 2nd priority species that are not already measured by the existing probes are added to the design.
Probes for the microRNAs for the remaining species are added to the design according to the species priority order.

Annotation considerations. Probes will be annotated on the array in line with the aforementioned priority order.

Creating a “Microarray Set” – For users who wish to measure more than the number of microRNAs than can be accommodated on one design, eArray has the ability to generate a microRNA “Microarray Set.” In this instance, probes for the different microRNAs selected will be incorporated across the number of arrays needed to include all the selected microRNAs. Microarray designs that are part of sets cannot be split into individual designs within eArray.

Species priority list

Agilent has established the following species priority list for multi-species microRNA array designs. For details, see Creating multi-species arrays, above.

Priority Order	Species Name	Common Name
1	Homo sapiens	human
2	Mus musculus	mouse
3	Rattus norvegicus	rat
4	Pan troglodytes	chimp
5	Macaca mulatta	rhesus monkey
6	Gallus gallus	chicken
7	Oryza sativa	rice
8	Ornithorhynchus anatinus	platypus
9	Physcomitrella patens	moss
10	Populus trichocarpa	poplar
11	Arabidopsis thaliana	arabadopsis
12	Danio rerio	zebrafish
13	Canis familiaris	dog
14	Caenorhabditis elegans	worm
15	Xenopus tropicalis	frog
16	Drosophila melanogaster	fruit fly
17	Vitis vinifera	grape
18	Bos taurus	cow
19	Monodelphis domestica	opossum
20	Fugu rubripes	fugu (Japanese pufferfish)
21	Tetraodon nigroviridis	pufferfish
22	Zea mays	corn
23	Caenorhabditis briggsae	worm
24	Pan paniscus	bonobo
25	Gorilla gorilla	gorilla
26	Pongo pygmaeus	orangutan
27	Drosophila erecta	fruit fly
28	Chlamydomonas reinhardtii	alga
29	Drosophila ananassae	fruit fly
30	Drosophila sechellia	fruit fly
31	Sorghum bicolor	sorghum
32	Drosophila yakuba	fruit fly
33	Drosophila virilis	fruit fly
34	Glycine max	soybean
35	Macaca nemestrina	pig-tailed macaque
36	Drosophila pseudoobscura	fruit fly
37	Drosophila grimshawi	fruit fly
38	Drosophila mojavensis	fruit fly
39	Drosophila willistoni	fruit fly
40	Drosophila persimilis	fruit fly
41	Drosophila simulans	fruit fly
42	Selaginella moellendorffii	spikemoss
43	Oikopleura dioica	tunicate
44	Sus scrofa	boar
45	Schmidtea mediterranea	flatworm
46	Ateles geoffroyi	spider Monkey
47	Bombyx mori	silkworm
48	Anopheles gambiae	mosquito
49	Apis mellifera	honey bee
50	Brassica napus	rapeseed
51	Lagothrix lagotricha	brown woolly monkey
52	Tribolium castaneum	beetle
53	Saguinus labiatus	tamarin
54	Pinus taeda	pine
55	Ciona intestinalis	tunicate
56	Triticum aestivum	wheat
57	Epstein Barr	human virus
58	Medicago truncatula	clover
59	Solanum lycopersicum	tomato
60	Ciona savignyi	tunicate
61	Mouse cytomegalovirus	mouse virus
62	Rhesus lymphocryptovirus	rhesus virus
63	Mareks disease	chicken virus
64	Mareks disease	chicken virus
65	Saccharum officinarum	sugarcane
66	Kaposi sarcoma-associated	human virus
67	Lemur catta	ring-tailed lemur
68	Human cytomegalovirus	human virus
69	Gossypium hirsutum	cotton
70	Mouse gammaherpesvirus	mouse virus
71	Symphalangus syndactylus	gibbon
72	Pygathrix bieti	black snub-nosed monkey
73	Herpes Simplex	human virus
74	Rhesus monkey	rhesus monkey
75	Xenopus laevis	frog
76	Human immunodeficiency	human virus
77	Ovis aries	sheep
78	BK polyomavirus	human virus
79	Dictyostelium discoideum	amoeba
80	Gossypium rammindii	cotton
81	JC polyomavirus	human virus
82	Simian virus	monkey virus
83	Brassica oleracea	wild mustard
84	Brassica rapa	cabbages
85	Cricetulus griseus	Chinese hamster
86	Carica papaya	papaya
87	Gossypium herbecium	cotton