Agilent offers the following application-specific guidance to help you create optimal microarray designs:
Expression array design guidance
CGH array design guidance
SureSelect Capture Array design guidance
microRNA array design guidance
Additional guidance can also be found in the eArray Frequently Asked Questions (FAQs).
Agilent recommends designing and including the following types of probe sets when generating custom Gene Expression microarray designs:
Agilent’s negative control probes for optimal background-subtraction with the Agilent Feature Extraction software. Agilent negative control probes are included in Agilent’s QC grid. If the use of customized negative controls probes is desired, we recommend that they be designated as non control probes for the purposes of Agilent’s Feature Extraction software.
Replicated non-control probes for use in the Multiplicative Detrending step of the Agilent Feature Extraction software. The Multiplicative Detrending step detects and corrects for trends in array uniformity and uses replicated non-control probes as a default. A minimum of 15 probes should be replicated 5-10 times on each microarray design. If replicated probes are not used, the default settings should be adjusted in the Feature Extraction protocol.
Probe set representing non-differentially expressed genes for use in accurate normalization of microarray experiments, where typical normalization assumptions about differential expression are not met due to a relatively low probe count or strong bias in differential expression. These probes should span the full range of signal intensity for optimal normalization. Agilent recommends that a minimum of 1% of the non control probes be part of this list for each custom design. If prior knowledge of non-differentially expressed genes is unavailable, we recommend that these probes be selected to randomly cover the dynamic range of the experiment. For these custom Gene Expression designs, microarray data should be normalized using data from these control probes. For custom “whole genome” type arrays, inclusion of a normalization gene list is generally not necessary.
The process for generation and use of a DyeNorm Gene List for two-color microarray data analysis is described in the Feature Extraction Software User Guide. The non-differentially expressed gene list can also be used for one-color microarray data normalization in downstream applications such as GeneSpring GX, as described in the GeneSpring GX Software User Manual.
Additional guidance can be found in the eArray Frequently Asked Questions (FAQs) for custom Gene Expression microarrays.
Agilent’s High-Definition Comparative Genomic Hybridization (HD-CGH) database provides users the flexibility to create custom microarray designs for analysis of genome regions of interest to them at the resolution of their choosing. Agilent recommends including the following types of probe sets when generating custom CGH microarray designs:
Normalization control probes that represent non-aberrant/non-variant regions for the purpose of accurate normalization of the data using Agilent’s Feature Extraction software – A probe normalization group needs to be used only when the assumption of dye normalization fails. The assumption is that the overall signal intensities between the two channels are the same. When you design a whole genome array, you do not need to include a specific normalization probe group, since there will be a sufficient number of non-aberrant regions for proper normalization. For other situations, Agilent provides a control probe group, or you can select your own. Use this Agilent or user-defined normalization probe group in Feature Extraction (FE) to properly normalize the array. The probes in a dye normalization probe group must occupy at least 1% of the total number of features on the array after filtering. If you do not use an Agilent or user-specified normalization probe group, and the assumptions of dye normalization are not met within reason, FE will not be able to correct all of the systematic bias. This will alter copy number estimates. For proper normalization, specify the normalization probe group in the Agilent Feature Extraction software.
Replicate probes for determining the Reproducibility QC metric in Agilent’s Feature Extraction and DNA Analytics software – The Reproducibility metric is calculated as the Median %CV of background-subtracted signal for these replicate probes after outlier rejection. If you choose not to use this probe group, it is important to include a set of replicate probes for calculating reproducibility. For user-defined replicate probe groups, Agilent recommends a minimum of 300 probes replicated five times.
Additional guidance can be found in the eArray Frequently Asked Questions (FAQs) for custom CGH microarrays.
Agilent recommends that you take the following guidelines under consideration when designing your array-based DNA capture microarrays:
Use as tight a probe spacing as your application allows. Agilent recommends a probe spacing of 3-bp between probe starts. This is because Agilent has done the most extensive testing on 3-bp probe-spacing, which enables targeting of 700kb-800kb of the genome, depending on the specific number of independent regions that you target. Agilent has also obtained successful capture results through limited testing of larger probe spacing (including 15 bp and 20 bp), but has observed lower average read depths. In these limited tests, the drop in read-depth has been a little better than expected, on average only 75% of the expected drop. For your application and sequencing throughput, this lower read-depth may not be adequate for your needs.
Pad (extend) your intervals by 100bp-200bp on either side. Although large numbers of reads are typically observed at interval endpoints, Agilent has observed that optimal capture depths are achieved 100 bp to 200 bp inside the interval boundaries. We therefore recommend that you extend your intervals by 100 bp to 200 bp, depending on your needs. For example, if you target exons, do not to use Exon endpoints directly. Instead, use a set of coordinates that start 100 bp before the start of each exon, and that end 100 bp after the end of each exon. If you use the Genomic Tiling tool in eArray to select probes, you can use it to extend your intervals.
Be aware of duplicated and "repeat" regions. Agilent does not check all probes to see if they target unique regions of the genome. The only exceptions are probes that contain known repeat regions (as identified by Repeat Masker), which are omitted from the design if you use the Genomic Tiling tool in eArray to design probes. Note that probes that target duplicate regions can produce spurious results or they can produce reads that are too ambiguous to map with your sequencing software. In practice, such probes are relatively rare. To an extent, this issue is a natural consequence of the duplication that occurs in complex genomes.
For more information about design considerations for array-based DNA capture, please consult Agilent’s application note Complementing Next Generation Sequencing Technologies: Capture and Release Assay Using Agilent DNA Microarrays (#5989-8700EN).
In addition, you may find the Agilent tutorial on SureSelect Capture Arrays useful. To view this tutorial, go to the eArray Login page. Under Additional Information, click SureSelect Capture Array Tutorial with Wizard.
The Agilent microRNA microarray solution provides a robust and sensitive method for detection of microRNAs from total RNA. Agilent has introduced Human, Mouse and Rat catalog arrays, which have been designed and empirically tested to provide sensitive and specific measurements of all microRNAs from the Sanger miRBase database for these species.
eArray enables the design of custom microRNA arrays: researchers may design microarrays measuring the microRNAs of their choosing from the Sanger miRBase database. The design principles used in the design of our catalog arrays, outlined below, are also applied for these custom arrays. This approach reduces uncertainty around the design of custom arrays while continuing to provide the most sensitive and robust assay to meet the needs of researchers. eArray allows the flexibility for researchers to study the microRNAs of their choice on the 8x15K format.
Agilent microRNA array design principles
Before designing a custom microRNA array, it is useful to understand some of the underlying principles of the Agilent microRNA platform:
Probe design and labeling methods that are linked. The mature microRNAs are labeled via the ligation of a Cy3 conjugated pCp molecule to the 3’ end of the microRNA. This labeling reaction introduces an additional “C” base to the 3’ end of all of the labeled RNA molecules. During probe design, we take advantage of this “C” base, by adding an additional “G” to the 5’ end of the active probe sequence. The addition of this G:C base pair to the probe: microRNA interaction helps stabilize the interaction, and provides some additional selectivity to labeled mature microRNAs.
Multiple probes and probe replicates for each microRNA. Each microRNA represented on an Agilent microRNA array is measured by multiple probes. In addition, each probe sequence is replicated multiple times. This replication allows for both improved robustness, as outlier features are removed during data summarization in Feature Extraction, and improved sensitivity, as the presence of the probe replicates helps drive the hybridization reaction towards equilibrium.
Robust data summarization. The data summarization procedures used in the Agilent Feature Extraction software allow for the summarization of the multiple probes and probe replicates into a robust measurement for each microRNA. This measurement, the "TotalGeneSignal,” is found in both of the Feature Extraction output files: the full text and “GeneView” files. Details as to how the data summarization is completed can be found in the Feature Extraction Reference Guide.
Guidance for design of microRNA custom microarrays
Agilent has predesigned probes to all the mature microRNAs for all species* in the most recent Sanger miRBase release. Both the identification of the appropriate probe sequences and the methods implemented for custom microRNA microarray design are designed to provide a robust sensitive and specific measurement of your microRNAs of interest. All designs are based on the 8x15K format: in any microRNA design, customers have the ability to create “sets” to accommodate designs requiring more than 15,000 features.
[*NOTE: Although Agilent has designed probes for all species in the Sanger miRBase database, users should be aware that the Agilent labeling protocol requires an accessible hydroxyl group on the 3’ end of the microRNA for ligation of a dye-conjugated pCp molecule. microRNAs from some species (mostly plants) have a 3’ modification which may interfere with the Agilent labeling method. Researchers interested in studying 3’ modified microRNAs likely will need to use alternative labeling methods and should plan to conduct experiments to optimize the assay conditions.]
General Design Guidance
Probe search: Searching for probes in eArray is based on the premise that each microRNA is represented by multiple probes. Probes are therefore returned in groups, based on the microRNA to which they are designed. Searches for probes can be performed against multiple miRBase builds, but probes for a microRNA present in the latest build will be returned only when including that build in the search. See below for more information on multiple miRBase build designs.
Microarray Formats: At the present time, only the 8x15K format is enabled in eArray, because microRNA probe and assay design, including probe replicates, RNA input requirements, and labeling reaction conditions are based on that format. For users wishing to measure more microRNAs than can be accommodated on this format, a “Microarray Set” can be created. More information on creating microarray sets can be found below.
Microarray Layout: Selected probes will be laid out based on the design principles outlined above. Users have the option of representing each microRNA by 16 or 20 features. Twenty features will result in slightly more robust data, while 16 features will allow for the inclusion of more microRNAs per array. All probes will be randomly distributed on the array which results in probe replicates being spread across the array yielding the most robust downstream data. Any space not used for the selected probes will be filled by a structural control and considered “blank.” These “blank” features will be ignored in the downstream analysis.
Guidance for specific cases
Creating an array from multiple miRBase build designs
The eArray probe database contains probes designed to the current Sanger miRBase release. It also contains probes to microRNAs that were present in selected former database builds that may have been removed or changed in the intervening periods. For the earliest Sanger miRBase builds, eArray only contains probes for selected species (9.1, human only; 10.1 and 11.0, human, mouse and rat).
It is possible to design an array containing probes to microRNAs from multiple database builds. The systematic ID given to the “old-build specific” microRNAs will be appended with the database version (e.g., hsa-miR-139_v9.1). Note that not all microRNA sequence changes will result in new probes being designed (e.g., addition of one base to the 5’ end). If the probe sequences did not change, the microRNA is considered unchanged.
Due to the data processing steps in Feature Extraction, each microRNA sequence being measured must have a unique systematic name, to enable proper TotalGeneSignal calculation.
Combined with the fact that some microRNAs may be present in multiple database build with the same name, but different sequences, requires us to alter the previous names.
microRNA sequences from previous builds that are unchanged in the current build are searchable in the most recent build only. microRNAs with name changes between builds are searchable in both builds, but the primary annotation is based on the newest build.
Creating multi-species arrays – It is possible to design a multi-species array using eArray. Due to the similarity of certain cross-species microRNAs and the design of the microRNA assay, the layout of such designs needs to be done very carefully. This system is designed to allow you as much flexibility as possible to design multi-species arrays while maintaining the Agilent microRNA microarray design principles.
Primary species must be identified. When microRNAs are selected for a given design from multiple species and those microRNAs are identical, probes to only one of those microRNAs will be incorporated in the design. A species priority order will be used to determine which probes will be incorporated in the design, with the primary species getting the first priority. Users must choose the primary species when creating a microarray. Remaining priority species is according to Agilent’s pre-defined species priority order.
Probes for all microRNAs of the first priority species will be incorporated in the design.
Probes for all microRNAs for the 2nd priority species that are not already measured by the existing probes are added to the design.
Probes for the microRNAs for the remaining species are added to the design according to the species priority order.
Annotation considerations. Probes will be annotated on the array in line with the aforementioned priority order.
Creating a “Microarray Set” – For users who wish to measure more than the number of microRNAs than can be accommodated on one design, eArray has the ability to generate a microRNA “Microarray Set.” In this instance, probes for the different microRNAs selected will be incorporated across the number of arrays needed to include all the selected microRNAs. Microarray designs that are part of sets cannot be split into individual designs within eArray.
Agilent has established the following species priority list for multi-species microRNA array designs. For details, see Creating multi-species arrays, above.
Priority Order
Species Name
Common Name
1
Homo sapiens
human
2
Mus musculus
mouse
3
Rattus norvegicus
rat
4
Pan troglodytes
chimp
5
Macaca mulatta
rhesus monkey
6
Gallus gallus
chicken
7
Oryza sativa
rice
8
Ornithorhynchus anatinus
platypus
9
Physcomitrella patens
moss
10
Populus trichocarpa
poplar
11
Arabidopsis thaliana
arabadopsis
12
Danio rerio
zebrafish
13
Canis familiaris
dog
14
Caenorhabditis elegans
worm
15
Xenopus tropicalis
frog
16
Drosophila melanogaster
fruit fly
17
Vitis vinifera
grape
18
Bos taurus
cow
19
Monodelphis domestica
opossum
20
Fugu rubripes
fugu (Japanese pufferfish)
21
Tetraodon nigroviridis
pufferfish
22
Zea mays
corn
23
Caenorhabditis briggsae
24
Pan paniscus
bonobo
25
Gorilla gorilla
gorilla
26
Pongo pygmaeus
orangutan
27
Drosophila erecta
28
Chlamydomonas reinhardtii
alga
29
Drosophila ananassae
30
Drosophila sechellia
31
Sorghum bicolor
sorghum
32
Drosophila yakuba
33
Drosophila virilis
34
Glycine max
soybean
35
Macaca nemestrina
pig-tailed macaque
36
Drosophila pseudoobscura
37
Drosophila grimshawi
38
Drosophila mojavensis
39
Drosophila willistoni
40
Drosophila persimilis
41
Drosophila simulans
42
Selaginella moellendorffii
spikemoss
43
Oikopleura dioica
tunicate
44
Sus scrofa
boar
45
Schmidtea mediterranea
flatworm
46
Ateles geoffroyi
spider Monkey
47
Bombyx mori
silkworm
48
Anopheles gambiae
mosquito
49
Apis mellifera
honey bee
50
Brassica napus
rapeseed
51
Lagothrix lagotricha
brown woolly monkey
52
Tribolium castaneum
beetle
53
Saguinus labiatus
tamarin
54
Pinus taeda
pine
55
Ciona intestinalis
56
Triticum aestivum
wheat
57
Epstein Barr
human virus
58
Medicago truncatula
clover
59
Solanum lycopersicum
tomato
60
Ciona savignyi
61
Mouse cytomegalovirus
mouse virus
62
Rhesus lymphocryptovirus
rhesus virus
63
Mareks disease
chicken virus
64
65
Saccharum officinarum
sugarcane
66
Kaposi sarcoma-associated
67
Lemur catta
ring-tailed lemur
68
Human cytomegalovirus
69
Gossypium hirsutum
cotton
70
Mouse gammaherpesvirus
71
Symphalangus syndactylus
gibbon
72
Pygathrix bieti
black snub-nosed monkey
73
Herpes Simplex
74
Rhesus monkey
75
Xenopus laevis
76
Human immunodeficiency
77
Ovis aries
sheep
78
BK polyomavirus
79
Dictyostelium discoideum
amoeba
80
Gossypium rammindii
81
JC polyomavirus
82
Simian virus
monkey virus
83
Brassica oleracea
wild mustard
84
Brassica rapa
cabbages
85
Cricetulus griseus
Chinese hamster
86
Carica papaya
papaya
87
Gossypium herbecium