Get custom microarray design guidance |
|
Agilent offers the following application-specific guidance to help you create optimal microarray designs:
Expression array design guidance
SureSelect Capture Array design guidance
microRNA array design guidance
Additional guidance can also be found in the eArray Frequently Asked Questions (FAQs).
Agilent recommends designing and including the following types of probe sets when generating custom Gene Expression microarray designs:
Agilent’s negative control probes for optimal background-subtraction with the Agilent Feature Extraction software. Agilent negative control probes are included in Agilent’s QC grid. If the use of customized negative controls probes is desired, we recommend that they be designated as non control probes for the purposes of Agilent’s Feature Extraction software.
Replicated non-control probes for use in the Multiplicative Detrending step of the Agilent Feature Extraction software. The Multiplicative Detrending step detects and corrects for trends in array uniformity and uses replicated non-control probes as a default. A minimum of 15 probes should be replicated 5-10 times on each microarray design. If replicated probes are not used, the default settings should be adjusted in the Feature Extraction protocol.
Probe set representing non-differentially expressed genes for use in accurate normalization of microarray experiments, where typical normalization assumptions about differential expression are not met due to a relatively low probe count or strong bias in differential expression. These probes should span the full range of signal intensity for optimal normalization. Agilent recommends that a minimum of 1% of the non control probes be part of this list for each custom design. If prior knowledge of non-differentially expressed genes is unavailable, we recommend that these probes be selected to randomly cover the dynamic range of the experiment. For these custom Gene Expression designs, microarray data should be normalized using data from these control probes. For custom “whole genome” type arrays, inclusion of a normalization gene list is generally not necessary.
The process for generation and use of a DyeNorm Gene List for two-color microarray data analysis is described in the Feature Extraction Software User Guide. The non-differentially expressed gene list can also be used for one-color microarray data normalization in downstream applications such as GeneSpring GX, as described in the GeneSpring GX Software User Manual.
Additional guidance can be found in the eArray Frequently Asked Questions (FAQs) for custom Gene Expression microarrays.
Agilent’s High-Definition Comparative Genomic Hybridization (HD-CGH) database provides users the flexibility to create custom microarray designs for analysis of genome regions of interest to them at the resolution of their choosing. Agilent recommends including the following types of probe sets when generating custom CGH microarray designs:
Normalization control probes that represent non-aberrant/non-variant regions for the purpose of accurate normalization of the data using Agilent’s Feature Extraction software – A probe normalization group needs to be used only when the assumption of dye normalization fails. The assumption is that the overall signal intensities between the two channels are the same. When you design a whole genome array, you do not need to include a specific normalization probe group, since there will be a sufficient number of non-aberrant regions for proper normalization. For other situations, Agilent provides a control probe group, or you can select your own. Use this Agilent or user-defined normalization probe group in Feature Extraction (FE) to properly normalize the array. The probes in a dye normalization probe group must occupy at least 1% of the total number of features on the array after filtering. If you do not use an Agilent or user-specified normalization probe group, and the assumptions of dye normalization are not met within reason, FE will not be able to correct all of the systematic bias. This will alter copy number estimates. For proper normalization, specify the normalization probe group in the Agilent Feature Extraction software.
Replicate probes for determining the Reproducibility QC metric in Agilent’s Feature Extraction and DNA Analytics software – The Reproducibility metric is calculated as the Median %CV of background-subtracted signal for these replicate probes after outlier rejection. If you choose not to use this probe group, it is important to include a set of replicate probes for calculating reproducibility. For user-defined replicate probe groups, Agilent recommends a minimum of 300 probes replicated five times.
Additional guidance can be found in the eArray Frequently Asked Questions (FAQs) for custom CGH microarrays.
Agilent recommends that you take the following guidelines under consideration when designing your array-based DNA capture microarrays:
Use as tight a probe spacing as your application allows. Agilent recommends a probe spacing of 3-bp between probe starts. This is because Agilent has done the most extensive testing on 3-bp probe-spacing, which enables targeting of 700kb-800kb of the genome, depending on the specific number of independent regions that you target. Agilent has also obtained successful capture results through limited testing of larger probe spacing (including 15 bp and 20 bp), but has observed lower average read depths. In these limited tests, the drop in read-depth has been a little better than expected, on average only 75% of the expected drop. For your application and sequencing throughput, this lower read-depth may not be adequate for your needs.
Pad (extend) your intervals by 100bp-200bp on either side. Although large numbers of reads are typically observed at interval endpoints, Agilent has observed that optimal capture depths are achieved 100 bp to 200 bp inside the interval boundaries. We therefore recommend that you extend your intervals by 100 bp to 200 bp, depending on your needs. For example, if you target exons, do not to use Exon endpoints directly. Instead, use a set of coordinates that start 100 bp before the start of each exon, and that end 100 bp after the end of each exon. If you use the Genomic Tiling tool in eArray to select probes, you can use it to extend your intervals.
Be aware of duplicated and "repeat" regions. Agilent does not check all probes to see if they target unique regions of the genome. The only exceptions are probes that contain known repeat regions (as identified by Repeat Masker), which are omitted from the design if you use the Genomic Tiling tool in eArray to design probes. Note that probes that target duplicate regions can produce spurious results or they can produce reads that are too ambiguous to map with your sequencing software. In practice, such probes are relatively rare. To an extent, this issue is a natural consequence of the duplication that occurs in complex genomes.
For more information about design considerations for array-based DNA capture, please consult Agilent’s application note Complementing Next Generation Sequencing Technologies: Capture and Release Assay Using Agilent DNA Microarrays (#5989-8700EN).
In addition, you may find the Agilent tutorial on SureSelect Capture Arrays useful. To view this tutorial, go to the eArray Login page. Under Additional Information, click SureSelect Capture Array Tutorial with Wizard.
The Agilent microRNA microarray solution provides a robust and sensitive method for detection of microRNAs from total RNA. Agilent has introduced Human, Mouse and Rat catalog arrays, which have been designed and empirically tested to provide sensitive and specific measurements of all microRNAs from the Sanger miRBase database for these species.
eArray enables the design of custom microRNA arrays: researchers may design microarrays measuring the microRNAs of their choosing from the Sanger miRBase database. The design principles used in the design of our catalog arrays, outlined below, are also applied for these custom arrays. This approach reduces uncertainty around the design of custom arrays while continuing to provide the most sensitive and robust assay to meet the needs of researchers. eArray allows the flexibility for researchers to study the microRNAs of their choice on the 8x15K format.
Agilent microRNA array design principles
Before designing a custom microRNA array, it is useful to understand some of the underlying principles of the Agilent microRNA platform:
Probe design and labeling methods that are linked. The mature microRNAs are labeled via the ligation of a Cy3 conjugated pCp molecule to the 3’ end of the microRNA. This labeling reaction introduces an additional “C” base to the 3’ end of all of the labeled RNA molecules. During probe design, we take advantage of this “C” base, by adding an additional “G” to the 5’ end of the active probe sequence. The addition of this G:C base pair to the probe: microRNA interaction helps stabilize the interaction, and provides some additional selectivity to labeled mature microRNAs.
Multiple probes and probe replicates for each microRNA. Each microRNA represented on an Agilent microRNA array is measured by multiple probes. In addition, each probe sequence is replicated multiple times. This replication allows for both improved robustness, as outlier features are removed during data summarization in Feature Extraction, and improved sensitivity, as the presence of the probe replicates helps drive the hybridization reaction towards equilibrium.
Robust data summarization. The data summarization procedures used in the Agilent Feature Extraction software allow for the summarization of the multiple probes and probe replicates into a robust measurement for each microRNA. This measurement, the "TotalGeneSignal,” is found in both of the Feature Extraction output files: the full text and “GeneView” files. Details as to how the data summarization is completed can be found in the Feature Extraction Reference Guide.
Guidance for design of microRNA custom microarrays
Agilent has predesigned probes to all the mature microRNAs for all species* in the most recent Sanger miRBase release. Both the identification of the appropriate probe sequences and the methods implemented for custom microRNA microarray design are designed to provide a robust sensitive and specific measurement of your microRNAs of interest. All designs are based on the 8x15K format: in any microRNA design, customers have the ability to create “sets” to accommodate designs requiring more than 15,000 features.
[*NOTE: Although Agilent has designed probes for all species in the Sanger miRBase database, users should be aware that the Agilent labeling protocol requires an accessible hydroxyl group on the 3’ end of the microRNA for ligation of a dye-conjugated pCp molecule. microRNAs from some species (mostly plants) have a 3’ modification which may interfere with the Agilent labeling method. Researchers interested in studying 3’ modified microRNAs likely will need to use alternative labeling methods and should plan to conduct experiments to optimize the assay conditions.]
General Design Guidance
Probe search: Searching for probes in eArray is based on the premise that each microRNA is represented by multiple probes. Probes are therefore returned in groups, based on the microRNA to which they are designed. Searches for probes can be performed against multiple miRBase builds, but probes for a microRNA present in the latest build will be returned only when including that build in the search. See below for more information on multiple miRBase build designs.
Microarray Formats: At the present time, only the 8x15K format is enabled in eArray, because microRNA probe and assay design, including probe replicates, RNA input requirements, and labeling reaction conditions are based on that format. For users wishing to measure more microRNAs than can be accommodated on this format, a “Microarray Set” can be created. More information on creating microarray sets can be found below.
Microarray Layout: Selected probes will be laid out based on the design principles outlined above. Users have the option of representing each microRNA by 16 or 20 features. Twenty features will result in slightly more robust data, while 16 features will allow for the inclusion of more microRNAs per array. All probes will be randomly distributed on the array which results in probe replicates being spread across the array yielding the most robust downstream data. Any space not used for the selected probes will be filled by a structural control and considered “blank.” These “blank” features will be ignored in the downstream analysis.
Guidance for specific cases
Creating an array from multiple miRBase build designs
The eArray probe database contains probes designed to the current Sanger miRBase release. It also contains probes to microRNAs that were present in selected former database builds that may have been removed or changed in the intervening periods. For the earliest Sanger miRBase builds, eArray only contains probes for selected species (9.1, human only; 10.1 and 11.0, human, mouse and rat).
It
is possible to design an array containing probes to microRNAs
from multiple database builds. The systematic ID given to the
“old-build specific” microRNAs will be appended with the database
version (e.g., hsa-miR-139_v9.1). Note that not all microRNA sequence
changes will result in new probes being designed (e.g., addition
of one base to the 5’ end). If the probe sequences did not change,
the microRNA is considered unchanged.
Due to the data processing steps in Feature Extraction, each microRNA sequence being measured must have a unique systematic name, to enable proper TotalGeneSignal calculation.
Combined with the fact that some microRNAs may be present in multiple database build with the same name, but different sequences, requires us to alter the previous names.
microRNA sequences from previous builds that are unchanged in the current build are searchable in the most recent build only. microRNAs with name changes between builds are searchable in both builds, but the primary annotation is based on the newest build.
Creating multi-species arrays – It is possible to design a multi-species array using eArray. Due to the similarity of certain cross-species microRNAs and the design of the microRNA assay, the layout of such designs needs to be done very carefully. This system is designed to allow you as much flexibility as possible to design multi-species arrays while maintaining the Agilent microRNA microarray design principles.
Primary
species must be identified. When microRNAs are selected for a
given design from multiple species and those microRNAs are identical,
probes to only one of those microRNAs will be incorporated in
the design. A species priority order will be used to determine
which probes will be incorporated in the design, with the primary
species getting the first priority. Users must choose the primary
species when creating a microarray. Remaining priority species
is according to Agilent’s pre-defined species
priority order.
Probes for all microRNAs of the first priority species will be incorporated in the design.
Probes for all microRNAs for the 2nd priority species that are not already measured by the existing probes are added to the design.
Probes for the microRNAs for the remaining species are added to the design according to the species priority order.
Annotation considerations. Probes will be annotated on the array in line with the aforementioned priority order.
Creating a “Microarray Set” – For users who wish to measure more than the number of microRNAs than can be accommodated on one design, eArray has the ability to generate a microRNA “Microarray Set.” In this instance, probes for the different microRNAs selected will be incorporated across the number of arrays needed to include all the selected microRNAs. Microarray designs that are part of sets cannot be split into individual designs within eArray.
Agilent has established the following species priority list for multi-species microRNA array designs. For details, see Creating multi-species arrays, above.
Priority Order |
Species Name |
Common Name |
1 |
Homo sapiens |
human |
2 |
Mus musculus |
mouse |
3 |
Rattus norvegicus |
rat |
4 |
Pan troglodytes |
chimp |
5 |
Macaca mulatta |
rhesus monkey |
6 |
Gallus gallus |
chicken |
7 |
Oryza sativa |
rice |
8 |
Ornithorhynchus anatinus |
platypus |
9 |
Physcomitrella patens |
moss |
10 |
Populus trichocarpa |
poplar |
11 |
Arabidopsis thaliana |
arabadopsis |
12 |
Danio rerio |
zebrafish |
13 |
Canis familiaris |
dog |
14 |
Caenorhabditis elegans |
worm |
15 |
Xenopus tropicalis |
frog |
16 |
Drosophila melanogaster |
fruit fly |
17 |
Vitis vinifera |
grape |
18 |
Bos taurus |
cow |
19 |
Monodelphis domestica |
opossum |
20 |
Fugu rubripes |
fugu (Japanese pufferfish) |
21 |
Tetraodon nigroviridis |
pufferfish |
22 |
Zea mays |
corn |
23 |
Caenorhabditis briggsae |
worm |
24 |
Pan paniscus |
bonobo |
25 |
Gorilla gorilla |
gorilla |
26 |
Pongo pygmaeus |
orangutan |
27 |
Drosophila erecta |
fruit fly |
28 |
Chlamydomonas reinhardtii |
alga |
29 |
Drosophila ananassae |
fruit fly |
30 |
Drosophila sechellia |
fruit fly |
31 |
Sorghum bicolor |
sorghum |
32 |
Drosophila yakuba |
fruit fly |
33 |
Drosophila virilis |
fruit fly |
34 |
Glycine max |
soybean |
35 |
Macaca nemestrina |
pig-tailed macaque |
36 |
Drosophila pseudoobscura |
fruit fly |
37 |
Drosophila grimshawi |
fruit fly |
38 |
Drosophila mojavensis |
fruit fly |
39 |
Drosophila willistoni |
fruit fly |
40 |
Drosophila persimilis |
fruit fly |
41 |
Drosophila simulans |
fruit fly |
42 |
Selaginella moellendorffii |
spikemoss |
43 |
Oikopleura dioica |
tunicate |
44 |
Sus scrofa |
boar |
45 |
Schmidtea mediterranea |
flatworm |
46 |
Ateles geoffroyi |
spider Monkey |
47 |
Bombyx mori |
silkworm |
48 |
Anopheles gambiae |
mosquito |
49 |
Apis mellifera |
honey bee |
50 |
Brassica napus |
rapeseed |
51 |
Lagothrix lagotricha |
brown woolly monkey |
52 |
Tribolium castaneum |
beetle |
53 |
Saguinus labiatus |
tamarin |
54 |
Pinus taeda |
pine |
55 |
Ciona intestinalis |
tunicate |
56 |
Triticum aestivum |
wheat |
57 |
Epstein Barr |
human virus |
58 |
Medicago truncatula |
clover |
59 |
Solanum lycopersicum |
tomato |
60 |
Ciona savignyi |
tunicate |
61 |
Mouse cytomegalovirus |
mouse virus |
62 |
Rhesus lymphocryptovirus |
rhesus virus |
63 |
Mareks disease |
chicken virus |
64 |
Mareks disease |
chicken virus |
65 |
Saccharum officinarum |
sugarcane |
66 |
Kaposi sarcoma-associated |
human virus |
67 |
Lemur catta |
ring-tailed lemur |
68 |
Human cytomegalovirus |
human virus |
69 |
Gossypium hirsutum |
cotton |
70 |
Mouse gammaherpesvirus |
mouse virus |
71 |
Symphalangus syndactylus |
gibbon |
72 |
Pygathrix bieti |
black snub-nosed monkey |
73 |
Herpes Simplex |
human virus |
74 |
Rhesus monkey |
rhesus monkey |
75 |
Xenopus laevis |
frog |
76 |
Human immunodeficiency |
human virus |
77 |
Ovis aries |
sheep |
78 |
BK polyomavirus |
human virus |
79 |
Dictyostelium discoideum |
amoeba |
80 |
Gossypium rammindii |
cotton |
81 |
JC polyomavirus |
human virus |
82 |
Simian virus |
monkey virus |
83 |
Brassica oleracea |
wild mustard |
84 |
Brassica rapa |
cabbages |
85 |
Cricetulus griseus |
Chinese hamster |
86 |
Carica papaya |
papaya |
87 |
Gossypium herbecium |
cotton |