Glossary of terms

AllTracks BED File | Amplicon | Analyzable Region | Application | BED | Boosting | Catalog Design | CGH | CGH+SNP | CH3 | ChIP| Collaboration | Coverage | Custom Design | Density | Design | Design ID | Filler Probes | Guide RNA (gRNA) | Guidegroup | Masking | Minimum Sequencing | Normalization Probegroup | Normalization Probes | Orphan | Platform | Price Tier | Probe | # Probes | Probe Size | Probegroup | Replicate Probes | Replication | Sequenceable Region | SurePrint | Target | Target ID | Target Region | # Target Regions | Target Region Size | Wobble Probes | Workgroup

AllTracks BED File

SureDesign creates an AllTracks BED file for each design. This file contains, in one place, information about target regions, covered regions, missed regions, and amplicons (for HaloPlex designs) or probes (for SureSelect designs).

Amplicon

HaloPlex probes provide hybridization and ligation sites to circularize genomic fragments that have been cut by restriction enzymes; these fragments are then PCR-amplified. The length of the fragment varies, depending on the genomic spacing of the hybridizable end sequences of the probe. Sequencing coverage depends on the target being within the selected read-length of the end of one or more amplicons.

Analyzable Region

SureDesign calculates the analyzable region of a HaloPlex amplicon by ignoring the first 5 bases of the sequenceable region, since these bases are more likely to be subject to reference bias when detecting variants.

See Also: Sequenceable Region

Application

Within SureDesign, application is used to refer to the particular measurement assay that the design is intended for.

BED

BED files utilize a format defined by the UCSC Genome Browser: 
http://genome.ucsc.edu/FAQ/FAQformat.html#format1.

SureDesign utilizes 3 or 4-column BED files, containing chromosome, start, and stop columns, and an optional name column.

Note: BED format requires that start/stop locations be specified in zero-based half open format. To specify the first 1000 bases of chromosome 1, the correct coordinates are "chr 1 0 1000 MyRegion". By comparison, browser coordinates are specified in 1-based closed format. The equivalent region is denoted "chr1:1-1000" in browser format.

Boosting

High and low GC genomic regions tend to be under-represented in sequencing. The cause is complex, and many steps in the overall assay can contribute. To compensate, SureDesign can boost the concentration of high and low GC probes by replicating them during the SurePrint process.

Catalog Design

A catalog design is a SureSelect or HaloPlex design created by Agilent. These designs can be found in the Agilent Catalog tab on the Find Designs screen.

See Also: Design, Design ID, Custom Design

CGH

CGH designs can be for CGH microarrays or CGH+SNP microarrays. CGH microarrays contain probes for comparative genomics hybridization (CGH) analysis. CGH+SNP microarrays contain probes for CGH analysis and additional probes for loss of heterozygosity (LOH) analysis.

CGH+SNP

CGH+SNP microarrays contain probes for comparative genomics hybridization (CGH) analysis and additional probes for LOH analysis.

CH3

CH3 microarrays (also called DNA methylation microarrays) contain probes for identifying methylated regions.

ChIP

ChIP designs can be for ChIP-on-chip (ChIP) microarrays or DNA methylation (CH3) microarrays. ChIP microarrays contain probes for identifying protein binding regions. CH3 microarrays contain probes for identifying methylated regions.

Collaboration

Your primary "home" within SureDesign is your workgroup. A collaboration is an additional workgroup. You can create as many collaborations as you wish; invite others to join; and create designs within those collaborations. Collaborations allow you to share specific designs with specific users outside of your workgroup. You can maintain a collaboration indefinitely, or dissolve it completely once it has served its purpose.

Coverage

The coverage of a design is the percentage of nucleotides in the target regions that are expected to be captured by one or more probes in the design.

·        For SureSelect DNA and RNA designs, a target nucleotide is considered to be covered if at least one probe comes within 50 bases of the nucleotide in either direction. For older designs (specifically, SureSelect catalog designs created prior to 2016 and SureSelect custom designs created prior to July 2020) target nucleotides are considered covered if a probe comes within 100 bases in either direction.

·        For HaloPlex designs, a target nucleotide is considered to be covered if the analyzable region of least one amplicon overlaps with the nucleotide.

·        For microarray designs, a target nucleotide is considered to be covered if at least one probe overlaps the nucleotide or if the nucleotide is within the analyzable region of at least one probe.

The covered region is included as a track in the AllTracks BED File.

Custom Design

A custom design is a SureSelect or HaloPlex design that you create in your workgroup or a collaboration. Each design has a Design Name (which you assign) and a Design ID (which Agilent assigns).

Density

The density of a design is the number of probes per nucleotide in the target regions. For example, when you create a design with a density of 5x, SureDesign tries to select probes with enough overlap for each nucleotide in the target regions to be covered by five unique probes.

Design

A design is the blueprint for a SureSelect or HaloPlex library. It includes all the probes that will be manufactured together and shipped to you when you place an order.

Design ID

The Design ID is the unique number assigned to a design by Agilent.

For SureSelect, the number is:

·        A 7-digit number.

·        A 7-digit number preceded by letter S or C.

For HaloPlex, the number is a 5 digit number, a dash, and a 10 digit number.

Filler Probes

Filler probes are probes added to the design to fill the microarray in integral multiples.

Guide RNA (gRNA)

An RNA species for use in initiating site-directed cleavage using CRISPR/Cas9 systems. The SureGuide design wizard aids in the creation of gRNA sequences for specific targets of interest.

Guidegroup

A guidegroup is a set of gRNA sequences. A SureGuide design consists of one or more guidegroups. Individual guidegroups cannot be manufactured or ordered, but they can be included, either alone or with other guidegroups, in a design.

·        When creating a design with the standard SureGuide wizard, SureDesign automatically creates a single guidegroup from the gRNAs included in the final design.

·        When creating a SureGuide design with the advanced wizard, you explicitly create and include guidegroups to make up the design.

Masking

Capture of repetitive elements in the genome can lead to inefficient sequencing, with many reads either unmappable or off-target. Masking refers to the process of finding and hiding repetitive elements in the target regions so that SureDesign does not select probes for those elements.

When creating a SureSelect design, you can select for SureDesign to apply masking to your target regions. Excessive masking, however, can lead to lack of coverage within target regions that you wish to capture. SureDesign offers various masking stringencies. The best stringency level for your design depends on your need for high coverage versus your need for high sequencing efficiency.

HaloPlex probes only hybridize to the genome within a short segment at each end of the probe. When selecting probes, SureDesign reject probes in which both ends overlap with repetitive elements.

Minimum Sequencing

The Minimum Sequencing value, provided in a design's downloadable PDF report, is the recommended minimum sequencing depth. The value is designed to obtain 200x average coverage of the regions covered by PCR amplicons (HaloPlex) or hybrid capture probes (SureSelect). This depth is normally sufficient to cover 90% or more of the target regions with 20 reads or more. An average coverage of 100x will typically only cover 80% of the target regions with 20 reads or more. If 80% coverage is sufficient for your sequencing experiments, you can use a sequencing depth that is half of the Minimum Sequencing value.

The exact amount of sequencing needed for your experiments can only be determined empirically, and it may be higher or lower than the Minimum Sequencing value in the report depending on the genomic region that is amplified.

Normalization Probegroup

A normalization probegroup is a set of microarray probes that are expected to be copy number neutral and can be used by Agilent's Feature Extraction software for normalization of signal data. Each CGH and CGH+SNP microarray design format has a default Agilent normalization probegroup, consisting of normalization probes, that is automatically included in the design. The Agilent normalization probegroups are biased away from probes in the Database of Genomic Variants (DGV), but do not exclude all DGV regions. If you want to exclude frequent aberration regions in common cancers, Agilent suggests creating your own normalization probegroup.

Note that the Feature Extraction software does not use the normalization probegroup by default. For information on how to use the normalization probegroup for normalization in Feature Extraction, refer to the Feature Extraction User Guide.

Normalization Probes

Normalization probes are microarray probes that are expected to be copy number neutral and can be used by Agilent's Feature Extraction software for signal normalization. These probes cover 'backbone' regions of the genome. Include normalization probes in your CGH or CGH+SNP design when most probes in the design cover aberrant regions (for example, in a design in which half of the probes are on the X-chromosome). In a design in which the probes cover regions throughout the genome, it is not necessary to include specific normalization probes since there should be a sufficient number of non-aberrant regions for proper normalization. If you choose to include normalization probes, the minimum number of normalization probes is one percent of the design (a requirement of Agilent's Feature Extraction software) and the recommended number is at least several hundred probes.

See Also: Normalization Probegroup

Orphan

When SureDesign selects probes for a SureSelect design, long target regions are covered by multiple probes that abut or overlap (depending on the density). Very short, isolated target regions, however, may only be covered by a single probe, which is referred to as an orphan probe. For a SureSelect design, a probe is considered to be an orphan if it does not overlap with other probes in the design and it is more than 100 bp away from its nearest neighbor probe on either side.

Additionally, since the SureSelect assay captures randomly fragmented DNA, probes also capture DNA upstream and downstream of their genomic footprint. This combination of effects (orphan probe coverage and the capture of upstream and downstream sequences) can result in under-sequencing of short, isolated target regions. To compensate, SureDesign automatically replicates orphan probes during the probe selection process.

Platform

In SureDesign, platform is used to refer to the sequencing technology.

For SureSelect designs, the platform assignment is not critical to the design; the components of the kit that vary between platforms are ordered separately.

For HaloPlex designs, the sequencing primers are synthesized into the probes, so each design is platform-specific and is ordered with its kit under a single part number.

Price Tier

Price tier refers to the Agilent-assigned pricing category for a design. For HaloPlex designs, Agilent sets the price tier based on the total size of the target regions of interest and the total number of probes in the design. For smaller SureSelect designs, Agilent sets the price tier based on the total capture size of the design. For larger SureSelect designs, Agilent sets the price tier based on the total number of probes in the design.

Probe

A probe is an oligonucleotide of a particular sequence manufactured by Agilent using the SurePrint synthesis process. Agilent manufactures SureSelect probes by converting DNA to biotinylated RNA. Agilent manufactures HaloPlex probes by amplifying the original DNA.

SureSelect probes capture genomic DNA fragments that contain the complementary sequence. Since the fragmentation process used in a SureSelect assay is random, the captured fragments can extend past the end of the probe, leading to sequencing of regions outside of the original target regions and outside of the regions covered by the probes for those targets.

HaloPlex probes provide ligation sites to circularize restriction-digested genomic fragments. These fragments are then PCR-amplified. The fragment length varies depending on the genomic spacing of the hybridizable end sequences of the probe. Sequencing coverage depends on the target being within the selected read-length of the end of one or more amplicons.

Probes Size 

For SureSelect designs, Probes Size refers to the total size of the genomic footprint of all the probes. Nucleotides that are covered by multiple probes count only once towards the probe size. Since the SureSelect protocol uses randomly sheared genomic DNA, the Probe Size is only an approximation of the size of the sequenceable region.

For HaloPlex designs, Probe Size refers to the size of the sequenceable region.

# Probes 

# Probes refers to the total number of probes in a design. When reporting this number, SureDesign includes replicated probes.

Probegroup

A probegroup is a set of probes, each with a defined replication factor. A design consists of one or more probegroups. Individual probegroups cannot be manufactured or ordered, but they can be included, either alone or with other probegroups, in a design.

·        When creating a design with the standard wizards, SureDesign automatically creates a single probegroup from the probes generated in the probe selection job.

·        When creating a design with the advanced wizards, you explicitly create and include probegroups to make up the design.

Replicate Probes

SureDesign uses replicate probes to enable Agilent's Feature Extraction software to calculate the Reproducibility QC metric. This metric, which helps confirm uniform hybridization and signal intensity across the surface of the microarray, is set to the median percent CV (coefficient of variation) of background-subtracted signal for these replicate probes after outlier rejection. High scores for the Reproducibility metric usually indicate that the hybridization volume was too low or that the oven stopped rotating during the hybridization. The minimum number of replicated probes should be at least 300 probes replicated five times. This is because Feature Extraction requires a minimum three times replication level after rejection of feature non-uniformity outliers (FNUOL). The default Agilent replicate probegroup for the 1 x 244K and 1 x 1M CGH microarrays contains 1000 probes and should be replicated five times. The 1000 probes are a random selection from catalog arrays.

Replication

In SureDesign, replication refers to the replication of probes within a probegroup, the replication of probegroups within a design, or the replication of a design across the available features of a SurePrint slide during manufacturing.

SureDesign may replicate probes when performing GC boosting. If you upload probes into SureDesign, you can specify the replication number for each probe.  

When creating a SureSelect design with the advanced wizard, you can replicate probegroups within the design .

When you place an order for a design, Agilent automatically replicates your entire design to fill all available features on the array during the SurePrint synthesis. You do not need to apply replication in the design to accomplish this replication process.

Sequenceable Region

The sequenceable region for a design is the portion of the genome capable of being sequenced when the design is used in a target enrichment capture.

For HaloPlex designs, the sequenceable region of an individual HaloPlex amplicon is the portion of the genome capable of being sequenced when that amplicon is used in a HaloPlex assay. The amplicon's sequenceable region depends on the platform and read length.

·        For the Illumina platform, the sequenceable region of an amplicon is the read length from each end of the amplicon. For example, a 400-bp amplicon sequenced on a MiSeq instrument with 150 PE sequencing would have two sequenceable regions of 150 bp separated by a 100-bp gap. Since HaloPlex designs typically cover each nucleotide with multiple amplicons, sequencing from an alternative amplicon can typically cover the gap.

·        For the Ion Torrent platform, the portion of the amplicon that will be sequenced by the standard flow cycle is considered sequenceable.

SureDesign calculates the analyzable region of a HaloPlex amplicon by ignoring the first 5 bases of the sequenceable region, since these bases are more likely to be subject to reference bias when detecting variants.

For SureSelect designs, the size of the sequenceable region is approximately equal to the probes size.

SurePrint

Agilent synthesizes oligonucleotides using our proprietary SurePrint process. Oligos are synthesized in parallel on planar glass support, then cleaved and amplified to create SureSelect or HaloPlex libraries.

Target

Target refers to the genomic database entries that SureDesign retrieves based on the user-provided Target IDs.

The term target can also refer to the genomic region, or set of regions with a common target ID, that you enter into the Define Targets step of the SureSelect or HaloPlex wizard.

Target ID

A Target ID is a gene name, gene symbol, or transcript ID that is listed in one or more genome annotation databases.

Target ID can also refer to the ID attached to a genomic region, or a set of genomic regions, that you enter into the Define Targets step of the SureSelect or HaloPlex wizard.

Target Region

A target region is a contiguous genomic interval within a target. Depending on the regions of interest defined for a target, a target may be divided into many target regions. The target regions are the input to the probe selection process.

SureDesign derives the target regions by:

  1. searching the specified gene databases for the Target IDs and extracting the targets

  2. extracting the specified regions of interest (exons, UTRs, entire genomic sequence, with or without extension)

  3. merging overlapped or adjacent regions to form contiguous genomic intervals

# Target Regions

# Target Regions refers to the number target regions.

Target Region Size

The target region size is the size of the genomic footprint of the target regions. 

Wobble Probes

If a known SNP occurs within the hybridization site at either end of a HaloPlex probe, SureDesign may include wobble probes with the alternate allele in the design. The downloadable Amplicons track shows wobble probes as duplicated regions, though they are not in fact exact duplicates.

Workgroup

A workgroup consists of one or more SureDesign accounts that have access to the same set of folders. As a SureDesign user, you are a member of a workgroup. All designs in the workgroup can be seen by all members of the workgroup.

See Also: Collaboration