The functionality described in this topic is only available when you mark Show Advanced Options.


HaloPlex probegroup wizard:

Add Content

Tile Genes or Regions

 

When you select to add content using the tiling method, you first define the targets that you want to capture (screen 1), then you review the targets to make sure that SureDesign successfully recognized all your targets (screen 2). Finally, you assign the selection parameters (screen 3).

When you are finished making your selections, you then submit the probegroup to SureDesign and the program's algorithms select the probes. You receive an e-mail from Agilent SureDesign notifying you when your probe selection job is complete and the results are available for you to review.

Screen 1 - Targets, Databases, and Regions of Interest

Complete the fields and selections in this window to define the targets that you want to capture.

Targets

In the Targets text area, enter the identifiers for the targets using either of the following approaches:

·        Type or paste the target identifiers directly into the text area. List one identifier per line.

·        Click Upload to browse to a text file (*.txt) that lists the target identifiers (one identifier per line).

The permitted identifiers are:

·        For target genes:

Gene name - enter the gene name (not case-sensitive) as it appears in one or more of the selected databases; example: brca1

Transcript ID - enter the transcript ID (not case-sensitive) as it appears in one or more of the selected databases; examples: NM_007294, OTTHUMT00000348798, or ENST00000357654

Gene ID - enter the numerical NCBI gene ID; example: 672

SNP ID - enter the dbSNP ID; example: rs35282626

·        For target genomic intervals:

Genomic coordinates - enter the chromosome number and range of nucleotides using the UCSC browser format or BED format.

You can add a string of text, no spaces, after the target genomic interval to be used as the target ID (e.g. chr1:1-100 geneX). If you enter multiple target genomic intervals with the same target ID (e.g. chr1:1-100 geneX and chr1:201-300 geneX), SureDesign will treat the intervals as different regions within the same gene.

Databases

Below the Databases heading, mark the genome databases that you want SureDesign to use to obtain genomic coordinate information for your specified targets. You can hover the cursor over a database name to see the date that Agilent most recently downloaded data from the database. For H. sapiens, the available database sources are:

RefSeq - US National Center for Biotechnology Information (NCBI)

Ensembl - European Bioinformatics Institute and the Wellcome Trust Sanger Institute

CCDS - Consensus Coding Sequence project (CCDS) of the US National Center for Biotechnology Information (NCBI)

Gencode - US National Human Genome Research Institute (NHGRI) and the Wellcome Trust Sanger Institute

VEGA - Vertebrate Genome Annotation project of the Human and Vertebrate Analysis and Annotation (HAVANA) group at the Wellcome Trust Sanger Institute

SNP - dbSNP database from the National Institutes of Health (NIH)

CytoBand - CytoBand file from the UCSC Genome Browser

 

 NOTE  If you mark multiple databases, and you select Coding Exons or Coding Exons + UTRs as the regions of interest (see below), SureDesign may find exon information for a target gene in more than one database. In these cases, the program considers a sequence to be coding if any of the selected databases identifies it as coding, and it considers a sequence to be translated if any of the selected databases identifies it as translated.

Regions of Interest

Specify the specific regions within the target genes that you want to capture. Use the options below the Regions of Interest heading:

·        Coding Exons - Select this option to include probes for the translated regions of the target genes.

·        Coding Exons + UTRs - Select this option to include probes for the translated regions and the 5' and/or 3' untranslated regions of the target genes. Mark adjacent check boxes to indicate which untranslated regions you want to include in the target regions:

·        Mark 5' UTR to include 5' untranslated regions.

·        Mark 3' UTR to include 3' untranslated regions.

·        Entire Transcribed Region - Select this option to include probes for the entire genomic sequence (exons, introns, and UTRs) of your target genes.

 NOTE  For target genomic intervals (i.e. targets entered as genomic coordinates), SureDesign always includes the entire genomic sequence when selecting probes for the design, regardless of your selection for the Regions of Interest.

Include Flanking Bases

In the 3' and 5' drop-down lists, select how many base pairs of flanking sequence (on the 3' and 5' ends, respectively) you want SureDesign to include on each exon/UTR when selecting the probes for a target gene.

 NOTE  SureDesign does not include flanking bases for targets entered as genomic coordinates.

Allow Synonyms

When this check box is marked, SureDesign compares the gene names you entered into the Targets area to a table of synonyms, and may use the synonym names to map the genes to a genomic location. For example, if you entered HER2 as a target, SureDesign would identify HER2 as a product of the gene ERBB2, and use ERBB2 to map the genomic location.

In cases in which the gene name for your target is also a synonym for another gene, SureDesign treats both genes as targets when Allow Synonyms is marked. For example, if you entered DSP as a target, SureDesign would identify your target as the official gene name for desmoplakin, but it would also identify it as a synonym for the gene encoding dentin sialophosphoprotein. Consequently, the program would map the genomic location to two completely different genes, and in the next step of the wizard (Screen 2), you would see both genomic locations listed for the target.

If you leave the Allow Synonyms check box marked, Agilent recommends that you carefully review the individual target regions identified by SureDesign. To do that, in the next step of the wizard (Screen 2), click View targets in UCSC to see the target regions in the UCSC Genome Browser.

When the Allow Synonyms check box is cleared, SureDesign maps your targets to genomic locations using only the entered gene names.

To fully control how SureDesign maps your targets to a genomic location, enter your targets using transcript IDs, gene IDs, or SNP IDs instead of gene names. Alternatively, after you advance to the next step of the wizard, click Download to download the Regions BED file and then edit the genomic locations listed in the file so that they accurately match those of your targets. You can then go back to the previous step of the wizard and paste the genomic locations into the Targets input area.


Click Next to continue to Screen 2.

Screen 2 - Target Summary and Target Details

This screen provides a chance for you to make sure that SureDesign successfully recognized all of the target identifiers that you entered on the previous screen. Review the Target Summary and Target Details before you click Next.

Target Summary

Near the top of the wizard window is a target summary with two bullet points that indicate:

·        1st bullet point: The number of target identifiers (targetIDs) that SureDesign was able to resolve to a genomic location, and the total number of continuous genomic regions that comprise those targets. (If any of the target identifiers mapped to more than one genomic location, you will notice that the number of targets is greater than the number of TargetIDs. See SureDesign gene finder for more information on how SureDesign maps target IDs to targets.)

·        2nd bullet point: The number of target identifiers (targetIDs) that SureDesign was not able to find in any of the databases you selected on the previous screen.

If SureDesign did not accurately identify all of your target regions

Target Details

The Target Details table lists the following information for each of the target identifiers that SureDesign was able to locate:

·        Target ID - The target ID is the gene name, transcript ID, SNP ID, or genomic coordinates that you used to define the target.

·        # Regions - The # Regions column lists the number of target regions within the target.

·        Base Pairs - The Base Pairs column lists the total number of base pairs within the regions defined by the target identifier.

·        Position - The Position column lists the genomic coordinates identified for the target.

 NOTE  To perform a careful review of the individual regions, click View targets in UCSC to open the UCSC Genome Browser and see the genomic locations of the target regions identified by SureDesign.


Click Next to advance to Screen 3.

Screen 3 - Selection Parameters

On this screen, set the parameters described below. When you are finished making your selections, submit the probegroup to SureDesign to begin the probe selection process.

Stringency

The Stringency parameter determines the number of "off-target" genome matches permitted for the probes in the probegroup. The options in the drop-down list are described below.

 NOTE  The default selection is Maximize Specificity. Changing to one of the less stringent options (Balanced or Maximize Coverage) may increase coverage, but can significantly increase the capture of genomic regions that are not part of the target regions. To compensate, you may need to increase sequencing in order to achieve your desired read-depth.

·        Maximize Specificity - With this option, SureDesign limits probe selection so that at least one of the two arms uniquely matches the intended target. For the other arm, SureDesign permits multiple matches in the genome sequence. If your target regions include UTRs, introns, or other non-coding regions, Agilent strongly recommends the Maximize Specificity option.

·        Balanced - When the Balanced option is selected, SureDesign first tries to select probes in which at least one arm uniquely matches the intended target. Then, for any target regions that are still uncovered, it select probes in which one of the arms contains no more than two matches in the genome sequence (the other arm is permitted to have more than two matches). Probes in which neither arm is unique are more likely to contain matches outside of the target regions, which could decrease the specificity of the probegroup. This option is commonly used when a homologous pseudogene causes the Maximize Specificity option to remove probes from a target gene.

·        Maximize Coverage - This is the least stringent option for the Stringency parameter. When this option is selected, SureDesign first tries to select probes in which at least one arm uniquely matches the intended target. Then, for any target regions that are still uncovered, it selects probes in which one of the arms may contain up to 5 matches in the genome sequence (the other arm is permitted to have more than 5 matches). Agilent does not recommend the Maximize Coverage option unless the probegroup will be used to capture pseudogenes or other experiments in which target specificity is not a concern.

Optimize for Fragmented Samples

Mark this check box if you want SureDesign to select probes that are optimized for enriching targets from fragmented DNA samples, such as those isolated from FFPE tissues. In order to optimize the capture of fragmented DNA, SureDesign selects a greater number of probes that create amplicons between 50 and 100 bp in length. The program will also select probes to cover both strands of the target regions, which further increases the total number of probes in the design. Because of the increased number of probes, the price tier assigned to a design containing the probegroup may be higher than that of a design of the same size that was not optimized for fragmented samples. Since reads from amplicons that are shorter than the read length will contain adaptors and/or terminate early, additional reads are required to achieve your desired read-depth.

 NOTE  If you do not mark the Optimize for Fragmented Samples check box, SureDesign only selects probes for short amplicons if they are needed for design coverage. If you do mark the check box, the program selects probes for short amplicons even if longer amplicons already cover the target region.


To submit the probegroup for probe selection:

  1. Click Begin Probe Selection.

    A message box opens indicating the e-mail address where Agilent will contact you when the probe selection job is complete. If desired, you can enter additional e-mail addresses into the provided field.

  2. Click OK in the notification message to submit the probegroup to SureDesign.

    Your submission is placed in the SureDesign job queue to await probe selection.

    The wizard takes you to the Finalize step. Click Close to close the wizard.