When selecting the probes for a SureSelect target enrichment design, SureDesign can exclude probes that cover repetitive sequences within the target region. The program uses publicly available masking tools to find repetitive sequences and employs those tools based on the masking stringency level that you select in the design wizard.
Depending on the genome species and masking stringency level you select, SureDesign uses one or more of the following masking tools to determine if a sequence is repetitive.
RepeatMasker – SureDesign obtains these sequences directly from UCSC (http://hgdownload.cse.ucsc.edu/downloads.html). They are generated using RepeatMasker and Tandem Repeat Finder (with a period of 12 or less).
WindowMasker – To generate these sequences, SureDesign obtains the unmasked sequences from UCSC and then masks repetitive regions using the WindowMasker tool from NCBI with its default parameters.
Uniqueness
35 track – SureDesign uses this Duke
Uniqueness track to find 35mer sequences that occur more than
5 times in the tiling interval.
The SureSelect design wizard offers four options for the level of stringency for repetitive sequence masking: Most Stringent, Moderately Stringent, Least Stringent, and No Masking. The default masking selection is Moderately Stringent Masking.
When you select No masking, SureDesign does not mask any sequences and creates probes across the entire target region.
When you select a stringency of Most, Moderate or Least, SureDesign masks sequences based on one or more of the masking tools described above. Because different species may have different masked sequence sets available, the criteria for the stringency options are dependent on the species you specify.
For the H. sapiens genome, the stringency criteria are:
Least Stringent Masking – A sequence must be masked by RepeatMasker, WindowMasker, and the Duke Uniqueness 35 track in order to be masked by SureDesign. Because a sequence must be identified as repetitive by all 3 masking tools, this option results in the least stringent masking of your tiling interval.
Moderately Stringent Masking – A sequence must be masked by RepeatMasker and WindowsMasker in order to be masked by SureDesign. It does not need to be in the Duke Uniqueness 35 sequence set.
Most Stringent Masking – A sequence must be masked by RepeatMasker in order to be masked by SureDesign. It does not need to be in the WindowMasker or Duke Uniqueness 35 sequence sets.
For non-human genomes, such as mouse (M. musculus) and rat (R. norvegicus), the Least Stringent option and the Moderately Stringent option use the same criteria because the Duke Uniqueness 35 track is not available. Some genomes also do not have a RepeatMasker sequence set available (e.g. Arabidopsis thaliana). For these species, all 3 stringency options use the same criteria. Consult the table below for a complete list of the criteria for each stringency level by species.
|
Least stringent |
Moderately stringent |
Most stringent |
A. thaliana |
n/a |
n/a |
WindowMasker |
B. taurus |
n/a |
WindowMasker RepeatMasker |
RepeatMasker |
C. elegans |
n/a |
WindowMasker RepeatMasker |
RepeatMasker |
C. familiaris |
n/a |
WindowMasker RepeatMasker |
RepeatMasker |
C. jacchus |
n/a |
n/a |
WindowMasker |
D. melanogaster |
n/a |
WindowMasker RepeatMasker |
RepeatMasker |
D. rerio |
n/a |
WindowMasker RepeatMasker |
RepeatMasker |
G. gallus |
n/a |
WindowMasker RepeatMasker |
RepeatMasker |
H. sapiens |
WindowMasker RepeatMasker Uniqueness 35 |
WindowMasker RepeatMasker |
RepeatMasker |
M. mulatta |
n/a |
WindowMasker RepeatMasker |
RepeatMasker |
M. musculus |
n/a |
WindowMasker RepeatMasker |
RepeatMasker |
O. latipes |
WindowMasker |
WindowMasker |
WindowMasker |
O. sativa |
n/a |
n/a |
WindowMasker |
R. norvegicus |
n/a |
WindowMasker RepeatMasker |
RepeatMasker |
S. cerevisiae |
n/a |
n/a |
WindowMasker |
S. pombe |
n/a |
n/a |
WindowMasker |