Although many other features have been added since 2003, one of the main purposes of SeqState remains automated primer design. For this, the alignment (NEXUS format; non-interleaved; see sample files) is screened for aligned internal primers, which also may be loaded from and saved to resource files. External primers (upstream and downstream of the alignment) can be specified as well. In each sequence, SeqState searches for stretches of missing data ("?"), which may contain, start with, or end with indel gap characters ("-") resulting from insertion of gaps in multiple sequences simultaneously. For each region of missing data, all primers are evaluated in terms of distance to the region and fit to the matching part of the sequence to be completed. Degeneracy of primers as well as ambiguity code in the template sequence is correctly interpreted, and mismatches in the head region receive particular attention.
For each section of absent data, the best primers are provided in the output. If no suitable primers are found, SeqState screens the sequence adjacent to the missing nucleotides (subtracting "-" and jumping over ambiguities) and suggests new primers for synthesis. To select the best primers, the program evaluates nucleotide composition of head and tail region, maximum length of primer dimer complements, primer specificity and annealing temperature, as well as the percentage of other sequences in the alignment to which the primer fits. SeqState allows specification and saving of user assumptions (e.g., the lengths of primer reads which strongly depend on the sequencer used).
The results are exported as a directly printable list and as a table that can be imported into other programs (e.g., Microsoft Excel). They are also printed to the screen and can be copied from there into any (online-) primer order form.
In addition, SeqState can calculate diverse characteristics of manually entered and/or currently loaded primers (e.g., Tm, fit to loaded sequences, primer dimers) and primer pairs (e.g., longest primer dimer complements and Tm differences between the two).
SeqState also supports character sets as understood by PAUP (Swofford, 1998) and calculates sequence statistics for the whole matrix and/or such character sets, including sequence length range, sequence divergence range, transition/transversion ratios, variability measures, and nucleotide composition. These are formatted as a table (on the screen and saved to a file) ready to be used in publications.
Finally, SeqState supports a number of published indel coding schemes. For details, please refer to:
Once SeqState runs, how to use it should be mostly self-evident by navigating through the menu bar and its items.
The first thing you will usually do is loading a data file from within the "File" menu. You will be prompted to a file dialog as you are used to from all other programs on your computer.
Currently, it is safest to use non-interleaved, normal NEXUS-files as input, such as they would be generated when using Paup's export command with format=nexus. Additional blocks following after the Data block may currently confuse SeqSate.
Charset commands should be restricted to contiguous sets e.g., charset one= 23-67 78 95-102; charsets using intervals such as 1-.\3 are not supported yet. Taxsets will be supported soon, but the current version does not deal with it properly, so better ommit the taxset command.
Second you might want to check the global settings (e.g., assumed primer read lengths) from the Primers menu. Use "choose/design primers" from the same menu to have SeqState analyse the data. Statistics are available from the Statistics menu; bootstrapping for standard errors can be adjusted via the Settings submenu.
Indel coding (a variety of simple or more complex schemes) can be required from the IndelCoder menu. Just choose your preferred coding scheme to have SeqState write a NEXUS file with indels coded that is ready to be executed in PAUP.
In the case of difficulties or questions, don't hesitate to contact me. Also, bug reports and comments are always highly appreciated.
Prof. Dr. Kai Müller
Research group for Evolution and Biodiverity of Plants
Institute for Evolution and Biodiverity
Westphalian Wilhelms-University, Münster, Germany
Hüfferstrasse 1
48149 Münster
Germany
E-mail: kaimueller
uni-muenster.de
http://bioinfweb.info/People/Mueller
In case SeqState was of any help for you I would appreciate its citation as follows:
Müller K: SeqState - primer design and sequence statistics for phylogenetic DNA data sets. Applied Bioinformatics 2005, 4:65-69
For each region of missing data, all primers are evaluated in terms of
If no suitable primers are found among those supplied in the alignment (i.e., they are all 1. too far away (user provided distance is exceeded), 2. and/or have >=3 mismatches, 3. and/or have >=1 mismatch in the head), SeqState screens the sequence adjacent to the missing nucleotides (ignoring "-") and suggests new primers for synthesis. The criteria are:
The potential primers found in a range of -100 bp and +100bp (of the target sequence, not matrix positions) from the beginning and end of the gap, respectively, are sorted by
The maximally 10 best primers are enlisted.
For a further evaluation through the user, the percentage of sequences in the alignment to which to primer fits perfectly is provided (plus the percentage of "valid" sequences in brackets, i.e., those sequences that don't have ???? in the respective region). Since the primary purpose of SeqState is to provide a reliable suggestion for well-working sequencing primers able to fill gaps of missing data, the matching of the primer to as many other sequences as possible is no priority during sorting of the best primers. However, if the primer should be used for more than only one taxon, the should decide to synthesize the primer with the highest fit percentage out of the 10 suggested.
Note: F primers are evaluated first (sequence upstream of the gap), R primers later, and added to a list. Therefore, F primers come first in a list of primers of identical quality (according to the above sorting criteria). If more than 10 primers are found, this would be unsatisfactory, since there is no reason why F primers should be privileged. Therefore, the list is shuffled prior to sorting, guaranteeing a more homogeneous distribution of F and R primers. This, however, introduces a certain level of randomness into the procedure, and explains why the lists provided by SeqState may sometimes not be completely identical in two subsequent analyses with identical settings and input files. Still, the 10 primers suggested are the best; there may be equally good other primers that are not output, and what is included into the top 10 therefore might vary. Since you won't synthesize them all, anyway, but should have enough suggestions to choose from, I considered this randomness to be of no significance for the purpose of the program. Advanced features that allow you to navigate through ALL primers found and extend the distance from the gap are currently implemented, and so is a refined scoring procedure for the primers.
Tm is estimated according to the formula Tm=69.3+0.41*GC[%]-650/primer_length.