ALL is a high speed, large dataset sequence alignment tool for Pairwise and Multiple Sequence Alignment (MSA). ALL processes both Protein and Nucleotide sequence alignments. The type of sequence is automatically recognized. Any printable character set can be used except special and reserved characters. There can be up to 31 characters in the set. If a query sequence contains the special character '-' it will never match another character and will show as '_' in sequence alignment output. Characters '~', '@', '&' and '%' are reserved and should not be used in sequences.
One or more FASTA input lines can be used in any or all sequence text boxes or upload files for both Pairwise alignment and MSA. For example, in pairwise alignment, the first sequence might contain 100 lines of FASTA input, perhaps portions of several bacteria genomes. The second sequence might contain one large FASTA sequence line, for example a well known bacteria chromosome. In this case the 100 lines of FASTA input would be compared to the large FASTA sequence.
With the pairwise and MSA web form, the ALL alignment tool supports up to 32MB of sequence characters per text box or upload file. The algorithm has been tested with a pairwise comparison of 1.6GB x 1.6GB. Layout of web forms are similar to EMBL-EBI web pages.
For MSA alignment, select the “MSA alignment” link. For pairwise alignment, select the “Pair alignment” link. The "MSA alignment" form is submitted with one sequence. The "Pair alignment" form is submitted with two sequences.
The FASTA title of each sequence element must be unique. For "Pair alignment" the title must be unique across both sequences. Please reference FASTA input sequences in the "FASTA Sequence Input Examples" section below.
Within each sequence text box or upload file, enter a free text list of FASTA sequence lines. The free text list contains a block of characters representing several DNA/RNA or Protein sequences. FASTA is the accepted sequence format. Partially formatted sequences are not accepted. Note that using data directly from word processors is not recommended. The word processor may add hidden/control characters that may cause unpredictable results. Please reference FASTA input sequences in the "FASTA Sequence Input Examples" section below.
The MSA form has one sequence text box or upload file. The pairwise form has two sequence text boxes or upload files.
MinMatchSize is the minimum size for a returned sequence alignment. The returned sequence alignment will be greater than or equal to the MinMatchSize. A smaller MinMatchSize will return more sequence alignments and take more time to complete.
For a value of "Automatic" the ALL algorithm will determine the value of MinMatchSize based mainly on the number of characters in the search sequence. A specific MinMatchSize can be chosen from the drop down list. The algorithm may not allow too small a MinMatchSize based on the size of the query sequence.
MaxMismatchRatio is the approximate maximum edit distance percent for a returned sequence alignment. For example, if MaxMismatchRatio=1/4 and the returned alignment is 44 characters, then the edit distance will generally be less than 11 since 11/44 = 1/4. 1/4 = 25%. A smaller MaxMismatchRatio will return less sequence alignments and take less time to complete.
For a value of "Automatic" the ALL algorithm will generally set MaxMismatchRatio=1/3.
|1||SRS||EMBOSS alignment format. This shows the sequence ID name, the sequence position, the sequence and the sequence position for each line. Alignments are returned in sets. For example, for alignments of (A,B),(B,C),(C,D), returned alignments will generally be in one set of (A,B,C,D). SRSPair, CSV and CSVIndex output formats can be submitted for the same alignment request to cross-referance the best match for a given sequence within a set.|
|2||SRSPair||EMBOSS alignment format. This shows the sequence ID name, the sequence position, the sequence and the sequence position for each line. Alignments are returned in pairs. For example, for alignments of (A,B),(B,C),(C,D), returned alignments will generally be in six pairs of (A,B),(A,C),(A,D),(B,C),(B,D),(C,D)|
|3||CSV||Each comma-separated values line contains a single sequence alignment. Each line contains the sequence ID names, the sequence positions, the edit distance and the alignment sequences.|
|4||CSVIndex||CSVIndex is the same as CSV, but does not include a print out of the alignment sequences. Print outs can be created using the sequence positions and the associated submitted query sequences. This is a good choice when the expected number of sequence alignments is very large.|
The FASTA examples contain a list of DNA/RNA or Protein sequences.
Lines starting with # are ignored. For example,
#This is a comment line, e.g. E. coli bacteria
#This line is also ignored
An example DNA sequence element in the list might be:
The line starting with '>' is the title of this sequence element. The DNA sequence is broken into two lines (CATT... and GCA...). One line or any number of lines is accepted.
Each FASTA sequence in the list must contain a title line starting with '>', followed by the DNA/protein sequence.
For questions, requests, thoughts or issues contact us.