For DNA, RNA and protein molecules up to 16MB, aligns all sequences of size K or greater. Similar alignments are grouped together for analysis. Sequence patterns can be filtered out of the comparison by replacing each character in the sequence patterns with the special character '-'.
For a quick start, follow sections "Pairwise Alignment Execution" and "MSA Alignment Execution" sequence alignment examples. Sections "Sequence Options and Features" and "How to Use this Tool" give a more complete description of the pairwise and MSA capabilities and options.
Copy and paste the below two line FASTA sequence into tab 'Pairwise Sequence Alignment', 'Step 1' text box
>Sequence 1
ACTAAGGCTCTCTACCCCTCTCAGAGA
Copy and paste the below two line FASTA sequence into tab 'Pairwise Sequence Alignment', 'Step 2' text box
>Sequence 2
ACTAAGGCTCCTAACCCCCTTTTCTCAGA
Select green 'Submit' button
All alignments between the two sequences are grouped and returned:
Sequence_1 0 ACTAAGGCTCTCTA-CCCC-TCT-CAGAGA 26
Sequence_2 0 ACTAAGGCTC-CTAACCCCCTTTTCTCAGA 28
Sequence_1 19 CTCAGAG 25
Sequence_2 1 CTAAG-G 6
Sequence_1 14 CCCCTCTC 21
Sequence_2 7 CTCCTAAC 14
Sequence_1 1 C-TAAGGCTCTCTAC 14
Sequence_2 9 CCTAACCCCCTTTTC 23
The first alignment is a global alignment of Sequence_1 characters 0 to 26 and Sequence_2 characters 0 to 28. The second alignment shows us that Sequence_1 characters 19-25 align with Sequence_2 characters 1-6. The third and fourth alignments show additional sequence alignments.
Beneath Download FASTA sequence input examples at the bottom of this page, select 'Pairewise Sequence Molecules' and
unzip the DNA molecules.
Open a new window ALLAlign
Select 'Pairwise Sequence Alignment' tab
Within Step 1 section, Select 'Choose File' button
Select 'human-alpha-globin.fasta' file
Within Step 2 section, Select 'Choose File' button
Select 'rabbit-alpha-globin.fasta' file
Select green 'Submit' button
All alignments between the Human Alpha Globin and Rabbit Alpha Globin are returned starting with:
HUMAN_sequence_from_alpha-globin_gene_cluster 8463 CCCAGGGCCTCTGGGACCTCC-TGGT-GC 8489
Rabbit_sequence_from_alpha-globin_gene_cluster 57 CCCAGG-CCTCTGGCATCTCCCTCGTCGC 84
HUMAN_sequence_from_alpha-globin_gene_cluster 12714 CCCAGGGCCTCTGGGACCTCC-TGGT-GC 12740
HUMAN_sequence_from_alpha-globin_gene_cluster 5870 ATCTCTGCAG-GTGCCCAGGCCAA-GGCAT-TCCCT 5902
Rabbit_sequence_from_alpha-globin_gene_cluster 44 AGCT-TGCTGTGTGCCCAGGCCTCTGGCATCTCCCT 78
HUMAN_sequence_from_alpha-globin_gene_cluster 12669 GAACTCACTGTGTGCCCAG-CC-CTG--AGCTCCC 12699
Rabbit_sequence_from_alpha-globin_gene_cluster 43 GAGCTTGCTGTGTGCCCAGGCCTCTGGCATCTCCC 77
HUMAN_sequence_from_alpha-globin_gene_cluster 8418 GAACTCACTGTGTGCCCAG-CC-CTG--AGCTCCC 8448
HUMAN_sequence_from_alpha-globin_gene_cluster 987 CACCCTGTGACACTGGGTCCCACTTTCTCT 1016
Rabbit_sequence_from_alpha-globin_gene_cluster 93 CACCCTG-GA-ACTGG--CCC-CTGTC-CT 116
...
Alignments are grouped for similarity. The first group shows HUMAN_sequence_from_alpha-globin_gene_cluster characters 8463-8489, 57-84 and Rabbit_sequence_from_alpha-globin_gene_cluster 57-84 align together. Subsequent groups show additional alignments.
Copy and paste the below three FASTA sequences (6 lines) into tab 'MSA Sequence Alignment', 'Step 1' text box
>Sequence 1
GCCCAGTAGCTTCCCAATATGAGAGCATCAATTGTAGATCGGGCC
>Sequence_2
TCTATAAGATTCCGCATGCGTTACTTATAAGATGTCTCAACGG
>Sequence_3
TAGAGATTAATTGCCACTGCCAAAATTCTG
Within 'Step 2', under 'MinMaxSize', select 12
Within 'Step 2', under 'MaxMismatchRatio', select 1/5
Select green 'Align' button
All alignments between the three sequences are grouped and returned:
Sequence_2 6 AGATTCCGCA-TGC 18
Sequence_3 8 A-ATTGC-CACTGC 19
Sequence_1 6 TA-GCTTCC-CAAT 17
Sequence_2 4 TAAGATTCCGCA-T 16
Sequence_2 25 TATAAGATGTC-TCAA 39
Sequence_1 17 TATGAGA-G-CATCAA 30
Sequence_3 15 A-CTGCCAAA-AT 25
Sequence_1 7 AGCTTCCCAATAT 19
Sequence_3 1 AGAG-ATTAATTGCCA 15
Sequence_1 21 AGAGCATCAATTGT-A 35
Sequence_2 22 ACTTATAAGAT-G 33
Sequence_1 29 AATTGTA-GATCG 40
Beneath Download FASTA sequence input examples at the bottom of this page, select 'MSA Sequence Molecules' and
unzip the DNA molecules.
Open a new window ALLAlign
Select 'MSA Sequence Alignment' tab
Within Step 1 section, Select 'Choose File' button
Select 'p1_split_ecoli_DH1_illumina.sections.fasta' file
Select green 'Align' button
All alignments between the three ecoli sections (ecoli1_1,ecoli1_2,ecoli1_3) are grouped and returned starting with:
ecoli1_3 2316 CGTTGCTGATGAATATCTTCTGCTCACATA 2345
ecoli1_1 55 CGTTGC-GC-GAATTTCT-CTGC-CAAA-A 79
ecoli1_3 1837    GCCCGACAA-ATCATTGACG-CAAT 1859
ecoli1_1 576    GCC-GTCAACAGCAGTGATGACAAT 599
ecoli1_3 802 C-CGC-GT-AAGCAG-AG-G-TGAC 820
ecoli1_1 573 CTCGCCGTCAA-CAGCAGTGATGAC 596
ecoli1_1 70 TCTGCC--A-AAATCAGTCA-G-A 88
ecoli1_3 1834 TCTGCCCGACAAATCATTGACGCA 1857
ecoli1_1 74 CCAAAATCA-G-TCAGATCGCGACA-A-TT 99
ecoli1_2 1709 CCAAAATAACGATCA--TCACGACATAATT 1736
...
ALL is a high speed, large dataset sequence alignment tool for Pairwise and Multiple Sequence Alignment (MSA). ALL processes both Protein and Nucleotide sequence alignments. The type of sequence is automatically recognized. Any printable character set can be used except special and reserved characters. There can be up to 31 characters in the set. If a query sequence contains the special character '-' it will never match another character and will show as '_' in sequence alignment output. Characters '~', '@', '&' and '%' are reserved and should not be used in sequences.
One or more FASTA input lines can be used in any or all sequence text boxes or upload files for both Pairwise alignment and MSA. For example, in pairwise alignment, the first sequence might contain 100 lines of FASTA input, perhaps portions of several bacteria genomes. The second sequence might contain one large FASTA sequence line, for example a well known bacteria chromosome. In this case the 100 lines of FASTA input would be compared to the large FASTA sequence.
With the pairwise and MSA web form, the ALL alignment tool supports up to 16MB of sequence characters per text box or upload file. Layout of web forms are similar to EMBL-EBI web pages.
For MSA alignment, select the “MSA alignment” link. For pairwise alignment, select the “Pair alignment” link. The "MSA alignment" form is submitted with one sequence. The "Pair alignment" form is submitted with two sequences.
The FASTA title of each sequence element must be unique. For "Pair alignment" the title must be unique across both sequences. Please reference FASTA input sequences in the "FASTA Sequence Input Examples" section below.
Within each sequence text box or upload file, enter a free text list of FASTA sequence lines. The free text list contains a block of characters representing several DNA/RNA or Protein sequences. FASTA is the accepted sequence format. Partially formatted sequences are not accepted. Note that using data directly from word processors is not recommended. The word processor may add hidden/control characters that may cause unpredictable results. Please reference FASTA input sequences in the "FASTA Sequence Input Examples" section below.
The MSA form has one sequence text box or upload file. The pairwise form has two sequence text boxes or upload files.
MinMatchSize is the minimum size for a returned sequence alignment. The returned sequence alignment will be greater than or equal to the MinMatchSize. A smaller MinMatchSize will return more sequence alignments and take more time to complete.
For a value of "Automatic" the ALL algorithm will determine the value of MinMatchSize based mainly on the number of characters in the search sequence. A specific MinMatchSize can be chosen from the drop down list. The algorithm may not allow too small a MinMatchSize based on the size of the query sequence.
MaxMismatchRatio is the approximate maximum edit distance percent for a returned sequence alignment. For example, if MaxMismatchRatio=1/4 and the returned alignment is 44 characters, then the edit distance will generally be less than 11 since 11/44 = 1/4. 1/4 = 25%. A smaller MaxMismatchRatio will return less sequence alignments and take less time to complete.
For a value of "Automatic" the ALL algorithm will generally set MaxMismatchRatio=1/3.
# | Value | Description |
---|---|---|
1 | SRS | EMBOSS alignment format. This shows the sequence ID name, the sequence position, the sequence and the sequence position for each line. Alignments are returned in sets. For example, for alignments of (A,B),(B,C),(C,D), returned alignments will generally be in one set of (A,B,C,D). SRSPair, CSV and CSVIndex output formats can be submitted for the same alignment request to cross-referance the best match for a given sequence within a set. |
2 | SRSPair | EMBOSS alignment format. This shows the sequence ID name, the sequence position, the sequence and the sequence position for each line. Alignments are returned in pairs. For example, for alignments of (A,B),(B,C),(C,D), returned alignments will generally be in six pairs of (A,B),(A,C),(A,D),(B,C),(B,D),(C,D) |
3 | CSV | Each comma-separated values line contains a single sequence alignment. Each line contains the sequence ID names, the sequence positions, the edit distance and the alignment sequences. |
4 | CSVIndex | CSVIndex is the same as CSV, but does not include a print out of the alignment sequences. Print outs can be created using the sequence positions and the associated submitted query sequences. This is a good choice when the expected number of sequence alignments is very large. |
The FASTA examples contain a list of DNA/RNA or Protein sequences.
Lines starting with # are ignored. For example,
#This is a comment line, e.g. E. coli bacteria
#This line is also ignored
An example DNA sequence element in the list might be:
>Escherichia coli,DH1
CATTATCGACTTTTGTTCGAGTGGAGTCCGCCGTGTCACTTTCGCTTTGGCAGCAGTGTCTTGCCCGATT
GCAGGATGAGTTACCAGCCACAGAATTCAGTATGTGGATACGCCCATTGCAGGCGGAACTGAGCGATAAC
The line starting with '>' is the title of this sequence element.
The DNA sequence is broken into two lines (CATT... and GCA...). One line
or any number of lines is accepted.
Each FASTA sequence in the list must contain a title line starting with '>',
followed by the DNA/protein sequence.
For questions, requests, thoughts or issues contact us.