dna-icon ALL All Local Alignment

STEP 1 - Enter your nucleotide, protein or other sequence
Enter or paste your first sequence as a single line or in FASTA format


Or,

AND
Enter or paste your second sequence as a single line or in FASTA format

Or,
STEP 2 - Choose Options
STEP 1 - Enter your nucleotide, protein or other sequence
Enter or paste your sequence as a single line or in FASTA format

Or,
STEP 2 - Choose Options

Introduction

Overview

For DNA, RNA and protein molecules up to 32MB, aligns all sequences of size K or greater. Similar alignments are grouped together for analysis. Sequence patterns can be filtered out of the comparison by replacing each character in the sequence patterns with the special character '-'.

For a quick start, follow sections "Pairwise Alignment Execution" and "MSA Alignment Execution" sequence alignment examples. Sections "Sequence Options and Features" and "How to Use this Tool" give a more complete description of the pairwise and MSA capabilities and options.

Pairwise Alignment Execution

Pairwise DNA alignment

Copy and paste the below two line FASTA sequence into tab 'Pairwise Sequence Alignment', 'Step 1' text box
>Sequence 1
ACTAAGGCTCTCTACCCCTCTCAGAGA

Copy and paste the below two line FASTA sequence into tab 'Pairwise Sequence Alignment', 'Step 2' text box
>Sequence 2
ACTAAGGCTCCTAACCCCCTTTTCTCAGA

Select green 'Submit' button

All alignments between the two sequences are grouped and returned:

Sequence_1 0 ACTAAGGCTCTCTA-CCCC-TCT-CAGAGA 26
Sequence_2 0 ACTAAGGCTC-CTAACCCCCTTTTCTCAGA 28

Sequence_1 19 CTCAGAG 25
Sequence_2  1 CTAAG-G 6

Sequence_1 14 CCCCTCTC 21
Sequence_2  7 CTCCTAAC 14

Sequence_1 1 C-TAAGGCTCTCTAC 14
Sequence_2 9 CCTAACCCCCTTTTC 23

The first alignment is a global alignment of Sequence_1 characters 0 to 26 and Sequence_2 characters 0 to 28. The second alignment shows us that Sequence_1 characters 19-25 align with Sequence_2 characters 1-6. The third and fourth alignments show additional sequence alignments.

Human Alpha Globin to Rabbit Alpha Globin alignment

Beneath Download FASTA sequence input examples at the bottom of this page, select 'Pairewise Sequence Molecules' and unzip the DNA molecules.

Open a new window ALLAlign

Select 'Pairwise Sequence Alignment' tab
Within Step 1 section, Select 'Choose File' button
Select 'human-alpha-globin.fasta' file
Within Step 2 section, Select 'Choose File' button
Select 'rabbit-alpha-globin.fasta' file
Select green 'Submit' button

All alignments between the Human Alpha Globin and Rabbit Alpha Globin are returned starting with:

HUMAN_sequence_from_alpha-globin_gene_cluster  8463 CCCAGGGCCTCTGGGACCTCC-TGGT-GC 8489
Rabbit_sequence_from_alpha-globin_gene_cluster   57 CCCAGG-CCTCTGGCATCTCCCTCGTCGC 84
HUMAN_sequence_from_alpha-globin_gene_cluster 12714 CCCAGGGCCTCTGGGACCTCC-TGGT-GC 12740

HUMAN_sequence_from_alpha-globin_gene_cluster  5870 ATCTCTGCAG-GTGCCCAGGCCAA-GGCAT-TCCCT 5902
Rabbit_sequence_from_alpha-globin_gene_cluster   44 AGCT-TGCTGTGTGCCCAGGCCTCTGGCATCTCCCT 78
HUMAN_sequence_from_alpha-globin_gene_cluster 12669 GAACTCACTGTGTGCCCAG-CC-CTG--AGCTCCC 12699
Rabbit_sequence_from_alpha-globin_gene_cluster   43 GAGCTTGCTGTGTGCCCAGGCCTCTGGCATCTCCC 77
HUMAN_sequence_from_alpha-globin_gene_cluster  8418 GAACTCACTGTGTGCCCAG-CC-CTG--AGCTCCC 8448

HUMAN_sequence_from_alpha-globin_gene_cluster   987 CACCCTGTGACACTGGGTCCCACTTTCTCT 1016
Rabbit_sequence_from_alpha-globin_gene_cluster   93 CACCCTG-GA-ACTGG--CCC-CTGTC-CT 116
...

Alignments are grouped for similarity. The first group shows HUMAN_sequence_from_alpha-globin_gene_cluster characters 8463-8489, 57-84 and Rabbit_sequence_from_alpha-globin_gene_cluster 57-84 align together. Subsequent groups show additional alignments.

MSA Alignment Execution

MSA DNA alignment

Copy and paste the below three FASTA sequences (6 lines) into tab 'MSA Sequence Alignment', 'Step 1' text box
>Sequence 1
GCCCAGTAGCTTCCCAATATGAGAGCATCAATTGTAGATCGGGCC
>Sequence_2
TCTATAAGATTCCGCATGCGTTACTTATAAGATGTCTCAACGG
>Sequence_3
TAGAGATTAATTGCCACTGCCAAAATTCTG

Within 'Step 2', under 'MinMaxSize', select 12
Within 'Step 2', under 'MaxMismatchRatio', select 1/5
Select green 'Align' button

All alignments between the three sequences are grouped and returned:

Sequence_2  6  AGATTCCGCA-TGC 18
Sequence_3  8  A-ATTGC-CACTGC 19
Sequence_1  6   TA-GCTTCC-CAAT 17
Sequence_2  4   TAAGATTCCGCA-T 16
Sequence_2 25 TATAAGATGTC-TCAA 39
Sequence_1 17 TATGAGA-G-CATCAA 30
Sequence_3 15    A-CTGCCAAA-AT 25
Sequence_1  7    AGCTTCCCAATAT 19

Sequence_3  1 AGAG-ATTAATTGCCA 15
Sequence_1 21 AGAGCATCAATTGT-A 35

Sequence_2 22 ACTTATAAGAT-G 33
Sequence_1 29 AATTGTA-GATCG 40

MSA alignment between Three Ecoli sections

Beneath Download FASTA sequence input examples at the bottom of this page, select 'MSA Sequence Molecules' and unzip the DNA molecules.

Open a new window ALLAlign

Select 'MSA Sequence Alignment' tab
Within Step 1 section, Select 'Choose File' button
Select 'p1_split_ecoli_DH1_illumina.sections.fasta' file
Select green 'Align' button

All alignments between the three ecoli sections (ecoli1_1,ecoli1_2,ecoli1_3) are grouped and returned starting with:

ecoli1_3 2316 CGTTGCTGATGAATATCTTCTGCTCACATA 2345
ecoli1_1   55 CGTTGC-GC-GAATTTCT-CTGC-CAAA-A 79

ecoli1_3 1837    GCCCGACAA-ATCATTGACG-CAAT 1859
ecoli1_1  576    GCC-GTCAACAGCAGTGATGACAAT 599
ecoli1_3  802 C-CGC-GT-AAGCAG-AG-G-TGAC 820
ecoli1_1  573 CTCGCCGTCAA-CAGCAGTGATGAC 596
ecoli1_1   70 TCTGCC--A-AAATCAGTCA-G-A 88
ecoli1_3 1834 TCTGCCCGACAAATCATTGACGCA 1857

ecoli1_1   74 CCAAAATCA-G-TCAGATCGCGACA-A-TT 99
ecoli1_2 1709 CCAAAATAACGATCA--TCACGACATAATT 1736
...

Sequence Options and Features

Options and Features

ALL is a high speed, large dataset sequence alignment tool for Pairwise and Multiple Sequence Alignment (MSA). ALL processes both Protein and Nucleotide sequence alignments. The type of sequence is automatically recognized. Any printable character set can be used except special and reserved characters. There can be up to 31 characters in the set. If a query sequence contains the special character '-' it will never match another character and will show as '_' in sequence alignment output. Characters '~', '@', '&' and '%' are reserved and should not be used in sequences.

One or more FASTA input lines can be used in any or all sequence text boxes or upload files for both Pairwise alignment and MSA. For example, in pairwise alignment, the first sequence might contain 100 lines of FASTA input, perhaps portions of several bacteria genomes. The second sequence might contain one large FASTA sequence line, for example a well known bacteria chromosome. In this case the 100 lines of FASTA input would be compared to the large FASTA sequence.

With the pairwise and MSA web form, the ALL alignment tool supports up to 32MB of sequence characters per text box or upload file. Layout of web forms are similar to EMBL-EBI web pages.

How to use this tool

For MSA alignment, select the “MSA alignment” link. For pairwise alignment, select the “Pair alignment” link. The "MSA alignment" form is submitted with one sequence. The "Pair alignment" form is submitted with two sequences.

The FASTA title of each sequence element must be unique. For "Pair alignment" the title must be unique across both sequences. Please reference FASTA input sequences in the "FASTA Sequence Input Examples" section below.

Step 1 - Input Sequences

Within each sequence text box or upload file, enter a free text list of FASTA sequence lines. The free text list contains a block of characters representing several DNA/RNA or Protein sequences. FASTA is the accepted sequence format. Partially formatted sequences are not accepted. Note that using data directly from word processors is not recommended. The word processor may add hidden/control characters that may cause unpredictable results. Please reference FASTA input sequences in the "FASTA Sequence Input Examples" section below.

The MSA form has one sequence text box or upload file. The pairwise form has two sequence text boxes or upload files.

Step 2 - Input Options
MinMatchSize
Default value is: Automatic

MinMatchSize is the minimum size for a returned sequence alignment. The returned sequence alignment will be greater than or equal to the MinMatchSize. A smaller MinMatchSize will return more sequence alignments and take more time to complete.

For a value of "Automatic" the ALL algorithm will determine the value of MinMatchSize based mainly on the number of characters in the search sequence. A specific MinMatchSize can be chosen from the drop down list. The algorithm may not allow too small a MinMatchSize based on the size of the query sequence.


MaxMismatchRatio
Default value is: Automatic

MaxMismatchRatio is the approximate maximum edit distance percent for a returned sequence alignment. For example, if MaxMismatchRatio=1/4 and the returned alignment is 44 characters, then the edit distance will generally be less than 11 since 11/44 = 1/4. 1/4 = 25%. A smaller MaxMismatchRatio will return less sequence alignments and take less time to complete.

For a value of "Automatic" the ALL algorithm will generally set MaxMismatchRatio=1/3.


OutputFormat
Default value is: SRS
# Value Description
1 SRS EMBOSS alignment format. This shows the sequence ID name, the sequence position, the sequence and the sequence position for each line. Alignments are returned in sets. For example, for alignments of (A,B),(B,C),(C,D), returned alignments will generally be in one set of (A,B,C,D). SRSPair, CSV and CSVIndex output formats can be submitted for the same alignment request to cross-referance the best match for a given sequence within a set.
2 SRSPair EMBOSS alignment format. This shows the sequence ID name, the sequence position, the sequence and the sequence position for each line. Alignments are returned in pairs. For example, for alignments of (A,B),(B,C),(C,D), returned alignments will generally be in six pairs of (A,B),(A,C),(A,D),(B,C),(B,D),(C,D)
3 CSV Each comma-separated values line contains a single sequence alignment. Each line contains the sequence ID names, the sequence positions, the edit distance and the alignment sequences.
4 CSVIndex CSVIndex is the same as CSV, but does not include a print out of the alignment sequences. Print outs can be created using the sequence positions and the associated submitted query sequences. This is a good choice when the expected number of sequence alignments is very large.
Step 3 - Submit Query
The query request is submitted to the ALL alignment algorithm.

FASTA sequence input examples

FASTA Sequence Description and Download

The FASTA examples contain a list of DNA/RNA or Protein sequences. Lines starting with # are ignored. For example,
#This is a comment line, e.g. E. coli bacteria
#This line is also ignored

An example DNA sequence element in the list might be:
>Escherichia coli,DH1
CATTATCGACTTTTGTTCGAGTGGAGTCCGCCGTGTCACTTTCGCTTTGGCAGCAGTGTCTTGCCCGATT
GCAGGATGAGTTACCAGCCACAGAATTCAGTATGTGGATACGCCCATTGCAGGCGGAACTGAGCGATAAC
The line starting with '>' is the title of this sequence element. The DNA sequence is broken into two lines (CATT... and GCA...). One line or any number of lines is accepted.
Each FASTA sequence in the list must contain a title line starting with '>', followed by the DNA/protein sequence.

Download FASTA sequence input examples:

Contact

Contact us

For questions, requests, thoughts or issues contact us.

Copyright © Ed Wachtel 2019