Comprehending ClustalW: An escort to Multiple Sequence Alignment
In the sphere of bioinformatics, a determining approach known as multiple sequence alignment (MSA) is taken to align three or more biological sequences that may be a nucleotide or protein sequence. With the assistance of multiple sequence alignment, biologists are capable of studying the conserved sequence patterns connecting several organisms through their evolution and ancestral relationship. [i]One of the most commonly used tools for this objective is ClustalW.
ClustalW:
A bioinformatics tool was launched in 1988 by Desmond G. Higgins and Paul Sharp for multiple sequence alignment named as ClustalW.[ii]The name ClustalW is an amalgam of two, where “Clustal” reflects its algorithm for clustering sequences while “W” supports its idea of applying different weights to sequences through the alignment to improve precision.
Attributes of ClustalW:
Progressive Alignment: One of the unique features of ClustalW is that it aligns sequences gradually. It establishes the alignment process with almost alike pairs and then successively adds on sequences to the alignment. This process leads to an almost exact result.[iii]
Weighting of sequences: ClustalW applies sequence weighting method to escape from prejudice towards any sequence. It allocates contrasting weights to sequences based on their evolutionary relation so that the tightly linked sequences do not influence the alignment process.[iv]
Tailored Parameters: This tool allows its consumers to modify several parameters at varying positions in alignment like gap penalties or substitution matrices guiding to more biologically significant alignments.[v]
ClustalW operation:
Insert input: The users insert the sequence they hope to align which can be in several forms as ClustalW accepts formats including FASTA, EMBL and GenBank.
Pairwise alignment: Firstly,ClustalW generates a distance matrix among all sequences by utilizing dynamic programming to perform pairwise alignment.[vi]
Applications of ClustalW:
Biological research is accompanied with the use of ClustalW in many aspects.
Evolutionary scrutinization: The alignment of multiple sequences from various species can conclude their evolutionary relation and assist in assembling phylogenetic trees.
Functional interpretation: Conserved regions obtained through aligning sequences aids in providing annotations of a sequence.
Comparison of genomes: ClustalW aids in comparing genomes of variety of species which delivers the intuition about evolution and genetic diversity.
ClustalW Benefits:
ClustalW is a consumer friendly bioinformatics tool as it provides versatility through its customizable framework and its feature of progressive alignment works for higher accuracy in multiple sequence alignment process.
Restrictions:
As always, things occur to be mixed blessings, ClustalW is also accompanied by some limitations like its susceptibility to initial pairwise alignment error which may grow through the process of progressive alignment. [vii]It is also observed to be computationally intensive and time consuming for larger datasets.
Alternatives to ClustalW:
MUSCLE [viii](Multiple Sequence Comparison by Log-expectation), MAFFT[ix](Multiple Alignment using Fast Fourier Transform) and Clustal Omega [x]acts as a substitute of ClustalW as each of these tools are acceptable for distinct alignment tasks.
Conclusion:
ClustalW endures to be a cornerstone tool in bioinformatics for operating multiple sequence alignment. Its potential to generate accurate and meaningful alignments coupled with sturdy and customizable features makes it essential for researchers in biological fields. As bioinformatics makes its way, tools like ClustalW will remain pivotal to resolve the puzzles of biological sequences.
References:
[i]Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J., Higgins, D. G., & Thompson, J. D. (2003). Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Research, 31(13), 3497–3500. doi:10.1093/nar/gkg500
[ii]A bioinformatics tool was introduced in 1988 by Desmond G. Higgins and Paul Sharp for multiple sequence alignment named as ClustalW.
[iii]Joseph, A. P., Srinivasan, N., & de Brevern, A. G. (2012). Progressive structure-based alignment of homologous proteins: Adopting sequence comparison strategies. Biochimie, 94(9), 2025–2034. doi:10.1016/j.biochi.2012.05.028
[iv]Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22(22), 4673–4680. doi:10.1093/nar/22.22.4673
[v]Thompson, J. D. (1995). Introducing variable gap penalties to sequence alignment in linear space. Computer Applications in the Biosciences : CABIOS, 11(2), 181–186. doi:10.1093/bioinformatics/11.2.181
[vi]Li, K.-B. (2003). ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics (Oxford, England), 19(12), 1585–1586. doi:10.1093/bioinformatics/btg19234
[vii]Hu, J., Li, B., & Kihara, D. (2005). Limitations and potentials of current motif discovery algorithms. Nucleic Acids Research, 33(15), 4899–4913. doi:10.1093/nar/gki791
[viii]Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with improved accuracy and speed. Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004. Presented at the Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004., Stanford, CA, USA. doi:10.1109/csb.2004.1332560
[ix]Katoh, K., Misawa, K., Kuma, K.-I., & Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30(14), 3059–3066. doi:10.1093/nar/gkf436
[x]Sievers, F., & Higgins, D. G. (2014). Clustal Omega. In Current Protocols in Bioinformatics (p. 3.13.1-3.13.16). doi:10.1002/0471250953.bi0313s48