Assume that your four buddies, using slightly different techniques, can all create spaghetti. Finding out whose techniques are most comparable is what you want to do.
Every time, Alice cooks penne pasta for ten minutes before adding tomato sauce.
Additionally, Bob cooks penne pasta for eight minutes before adding tomato sauce.
Charlie uses pasta, adds tomato sauce, and cooks it for 12 minutes.
Diana cooks pasta for a full twelve minutes, but she uses Alfredo sauce rather than tomato sauce.
You may draw a "distance" based on the differences between each pair of pals in order to compare their approaches:
Bob and Alice: They have different boiling times, but they both utilise tomato sauce and penne. Thus, there is not much space separating them.
Alice and Charlie: They use tomato sauce, but they cook pasta differently and with various kinds of pasta. It's a little bit farther away.
Charlie and Diana: Their gap is also minor because they use different sauces but the same pasta kind and cooking time.
Bob and Diana: The biggest difference is in the pasta kind, boiling time, and sauce.
Now, if you were to make a diagram to illustrate how similar their approaches to creating spaghetti are, you would group Alice and Bob together, Charlie and Diana together, and the pairings more away.
In phylogenetic analysis, scientists do something similar with genetic data. Instead of pasta recipes, they compare DNA sequences to see how closely related different species are, building a "tree" that shows these relationships based on the distances between them, just like your pasta diagram.
Understanding Distance-Based Methods in Phylogenetic Analysis:
Phylogenetic analysis involves the study of evolutionary links among species, organisms, or genes with the help of a phylogenetic tree. There are two main methods to contruct these trees:
- Distance-based
- Character-based
This article will elaborate the Distance-based methods of phylogenetic analysis, its types, approaches, advantages and limitations.
Distance-Based Method:
Among different method of phylogenetic analysis, Dstance-Based method is computationally efficient and easy-to-use approach. It is used to understand the evolutionary distances between sequences and construct the tree of relationship.
Distance-Based Method includes the claculation of pairwise distances between sequences,(just like 2 pairs of your friends in the case of paste example) then these sequences create phylogenetic trees. Here, the “distance” refers to a numerical representation of the evolutionary divergence between two sequences. The divergence can be based on various factors i.e. number of substitutions per site in DNA sequences or the number of amino acid changes in protein sequences.
Distance Matrix:
A matrix type known as the distance matrix is used to express the distance between any two sequences. The phylogenetic tree's matrix is based on the placement of more similar sequences—those with fewer distances between them—closer together and more dissimilar sequences further away.
What are Distance-Based Methods?
While there are several algorithms that come under the category of distance-based approaches, the NJ (Neighbour-Joining) method and UPGMA (Unweighted Pair Group Method with Arithmetic Mean) are two of the more well-known ones:
Neighbor-Joining Method:
This method contructs phylogenetic trees by minimizing the total branch length (the amount of evolutionary change or divergence).
It works by considering all sequences and calculating the distances between them. The alogrithm then pairs the sequences with minimum branch length, forming a tree.
Understand with the example:
Imagine you want to understand how close a group of cities is to each other, based on the distances between them. You have the following cities: A, B, C, and D.
You measure the distance between each pair of cities:
- A to B: 50 miles
- A to C: 40 miles
- A to D: 70 miles
- B to C: 30 miles
- B to D: 60 miles
- C to D: 20 miles
These distances can be thought of as similar to evolutionary distances between species in phylogenetic analysis.
Now using NJ method, create a Distance Matrix that shows distances between each ppairof neighboring cities, find the closest pair (in this case, C & D are the closest), then join the closest pair together (C & D) and treat them as a single group and calculate the distance from this new group (CD) to ther cities (A & B), keep repeating until all cities are joined into a single "tree".The tree represents the relationship between cities on distance basis, just like this NJ method creates a phylogenetic tree that shows evolutionary relationships between species.
Tools using NJ methods:
Several bioinformatics tools use the Neighbor-Joining (NJ) method for constructing phylogenetic trees like:
- MEGA (Molecular Evolutionary Genetics Analysis)
- PHYLIP (Phylogeny Inference Package)
- Clustal Omega
- RAxML (Randomized Axelerated Maximum Likelihood)
- Geneious
- IQ Tree
Advantages of NJ Method:
Nj method is fast and suitable for large datasets. It doesnot assume a constant rate of evolution making it more flexible in handling real-world data. It is widely used in molecular biology , epidomiology and evolutionary biology to know the evolutionary relationships.
UPGMA (Unweighted Pair Group Method with Arithmetic Mean):
It is a simple and straightforward hierarchical clustering technique used in phylogenetic research to produce a dendrogram, or rooted tree. UPGMA creates trees on the presumption that all lineages change throughout time at the same pace, or what is known as the "molecular clock," which is a constant rate of evolution. If there are notable differences in the rates of evolution between lineages, then this assumption may not be accurate.
According on their pairwise distances, UPGMA groups species or sequences. The nearest pair of sequences is used as the starting point, and it gradually joins them into bigger clusters.
The technique determines the pairwise distances between sequences using a distance matrix, which is comparable to the NJ approach.
UPGMA consistently generates a rooted tree, in which the root denotes the species' or sequences' most recent common ancestor.
Understanding with the example:
Let’s say you have four friends—Alice, Bob, Charlie, and Diana—and you want to know how similar their daily routines are. You decide to compare how much time each of them spends on three activities: sleeping, working, and leisure.
Here’s what you find:
- Alice: Sleeps 8 hours, works 8 hours, and has 8 hours of leisure.
- Bob: Sleeps 7 hours, works 9 hours, and has 8 hours of leisure.
- Charlie: Sleeps 6 hours, works 10 hours, and has 8 hours of leisure.
- Diana: Sleeps 8 hours, works 8 hours, and has 8 hours of leisure.
Now, you want to create a "tree" that shows how similar their routines are.
Find the closest pair (Alice & Diana), recalculate distance from this pair to others, group the next closest, repeat. Alice and Diana are grouped together because their routines are identical. Bob is grouped next with Alice and Diana because his routine is similar but not identical. Charlie is grouped last, showing that his routine is the most different. UPGMA groups the most similar sequences first, then gradually merges them into larger clusters, assuming that everyone’s routine changes at the same rate over time.
Tools using UPGMA method:
Several bioinformatics tools utilize the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) method for constructing phylogenetic trees like:
- MEGA
- PHYLIP
- Clustal
- DendroUPGMA
Advantages of UPGMA method:
When analysing data with reasonably uniform evolutionary rates or in specific types of molecular clock research, UPGMA is frequently employed for basic datasets where the assumption of a constant rate of evolution is plausible. It is preferable to use other approaches such as Maximum Likelihood or Neighbor-Joining for more intricate datasets with different rates of evolution.
Final Thoughts:
Distance-based methods like UPGMA and Neighbor-Joining are essential tools for constructing phylogenetic trees and understanding evolutionary relationships. UPGMA is simple and assumes a constant rate of evolution, while Neighbor-Joining is more flexible, accommodating varying rates. By choosing the right method for your data, you can accurately trace the evolutionary history of species, providing valuable insights into the interconnectedness of life.
References:
- N Saitou, M Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees., Molecular Biology and Evolution, Volume 4, Issue 4, Jul 1987, Pages 406–425, https://doi.org/10.1093/oxfordjournals.molbev.a040454
- Felsenstein, J. (1989). "PHYLIP - Phylogeny Inference Package (Version 3.2)." Cladistics, 5(2), 164-166.
- Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol. 2018 Jun 1;35(6):1547-1549. doi: 10.1093/molbev/msy096. PMID: 29722887; PMCID: PMC5967553.
- Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673-80. doi: 10.1093/nar/22.22.4673. PMID: 7984417; PMCID: PMC308517.
- Gascuel, O. (1997). "BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data." Molecular Biology and Evolution, 14(7), 685-695.
Further Reading:
- "Phylogenetics: Theory and Practice of Phylogenetic Systematics" by E. O. Wiley and Bruce S. Lieberman.
- "Molecular Evolution: A Phylogenetic Approach" by Roderick D. M. Page and Edward C. Holmes.
- "Bioinformatics and Molecular Evolution" by Paul G. Higgs and Teresa K. Attwood.
- "The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing" edited by Philippe Lemey, Marco Salemi, and Anne-Mieke Vandamme.
- "Inferring Phylogenies" by Joseph Felsenstein.