L5narr PDF - Nucleotide Substitution Models
Document Details
Uploaded by Deleted User
Tags
Summary
This presentation covers different nucleotide substitution models, including Jukes-Cantor, Felsenstein, Kimura, and Hasegawa models, analysis methods (UPGMA, neighbor-joining, maximum parsimony, etc.) and phylogenetic tree construction methods for evolutionary biology. It explores various approaches to inferring phylogenetic trees and making species decisions from the data.
Full Transcript
L5 How different nucleotide substitution models relate to each other Jukes-Cantor (JC) model Felsenstein (F81) model Equal base frequencies: A = C = G = T Unequal base frequencies: A C G T All substitutions...
L5 How different nucleotide substitution models relate to each other Jukes-Cantor (JC) model Felsenstein (F81) model Equal base frequencies: A = C = G = T Unequal base frequencies: A C G T All substitutions equally likely: = All substitutions equally likely: = Allow base frequencies to Allow for transition/ Allow for transition/ vary transversion bias transversion bias Kimura 2 parameter (K2P) model Hasegawa et al. (HKY85) Equal base frequencies: A = C = G = T Unequal base frequencies: A C G T Transitions () and transversions () have Transitions () and transversions () have Allow base different substitution rates: different substitution rates: frequencies to vary Allow all six pairs of substitutions to have different rates General reversible (REV) Unequal base frequencies: A C G T All six pairs of substitutions have different rates ANALYSIS METHODS UPGMA Phenetic methods Neighbour-joining ALIGNMENT Maximum parsimony Phylogenetic/ cladistic Maximum likelihood methods Bayesian inference Methods to infer phylogenetic trees: what information is used? Phylogenetic methods result in an inferred tree, which may be different from the true tree Character state methods use discrete characters: nucleotide or amino acid characters, morphological characters,... Distance matrix methods use the pairwise dissimilarity of each pair of OTUs, only suitable if such a distance can be computed: nucleotide divergence, amino acid divergence, differences in allele frequency Methods to infer phylogenetic trees: how to construct a tree? Tree evaluation methods examine all possible trees. The best tree is retained. Very slow! Stepwise clustering methods reconstruct a single tree based on pairwise comparison. Very fast. Strategies of phylogenetic analysis methods Tree evaluation Step-wise clustering Character Parsimony (MP) state Likelihood (ML) Bayesian likelihood Distance UPGMA matrix Neigbour-joining (NJ) Maximum parsimony (MP) Aims to find the tree topology that can be explained with the smallest number of character changes (mutations). MP assumes that the most parsimonous or most simple explanation is ‘evolutionarily’ also the most likely one. Principle of Occam’s Razor Evaluates possible tree topologies by inferring the minimum number of character changes required to explain all nodes of the tree at every sequence position. The tree with the smallest number of changes is chosen as the best one. Maximum Parsimony Tree (cladogram) LNA 1A LNA 1C RQU82075Radixquadrasi16SU82075 LNA 1E LNA 2C LNA 1D 100 LNA 1F LNA 2D LNA 2A 100 LNA 1B LNA 2B 100 U82076.2Radixrubiginosa16S LNA 2E Radixsp.EEAR-PhilippinesAF4856 Radixsp.EEAR-CanadaAF485650.1 LNU 4B 100 LNU 4F LNU 4E LNU 4A L5V 5B 61 Lymnaeasp.EEAR-China|AF485643. 100 Lymnaeagen.spU82070.2 Lymnaeasp.EEAR-HawaiiAF485644. L5V 5G LSV 5F 100 L5V 5H LSV 5E 100 L5V 5D LSV 5A Lymnaeaviridis16AF485642.1 StagnicolaelodesAF485652.1 Informative sites Site 1 2 3 4 5 6 7 8 9 seq 1 A A G A G T G C A seq 2 A G C C G T G C G seq 3 A G A T A T C C A seq 4 A G A G A T C C G * * * * a site is phylogenetically informative when there are at least two different kinds of characters, each represented at least two times Consensus trees A B C D E A B C D E A B C D E Tree 1 Tree 2 Tree 3 A B C D E A B C D E 67 100 67 Strict Majority-rule consensus tree consensus tree Advantages and disadvantages of parsimony Advantages: – based on shared derived characters (principle of cladistics) – evaluates different tree topologies – maintains all sequence information – allows reconstruction of ancestral character states Disadvantages: – can be slow for larger datasets – only uses informative sites – no correction for multiple hits possible – biased in case of unequal rates of evolution Evolutionary distance Computation of evolutionary distances 1 2 3 1 U C A A G U C A G G U U C G A 2 0.266 2 U C C A G U U A G A C U C G A dissimilarity 3 U U C A A U C A G G C C C G A 3 0.333 0.333 1 2 3 Convert dissimilarity to evolutionary distance by correcting for multiple events per site 2 0.328 evolutionary according to a certain model of evolution 3 0.44 0.44 distance Infer tree topology on the basis of the evolutionary distances Neighbour-joining (NJ) Constructs internal nodes by joining nearest neighbours Branch lengths take into account the distance from a taxon to all other taxa branch length is proportional to genetic distance Allows different evolutionary rates in different branches LNA 1E LNA 2C LNA 1F U82076.2Radixrubiginosa16S LNA 2D RQU82075Radixquadrasi16SU82075 83 LNA 1B LNA 2A LNA 2B 45 LNA 1D LNA 1A 95 LNA 1C LNA 2E 59 Radixsp.EEAR-PhilippinesAF4856 Radixsp.EEAR-CanadaAF485650.1 StagnicolaelodesAF485652.1 LNU 4A LNU 4B LNU 4E 99 LNU 4F Lymnaeagen.spU82070.2 L5V 5B Lymnaeasp.EEAR-HawaiiAF485644. 96 Lymnaeasp.EEAR-China|AF485643. 17 LSV 5F 10 LSV 5E 16 Lymnaeaviridis16AF485642.1 94 LSV 5A L5V 5D Substitutions per site 65 L5V 5G L5V 5H 0.05 Neighbour-joining tree Advantages and disadvantages of NJ Advantages: – allows the use of an explicit model of evolution – very fast – allows unequal rates of evolution Disadvantages: – only produces one best tree – reduces all sequence information into a single distance value – estimation of ancestral character states is not possible – dependent on the evolutionary model used (preferentially this model should be estimated from the data) Making species decisions from trees Applicable species concepts: phylogenetic, genetic. Phylogenetic species: All members of the species belong to a single monophyletic clade with high bootstrap support Genetic species: all members of the species should form part of an exclusive cluster. Different species should be separated by a genetic distance of 2 – 10 % percent (cytochrome b) Different for different genes.