Phylogenetics Final Powerpoint Slides PDF
Document Details
Uploaded by CompliantStonehenge
Dr. Oliver Manlik
Tags
Summary
This document presents Powerpoint slides covering molecular evolution, phylogenetic trees and their construction based on morphological and molecular characters. It details the concept of clades, monophyletic groups and traits like homologous and analogous traits. The structure of phylogenetic tress and the differences in methods of assessing evolutionary relationships are emphasized in the document.
Full Transcript
Molecular Evolution CHAPTER 10 & 11 Alignment & DR. OLIVER Phylogeny MANLIK Biology Department 09/22/10 Phylogeny tells us about the evolutionary relationships of taxa (groups of organism)—how closely or dist...
Molecular Evolution CHAPTER 10 & 11 Alignment & DR. OLIVER Phylogeny MANLIK Biology Department 09/22/10 Phylogeny tells us about the evolutionary relationships of taxa (groups of organism)—how closely or distantly they are related to one another, whether they share a recent or (distant) common ancestor. 1-2 We learn about phylogenies by constructing ‘phylogenetic trees’. Phylogenetic trees can be based on: Morphological Characters Molecular Characters: Variation in DNA sequences Variation in peptide sequences 1-3 Let’s construct Phylogeny, based on Morphological Characters: 1-4 Let’s construct Phylogeny, based on Morphological Characters: 1.) Vertebrae: Do they have vertebrae (backbone)? 2.) Chitin: Do they have chitin (cytoskeleton of insects)? 3.) Feathers: Do they have feathers? 4.) Fur: Do they have fur (hair)? 5.) Milk: Do their mothers nurse young with milk? 6.) Hoof: Do they have hooves? 1-5 1-6 1-7 Feathe Placent r Hair Milk Eggs Pouch a Hoof Cormoran t Koala See Handout! Echidna Cat Elephant Cormora Elephant Koala Echidna Cat nt Cormora Echidn Elepha nt Koala a Cat nt Cormor ant 0 Koala 0 Echidna 0 1-8 Cat 0 Feathe Placent r Hair Milk Eggs Pouch a Hoof Cormoran t Koala Echidna Cat Elephant 1-9 Feathe Placent r Hair Milk Eggs Pouch a Hoof Cormoran t Yes No No Yes No No No Koala No Yes Yes No Yes No No Echidna No Yes Yes Yes No No No Cat No Yes Yes No No Yes No Elephant No Yes Yes No No Yes Yes 1-10 Feathe Placent r Hair Milk Eggs Pouch a Hoof Cormoran t Yes No No Yes No No No Koala No Yes Yes No Yes No No Echidna No Yes Yes Yes No No No Cat No Yes Yes No No Yes No Elephant No Yes Yes No No Yes Yes Cormora Echidn Elepha nt Koala a Cat nt Cormor ant 0 Koala 0 Echidna 0 Cat 0 1-11 Feathe Placent r Hair Milk Eggs Pouch a Hoof Cormoran t Yes No No Yes No No No Koala No Yes Yes No Yes No No Echidna No Yes Yes Yes No No No Cat No Yes Yes No No Yes No Elephant No Yes Yes No No Yes Yes Cormora Echidn Elepha nt Koala a Cat nt Cormor ant 0 5 3 5 6 Koala 0 2 2 3 Echidna 0 2 3 Cat 0 1 1-12 Cormora Echidn Elepha nt Koala a Cat nt Cormor ant 0 5 3 5 6 Koala 0 2 2 3 Echidna 0 2 3 Cat 0 1 1-13 1 2 Cormora Echidn Elepha 2.5 nt Koala a Cat nt Cormor ant 0 5 3 5 6 Koala 0 2 2 3 Echidna 4.75 0 2 3 Cat 0 1 1-14 Birds Mammals Monotreme Placental Mammals Marsupials s Cormorant Echidna Koala Cat Elephant 1 2 2.5 This phylogeny is NOT accurate. It is only based on 4.75 7 morphological characters! 1-15 PHYLOGENETIC TREE (based on morphological characters) UTGROUP: Taxon (or group) INGROUP: Group that includes taxa to be compared to root the tree Cormorant Echidna Koala Cat Elephant BRANCH BRANCH = NODE: This phylogeny is NOT accurate. It is only based on Represents Common 7 morphological characters! common Ancestor 1-16 CLADE = Monophyletic Group Cormorant Echidna Koala Cat Elephant = Node: This phylogeny is NOT accurate. It is only based on Represents Common 7 morphological characters! common Ancestor 1-17 Clades = Monophyletic Groups Cormorant Echidna Koala Cat Elephant = Node: This phylogeny is NOT accurate. It is only based on Represents Common 7 morphological characters! common Ancestor 1-18 Clades = Monophyletic Groups Cormorant Echidna Koala Cat Elephant = Node: This phylogeny is NOT accurate. It is only based on Represents Common 7 morphological characters! common Ancestor 1-19 Phylogenetic Tree Showing evolutionary relationship of mammals. How many major clades? 1-20 Monophyletic Group = Clade des the most recent common ancestor of a group of organisms, and all of its descendan Polyphyletic: includes the most recent common ancestor, but NOT all of its descendant Paraphyletic: does NOT include the common ancestor of ALL members of the taxon Clade or not? Group types A monophyletic group, or clade (blue, red), is a group of organisms that consists of all the descendants of the last common ancestor. A group is paraphyletic (green) if it consists of the group's last common ancestor and all descendants of that ancestor excluding a few (typically only one or two) monophyletic subgroups. A polyphyletic group (grey) is characterized by convergent features or habits of scientific interest (for example, wings). The features by which a polyphyletic group is differentiated from others are not inherited from a common ancestor. 22 Phylogeny of seabirds (1676), based on various 1-23 characteristics Ernst Haeckel’s ‘Evolutionary Tree’ (1876) showing 3 monophyletic groups (‘clades’): Plantae (Plants) Protista (Protists) Animalia (Animals) 1-24 Ernst Haeckel’s ‘Evolutionary Tree’ (1876) showing 3 monophyletic groups (‘clades’): Plantae (Plants) Protista (Protists) Animalia (Animals) Monophyletic groups are based on common ancestry, so it is important to distinguish between characters that tell us about ancestry = homologous traits In contrast, some characters do NOT tell us about ancestry, and how taxa diverged 1-25 (became different) = analogous traits Analogous Traits Example: Wings of different species with different ancestry 1-26 Analogous Traits Example: Wings of different species with different ancestry Butterfly Pterodactyl Bir Bat d 1-27 Analogous Traits Homologous Traits Example: Pentadactyl Limbs; common ancestor Example: Wings of different species with different ancestry Butterfly Pterodactyl Bir Bat d Whale Bat Cat Analogous traits = Homologous traits = Traits that evolved independently, Traits that are similar by descent (due NOT based on descent (common to common ancestors), did not evolve ancestry) independently 1-28 DNA (and peptide) sequences can show Homology! One advantage of studying phylogeny, based on molecular characters is that DNA sequences or peptide sequences can show when taxa are grouped together by mistake on the basis of analogous similarities. 1-29 DNA (and peptide) sequences can show Before honeybees and stingless bees were Homology! grouped together One advantage of studying phylogeny, based on because: molecular characters is that DNA sequences or Morphological peptide sequences can show when taxa are similarities grouped together by mistake on the basis of Both have hives with analogous similarities. large number of non- reproductive workers (store pollen and raise the queen’s offspring) However, based on DNA sequences, they were re-classified: Honeybees are more closely related to solitary (non-social) orchid bees than stingless bees. So the observed similarities (e.g. social 1-30 hives) is probably an DNA sequence alignment of 4 species (ingroup) and 1 outgroup 1: ATTGCTATTACGGGA 2: ATAGTTATTACGCGT 3: ATAGTTATTACGCCT 4: ATAGCTATTACGGGA 5: CTGCTTCTAAGACTA Estimate pairwise differences to ‘infer’ (make estimate about) phylogeny: Simplest pairwise difference = p-distance 1-31 DNA sequence alignment of 4 species (ingroup) and 1 outgroup 1 2 3 4 5 1: ATTGCTATTACGGGA 1 0 2: ATAGTTATTACGCGT 0 2 3: ATAGTTATTACGCCT 3 0 4: ATAGCTATTACGGGA 4 0 5: CTGCTTCTAAGACTA 5 0 Estimate pairwise differences to ‘infer’ (make estimate about) phylogeny: Simplest pairwise difference = p-distance 1-32 DNA sequence alignment of 4 species (ingroup) and 1 outgroup 1 2 3 4 5 1: ATTGCTATTACGGGA 1 0 2: ATAGTTATTACGCGT 0 2 3: ATAGTTATTACGCCT 3 0 4: ATAGCTATTACGGGA 4 0 5: CTGCTTCTAAGACTA 5 0 Estimate pairwise differences to ‘infer’ (make estimate about) phylogeny: Simplest pairwise difference = p-distance See completed matrix and tree in the notes I provided during lecture and uploaded on Blackboard! 1-33 p-distance: p = nd/n nd = Number of pairwise differences between nucleotides n = Total number of nucleotides in sequence 1 2 3 4 5 1: ATTGCTATTACGGGA 1 0 2: ATAGTTATTACGCGT 0 2 3: ATAGTTATTACGCCT 3 0 4: ATAGCTATTACGGGA 4 0 5: CTGCTTCTAAGACTA 5 0 1-34 NA Sequence Alignment of 8 mammal species: marsupials & 4 placental mammals NA Sequence Alignment of 8 mammal species: marsupials & 4 placental mammals 1-36 NA Sequence Alignment of 8 mammal species: marsupials & 4 placental mammals Shrew European mole Dog Marsupial mole Dunnart Quoll Thylacine Domestic1-37 Cat NA Sequence Alignment of 8 mammal species: marsupials & 4 placental mammals Shrew European mole Dog Marsupial mole Dunnart Quoll Thylacine Domestic1-38 Cat Use Excel Sheet Provided! Thylacin Dunna Marsupial European Quoll Dog Cat Shrew e rt mole mole Thylacine Quoll Dunnart Marsupial mole Dog Cat Euopean mole Shrew DNA sequence similarities: Based on 125 DNA sequences Grey: Percentage similarity of each species compared to the thylacine (‘Tasmanian Tiger’; extinct in 1931 Blue: Percentage similarity between each species 1-40 1-41 Monotremes: Mammals that lay eggs! Platypus (left), echidna (right) 1-42 member: ultiple-Hit Mutations can lead to the same DNA sequence in descendants as in ancestor 2 ‘Hits’ (mutations)1 ‘Hit’ (mutation) 1-43 member: ultiple-Hit Mutations can lead to the same DNA sequence in descendants as in ancestor 2 ‘Hits’ (mutations)1 ‘Hit’ (mutation) Maximum Parsimony: Hypothesis in phylogeny that assumption the smallest number of character changes is most likely to be correct. In other words, 1 hit (mutation) or 0 hits (mutations) is more likely than 2 hits that result in the same character (e.g. DNA sequence) as the ancestor. Based on maximum parsimony, sequences (characters) that are identical to each at a given point in the sequence are assumed not to have diverged 1-44 (changed). Substitution Models for Phylogenetics: Many of the mutations in the genome are substitutions, and phylogenetic reconstruction (trees) make assumptions based on the likelihood of those mutations. This is based on: a) mutation rates for the gene/region b) likelihood of certain nucleotides (basepairs) to be substituted by another Scientists have thought of different models, based on this = These Substitution Models are also called Models of DNA Sequence Evolution Different models have been proposed by various scientists 1-45 Substitution model 1: Jukes-Cantor Model: JC69 A G C T Thomas H Jukes C A G T Charles R Cantor C69 makes the following assumptions: Mutation rates for all 4 nucleotides (A, T, G, C) are the same (α); does NOT account for differences between transitions and transversions All nucleotides/basepairs occur in equal proportions (50% GC; 50% AT) kes TH, Cantor CR (1969). Evolution of Protein Molecules. New York: Academic Press. pp. 21–132. bstitution model 2: mura Model: (K80) (K2P = Kimura 2-Parameter Model) A G C T A Motoo Kimura G C T (κ = transition/transversion substitution rate) K80 (K2P) makes the following assumptions: Transitions (purine-purine; pyrimidine-pyrimidine) are more likely than transversions (purine-pyrimidine), so 2 different substitution rates All nucleotides/basepairs occur in equal proportions (50% GC; 50% AT) Kimura M (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution. 16 (2): 111–20. ubstitution model 3: mura Model: (K81) K3P = Kimura 3-Parameter Model A G C T A Motoo Kimura G C T K81 (K3P) makes the following assumptions: There are 3 different substitutions rates for different substitution (α, β, γ): 1 transition: α; 2 transversion: β, γ All nucleotides/basepairs occur in equal proportions (50% GC; 50% AT) Kimura M (1981). Estimation of evolutionary distances between homologous nucleotide sequences. 1-48 Proceedings of the National Academy of Sciences of the United States of America. 78 (1): 454–8. Assumption for JC69, K2P and K3P: All nucleotides/basepairs occur in equal proportions (50% GC; 50% AT) Thale Cress Wheat: Humans: (Arabidopsis thaliana): GC = 56% GC = 41% GC = 36% Human Chromosome 19: GC-Content’ Differs across species and regions in48% the genome/genes! Substitution model 4: Felsenstein Model: (F81) A G Joseph C T Felsenstein A T G C F81 makes the following assumptions: Mutation rates for all 4 nucleotides (A, T, G, C) are the same; does NOT account for differences between transitions and transversions Nucleotides/basepairs do NOT occur in equal proportions Felsenstein J (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of 1-50 Molecular Evolution. 17 (6): 368–76. Substitution model 5: Hasegawa-Kishino-Yano Model: (HKY85) Masami Hirohiso Taka-Aki Hasegawa Kishino Yano A G C T T A G C F81 makes the following assumptions: Transitions (purine-purine; pyrimidine-pyrimidine) are more likely than transversions (purine-pyrimidine), so 2 different substitution rates Nucleotides/basepairs do NOT occur in equal proportions Hasegawa M, Kishino H, Yano T (1985). Dating of the human-ape splitting by a molecular clock of 1-51 mitochondrial DNA. Journal of Molecular Evolution. 22 (2): 160–74 Substitution model 6: Tamura-Nei Model: TN93 Koichiro Tamura Masatoshi Nei TN93 makes the following assumptions: Transitions (purine-purine; pyrimidine-pyrimidine) are more likely than transversions (purine-pyrimidine), but transitions can also have different rates (G A is NOT same as T C) so 3 different substitution rates Nucleotides/basepairs do NOT occur in equal proportions Tamura K, Nei M (May 1993). Estimation of the number of nucleotide substitutions in the control region of 1-52 mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution. 10 (3): 512–26. doi:10.1093/oxfordjournals.molbev.a040023. bstitution model 7: neralized Time-Reversible: GTR = Tavare Model (T86) Simon Tavaré TN93 makes the following assumptions: All subsitutions have different rates! Nucleotides/basepairs do NOT occur in equal proportions Tavaré S (1986). Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences. Lectures on 1-53 Mathematics in the Life Sciences. 17: 57–86. Models of DNA Sequence Evolution (Substitution Models) 1-54 So which substitution model is best? 1-55 So which substitution model is best? That also depends on the data (sequences)! 1-56 So which substitution model is best? That also depends on the data (sequences)! The software MEGA allows us to perform ‘model tests’ to pick the best substitution model based on the sequence alignment: Model with lowest Akaike Information Criterion (AIC) or lowest Bayesian Information Criterion (BIC) is chosen. https://www.megasoftware.net/ 1-57 1-58 Rooted versus non-rooted trees Theroot defines a unique evolutionary path towards each descendant (‘leaf’). It represents the last common ancestor (i.e. the most recent one) of all descendants shown in the tree Non-rooted trees are not properly speaking phylogenetic, since they have no temporal direction → do not indicate the type of relationship (ancestor, descendent, cousin, …) between nodes. Rooted tree Non-rooted tree A C F B H H D B G G C F I A I Root D E E 59 Rooted versus non-rooted trees Unweighted Pair-Group Method with Arithmetic mean: UPGMA UPGMA based tree is rooted. UPGMA Tree Each branch is labeled with a bootstrap value in this example, which provides a of fungi sort of statistical support for the existence of specific branches. Trees with ‘bootstrap values’ are generated many times, often 100 or 500 or even 1000 times. This method then makes one consensus tree, based on the 100 (or 500 or 1000) trees, and the bootstrap values are the percentages that the many trees agree with one another for a particular clade. http://www.slimsuite.unsw.edu.au/teaching/upgma/ https://www.youtube.co m/watch?v=09eD4A_Hx Time scale VQ Time scale Rooted versus non-rooted trees Example of a Neighbor- NJ Tree of mammals Joining method-derived Tree (NJ) Neigbor-Joining method generates an unrooted tree with metric/additive distances. The scale shows distances. Although the tree is unrooted, it is shown in the directional layout that looks like a rooted tree. 61 Maximum Likelihood (ML) Trees can be rooted on non-rooted. ML Tree comparing marine mammals Maximum Likelihood (ML) is a (Yuan et al. 2021) statistical technique for estimating probability distributions to assign probabilities to particular possible phylogenetic trees. Maximum likelihood requires a substitution model (e.g. K2P, K3P, HKY85, etc) to assess the probability of particular mutations. Same as for UPGMA trees, the ML trees can be sampled many times, producing consensus trees that show bootstrap values. https://www.youtube.com/watch?v=xDKUIegYpWM 62 Rooted versus non-rooted trees Rooted and non-rooted tree A Both rooted and non-rooted trees can F be presented in this left-to-right “growth” format. In GENEIOUS and in MEGA, one H B actually cannot discriminate between the G C rooted and non-rooted trees visually. I Weather a tree has a root or not can be learned from the method used for the D tree construction. Some methods work in such a way that their trees are inherently rooted (e.g. UPGMA). E 63 Molecular clock hypothesis Rate of evolution can be inferred from the number of changes accumulated over time in DNA/protein: the more changes, the longer time since the separation from the last common ancestor. https://www.youtube.com/watch?v=rMSVwWYXIjg 64 Molecular clock hypothesis Early protein studies showed approximately constant rate of evolution. This observation was the basis for a hypothesis: The molecular clock = evolution of DNA has constant rate over time and across lineages. That is, the evolutionary distance between species A and B is 2x the Earth time it took them to diverge (= to become different) from the nearest common ancestor. Specifically, the times for A and B are equal. This hypothesis is very attractive, because it can potentially help resolve history of species; clarify timing of evolutionary events; establish more accurate relationships between taxa. 65 Molecular clock hypothesis https://www.practicallyscience.com/the-maximal-rate-of-evolution/ 66 https://www.practicallyscience.com/model-organisms-and-dnas-molecular-clock/ 67 Molecular clock hypothesis Later it became clear that the concept is too simplistic. In reality the following is observed: Complex genomes evolve (mutate) slower than simple genomes. Coding sequences evolve (mutate) slower than non-coding sequences. Functional DNA regions evolve (mutate) slower than non-functional ones. Housekeeping sequences such as rRNA evolve (mutate) slower than other protein-coding sequences. Mitochondrial DNA (mtDNA) sequences evolve (mutate) faster (about 100- fold faster) than nuclear DNA sequences. Thus, there is no universal molecular clock. Still, the concept is very useful because: If the appropriate dataset is chosen, it is possible to study both long- term and 68 short-term evolutionary processes, and establish accurate dating of Molecular clock hypothesis Is it possible to relate the molecular time to the geological time? To calibrate the molecular clock, one needs to study Divergence between lineages according to fossil records Major geological events that cause separation of populations: - Continental drift - Formation of islands and lakes Unfortunately, fossil dating of divergence times is often inaccurate and not possible for all lineages. Thus, 69 absolute rates cannot be