LO8 Molecular Phylogeny and Evolution PDF
Document Details
Uploaded by Deleted User
Bicol University
Joseph Martin Q. Paet
Tags
Summary
These notes cover the topic of Molecular Phylogeny and Evolution. They describe the theory of evolution, the biological process of heredity, and Darwinian Evolution. The notes also discuss molecular phylogeny, the study of evolutionary relationships among organisms or molecules using molecular biology techniques.
Full Transcript
Bio16 Computational Biology Molecular Phylogeny and Evolution Prepared by: Joseph Martin Q. Paet Biology Department, College of Science...
Bio16 Computational Biology Molecular Phylogeny and Evolution Prepared by: Joseph Martin Q. Paet Biology Department, College of Science Bicol University 1 Evolution It is the theory that groups of organisms change over time so that descendants differ structurally and functionally from their ancestors The biological process by which organisms inherit morphological and physiological features that define a species Heredity is usually conservative - and yet the structure and function of bodies change over the course of generations Darwinian Evolution explains how this happens Perpetual Change = the world is not constant Common Descent = every organism has a common ancestor Multiplication of Species = geographic isolation Gradualism = change is slow but sure Natural Selection = variation best-fitted for survival lives on and multiply Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 2 1 Molecular Phylogeny Goal = deduce the correct trees for all species of life True Tree vs Inferred Tree = Actual vs Hypothesized historical events Neo-Darwinian Reticulated Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. Doolittle, W. F. (1999). Phylogenetic classification and the Universal Tree. Science, 284(5423), 2124–2128. https://doi.org/10.1126/science.284.5423.2124 3 Molecular Phylogeny Molecular Clock Hypothesis = states that for every given gene (or protein), the rate of molecular evolution is approximately constant Caveat rate of molecular evolution varies among different organisms clock varies among different genes/proteins clock is only applicable when a gene in question retains its function over evolutionary time Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 4 2 Molecular Phylogeny the study of the evolutionary relationships among organisms or molecules using the External Node techniques of molecular biology Sequence A Two Inherent Information Node Sequence B Topology = defines the relationships of Sequence C the proteins (or other objects) that are Sequence D Clade represented in the tree (e.g. where is the Root common ancestor) Sequence E Branch Lengths = reflect the degree of relatedness of the objects in the tree Branch Sequence F Taxon (OTU) Phylogenetic Tree is a graph composed of branches (edges) Scale and nodes Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 5 Molecular Phylogeny Branch = connects two nodes ❑ Defines the topology of the tree External Node ❑ Branch length = represents the number of Sequence A changes that occurred; phylogram vs cladogram Sequence B Node = intersection or terminating point of two Node Sequence C or more branches ❑ Represents taxon Sequence D Clade ❑ External node = extant taxon; operational Root taxonomic unit; can be swapped/rotated Sequence E ❑ Internal node = ancestral taxon Clade = group of all the taxa derived from a Branch common ancestor plus the common ancestor Sequence F Taxon (OTU) itself Root = represents the most recent common ancestor of all the sequences; outgroup vs Scale midpoint rooting Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 6 3 Types of Trees SPECIES TREES GENE/PROTEIN TREES Both Evolves the process by which two new species are the process by which two new genes/ created from a single ancestral species – proteins diverged from an ancestral reproductive isolation sequence Internal Node = Speciation Internal Node = Divergence Use of Molecular Clock May predate speciation = overestimated branch length Use of a variety of concatenated genes/proteins It may have a different topology Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 7 Types of Trees Zhu, Q., et.al. (2019). Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nature Hug, L. A., et. al. (2016). A new view of the Tree of Life. Nature Microbiology, 1(5). https://doi.org/10.1038/nmicrobiol.2016.48 Communications, 10(1). https://doi.org/10.1038/s41467-019-13443-4 8 4 Five Stages of Phylogenetic Analysis Sequence Acquisition Multiple Sequence Alignments Models of DNA and AA Substitution Tree-Building Methods Evaluating Trees Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 9 Stage 1: Sequence Acquisition Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 10 5 Stage 2: Multiple Sequence Alignments Major Considerations: All sequences are homologous See how MSA treats distantly related sequences Restrict to portions of the sequence that are available Know what to do with the gaps in MSA Inspect the MSA according to your sequence metadata Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 11 Stage 3: Models of DNA and AA Substitution Hamming Distance → difference = divergence Poisson Correction → corrects HD assuming equal substitution rates Jukes-Cantor One-Parameter Model → corrects HD by adjusting the probability of mutation rates Kimura Two-Parameter Model → corrects JC by adjusting probabilities of transition and transversion mutations Tamura’s Model → adjustments to account for the GC content Gamma (Γ) model → accounts for unequal substitution rates across variable sites Many more! = evaluate what model is best to use Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 12 6 Stage 4: Tree-Building Methods Distance-Based Methods Distance matrix begin the construction of a tree by calculating the pairwise distances between molecular sequences → distance ~ branch length Computationally fast = for a large number Combine closest distance of sequences E.g., Unweighted-Pair Group Method with Arithmetic Mean (UPGMA) and Neighbor- Connect to nascent trees Joining (NJ) UPGMA = simple and fast but overly simplified (less accurate than NJ); assumes constant molecular clock NJ = do not assume constant clock Done Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 13 Stage 4: Tree-Building Methods OTUs arranged into a star-like tree Distance-Based Methods begin the construction of a tree by calculating the pairwise distances between molecular sequences → distance ~ branch length Computationally fast = for a large number of sequences E.g., Unweighted-Pair Group Method with Arithmetic Mean (UPGMA) and Neighbor- Joining (NJ) Neighbors = connected by a single node UPGMA = simple and fast but overly simplified (less accurate than NJ); assumes constant molecular clock NJ = do not assume constant clock Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 14 7 Stage 4: Tree-Building Methods Identify informative sites Maximum Parsimony the best tree is one with the shortest branch lengths possible = fewer changes means a more likely explanation assume that all taxa evolve at the same rate and that all characters contribute the Construct Trees same amount of information Susceptible to long-branch attraction = rapidly evolving taxa creates artifacts Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. Select the shortest tree 15 Stage 4: Tree-Building Methods Maximum Likelihood designed to determine the tree topology and branch lengths that have the greatest likelihood of producing the observed dataset provides a statistical model for evolutionary change that varies across branches E.g. TREE-PUZZLE Program reduces the problem into quartets, performs quartet puzzling (estimate of support), then generates a consensus tree Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 16 8 Stage 4: Tree-Building Methods Bayesian Methods a statistical approach to modeling uncertainty in complex models Includes the specification of prior information and uses a computational method to estimate the posterior probability distribution MSA using prior model → MCMC to generate the posterior probability distribution → tree with support values Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 17 Stage 5: Evaluating Trees Assess accuracy = consistency, efficiency, and robustness Bootstrap analysis = most common; describes the robustness of the tree topology ❑ MSA as input → creates an artificial dataset from random MSA columns → tree is generated from random dataset → bootstrap replicates are generated → compare the bootstrap trees to the original, inferred tree. ❑ values above 70% are sometimes considered to support the clade designations (p < 0.05 level) Maximum likelihood reports tree with the greatest likelihood Bayesian inference result is typically the most probable tree Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. 18 9 Bio16 Computational Biology Molecular Phylogeny and Evolution References: Doolittle, W. F. (1999). Phylogenetic classification and the Universal Tree. Science, 284(5423), 2124–2128. https://doi.org/10.1126/science.284.5423.2124 Hug, L. A., et. al. (2016). A new view of the Tree of Life. Nature Microbiology, 1(5). https://doi.org/10.1038/nmicrobiol.2016.48 Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc. Zhu, Q., et.al. (2019). Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nature Communications, 10(1). https://doi.org/10.1038/s41467-019-13443-4 Prepared by: Joseph Martin Q. Paet Biology Department, College of Science Bicol University 19 10