Lecture 11a - Phylogenetics PDF
Document Details
Uploaded by GreatestMoon9518
Monsignor Doyle
Tags
Summary
This document provides a lecture on phylogenetics, covering learning objectives, the basics of phylogenetics, assumptions of cladistics and molecular methods, definitions, rooted and unrooted phylogenetic trees, and tree building and evaluation techniques. The lecture includes examples, software, and relevant diagrams.
Full Transcript
Lecture 11a - Phylogenetics BIO4BI3 - Bioinformatics Learning Objectives What are the elements of a phylogenetic tree Understand the assumptions we make when we interpret a phylogenetic tree Understand the difference between rooted and unrooted trees Be able to define and understand th...
Lecture 11a - Phylogenetics BIO4BI3 - Bioinformatics Learning Objectives What are the elements of a phylogenetic tree Understand the assumptions we make when we interpret a phylogenetic tree Understand the difference between rooted and unrooted trees Be able to define and understand the relationship among the terms homolog, ortholog, paralog, and xenology – Will be on the exam Describe the steps in building a phylogenetic tree Phylogenetics Evolutionary theory is basis for all of the assumptions we make from BLAST, multiple protein alignments, and all of the matrix and sequence distance measurements we have seen It is necessary to identify gene families, inferring gene function, gene annotation, origins of disease, the nature of polymorphisms Sometimes the best BLAST match is not the most similar sequence It can be easy to misinterpret phylogenetic data The Basics of Phylogenetics Is the study of evolutionary relationships Phylogenetic analysis is the means of inferring/estimating these relationships Phylogenetic relationships are most often represented by “trees” Sometimes called cladistics A clade is a group of individuals which share a common evolutionary ancestor Members of a clade are more related to each other than members of other groups Assumptions of Cladistics 1. Any group of organisms are related by descent from a common ancestor 2. There is a bifurcating pattern of cladogenesis 3. Change in characteristics occurs in lineages over time Phylogenetic Trees Elements 1. Clades – a group of organisms/genes that include the MRCA of all its members and all of the descendants of that MRCA 2. Taxons – Any named group of organisms 3. Branches – branch length may or may not correlate to evolutionary distance (depends on methods used) but they always represent divergence 4. Node – A bifurcating branch point Phylogenetic Tree Example Organ CL, Schweitzer MH, Zheng W. Freimark LM, Cantley LC, and Asara JM. 2008. Molecular Phylogenetics of Mastodon and Tyrannosaurus rex. Science 320: 499 Assumptions of Molecular Phylogenetic Methods 1. The sequence is correct 2. The sequences are homologous – all descendant from a common ancestral sequence 3. Each position in a MSA is homologous with every other in the alignment 4. Each of multiple sequences in an alignment have a common phylogenetic history 5. The sampling taxa is adequate to resolve the issue 6. Sequence variation among the samples is representative of the broader group 7. The sequence variability in the sample contains enough signal to resolve the issue More Assumptions The following assumptions do not hold for all analysis 1. The sequences evolved according to a single stochastic process 2. All positions in the sequence evolved according to the same stochastic process 3. Each position in the sequence evolved independently Some Definitions Homologs – sequences that have common ancestral origins but may not have common activity. Sequence sharing some arbitrary level of similarity in alignments are called homologous. Note: Homology refers to shared ancestry only and is not quantifiable. Similarity is a quantifiable measure Orthologs – homologs created through speciation. They tend to have similar function Paralogs – homologs created through gene duplication. They tend to have differing function Xenologs – homologs occuring from horizontal gene transfer. Generally the function is similar Rooted Phylogenetic Trees In molecular phylogenetics, a rooted tree and an unrooted tree represent two different ways of visualizing evolutionary relationships among species or sequences: Rooted Tree A rooted tree has a specific root node that represents the common ancestor of all the sequences or species in the tree. The root defines the direction of evolution, indicating which nodes or branches are more ancestral and which are more derived. With a rooted tree, you can infer time or evolutionary distance from the root toward the tips, often showing how species have diverged from a common ancestor. Rooted trees are typically constructed by including an outgroup—a sequence or species known to be more distantly related to the rest of the group. The outgroup helps define the root, effectively setting a baseline for evolutionary comparisons. Rooted Tree of Protein Domain Unrooted Phylogenetic Trees Unrooted Tree An unrooted tree does not have a root and thus does not specify the direction of evolution or any common ancestor. It only represents the relationship among sequences based on similarity or distance, showing which sequences are more closely related to each other without indicating a time scale or ancestry. Unrooted trees are often generated when evolutionary direction isn’t known or when you want to emphasize pairwise relationships rather than evolutionary history. These trees are frequently used in situations where a specific root cannot be confidently identified, or in cases where only the topology (branching order) of relationships is of interest. Unrooted Tree of a Gene Family Rooted vs Unrooted Trees Key Differences 1.Root Presence: Rooted trees have a root indicating the common ancestor, while unrooted trees lack this. 2.Evolutionary Direction: Rooted trees imply a direction of evolution, while unrooted trees do not. 3.Interpretation of Relationships: Rooted trees can show ancestry and timelines, while unrooted trees only depict relationships in terms of similarity or distance. How To Construct a Tree There are four steps in performing phylogenetic analysis with DNA/protein data 1. Construct a multiple sequence alignment 2. Determine the substitution model 3. Tree building 4. Tree evaluation Alignment Start with a multiple sequence alignment of the genes. We assume that aligned positions (sites) have a common ancestral relationship be it DNA/protein or gap. Which base/gap occupying an aligned position is the ‘character state’. Sites believed to be homologous where a change has occurred are referred to as ‘informative sites’ Choose some software to perform the alignment (ClustalW/ClustalX/T-Coffee/MALIGN) Often choose a subset of the alignment to develop our tree Alignment Alignment step is very important and forms the dataset from which all further analysis comes. It is not uncommon to edit the alignment; deleting ambiguous regions, inserting gaps, deleting gaps. Goal is for an alignment that bests represents the evolutionary process Phylogenetic DataModel Models of substitution rates between bases Parsiomony – meaning thifty. Our goal is find a tree the represents the evolutionary relationship among sequences in the simplest way (maximum parsimony). Note that the simplest explanation is not always the correct one We use weighted matrices to score a distance Character weighted matrix – simple matrix ie transitions=1 and transversion=2 (MP) Simple substitution matrix – developed with more complex maths than CWM (ML and distance phylogenetics). Captures rate of change Pairwise sequence comparison matrix – derived from multiple sequence alignments ie PAM, BLOSSUM Tree Building There are many different methods to construct phylogenetic trees Fall into two broad categories Distance based – use the amount of dissimilarity between two aligned sequences to build the tree. The character data is disregarded when constructing the tree. Common methods include unweighted pair group method with arithmetic mean (UPGMA) and neighbor joing (NJ) Character based – Methods have little in common other an using character data at all steps in the analysis. Examples include Maximum Parsimony, Maximum Likelihood Tree Evaluation How do we know how correct our tree is? Randomized Trees (Skewness Test) – distribution of random MP tree lengths is symetical, phylogenetic datasets have a skewed distribution Bootstrapping – From the dataset create many new datasets with substitution. How many times are a particular branches recreated in these new data? The proportion of times observed over generated gives a bootstrap value Phylogenetic Software PHYLIP PAUP FASTDNAml PUZZLE MacCLADE MOLPHY Clustal Treeview