Lecture 9- Phylogenetics I PDF
Document Details
Uploaded by AffectionateCommonsense7053
UWI Cave Hill
Dr. A.T Alleyne
Tags
Related
Summary
This lecture covers the principles of phylogenetics and bioinformatics, focusing on molecular evolution and the molecular clock hypothesis. It discusses the importance of molecular sequence data and phylogenetic tree analysis in biology.
Full Transcript
LECTURE 9 PHYLOGENETICS I BIOC 3265-PRINCIPLES OF BIOINFORMATICS Dr. A.T Alleyne- UWI Cave Hill L EARNING O UTCOMES At the end of this lecture you should be able to: 1. Define the term phylogeny 2. Explain the importance of the Molecular clock hypothesis 3. Describe some of the...
LECTURE 9 PHYLOGENETICS I BIOC 3265-PRINCIPLES OF BIOINFORMATICS Dr. A.T Alleyne- UWI Cave Hill L EARNING O UTCOMES At the end of this lecture you should be able to: 1. Define the term phylogeny 2. Explain the importance of the Molecular clock hypothesis 3. Describe some of the basic assumptions in Phylogenetics 4. Describe all parts of a phylogenetic tree 5. Analyze a given phylogenetic tree 2 Inference of evolutionary relationships. Traditional phylogeny relied on the comparison of morphological or structural information and features between organisms (classical taxonomy) Molecular sequence data are used for phylogenetic analyses Molecular Phylogeny 3 MOLECULAR EVOLUTION AND PHYLOGENY Molecular evolution is the study of changes in genes and proteins throughout different branches of the tree of life. Phylogeny is the inference of evolutionary relationships. All branches of modern biological sciences today makes use of molecular sequence data and are also used for phylogenetic analyses. Early work by Dayhoff and others, globin proteins (e.g. haemoglobin )were sequenced and they were used as a basis for several hypothesis in molecular phylogeny. 4 M OLECULAR CLOCK HYPOTHESIS Proposed in 1962 & 63 by Zuckerkandl, Pauling and Margoliash, it states: “For every given protein (or gene), the rate of molecular evolution is approximately constant in all evolutionary lineages”. (Pevsner 2009) Protein sequences evolve at constant rates, hence they are used to estimate the times that sequences diverged. Their studies were conducted using human globins with known amino acid composition. Homologous proteins will have correlated rates based on the time of divergence from a common ancestor 5 The rate of amino acid substitution is constant for each protein. Similar experiment conducted with three proteins using Palaeontology in 1971 by Dickerson , Journal of Mol Bio 57; 1-15. Dickerson (1971) Taken from Pevsner 2009 6 M OLECULAR CLOCK EVIDENCE : HAEMOGLOBIN CONSISTENCY —protein sequences evolve at constant rates, they can be used to estimate the times that sequences diverged —Allows for the calculation of gene or molecular evolution. Their estimates suggested that gene duplications from to occurred 44 MYA; and were derived from a common ancestor 260MYA. 7 Protein Rate per amino acid site substitution per 109 years ( PAM) Growth hormone 3.7 AMINO ACID SUBSTITUTION Lactalbumin 2.7 RATES Carbonic anhydrase c 1.6 Lysozyme 0.98 Myoglobin 0.89 Cytochrome C 0.22 Ubiquitin 0.10 8 Nucleotide sequences evolve at different rates among the different positions within a codon e.g. substitution between pyrimidines and purines. Also varies among organisms e.g. Viruses. LIMITATIONS The clock varies across different species, or OF THE genes and across different parts of individual species due to selection pressure and HYPOTHESIS generation time. Observations of morphological evolution is not always in a steady state. Molecular evolution may occur independently (time scale) of morphological evolution. 9 Kimura (1968) theory of neutral selection of nucleotides. The main cause of evolutionary change is random drift of mutant alleles that are selectively Disadvantage/ neutral. problem LIMITATIONS L Evolutionary rate is also dependent on OF THE several factors, e.g. Metabolic rates, generation times, population sizes, HYPOTHESIS mutation rates and selection pressure etc. The clock only works if gene or protein function is maintained over evolutionary time. 10 Tajima in 1993 introduced a test of the molecular clock: The Relative rate test. It calculates the relative rates of evolution of 2 proteins ( A&B) or TAJIMA’S protein families. RELATIVE RATE It adds a third protein ( C) as an TEST outgroup for comparison from the pair. It uses a common ancestor O to determine rates of change It uses the null hypothesis (statistical testing) to accept or reject the clock. 11 TAJIMA RELATIVE RATE TEST Performs a chi square ( 2) test to determine if those rates are common ancestor comparable (null hypothesis) or whether we can reject the null at a significance level of p < 0.05 Measure rates AO, BO 12 THE TAJIMA ALGORITHM Considers 3 sequences (1, 2 and 3), and let 3 be the out-group) Considers the alignment with all three sequences. Seq. 1, 2 and 3 have nucleotides i, j and k. Uses the molecular clock, which suggests that all three sequences should have an equal rate of evolution regardless, of the substitution model Example: E(nijk) = E(njik) or E(nijk njik) If they are not, then the MC hypothesis is rejected 13 U SES OF E VOLUTIONARY T HEORY Important for: Gene family identification Gene discovery ( gene function) Origins of genetic diseases Epidemiology of pathogenic diseases Polymorphism characterization Evolutionary theories of life 14 Types of Evolutionary tree diagrams Branching shapes or topology 15 TREE TOPOLOGY Topology- the relationship of the tree subjects e.g. Proteins or genes or organism etc. e.g. common ancestor , homology etc. Branch lengths- reflects the degree of these relationships. A phylogenetic tree or dendrogram They represent the number of changes that occurred between is a graph consisting of branches each node. May be seen as a and nodes. The nodes represent distance scale on tree. a taxonomic unit and a branch 16 connects two nodes TREE NOMENCLATURE Clade or cluster- a monophyletic group or taxon derived from a common ancestor or node Node- an internal node is a bifurcating branch point in a tree. An external node represent an OTU Taxon- any named group of organisms( not necessarily arranged in a cluster Branch- Branches represent the relationship between taxa The Root- the common ancestor of all units in a rooted tree. Oldest timeline in tree 17 T HE C OMMON ANCESTOR If two sequences are homologous to each other it is assumed they were derived from a common ancestor A common ancestor suggests a similar function Molecular phylogeny can only generate an inferred tree from data Inferred trees are hypothetical versions of actual evolutionary events or true 18 This Photo by Unknown Author is licensed under CC BY-SA trees Examples of clades 19 Examples of clades Lindblad-Toh et al., Nature 438: 803 (2005), fig. 10 O PERATIONAL TAXONOMIC UNITS (OTU) OTU’s are Operational taxonomic units are the external taxa or units represented at each node They are found at the terminal nodes 2 A F 1 1 2 G B I H 2 C operational taxonomic unit (OTU) 1 D 6 E 21 time An out-group allows the tree root to be placed correctly. represents an organism that is similar but not from the same branch as the rest of the tree Outgroup 22 23 Rooted trees: The common ancestor is shown as a node from which all other nodes are derived Un-rooted trees: Branching relationships between nodes are shown by the way they are connected to each other, but the position of the common ancestor is not. An unrooted tree may be rooted by selecting an edge and re-drawing a rooted tree in relation to this branch. 24 P HYLOGENETIC T REES Cladogram:- branch lengths are equal and do no not reflect precise evolutionary time scales. Relationships are inferred. Phylogram or phenogram: different branch lengths and time scales reflect distance from a common ancestor or similarity relationships. A branched tree with a given scale is also called a dendrogram 25 F ENG AND D OOLITTLE (1987 ) Feng and Doolittle used the Needleman-Wunsch algorithm “to achieve the multiple alignment of a set of protein sequences and to construct an evolutionary tree depicting their relationship. They assumed the sequences shared a common ancestor, and constructed trees from different matrices derived directly from a MSA. 26 MSA AND P HYLOGENETICS Fundamental basis of a phylogenetic tree tree is a model of the alignment a tree can still be generated from a misalignment Each column is a character in a phylogenetic algorithm Each residue or unit in the column represents the state of the character 27 B ASIC ASSUMPTIONS IN TREE ANALYSIS 1. The molecular sequence is correct and originates from a specific source 2. The sequences are homologous 3. Each position in the alignment is homologous with every other in that alignment 4. The sampling of taxa is adequate to resolve the query of interest and is representative of the broader group 5. Each position in the sequence sample evolved independently 28 S PECIES OR G ENE TREES DNA, RNA or proteins may be Species Gene used in constructing trees. tree- tree- Speciation usually occurs when a species becomes reproductively isolated. In a species tree, each Divergence internal node represents a Divergence within a single speciation event among multiple homologous genes Genes (and proteins) may gene duplicate or otherwise evolve before or after any given Nodes represent Protein families speciation event. a speciation or gene families The topology of a gene (or event are used protein) based tree may differ 29 from each other. Homology Sequences share a common ancestor Does not necessarily refer to levels of similarity HOMOLOGY AND SIMILARITY Similarity Independent of historical data. A quantifiable term Measures degree of relatedness or difference 30 This Photo by Unknown Author is licensed under CC BY-SA 31 O RTHOLOGS Homology produced through speciation Genes are derived form a common ancestor due to divergence Orthologous genes or proteins have similar functions In identification, gene phylogeny matches the organism’s general phylogeny Genomic variation usually occurs after speciation 32 ORTHOLOG USES as markers for homologous chromosomal regions for comparative mapping phylogenetic footprinting operon prediction 33 PARALOGS Homology derived by gene duplication Genes derived from a common ancestor are due to duplication followed by divergence These genes may have different functions from the common ancestor or from each other In identification, the gene phylogeny does not follow the organism’s general phylogeny This is not generally used for phylogeny 34 PARALOG USES Identification of paralogs is a pre- requisite for studying processes of gene duplication. 35 X ENOLOGS Homology resulting from horizontal gene transfer between two organisms This is usually difficult to ascertain It may be determined by the %G+ C ratio Gene functions between the two organism are usually similar In identification, gene phylogeny does not match the organism’s general phylogeny when other genes do. 36 This Photo by Unknown Author is licensed under CC BY-SA Similar function but separate evolutionary origin 37 REFERENCES Baxevanis, A. D. and Oulette, B. F. Bioinformatics: A practical guide to the analysis of genes and proteins. 3rd ed.Wiley. Pevsner, J. Bioinformatics and Functional Genomics; Wiley, 2009 Tajima, F. (1993) Simple Methods for Testing the Molecular Evolutionary Clock Hypothesis, Genetics , 135:599- 607. 38