Phylogenetics1 PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document covers phylogenetics, a field studying the evolutionary relationships among lineages. It discusses various concepts like phylogenetic trees, distance methods, parsimony, and maximum likelihood, as well as important software programs for phylogenetic analyses.
Full Transcript
Phylogenetics What is phylogenetics? Study of branching patterns of descent among lineages Lineages – Populations – Species – Molecules Shift between population genetics and phylogenetics is often the species boundary – Distantly related populations also show patterning...
Phylogenetics What is phylogenetics? Study of branching patterns of descent among lineages Lineages – Populations – Species – Molecules Shift between population genetics and phylogenetics is often the species boundary – Distantly related populations also show patterning – Patterning across geography What is phylogenetics? Goal: Determine and describe the evolutionary relationships among lineages – Order of events – Timing of events Visualization: Phylogenetic trees – Graph – No cycles Phylogenetic trees Nodes – Terminal – Internal – Degree Branches Topology Phylogenetic trees Rooted or unrooted – Rooted: Precisely 1 internal node of degree 2 Node that represents the common ancestor of all taxa – Unrooted: All internal nodes with degree 3+ Stephan Steigele Phylogenetic trees Rooted or unrooted – Rooted: Precisely 1 internal node of degree 2 Node that represents the common ancestor of all taxa – Unrooted: All internal nodes with degree 3+ Phylogenetic trees Rooted or unrooted – Rooted: Precisely 1 internal node of degree 2 Node that represents the common ancestor of all taxa – Unrooted: All internal nodes with degree 3+ Binary: all speciation events produce two lineages from one Cladogram: Topology only Phylogram: Topology with edge lengths representing time or distance Ultrametric: Rooted tree with time-based edge lengths (all leaves equidistant from root) Phylogenetic trees Clade: Group of ancestral and descendant lineages Monophyly: All of the descendants of a unique common ancestor Polyphyly: Descendants include lineages from multiple ancestors Paraphyly: One or more monophyletic subgroups are left apart from all other descendants of a unique common ancestor Phylogenetic trees Phylogenetic trees Phylogenetic trees Phylogenetic trees Figure 17.16 The transition from genetic polyphyly to paraphyly to monophyly in speciation Conceptual framework Conceptual framework Conceptual framework Conceptual framework Conceptual framework Conceptual framework In each lineage, new mutations are fixed independently Each subsequent mutation is placed on a the previously fixed sequence Given enough time, each lineage goes through a unique sequence of (nested) fixation events Sequential fixation generates a readable history of sequence similarity and differences How to read this sequence? – Given extant taxa, how to reconstruct history? – Use nested shared similarity to infer history Building phylogenetic trees Distance: Estimate distance matrix given data, generate tree that represents these distances Parsimony: Attempts to find a tree that minimizes the number of changes given data Maximum likelihood & Bayesian: Model-based approach to find the most-likely tree given the data Distance Compute a distance matrix given a set of biological data – UPGMA – Neighbor joining Compute a tree that most resembles this distance matrix Distance Unweighted pair group method using arithmetic averages (UPGMA) Given a set of taxa and a distance matrix, UPGMA produces a rooted tree with edge lengths Clusters taxa, then merging clusters Assembled outside in Distance Distance Distance Neighbor-Joining (NJ) Saitou and Nei 1987 Given distance matrix, produces an unrooted phylogenetic tree with edge lengths Repeatedly pairing neighboring taxa Determine which nodes are neighbors based only on distance matrix Parsimony The preferred evolutionary tree is the one that requires “the minimum net amount of evolution” (Edwards and Cavalli-Sforza, 1963) Each taxon described by a set of characters Each character can be in one of a finite number of states Steps = changes in character states Goal: Find the tree that explains the distribution of character sets across taxa with the fewest number of steps Parsimony Teresa Przytycka Parsimony 1 2 3 4 6 5 Teresa Przytycka Parsimony Binary characters Multistate characters Ordered changes Reversible changes Parsimony Fitch Wagner Dollo Carmin-Sokal prefect Maximum likelihood Given a set of biological data, and a probabilistic model of evolution, find the tree that has the highest probability of generating the data – Multiple sequence alignment – Nucleotide substitution model Tree Search methods Exhaustive search (exact) Branch and bound (exact) Heuristic (approximate) Exhaustive Evaluate lengths of every possible tree Number of Taxa Number of trees 3 1 5 15 10 2,027,025 20 1020 50 1074 Branch and Bound Hendy and Penny 1982 Much faster, still guaranteed to find the best tree Determine upper bound of length of shortest tree Follow predictable search path through tree possible tree topology Abandon any fork of search tree where the upper bound is exceeded before the last taxon is added Heuristic Search trees by swapping Very fast Find a starting tree – Stepwise addition – Star decomposition Rearrange tree to find better trees – Nearest neighbor Interchange (NNI) – Subtree pruning and regrafting (SPR) – Tree bisection and Reconnection (TBR) Heuristic Bootstrap – Randomly resample the data with replacement – Rebuild tree – What fraction of the bootstrap samples show support for a particular node? Heuristic Felsenstein Efron et al. 1996 Heuristic Bootstrap – Randomly resample the data with replacement – Rebuild tree – What fraction of the bootstrap samples show support for a particular node? Jackknife – Randomly subset data – Rebuild tree – What fraction of jackknife samples show support for particular node – Whether excluding certain characters has major effect on tree Populations of North American Bears Software: PHYLIP Joe Felsenstein (1980) Over 29,000 registered users Parsimony, distance matrix, ML DNA, RNA, protein, restriction sites, discrete characters, continuous characters, allele frequencies, distance matrices Freely available http://evolution.gs.washington.edu/phylip.html Webservers available as well Software: MEGA Molecular Evolutionary Genetic Analysis Sudhir Kumar Parsimony, distance matrix, ML – NJ – UPGMA – Minimum evolution Molecular data (nucleic acid, protein sequences) Bootstrapping, consensus trees Data editing Sequence alignment (with ClustalW) http://www.megasoftware.net Software: EMBOSS European Molecular Biology Open Software Suite Peter Rice, Alan Bleasby, Jon Ison General sequence analysis Phylogeny (PHYLIP) Alignment (CLUSTAL) All sequence formats Many alignment formats http://emboss.sourceforge.net/what/