Lecture 3: Phenetics 1 PDF
Document Details
Uploaded by IrreplaceableMarimba
UKZN
2022
David Maddison
Tags
Summary
This document is a lecture presentation on phenetics, covering different phylogenetic reconstruction methods, types of analyses in systematics, and phenetic methodologies. It discusses the founding principles of phenetics and how to analyze data for phenograms and clustering. The presentation also introduces different clustering methods and explains Euclidean Distance in details..
Full Transcript
Lecture 3: Phenetics 1 Not the real thing???? Used with the kind permission of Dr. David Maddison author and copyright holder - http://david.bembidion.org/tshirts.html 1 Phylogenetic Reconstruction Methods...
Lecture 3: Phenetics 1 Not the real thing???? Used with the kind permission of Dr. David Maddison author and copyright holder - http://david.bembidion.org/tshirts.html 1 Phylogenetic Reconstruction Methods Distance-matrix Methods Discrete Data Methods Models based Methods Maximum Parsimony OR Cladistics Maximum Bayesian Likelihood Inference 2 Types of Analyses Used in Systematics Three approaches to analysis can be taken: 1. Distance or Algorithmic Methods/Phenetics common methods of analysis include Nearest Neighbour and UPGMA (these produce only one ‘correct’ tree) Unless stated otherwise – all photographs and illustrations are from Wikipedia Creative Commons, GNU licence or in Public Domain 3 2. Parsimony Method/Cladistics involves Maximum Parsimony (these can produce numerous possibly ‘correct’ or shortest trees) 3. Model-based Methods common methods of analysis include Maximum Likelihood and Bayesian Analysis 4 Phenetics: distance-matrix method also called Numerical Taxonomy or Taxometrics origin from the word phenotype phenetics has origins in Biological Statistics/ Biometry based on grouping of taxa using overall similarity 5 utilizes clustering methods/analyses starting unit is the OTU (= Operational Taxonomic Unit) founding principles laid down by Sokal & Sneath (1963 & 1973) Robert Sokal Peter Sneath (1926-2012) (1923-2011) http://www2.le.ac.uk/ebulletin/people/bereavements/2010- 6 http://www.pandasthumb.org/archives/2012/04/robert-r-sokal.html 2019/2011/09/nparticle.2011-09-15.4186255082 Phenetics Founding Principles 1. more characters = more data = more reliable results 2. every character treated as equally important = equal weight 3. overall similarity between any two OTUs is calculated by comparing them character by character 4. distinct groups (= phenons) are recognized because of correlated character sets 7 Founding principles cont. 5. phylogenetic conclusions can be made from the phenogram (??) 6. it turns taxonomy into an objective science 7. classifications are based on phenetic similarity 8 Phenetics Methodology Type of Data: minimum of 60 characters required to get statistically accurate results & avoid unintentional weighting characters must be all or nothing i.e. present or absent (binary) e.g.? characters treated as +/ – OR 0/1 multistate characters (e.g. leaf type) can be converted into binary (e.g. leaf longer than broad [= 0] versus leaf broader than long [= 1]) 9 Reniform leaf 10 http://www.hgtv.com/landscaping/a-look-at-leaves/index.html http://cite.nwmissouri.edu/nworc/files/Agriculture/LeafID_Shape/Reniform2.jpg Data analysis Presence & absence of characters is used to prepare a data matrix Leaves Present Stems Succulent Flowers Solitary Fruit a berry Seeds with coma OTU 1 Present Present Absent Present Present OTU 2 Present Present Absent Present Present OTU 3 Present Absent Absent Present Absent OTU 4 Absent Absent Present Present Present OTU 5 Absent Absent Absent Absent Present Leaves Present Stems Succulent Flowers Solitary Fruit a berry Seeds with coma OTU 1 1 1 0 1 1 OTU 2 1 1 0 1 1 OTU 3 1 0 0 1 0 OTU 4 0 0 1 1 1 OTU 5 0 0 0 0 1 11 Similarity co-efficient calculations Character 1 2 3 4 5 6 7 8 9 10 OTU α 1 0 0 0 1 1 0 0 1 0 OTU β 0 0 0 0 1 0 0 1 1 0 OTU α 1 0 1 a b a+b OTU β 0 c d c+d a+c b+d p Adapted from Dunn & Everitt (1982) 12 Character 1 2 3 4 5 6 7 8 9 10 OTU α 1 0 0 0 1 1 0 0 1 0 OTU β 0 0 0 0 1 0 0 1 1 0 OTU α 1 0 1 2 (=a) 1 (=b) 3 OTU β 0 2 (=c) 5 (=d) 7 4 6 10 a 2 Jaccard co-efficient = a + b + c = 2+1+2 = 2 = 0.4 13 5 Data matrix to similarity matrix Data matrix Leaves Stems Flowers Fruit a Seeds Present Succulent Solitary berry with coma OTU 1 1 1 0 1 1 OTU 2 1 1 0 1 1 OTU 3 1 0 0 1 0 OTU 4 0 0 1 1 1 OTU 5 0 0 0 0 1 Not using the J co-efficient in this exercise !!! 14 each OTU is compared with every other OTU character by character a similarity (or dissimilarity) co-efficient calculated for each pair of OTUs this is known as a similarity/dissimilarity matrix or cluster matrix OTU 1 OTU 2 OTU 3 OTU 4 OTU 5 OTU 1 1 0.7 0.6 0.5 0.1 OTU 2 1 0.5 0.3 0.4 OTU 3 1 0.8 0.2 OTU 4 1 0.9 OTU 5 1 15 OTU 1 OTU 2 OTU 3 OTU 4 OTU 5 OTU 1 1 0.7 0.6 0.5 0.1 OTU 2 1 0.5 0.3 0.4 OTU 3 1 0.8 0.2 OTU 4 1 0.9 OTU 5 1 the similarity coefficient is the % of homologous characters between the OTUs e.g. OTU1 when compared with itself has 100% homologous characters i.e. 100% similarity (= 1) however, when compared with OTU2 it shares only 70% of the same characters i.e. 70% similar (= 0.7) 16 Clustering tables (data or similarity) good to express and understand similarity ???? but can use clustering methods to make it more effective (move from tables to tree-like figures) Nearest Neighbour (Neighbour Joining) (= Single linkage clustering) a simple algorithm takes the two most similar OTUs (based on their similarity co-efficient), and links these together these two OTUs are connected by this single linkage 17 OTU 1 OTU 2 OTU 3 OTU 4 OTU 5 OTU 1 1 0.7 0.6 0.5 0.1 OTU 2 1 0.5 0.3 0.4 OTU 3 1 0.8 0.2 OTU 4 1 0.9 OTU 5 1 18 OTU 1 OTU 2 OTU 3 OTU 4 OTU 5 OTU 1 1 0.7 0.6 0.5 0.1 OTU 2 1 0.5 0.3 0.4 OTU 3 1 0.8 0.2 OTU 4 1 0.9 OTU 5 1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 5 4 19 Next…..??? most similar OTU is then linked to these two first OTUs 20 OTU 1 OTU 2 OTU 3 OTU 4 OTU 5 OTU 1 1 0.7 0.6 0.5 0.1 OTU 2 1 0.5 0.3 0.4 OTU 3 1 0.8 0.2 OTU 4 1 0.9 OTU 5 1 21 OTU 1 OTU 2 OTU 3 OTU 4 OTU 5 OTU 1 1 0.7 0.6 0.5 0.1 OTU 2 1 0.5 0.3 0.4 OTU 3 1 0.8 0.2 OTU 4 1 0.9 OTU 5 1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 5 4 3 22 …..and so on until all the OTUs have been linked in a downward manner - highest similarity to lowest similarity this process is said to be heuristic = learning by discovering thing for themselves and learning from own experiences rather than by telling or directing you 23 OTU 1 OTU 2 OTU 3 OTU 4 OTU 5 OTU 1 1 0.7 0.6 0.5 0.1 OTU 2 1 0.5 0.3 0.4 OTU 3 1 0.8 0.2 OTU 4 1 0.9 OTU 5 1 24 OTU 1 OTU 2 OTU 3 OTU 4 OTU 5 OTU 1 1 0.7 0.6 0.5 0.1 OTU 2 1 0.5 0.3 0.4 OTU 3 1 0.8 0.2 OTU 4 1 0.9 OTU 5 1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 5 4 3 1 2 25 OTU 1 OTU 2 OTU 3 OTU 4 OTU 5 OTU 1 1 0.7 0.6 0.5 0.1 OTU 2 1 0.5 0.3 0.4 OTU 3 1 0.8 0.2 OTU 4 1 0.9 OTU 5 1 26 OTU 1 OTU 2 OTU 3 OTU 4 OTU 5 OTU 1 1 0.7 0.6 0.5 0.1 OTU 2 1 0.5 0.3 0.4 OTU 3 1 0.8 0.2 OTU 4 1 0.9 OTU 5 1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 5 4 3 1 2 27 1.0 0.9 0.8 0.7 0.6 0.5 0.4 5 4 3 1 2 once all OTUs linked – do not consider other similarity values i.e. single linkage of highest s values 28 Other Clustering Methods Unweighted average linkage clustering or Unweighted Pair Group Method with Arithmetic Mean (UPGMA) also known as Average linkage between groups – e.g. in SPSS initially links OTUs with highest similarity makes this cluster an “OTU” 29 OTU A B C D E A 1 0.7 0.4 0.8 0.4 B 1 0.5 0.5 0.3 C 1 0.4 0.6 D 1 0.4 E 1 Unweighted average linkage clustering 30 OTU A B C D E A 1 0.7 0.4 0.8 0.4 B 1 0.5 0.5 0.3 C 1 0.4 0.6 D 1 0.4 E 1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 A D 31 in unweighted average linkage clustering the similarity matrix is then recalculated treating the new cluster as a new OTU the algorithm then finds the next most similar cluster joins the pair to form a bigger, higher level cluster (= new OTU) 32 OTU A+D B C E A +D 1 0.6 0.4 0.4 B 1 0.5 0.3 C 1 0.6 E 1 33 OTU A+D B C E A +D 1 0.6 0.4 0.4 B 1 0.5 0.3 C 1 0.6 E 1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 A D B C E 34 this pairing and recalculation continues until all clusters are joined the distance between any two clusters is taken to be the average of the distance between all OTUs that make up that cluster 35 OTU A+D B C E A +D 1 0.6 0.4 0.4 B 1 0.5 0.3 C 1 0.6 E 1 OTU A+D+B C+E A +D + B 1 0.4 C+E 1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 A D B C E 36 these methods result in distance trees because they measure the distance of OTUs from one another based on their overall similarity the most widely used measure of such ‘distance’ is Euclidean Distance (also called taxonomic distance) 37 euclidean distance is easy to measure if one has two points A & B it is simply the distance between these in phenetics A & B would be two characters as the number of points (= characters) increases the calculation of distance becomes increasingly more complicated 38 Conclusion (Part 1) Principles Methodology Analysis Similarity Clustering Analysis Presentation 39