Hierarchical Clustering PDF

Summary

This presentation details hierarchical clustering methods, including how to use heatmaps and various distance metrics. The presentation provides examples of how to apply these techniques to data visualization and analysis. It discusses the different methods to compare clusters.

Full Transcript

Hierarchical Clustering What is heatmap? The columns represent different samples Hierarchical clustering orders the...

Hierarchical Clustering What is heatmap? The columns represent different samples Hierarchical clustering orders the rows and/or the columns based on similarity The rows represent measurements from different genes. This makes it easy to see correlations in the data Red usually stands for high expression of gene. Blue/Purple usually stands for low expression of gene. What is heatmap? These samples express the same genes Hierarchical clustering orders the rows and/or the columns based on similarity These genes behave This makes it easy to see the same correlations in the data Red usually stands for high expression of gene. Blue/Purple usually stands for low expression of gene. Heatmap with hierarchical clustering The heatmap without The heatmap with hierarchical clustering hierarchical clustering Heatmap with hierarchical clustering Heatmaps often come with dendrograms A simple example A simple example A simple example Conceptually 1) Figure out which gene is most like gene 1 A simple example Conceptually 1) Figure out which gene is most like gene 1 Gene 1 and gene 2 are different A simple example Conceptually 1) Figure out which gene is most like gene 1 Gene 1 and gene 3 are similar A simple example Conceptually 1) Figure out which gene is most like gene 1 Gene 1 and gene 4 are also similar A simple example Conceptually 1) Figure out which gene is most like gene 1 Gene 1 is most similar to gene 3 A simple example Conceptually 1) Figure out which gene is most similar to gene 1 2) Figure out which gene is most similar to gene 2 A simple example Conceptually 1) Figure out which gene is most similar to gene 1 2) Figure out which gene is most similar to gene 2 Gene 2 is most similar to gene 4 A simple example Conceptually 1) Figure out which gene is most similar to gene 1 2) Figure out which gene is most similar to gene 2 (then gene 3, gene 4) A simple example Conceptually 1) Figure out which gene is most similar to gene 1 2) Figure out which gene is most similar to gene 2 (then gene 3, gene 4) 3) Of the different combinations, figure out which two genes are the most similar. Merge them into a cluster A simple example Gene 1 and gene 3 are more similar than any other Conceptually combination 1) Figure out which gene is most similar to gene 1 2) Figure out which gene is most similar to gene 2 (then gene 3, gene 4) 3) Of the different combinations, figure out which two genes are the most similar. Merge them into a cluster A simple example Conceptually 1) Figure out which gene is most similar to gene 1 2) Figure out which gene is most similar to gene 2 (then gene 3, gene 4) 3) Of the different combinations, figure out which two genes are the most similar. Merge them into a cluster A simple example Conceptually 1) Figure out which gene is most similar to gene 1 2) Figure out which gene is most similar to gene 2 (then gene 3, gene 4) 3) Of the different combinations, figure out which two genes are the most similar. Merge them into a cluster 4) Go back to step 1, but now treat the cluster list it is a single gene. A simple example Conceptually 1) Figure out which gene is most similar to cluster 1 Cluster 1 is most similar to gene 4 A simple example Conceptually 1) Figure out which gene is most similar to cluster 1 2) Figure out which gene is most similar to gene 2 Cluster 2 is most similar to gene 4. But notice that we compare gene 2 to cluster 1 A simple example Conceptually 1) Figure out which gene is most similar to cluster 1 2) Figure out which gene is most similar to gene 2 (then gene 4) A simple example Conceptually 1) Figure out which gene is most similar to cluster 1 2) Figure out which gene is most similar to gene 2 (then gene 4) 3) Of the different combinations, figure out which two genes are the most similar. Merge them into a cluster. Gene 2 and gene 4 are the most similar combination A simple example Conceptually 1) Figure out which gene is most similar to cluster 1 2) Figure out which gene is most similar to gene 2 (then gene 4) 3) Of the different combinations, figure out which two genes are the most similar. Merge them into a cluster. Gene 2 and gene 4 are the most similar combination A simple example Conceptually 1) Figure out which gene is most similar to cluster 1 2) Figure out which gene is most similar to gene 2 (then gene 4) 3) Of the different combinations, figure out which two genes are the most similar. Merge them into a cluster. 4) Go back to step 1 A simple example A simple example It indicates both the similarity and the order that the clusters were formed A simple example Hierarchical clustering is usually accompanied by a “dendrogram” It indicates both the similarity and the order that the clusters were formed Cluster 1 was formed first and is most similar. It has the shortest branch A simple example Hierarchical clustering is usually accompanied by a “dendrogram” It indicates both the similarity and the order that the clusters were formed Cluster 2 was second and is the second most similar. It has the second shortest branch. A simple example Hierarchical clustering is usually accompanied by a “dendrogram” It indicates both the similarity and the order that the clusters were formed Cluster3, which contains all of the genes, was formed last. It has the longest branch. A simple example in DETAIL Conceptually 1) Figure out which gene is most like gene 1 We have to define what “most similar” stands for A simple example in DETAIL Conceptually 1) Figure out which gene is most like gene 1 A simple example in DETAIL A simple example in DETAIL A simple example in DETAIL Euclidean distance between gene 1 and 2 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 1 ! + 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑠 𝑠𝑎𝑚𝑝𝑙𝑒 2 ! A simple example in DETAIL Euclidean distance between gene 1 and 2 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 1 ! + 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑠 𝑠𝑎𝑚𝑝𝑙𝑒 2 ! A simple example in DETAIL Euclidean distance between gene 1 and 2 1.6 − (−0.5) ! + 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑠 𝑠𝑎𝑚𝑝𝑙𝑒 2 ! A simple example in DETAIL Euclidean distance between gene 1 and 2 1.6 − (−0.5) ! + 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑠 𝑠𝑎𝑚𝑝𝑙𝑒 2 ! A simple example in DETAIL Euclidean distance between gene 1 and 2 1.6 − (−0.5) ! + 0.5 − (−1.9) ! A simple example in DETAIL Euclidean distance between gene 1 and 2 2.1 ! + 2.4 ! A simple example in DETAIL Euclidean distance between gene 1 and 2 2.1 ! + 2.4 ! The distance between gene 1 and 2 in sample 1 2.1 A simple example in DETAIL Euclidean distance between gene 1 and 2 2.1 ! + 2.4 ! The distance between gene 1 and 2 in sample 2 The distance between 2.4 gene 1 and 2 in sample 1 2.1 A simple example in DETAIL Euclidean distance between gene 1 and 2 2.1 ! + 2.4 ! The hypotenuse is the total “distance” between genes 1 and 2 2.4 2.1 A simple example in DETAIL Euclidean distance between gene 1 and 2 2.1 ! + 2.4 ! The Pythagorean theorem says that the 2.4 hypotenuse = 𝑥 ! + 𝑦 ! 2.1 A simple example in DETAIL Euclidean distance between gene 1 and 2 2.1 ! + 2.4 ! The Pythagorean theorem says that the hypotenuse = 𝑥 ! + 𝑦 ! 2.4 = 2.1 ! + 2.4 ! = 3.2 2.1 A simple example in DETAIL When we are having more samples, we just extend the equation (𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 1)! +(𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 2)! +(𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 2)! … Distance metrics Euclidean distance is just one method. There are lots more, including: - The Manhattan distance The Manhattan distance is just the absolute value of the differences… 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 1 + 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 2 + |𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 … | Distance metrics Euclidean distance is just one method. There are lots more, including: - The Manhattan distance The Manhattan distance is just the absolute value of the differences… 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 1 + 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 2 + |𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 … | Total Length = Manhattan distance Differences using Euclidean and Manhattan Using “Euclidean” distance Using “Manhattan” distance Differences using Euclidean and Manhattan Using “Euclidean” distance Using “Manhattan” distance But the choice is arbitrary Differences using Euclidean and Manhattan Using “Euclidean” distance Using “Manhattan” distance But the choice is arbitrary There’s no biological or physical reason to choose one and not the other Pick the one that gives you more insight into your data Comparing clusters Comparing clusters Do you remember how we merged gene 1 and 3 into cluster 1 and compared it to the other genes? Comparing clusters Do you remember how we merged gene 1 and 3 into cluster 1 and compared it to the other genes? There are different ways to compare clusters, too. Different Ways To Compare Clusters Different Ways To Compare Clusters Imagine our data was spread out on an X-Y plane. Different Ways To Compare Clusters Imagine our data was spread out on an X-Y plane. Different Ways To Compare Clusters Imagine our data was spread out on an X-Y plane. Different Ways To Compare Clusters Imagine our data was spread out on an X-Y plane. Different Ways To Compare Clusters Imagine our data was spread out on an X-Y plane. We can compare that point to: 1) The average of each cluster (which is called as “centroid”) 2) The closest point in each cluster (which is called as “single-linkage”) 3) The furthest point in each cluster (which is called as “complete-linkage) Different Ways To Compare Clusters Imagine our data was spread out on an X-Y plane. We can compare that point to: 1) The average of each cluster (which is called as “centroid”) 2) The closest point in each cluster (which is called as “single-linkage”) 3) The furthest point in each cluster (which is called as “complete-linkage) 4) Etc. Different Ways To Compare Clusters Here’s a heatmap that Here’s a heatmap that Here’s a heatmap that compares the furthest points compares the average points compares the closest points in in the clusters. in the clusters. the clusters.

Use Quizgecko on...
Browser
Browser