Comparative Genomics 1 and 2 PDF
Document Details
Uploaded by SeasonedLagrange3473
Tags
Summary
This document provides a comprehensive overview of comparative genomics, exploring various aspects such as comparisons across different distances, analyses of synteny, collinearity, and phenotypic convergence. It also covers comparative genomics of prokaryotes, the analysis of Neandertal and modern human genomes, and studies of grass genomes.
Full Transcript
Comparative Genomics 1 and 2 ============================ Table of Contents {#table-of-contents.TOCHeading} ================= [Comparative Genomics 1 and 2 1](#comparative-genomics-1-and-2) [What does comparative genomics entail? 2](#what-does-comparative-genomics-entail) [Comparisons across dif...
Comparative Genomics 1 and 2 ============================ Table of Contents {#table-of-contents.TOCHeading} ================= [Comparative Genomics 1 and 2 1](#comparative-genomics-1-and-2) [What does comparative genomics entail? 2](#what-does-comparative-genomics-entail) [Comparisons across different distances: 2](#comparisons-across-different-distances) [1. Long phylogenetic distances 2](#long-phylogenetic-distances) [2. Moderate distances 2](#moderate-distances) [3. Short distances 2](#short-distances) [What is being compared? 2](#what-is-being-compared) [Synteny 2](#synteny) [Conserved Synteny 2](#conserved-synteny) [Collinearity 2](#collinearity) [Significance of synteny, conserved synteny and collinearity 2](#significance-of-synteny-conserved-synteny-and-collinearity) [Factors affecting synteny, conserved synteny and collinearity: 2](#factors-affecting-synteny-conserved-synteny-and-collinearity) [Phenotypic convergence 3](#phenotypic-convergence) [How does it occur? 3](#how-does-it-occur) [Evolutionary significance of Convergent Recruitment 3](#evolutionary-significance-of-convergent-recruitment) [Convergent recruitment -- PEPC genes in C4 Monocot lineages 3](#convergent-recruitment-pepc-genes-in-c4-monocot-lineages) [What was found: 3](#what-was-found) [Phenotypic convergence: 3](#phenotypic-convergence-1) [Comparative genomics of Prokaryotes: 3](#comparative-genomics-of-prokaryotes) [In the context of Influence of HGT and IGD on protein family expansion 3](#in-the-context-of-influence-of-hgt-and-igd-on-protein-family-expansion) [The study: 4](#the-study) [Neandertal and Modern Human genomes 4](#neandertal-and-modern-human-genomes) [Study comparison of SNPS between Neandertals and present day humans: 4](#study-comparison-of-snps-between-neandertals-and-present-day-humans) [Grass Genomes 4](#grass-genomes) [1. Brachypodium distachyon (Brachy) 5](#brachypodium-distachyon-brachy) [The Brachy Genome compared to other Grass genomes -- Colinearity 5](#the-brachy-genome-compared-to-other-grass-genomes-colinearity) [Model of Grass chromosome evolution 5](#model-of-grass-chromosome-evolution) What does comparative genomics entail? -------------------------------------- - It is all about the similarities and differences of a genome. - In the genomes of contemporary related organisms, we see the conservation of: - Sequences coding for proteins and functional RNAs from a last common ancestor. - The sequences controlling the regulation of genes that have similar patterns of expression. - Divergence is seen between sequences that code for proteins, functional RNAs and regulatory regions responsible between species. - The conservation of sequence implies conservation of function. - Recognition of orthologues and paralogues to make meaningful comparisons. Comparisons across different distances: --------------------------------------- ### Long phylogenetic distances - 1 billion years since separation (A-D in the photo) - This comparison gives information on types and numbers of genes in functional categories e.g. you can see genes involved in coding for proteins and metabolism - Shows little conversion of gene order and regulatory sequnces. ### Moderate distances - 70-10 million years since separation (A-C in the photo) - Functional and non-functional DNA is found in conserved regions. - Functional sequences will have changed less than non-functional DNA - Purifying (Negative) selection: removal of deleterious mutation. ### Short distances - 5 million years since separation (A-B in the photo) - Information about what sequence are responsible for making organisms unique - Difference due to positive selection (aka Darwinian selection) - Which is the retention of mutation that benefit an organisms. What is being compared? ----------------------- 1. Repeat regions (transposable elements, microsatellites) 2. Markers (SNPs) 3. Non-protein coding regions (RNA-only genes, gene deserts) 4. Duplications: (whole genome, segmental, gene, gene families) 5. Base composition e.g. %GC content (overall, coding regions and non-coding regions) 6. Gene number in functional categories 7. Favourite genes 8. Chromosome rearrangements 9. Gene order. Synteny ------- - Are genes or gene elements located on the same chromosome. - May or may not be linked (aka they do not need to be inherited as a linked cluster of genes). ### Conserved Synteny {#conserved-synteny.ListParagraph} - Are a conservation of synteny of orthologous genes between two or more different organisms. - The extent is inversely proportional to length of time since divergence from the ancestral locus. - The longer the time has passed between two species since they diverged, the less likely the order of the genes will be conserved. ![](media/image2.png) Collinearity ------------ - It's a conservation of gene (or marker) order along a chromosomal segment in different species. - In present day, this word is interchangeable with synteny. Significance of synteny, conserved synteny and collinearity ----------------------------------------------------------- - In genomes of some grasses, you will see high conservation of synteny and collinearity. - Knowing the genome sequence of a species with a small genome helps with the mapping and isolating genes coding for desirable traits from species with larger genomes. - In medicine loci with medical or phenotypic consequences can be recognised because of linkage to a cluster of syntenic loci ### Factors affecting synteny, conserved synteny and collinearity: {#factors-affecting-synteny-conserved-synteny-and-collinearity.ListParagraph} - Gene loss, multiple rounds of gene duplications, chromosomal rearrangements (fusions, splits inversions, reciprocal translocations) - Mask sequences that have been derived from a common ancestral sequence. Phenotypic convergence ---------------------- - It's the independent evolution of similar or identical traits in distantly related species due to selective pressures, for example: 1. Eyes ( The human eye and octopus eye although we are distantly related, the anatomy of both human and octopus eye is fairly similar) 2. Echolocation in dolphins and bats 3. Protein properties 4. Biochemical pathways - Through phenotypic convergence we can identify detriments leading to the independent origins of adaptive traits. - Comparing detriments (evolutionary enablers) gives information on the genomic (and environmental) background in limiting or furthering an adaptive innovation. ### How does it occur? {#how-does-it-occur.ListParagraph} - Alterations of different loci -\> shows changes in different enzymes can lead to similar phenotypes. - Alteration of homologous genes from different taxonomic groups (convergent recruitment) - New function then usually evolves by the modification of pre-existing genes, but 2 criteria need to be met: 1. No deleterious effect through loss of ancestral function. 2. Expression profiles of the genes and kinetics of the proteins they encode must be suitable for new function. Evolutionary significance of Convergent Recruitment --------------------------------------------------- - The presence of genes able to evolve a new function enhances the chances that a given group of organisms can evolve a new trait, but only a few genes have the potential to make a specific phenotypic change (aka evolutionary enablers) - The absence of these genes in other groups of organisms can hinder the acquisition of a new trait. Convergent recruitment -- PEPC genes in C4 Monocot lineages ----------------------------------------------------------- - Evolution of phosphoenolpyruvate carboxylase (PEPC) in groups of grasses and sedges. - PEPC are enzymes that catalyse the first reaction of C4 photosynthesis/ - In the photo the red and blue triangles are the PEPC gene lineages important in C4 grasses and sedges. - This tells us that there are other C4 orthologues and homologues that did not give rise to the C4 pathway. ### What was found: {#what-was-found.ListParagraph} Grasses- the same lineage of PEPC gene (out of 6 lineages) was independently recruited over 8 times during evolution of C4 grasses. Sedges -- the same lineage of PEPC gene used (distinct from that used by grasses) was recruited over 5 times during evolution of C4 sedges. - This suggests that there was something in those genes that predisposed the grasses and sedges to evolve C4 (evolutionary enablers) - The repeated use of the same lineage in both groups indicates while they both have separate evolutionary pathways, they have faced similar selective pressures that favoured the development of a C4 pathway. Phenotypic convergence: ----------------------- - When there is phenotypic convergence due to convergent recruitment you may see identical substitutions in the recruited genes in different lineages of organisms. - E.g. C4 PEPC genes in grasses and sedges in the photo. - Likely to occur because effects of the resulting amino acid substitutions will be comparable. - However there are limitations to substitutions that can occur that allow for emergence of functional or optimised protein (example a sub that would destroy the active of a gene renders it unfunctional) - Note: that convergent substitution do not explain all convergent phenotypes there can be non-convergent phenotypes (divergent) substitutions that also play a role in these adaptive changes. Comparative genomics of Prokaryotes: ------------------------------------ - Smaller genomes are found living in restricted environments such as Nanoarcheum equitans lines inside another archea. - Larger genomes in organism living in complex habitats e.g. bradyrhizobium japonicum, soil bacterium that forbs symbiotic relationship with plants (nitrogen-fixing root nodules) - The reason for this was that constant environment may allow survival with fewer genes (resulting in a smaller genome) however changeable habitats e.g. soils- many genes are required for survival even though they may not be used all the time (resulting in a larger genome) ### In the context of Influence of HGT and IGD on protein family expansion {#in-the-context-of-influence-of-hgt-and-igd-on-protein-family-expansion.ListParagraph} - ![](media/image4.png)Prokaryotes show rapid adaptation (it is why we see antibiotic resistance and because they also very easily take DNA from their environment) - A reason for the adaptation is due to acquisition of genes by horizontal gene transfer (HGT) , which increases the genome size. - Another reason is due to intrachromosomal gene duplication (IGD), which also increases the genome size ### The study: {#the-study.ListParagraph} - The aim was to look at how much of the genome size increase is due to IGD or HGT - Involved analysing 110 genomes representing eight distinct clades of prokaryotes all with different genome size (small, average, large) - They searched for paralogues (which are homologues acquired through IGD)- identical in sequence to endogenous gene and are tandemly (next to each other) arranged. - Xenologues(which are homologues acquired through HGT) are randomly arranged and showing sequence variability to endogenous gene. - Contrary to what they expected they found that 88-98% of expansions are due to HGT in both small and large genomes and larger genomes contained the most xenologues. - The codon adaptation index is a way that gene expression is estimated or measured - They found that paralogues have higher expression levels than xenologues - Xenologues had higher expression levels than singletons (which are basic genetic mechanism so there is really no need for them to be highly expressed) Neandertal and Modern Human genomes ----------------------------------- - The two have diverged from 270,000-400,000 years ago, the dates will be different according to who you ask. - The reason for sequencing the Neandertal genome was to record mutations that have become fixed or risen to high frequency in humans in the last several hundred thousand years. - Identify genes that have been affected by positive selection since neandertals and humans diverged. - Majority of DNA used in the 2010 study was extracted from bones of three females from the Vindija cave (Croatia), other samples from bones in Spain , Germany and Russia. - They run numerous control and checks to exclude contamination from microbial DNA in the cave and modern human DNA. - They utilised next generation sequencing technologies. - They then assembled the sequence using reference genomes from chimpanzee and modern human genome (Ventner) - They compared the sequenced with: - Chimp - Ventner - South African (San) - West African (Yoruba) - PNG - Chinese (Han) - Western European (French) - One third of the genome is still not sequenced as the DNA is not of high enough quantity and/or quality. - The genomes are 99.84% identical: - 78 nucleotide substitutions in protein-coding genes in 300,000 years - Modern humans have derived a state of these genes with Neandertals having the ancestral (chimp-like) state. - The genes include those coding for proteins involved in skin physiology and it is not clear what effect these changes have at the phenotype level. - It has also been identified 20 genomic regions that showed positive selection in modern human genomes; with 5 regions containing no protein-coding genes and 15 regions containing 1-12 genes that are involved in metabolism, cognitive function, and skeletal morphology. ### Study comparison of SNPS between Neandertals and present day humans: {#study-comparison-of-snps-between-neandertals-and-present-day-humans.ListParagraph} - Present day humans where 2 European Americans, 2 east Asians and 4 west Africans and diverse modern humans and chimps. - They found that the Neanderthals share more SNPs with Europeans and East Asians than sub-Saharan Africans, suggesting that gene flow from Neandertals to modern humans is after modern humans left Africa but before migrating into Eurasia. - 1-4% of modern Eurasian genomes derived from Neandertal genomes. Grass Genomes ------------- Grasses: provides the bulk of human nutrition. - Is a sustainable energy sources- biofuels - And is feed for animals - But consumption is outstripping supply - 3 subfamilies contain major food, fodder and fuel grass species. ![](media/image6.png) - Ancestor of all underwent whole genome duplication. ### Brachypodium distachyon (Brachy) - Is a representative of subfamily containing barley and wheat. - Relatively small genome for this subfamily - The chromosome number is half that of others making it very easy to work with. - Most LTR retrotransposons located in the pericentromeric regions and conserved syntenic breaks also seen in other grass genomes. - DNA transposons are more widely distributed with majority associated with gene rich regions also seen in other grass genomes. - From comparative analyses on sequenced grass genomes can conclude that retrotransposon content scales with genome size for all grass genomes - DNA transposon content is not correlated with genome size for all grass genomes. - 77-84% of gene families found in rice, sorghum and Brachy are shared which reflects relatively recent common origin. - There are various lecels of lineage specific genes: - Genes for which no orthologue can be found in related species =singleton - 6 major duplications of chromosomal regions, covering 93% of the genome, originated from the ancient WGD event before grass families diverged. ![A close-up of a diagram Description automatically generated](media/image8.png) The Brachy Genome compared to other Grass genomes -- Colinearity ---------------------------------------------------------------- - Collinearity is particularly noticeable in the euchromatic (gene-rich) regions of grass genomes. - The long arm of Brachy chromosome 5 (Bd5) shows a high degree of collinearity with the corresponding chromosomes in sorghum (Sb6) and rice (Os4) - The short arm has less collinearity, it has roughly half the gene density compared to rest of the chromosome and contains a high content of retrotransposons. (it gains retrotransposons unlike other parts of the genome through replication and lack of recombination. - The presence of these retrotransposon-rich region suggest that these repeats have been maintained for 50-70 million years. - The hypothesis: is that chromosome ancestral reached a tipping point when the high retrotransposon content began to harmfully impact the genes. This could have shaped the evolutionary path of this chromosome by impacting gene density and potentially limiting recombination. Model of Grass chromosome evolution ----------------------------------- Present-day grass chromosomes have evolved from those of the common ancestor through - WGD\ ancestral chromosome - translocations\ ancestral chromosome fusions lineage-specific nested insertions