Comparative Genomics I & II PDF
Document Details

Uploaded by SeasonedLagrange3473
The University of Western Australia
Martha Ludwig
Tags
Summary
This document appears to be lecture notes for a course on comparative genomics. The lecture notes cover topics such as suggested readings, learning outcomes, and various aspects of comparative genomics.
Full Transcript
Genomics (GENE3370) Comparative Genomics I & II Prof Martha Ludwig Suggested Reading: Christin PA et al. (2010) Causes and evolutionary significance of genetic convergence. Trends in Genetics 26: 400-405 Gibbons A (2010) Close encounters of the p...
Genomics (GENE3370) Comparative Genomics I & II Prof Martha Ludwig Suggested Reading: Christin PA et al. (2010) Causes and evolutionary significance of genetic convergence. Trends in Genetics 26: 400-405 Gibbons A (2010) Close encounters of the pre-historic kind. Science 328: 680-684 Green RE et al. (2010) A draft sequence of the Neandertal genome. Science 328:710- 722 Hardison RC (2003) Comparative Genomics. PLoS Biology 1: 156-160 Paterson et al. (2010) Insights from the comparison of plant genome sequences. Annual Review of Plant Biology 1: 349-372 Treangan TJ & Rocha EPC (2010) Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genetics 7: e1001284 Genomics (GENE3370) Comparative Genomics I & II Learning Outcomes At the end of this lecture, you should be able to explain: divergence and conservation in the context of genome structure, function and evolution the information obtained from different levels of phylogenetic comparisons negative/purifying selection positive/Darwinian selection the types of genetic elements examined in comparative genetics and the information that can be gained from each synteny, conserved synteny, collinearity, evolutionary/phenotypic convergence, and convergent recruitment, and their significance in the context of genome structure, function and evolution Comparative Genomics – It’s All About Similarities and Differences In the genomes of contemporary related organisms, we see conservation of: sequences coding for proteins and functional RNAs from a last common ancestor sequences controlling the regulation of genes that have similar patterns of expression Divergence is seen between sequences that code for proteins, functional RNAs and regulatory regions responsible for differences between species Conservation of sequence implies conservation of function Recognition of orthologues and paralogues to make meaningful comparisons A page from Darwin’s notebook with what’s thought to be the first phylogenetic tree http://www.amnh.org/exhibitions/darwin/idea/treelg.php Comparative Genomics – It’s All About Similarities and Differences AND the Questions You Ask Comparisons across long phylogenetic distances, e.g. 1 billion years since separation: give information on types and numbers of genes in functional categories you e.. g can see genes involved in coding for proteins and metabolism. show little conservation of gene order and A D regulatory sequences - Comparisons across moderate distances, e.g. 70-100 million years since separation show: functional and non-functional DNA is found in conserved regions A C - functional sequences will have changed less than non-functional DNA purifying (aka negative) selection – removal of deleterious mutations Hardison (03) PLoS Biology 1: 156-160 Comparative Genomics – It’s All About Similarities and Differences AND the Questions You Ask Comparisons across short distances, e.g. 5 million years since separation give: information about what sequences are responsible for making organisms unique differences due to positive selection (aka Darwinian selection) retention of mutations that benefit an organism A = B Hardison (03) PLoS Biology 1: 156-160 Comparative Genomics – What is Compared? Repeat regions transposable elements, microsatellites Markers SNPs Non-protein coding regions RNA-only genes gene deserts Duplications whole genome segmental gene gene families Base composition, e.g. %GC content overall coding regions and non-coding regions Gene number in functional categories Favourite genes Chromosome rearrangements Gene order Synteny and Conserved Synteny syntenic relo bet. human and mouse. Synteny genes or genetic elements located on the same chromosome may or may not be linked ↳ They need to be inherited as do not a linked cluster of genes. Conserved synteny conservation of synteny of orthologous genes between two or more different organisms extent is inversely proportional to length of time since divergence from the ancestral locus Lewis et al. (02) Genome Biology 3: research0082.1 Collinearity Collinearity conservation of gene (or marker) order along a chromosomal segment in different species has · note : in much present-day usage , synteny same meaning as collinearity. · we can even see if is in the transcription the same direction. by looking at the arrows. A-E = genes or markers; X-Z = species; coloured boxes = coding regions Keller & Feuillet (00) Trends in Plant Science 5: 246-251 Synteny, Conserved Synteny and Collinearity Significance in genomes of some grasses see high conservation of synteny and collinearity knowing the genome sequence of a species with a small genome facilitates mapping and isolating genes coding for desirable traits from species with larger genomes far because Colinearity synteny genes. ↳ gives us an idea of where our gene of in in medicine loci with medical or phenotypic consequences can be recognised because of linkage to a cluster of syntenic loci Factors affecting synteny, conserved synteny and collinearity gene loss, multiple rounds of gene duplications, chromosomal rearrangements (fusions, splits, inversions, reciprocal translocations) mask sequences that have been derived from a common ancestral sequence https://www.genoscope.cns.fr/agc/website/spip.php?article588 Evolutionary Convergence related we · although we are distantly to octopus , have similar eyes. Phenotypic convergence the independent evolution of similar or identical traits in distantly related species due to selective pressures, for example: eyes echolocation in dolphins and bats particular protein properties biochemical pathways it that allows what is phenotypic poergence Can Identify the determinants leading to the independent origins of adaptive traits comparing determinants gives information on the genomic (and environmental) background in limiting or furthering an adaptive innovation “evolutionary enablers” Ogura et al. (04) Genome Research 14: 1555-1561; http://en.wikibooks.org/wiki/File:Chiroptera_echolocation.svg; http://www.dolphins- world.com/Dolphin_Echolocation.html; Lilijas & Laurberg (00) EMBO reports 1: 16-17 Evolutionary Convergence – How Does It Happen? at a molecular level Phenotypic convergence may result from: alterations of different loci shows changes in different enzymes can lead to similar phenotypes alteration of homologous genes from different taxonomic groups convergent recruitment New functions usually evolve by the modification of pre-existing genes two criteria need to be met: 1) no deleterious effect through loss of ancestral function 2) expression profiles of the genes and kinetics of the proteins they encode must be suitable At least 60 independent origins of the for new function C4 photosynthetic pathway have occurred in the flowering plants – an excellent example of convergent evolution Evolutionary Significance of Convergent Recruitment The presence of genes able to evolve a new function enhances the chances that a given group of organisms can evolve a new trait but only a few genes have the potential to make a specific phenotypic change “evolutionary enablers” The absence of these genes in other groups of organisms can hinder the acquisition of a new trait http://mcat-review.org/evolution.php Convergent Recruitment – e.g. PEPC Genes in C4 Monocot Lineages Evolution of phosphoenolpyruvate carboxylase (PEPC) gene in groups of grasses (g) and sedges (s) PEPC = enzyme catalysing first reaction of C4 photosynthesis G letters and numbers after g and s = PEPC gene lineages red and blue triangles = PEPC gene lineages important in C4 grasses and sedges and homologues that ↳ This tells us that other (4 orthologies Found: did not give rise to the 14 pathway. grasses – same lineage of PEPC gene (out of 6 lineages) used > 8 times during evolution of C4 grasses (see next slide) sedges – same lineage of PEPC gene used (distinct from that used by grasses) > 5 times during evolution of C4 sedges (see - next slide) in those that · There was something genes to predisposed the grasses and sedges evolve (4 (an evolutionary enable) · Christin et al. (10) Trends in Genetics Predisposition of these gene lineages to evolve novel adaptive 26: 400-405 function & Convergent Recruitment – e.g. PEPC Genes in C4 Monocot Lineages When there is phenotypic convergence due to convergent recruitment: may see identical substitutions in the recruited genes in different lineages of organisms e.g. C4 PEPC genes in grasses (b) and gb z sedges (c) Note: protein sequences shown likely to occur because effects of the resulting amino acid substitutions will be comparable limitations to substitutions that can occur allow emergence of functional, optimised ↳ protein example site. subs that destroy the active S ↳ Note: convergent substitutions do not explain all the convergent phenotypes seen – non-convergent (= divergent) substitutions also play a role in these adaptive changes Christin et al. (10) Trends in Genetics 26: 400-405 Comparative Genomics of Prokaryotes Smaller genomes found in organisms living in restricted environments – e.g. Nanoarchaeum equitans lives inside another archea Larger genomes in organisms living in complex habitats – e.g. Bradyrhizobium japonicum, soil bacterium that forms symbiotic relationship with plants (nitrogen-fixing root nodules) Why….? Constant environments may allow survival with fewer genes Changeable habitats e.g. soils – many genes would be required for survival even though they all may not be used all the time Pierce (08) Genetics: A Conceptual Approach Comparative Genomics of Prokaryotes – e.g. Influence of HGT and IGD on Protein Family Expansion It is ⑭-because why we see antibiotic resistance they very easily Prokaryotes show rapid adaptation take DNA from environment, their in part due to acquisition of genes by horizontal gene transfer (HGT) increases genome size in part due to intrachromosomal gene duplication (IGD) increases genome size textbookofbacteriology.net staff.jccc.net Comparative Genomics of Prokaryotes – e.g. Influence of HGT and IGD on Protein Family Expansion 110 genomes representing eight distinct clades of prokaryotes compared different genome sizes – small, average, large Searched for: paralogues – homologues acquired through IGD – identical in sequence to endogenous gene and tandemly arranged xenologues – homologues acquired through HGT – randomly arranged and showing sequence variability to endogenous gene Treangen & Rocha (11) PLoS Genetics 7(1): e1001284 Comparative Genomics of Prokaryotes – e.g. Influence of HGT and IGD on Protein Family Expansion Found: 88% - 98% of expansions due to HGT both small and large genomes larger genomes contained the most xenologues contrary to previous thinking index is a way that gene expression The codon adaptation is estimated /. measured paralogues have higher expression levels than xenologues xenologues had higher expression levels than singletons ↳ basic genetic Mechanisms , so there was no for them need highly expressed to be. Treangen & Rocha (11) PLoS Genetics 7(1): e1001284 Neandertal and Modern Human Genomes Neandertals and the ancestors of modern humans diverged 270,000 to 400,000 years ago There different are of estimates when Neandertals = sister group to modern humans Why sequence the Neandertal genome? record changes (mutations) that have become fixed or risen to high frequency in humans in the last several hundred thousand years identify genes that have been affected by positive selection since Neandertals and humans diverged Neandertals and modern humans coexisted in Europe 30,000-45,000 years ago Gibbons (10) Science 328: 680-684 Neandertal and Modern Human Genomes Majority of DNA used in the 2010 study extracted from bones of three females Vindija cave (Croatia) Other samples from bones of Neandertals in Spain Germany Russia Green et al. (10) Science 328: 680-684 More recently other ancient human genomes have been sequenced see Gibbons (2014) Science 343: 1417 Neandertal and Modern Human Genomes Ran numerous controls and checks to exclude contamination microbial DNA modern human DNA next gen sancing. - > Used NGS technologies - Assembled sequence using reference genomes chimpanzee http://en.wikipedia.org/wiki/File:Vindija_cave.jpg modern human genome (Ventner) Compared sequence with chimp Ventner South African (San) West African (Yoruba) Papua New Guinean Chinese (Han) Western European (French) “clean caves” Gibbons (10) Science 238: 680-684 Neandertal and Modern Human Genomes One-third of genome not sequenced DNA not of high enough quantity and/or quality Genomes are 99.84% identical 78 nucleotide substitutions in protein-coding genes in 300,000 years modern humans have derived state of these genes Neandertals have ancestral (chimp-like) state genes include those coding for proteins involved in skin physiology not clear what effect these changes have at the phenotype level Neandertal Early human Gibbons (10) Science 238: 680-684 Neandertal and Modern Human Genomes Identified 20 genomic regions that showed strong positive selection in modern human genomes 5 regions contained no protein-coding genes structural or regulatory genomic features? 15 regions contain 1-12 genes involved in metabolism cognitive function skeletal morphology : the last gene in genea for red box that transcription factor codes involved a thats in Osteoblasts (bone cell formation) that has a role in skeletal morphology. Neandertal Early human Gibbons (10) Science 238: 680-684 Neandertal and Modern Human Genomes Compared SNPs between Neandertals and present-day humans 2 European Americans 2 East Asians 4 West Africans diverse modern humans (French, Yoruba, Han, San, Papuan) chimps Found: Neandertals share more SNPs with Europeans and East Asians than sub-Saharan Africans, suggesting…. gene flow from Neandertals to modern humans after modern humans left Africa, but before migrating into Eurasia 1-4% of modern Eurasian genomes derived from Neandertal genomes Grass Genomes – Why Study Them Using Comparative Genomics? Grasses provide the bulk of human nutrition sustainable energy sources – biofuels · feed for animals. BUT consumption is outstripping supply Grass Genomes Three subfamilies contain major food, fodder and fuel I grass species 3 ancestor of all underwent whole genome 2 duplication (WGD; shown) lineage-specific WGDs also occurred (not shown) Whole genome sequence available for at least one species in each subfamily Brachypodium distachyon (Brachy) representative of subfamily containing barley and wheat relatively small genome for this subfamily 1/10 the size of barley and wheat · short plant Modified from Paterson et al. (09) Plant Physiology 149: 125-131; http://www.brachypodium.org numbers = million years ago Brachy Genome Compared to Other Grass Genomes – Transposable Elements Brachy genome size and gene number – similar to rice and sorghum maize is larger due to lineage-specific WGD chromosome number – half that of others Im easier to work with in the lamb. much Brachy most LTR retrotransposons located in pericentromeric regions and conserved syntenic breaks also seen in other grass genomes DNA transposons more widely distributed majority associated with gene rich regions also seen in other grass genomes STA = gene introns and satellite tandem arrays; cLTRs = complete LTRs; sLTRs solo LTRs; DNA-TEs = autonomous DNA transposons; MITES = minature inverted-repeat transposable elements; CDS = gene exons; triangles = syntenic break points Vogel et al. (10) Nature 463, 763-768 Brachy Genome Compared to Other Grass Genomes – Transposable Elements From comparative analyses on sequenced grass genomes can conclude: retrotransposon content scales with genome size for all grass genomes DNA transposon content is not correlated with genome size for all grass genomes of · The location transposing elements differs but then transposing elements also differ with respect to their content and the of genomes grasses. retrotransposon DNA transposon Devos (10) Current Opinion in Plant Biology 13: 139-145 Brachy Genome Compared to Other Grass Genomes - Conservation of Gene Families & 77% - 84% of gene families found in rice, sorghum and Brachy are shared reflects relatively recent common origin Various levels of lineage-specific genes genes for which no orthologue can be found in related species singleton = levels = grass, grass subfamily (Pooid), Brachy taxonomic obvious targets for functional analyses may be involved in distinguishing taxa & Shared Gene Families Vogel et al. (10) Nature 463, 763-768 Brachy Genome Compared to Other Grass Genomes - Conservation of Synteny · Whether these ribbons connect either w/in a chromosome or across tells where the you duplications. happened Six major duplications of chromosomal regions covering 92% of the genome originated from the ancient WGD event before grass families diverged creation of paralogues I cause win a gene family Conserved synteny between Brachy, rice, sorghum and wheat 59 blocks of collinear orthologous genes covering 99% of the Brachy genome provide a framework for Bd understanding grass genome evolution aid the assembly of sequences from other related grasses Vogel et al. (10) Nature 463, 763-768 Brachy Genome Compared to Other Grass Genomes - Collinearity Collinearity “rule” in grass genomes, especially pronounced in euchromatic regions c.f. Brachy chromosome 5 (Bd5) long arm and syntenic chromosomes from sorghum (Sb6) and rice (Os4) and the short arm…. less collinearity ~half the density of genes as the rest of the genome (also for sorghum and rice chromosomes) high retrotransposon content (also for sorghum and rice chromosomes) gaining retrotransposons unlike other parts of the genome through replication and lack of recombination Indicates maintenance of these repeat regions for ~50-70 million years Hypothesis: chromosome ancestral to Bd5 reached a “tipping point” when high retrotransposon content became deleterious to genes Vogel et al. (10) Nature 463, 763-768 Model of Grass Chromosome Evolution Present-day grass chromosomes have evolved from those of the common ancestor through WGD ancestral chromosome translocations ancestral chromosome fusions lineage-specific nested insertions colours represent regions originating from ancestor with n = 5 and an n = 12 intermediate Vogel et al. (10) Nature 463, 763-768