Lecture 4 - Human Genetics - 2024 PDF
Document Details
Uploaded by Deleted User
2024
Katerina Kiriakopulos
Tags
Summary
This lecture covers human genetic variation, focusing on positional cloning and personal genomes. It explores different types of genetic variations, including SNPs, INDELS, and microsatellites, and outlines a method for identifying disease-causing mutations using linkage analysis.
Full Transcript
Human Genetic Variation Positional cloning to personal genomes Katerina Kiriakopulos (TA) PhD Student, Maass Lab September 17, 2024 Some Key Terms haplotype Locus The place on a chromos...
Human Genetic Variation Positional cloning to personal genomes Katerina Kiriakopulos (TA) PhD Student, Maass Lab September 17, 2024 Some Key Terms haplotype Locus The place on a chromosome where a specific gene (or DNA sequence) is located. locus Allele One of the variant forms of a gene at a particular locus. Polymorphism A common variation (i.e. allele) in the sequence of DNA among individuals. allele Haplotype The genetic constitution of an individual chromosome. Contraction of the phrase “haploid genotype”. Can refer to only one locus or to an entire genome. Commonly refers to a set of alleles found to be associated on a Homologous single chromosome, or portion of a chromosome. chromosomes “Phasing” refers to the physical linkage between haplotypes along the chromosome. Founder effect Migration Loss of genetic variation when a new colony is formed by a very small number of individuals. New colony Original population www.genome.gov/glossary Outline 1. Types of genetic variation 2. Quiz 3. Human genetic diversity 4. Associating traits with genotypes 5. Personal genomes Types of variation Single Nucleotide Polymorphisms (SNPs) a.k.a SNV – “Variant” (a SNP is SNV that occurs with greater frequency in the population - remember the definition of polymorphism!) SNP is SNV but the reverse is not necessarily true ~1/1,000 bases is different between two people (0.1% of aligned bases) Philipp TTGACGTCAGTGCCGTGAC ||||||||| ||||||||| Katerina TTGACGTCACTGCCGTGAC Most are probably silent. But, many undoubtedly impact gene expression, splice variation, and protein properties. Types of variation Insertions & Deletions (INDELs) An indel occurs every 6,000 bases on average between a typical person and the reference genome Philipp TTGACGTC--TGCCGTGAC |||||||| ||||||||| Katerina TTGACGTCACTGCCGTGAC Whether a variant is an insertion or a deletion depends on the what you use as the reference. Types of variation Microsatellites a.k.a. SSLPs – “Simple Sequence Length Polymorphisms”, STSs – “Sequence Tagged Sites”, or SSRs – “Simple Sequence Repeats” Philipp GGCGTGCATGCGTTTGACCTCCTCCTCCTCCTCCTCCTTTGACCTCCACTATG Katerina GGCGTGCATGCGTTTGACCTCCTCCTCCTTTGACCTCCACTATG Andy GGCGTGCATGCGTTTGACCTCCTCCTCCTCCTTTGACCTCCACTATG Why would microsatellites provide greater resolution compared to SNPs when looking at differences between people? Types of variation Microsatellites Poll question: Why would microsatellites provide greater resolution compared to SNPs when looking at differences between people? a) More possible alleles at a microsatellite than at an SNP, so you can more easily trace them through a family. b) An SNP is higher resolution than a microsatellite because it is at a single base position. c) A microsatellite includes multiple variants. Types of variation Microsatellites a.k.a. SSLPs – “Simple Sequence Length Polymorphisms”, STSs – “Sequence Tagged Sites”, or SSRs – “Simple Sequence Repeats” Philipp GGCGTGCATGCGTTTGACCTCCTCCTCCTCCTCCTCCTTTGACCTCCACTATG Katerina GGCGTGCATGCGTTTGACCTCCTCCTCCTTTGACCTCCACTATG Andy GGCGTGCATGCGTTTGACCTCCTCCTCCTCCTTTGACCTCCACTATG Why would microsatellites provide greater resolution compared to SNPs when looking at differences between people? you are more able to trace it back Hypothetical segregation in a pedigree: SNP vs. microsatellite Adapted from Puliti et al. Pediatr. Nephrol. 2007 Types of variation Structural Variants (SVs) A variant that affects “a region of DNA approximately 1kb and larger in size” (ncbi.nlm.nih.gov) Reference A B C D Some SVs are Copy Number Insertion A B C X D Variants (CNVs) - variants that change the number of occurrences Deletion A B D (copies) of a region of DNA in the Duplication A B C C D genome Inversion A B D C Deletions and duplications are CNVs Translocation A S T D A typical genome has 2,100 - 2,500 SVs, which affect ~20 Mb of the genome. The first personal genome? Individuals are quite different from the reference. the number of gains and losses in the genome evens out there are minor effects when there is a mutation in the non-coding region Quiz 3 Minutes Human Polymorphisms What are these good for? (1) They make people different from each other! (2) Data about polymorphisms (allele frequencies, haplotypes) can be used to: i. Provide insights into human history ii. Associate genotype with phenotype Allele frequencies are different between populations Both alleles occur in all populations, but Derived allele is only seen in some with variation in the frequency populations this is less common this is more common Human Genome Diversity Project Browser Data Genetic diversity within populations Most genetic variation is within each population, not between populations. Most variants are found in populations on more than one continent. An average population from anywhere in the world includes 85% of all common human variation at autosomal loci. in an individual perspective, we are different but in terms of populations, we are more or less the same 1000 Genomes Project Consortium (2015) Nature 526:68–74 Owens K, King MC. Science (1999) 286:451-3 Genetic diversity within populations genome (millions) Variant sites per African populations are genetically more diverse than any other population. Why? How does this relate to the founder effect? 1000 Genomes Project Consortium (2015) Nature 526:68–74 Genetic diversity within populations Poll question: African populations are genetically more diverse than any other population. Why? How does this relate to the founder effect? a subset of the population from Africa moved to a different continent —> the founder effect a) According to the out-of-Africa theory, life started in Africa and as humans migrated, variation was lost as new populations were settled. b) The African populations are larger. c) Particular alleles are favourable to certain areas, so variation was lost as people adapted to new environments. Haplotypes haplotype Contraction of the phrase “haploid genotype” locus Can refer to only one locus or to an entire genome. Commonly refers to a set of alleles found to be associated or inherited together on a single chromosome, or portion of a chromosome. allele The association between a mutation and a haplotype can only be disrupted by mutation or recombination. Homologous chromosomes Haplotypes maternal haplotypes paternal haplotypes A G T A G T A T C G G C A T C A G C recombination during meiosis A G T can generate new G G T haplotypes A G T G G T Recombination Hotspots Recombination preferentially occurs at certain sites in the genome (recombination hotspots) PRDM9 - ZnF protein found to bind to a consensus motif at hotspots this motif allows for recombination to happen BUT recombination can happen in anywhere but most often in those hotspots Hochwagen & Marais Current Biology (2010) Linkage disequilibrium Linkage disequilibrium (LD): “non-random association of alleles at two or more loci in a general population”*. Result of decreased recombination between alleles If you have two SNP loci with two alleles each: SNP locus 1 SNP locus 2 A or G C or T 4 possible haplotypes: AC, AT, GC and GT Haplotypes in a population Locus 1 Locus 2 A T A T A T Are the alleles G and C in LD in this A T population? G C yes because it is because GC never separate here G C G C G C *Goode E.L. (2011) Linkage Disequilibrium. In: Schwab M. (eds) Encyclopedia of Cancer. Springer, Berlin, Heidelberg Linkage disequilibrium Linkage disequilibrium (LD): “non-random association of alleles at two or more loci in a general population”*. Result of decreased recombination between alleles If you have two SNP loci with two alleles each: SNP locus 1 SNP locus 2 A or G C or T 4 possible haplotypes: AC, AT, GC and GT Haplotypes in a population Locus 1 Locus 2 A T A T G T Are the alleles G and C in LD in this G T population? A C here they are not as G can also join with T in here A C G C G C *Goode E.L. (2011) Linkage Disequilibrium. In: Schwab M. (eds) Encyclopedia of Cancer. Springer, Berlin, Heidelberg The Structure of Haplotype Blocks in the Human Genome Gabriel et al., Science 296:2225 (2002). Main findings: Characterized haplotype patterns across 51 autosomal regions (spanning 13Mb & containing almost 4000 SNPs) of the human genome in individuals from Africa, Europe, and Asia. “We show that the human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed”. i.e. there are recombination hotspots in the human genome; variants in between are linked more often than would be expected by chance (they are in linkage disequilibrium (LD)) Haplotype Blocks Strong LD - no historical recombination Uninformative pair Strong evidence for historical recombination haplotypes TGCAGTTCGCT TCCTC Possible TGCAGTACGCT TG ATGCT whites —> recombination hotspots CAGTCGCTAAC GA GTTAT Recombination “hotspots” between haplotype blocks 22 variants This square represents allelic association between variants “38” and “39” Gabriel et al. Science (2002): Supplemental Fig. 2A. The structure of haplotype blocks in the human genome - Important takeaways Only the 7th SNP would have to because it is the defining feautre of the block be genotyped to differentiate The average size of a haplotype block is between all the haplotypes in the 1st block. between 11 and 22 kb. Size varies. haplotypes TGCAGTTCGCT TCCTC Possible Higher genetic diversity is found among African TGCAGTACGCT TG ATGCT populations (founder effect) CAGTCGCTAAC GA GTTAT - smaller haplotype blocks with more possible haplotypes within the blocks BUT variation is shared across populations Recombination “hotspots” between - 51% of haplotypes are found in all haplotype blocks populations, 72% are found in 2 of 3 populations If we know all the haplotypes, we don’t have to genotype every SNP in haplotype block, and can instead infer SNPs by genotyping a set of tag SNPs → “HapMap” project Gabriel et al. Science (2002) Mapping Mendelian traits haplotype encompasses the haplotype block Linkage analysis Mendelian trait: characteristic caused by a single genetic locus; > 5,000 known heritable diseases/traits - see “Online Mendelian Inheritance in Man”: www.ncbi.nlm.nih.gov/omim Disease phenotype B2 is common in all diseased individuals Disease linked Disease linked allele haplotype Arch Neurol. 1999;56(6):667-672. doi:10.1001/archneur.56.6.667 Mapping Mendelian traits Traditional method for cloning a disease gene Step 1. Get some families and establish the inheritance pattern. Step 2. Check karyotypes, and measure polymorphisms across the entire genome in all individuals from the families. Genotype individuals Step 3. Determine which polymorphisms follow the disease. Note that the score LOD polymorphisms are not causing the disease, they are just linked. The probability that a genetic marker and a variants trait (or another genetic marker) are located near each other is called the LOD (log of candidate region odds) score. Step 4. Start looking for mutations in genes affected individuals in the area of the ACGGGTTCGCATCGCATGCAGCTCGCCGTAG polymorphisms. Disease causing mutation Linkage vs. Association (Genome Wide Association Studies) Linkage: in a family, you know the “path” the Unaffected (Control) chromosomes took. So you are identifying the part of a chromosome where the disease gene/mutation lies. Good for Mendelian disease. Association: Doesn’t require relationships among people, just two groups of people: case and control. Assumes that common genetic variants cause phenotypes in a population (or Affected (Case) that the common variants are linked to the causal variants). Good for common & polygenic disease. Variants associated with trait height is the significance level, each dot is each SNP in the genome Asparagus anosmia GWAS results This study used DNA microarrays (SNP chips) Fig. 2 to genotype a large cohort of unrelated subjects Using these genotypes Fig. 8 and information provided by 23andMe customers they were able to find novel associations Red/blue circles represent genotyped SNPs. Magenta/green squares represent “imputed” SNPs, i.e. SNPs that are inferred (not genotyped) based on known Figure 8. Bayes factors for genotyped and imputed SNPs for asparagus anosmia around OR2M7. haplotypes (see haplotype block slides) Caveats of GWAS Measuring common SNPs and frequent haplotypes - rare events are not well captured you need yo increase your study popu;ation size in order to catch the rare events Structural variation (CNVs, inversions, etc.) may NOT be captured Difficult to pinpoint responsible genes Validation is required using follow-up experiments (such as MPRAs!) Small proportion of risk explained: Explanation #1: We don’t know all the genetic variants. Explanation #2: There are complex interactions among the known variants. - This point is quite technical and has to do with how you measure heritability - see “Phantom Heritability” papers, also future lectures on genetic interactions BUT it does give clues of the genes and pathways that underlie complex disease Manolio, Teri A., et al. "Finding the missing heritability of complex diseases." Nature 461.7265 (2009): 747-753. We don’t know all the genetic variants… Solution: Personal Genomes! There are a very large number of less-frequent polymorphisms, and almost certainly a large pool of human genetic variation that will be inaccessible until we sequence everyone – and even then, it will be very difficult to know what are the phenotypic consequences, if any, of any of this variation! Summary Questions 1. Haplotype blocks are regions within which recombination rates are highest. a) True b) False 2. What is TRUE about linkage and association studies: a) Association studies can identify which specific gene in a haplotype block causes the phenotype. b) Linkage studies do not require relationships between people. c) GWAS can find variants associated with asparagus anosmia. 3. Individuals from which continental group have the greatest number of variant sites (i.e. most variation)? a) East Asia b) Africa c) Europe