Quantitative Trait Locus (QTL) Analysis and Genome-Wide Association Studies (GWAS) PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This presentation covers quantitative trait locus (QTL) analysis and genome-wide association studies (GWAS). It includes learning objectives, different approaches for detecting QTL, and the significance of sample size in analyses. The data provided focuses on genetics and biological concepts, potentially useful for an undergraduate-level biology course.
Full Transcript
Quantitative Trait Locus (QTL) Analysis and Genome-Wide Association Studies (GWAS) Learning Objectives: QTL Analyses Describe a Quantitative Trait and what makes it different from a Qualitative Trait. Describe three approaches to detecting Quantitative Trait Lo...
Quantitative Trait Locus (QTL) Analysis and Genome-Wide Association Studies (GWAS) Learning Objectives: QTL Analyses Describe a Quantitative Trait and what makes it different from a Qualitative Trait. Describe three approaches to detecting Quantitative Trait Loci and their strengths and weaknesses. Describe an Important Statistical Issue with QTL Analyses. Describe ways that Sample Size affects the success of a QTL Analysis. Learning Objectives: QTL Analyses Describe a Quantitative Trait and what makes it different from a Qualitative Trait. Describe three approaches to detecting Quantitative Trait Loci and their strengths and weaknesses. Describe an Important Statistical Issue with QTL Analyses. Describe ways that Sample Size affects the success of a QTL Analysis. Phenotypic Variation Phenotypic Variation within Species Phenotypic Variation within Species Phenotypic Variation within Species Many of these traits are Quantitative, rather than Qualitative. Qualitative traits a.k.a Discrete traits Many of these traits are Quantitative, rather than Qualitative. Quantitative (complex) trait: A trait determined by many genes, almost always interacting with environmental influences. Continuous variation, often following a normal distribution. Sequence Variation GTCGTTATCGTCGATATGGTAGGCTTAGCTAGGCTACCCGTCCTAC GTCGTTAACGTCGATACGGTAGGCTTAGCTAGGCTACCCGTCTTAC GTCGTTAACGTCGATACGGTAGGCTTAGCTAAGCTACCCGTCCTAC GTCATTAACGTCGATACGGTAGGCTTAGCTAGGCTACCCGTCCTAC GTCGTTAACGTCGATACGGTAGGCTTAGCTAGGCTACCCGTCCTAC GTCGTTAACGTCGATACGGTAGGTTTAGCTAGGCTACCCGTCCTAC GTCGTTATCGTCGATACGGTAGGCTTAGCTAGGCTACCCGTCCTAC GTCGTTAACGTCGATACTGTAGGCTTAGCTAGGCTACCCATCCTAC On average, any two humans differ at ~3 million SNPs! What are the SNPs that are responsible for Variation in a Quantitative Trait? GENOTYPE PHENOTYPE BLACK BOX of Quantitative Genetics Quantitative Genetics The genetic analysis of quantitative traits. Statistical descriptions of the relationship between genotype and phenotype. Much of the Variation in a Quantitative Trait can be Explained by Effects of Multiple Genes. 0.6 Aa x Aa 0.5 AA = 2 Aa = 1 0.4 Frequency aa = 0 0.3 0.2 0.1 0 A a 0 1 2 A AA Aa Phenotype a Aa aa Much of the Variation in a Quantitative Trait can be Explained by Effects of Multiple Genes. 0.4 0.35 AaBb x AaBb AABB = 2 0.3 AABb = 1.5 0.25 AAbb = 1 Frequency AaBB = 1.5 0.2 AaBb = 1 0.15 Aabb = 0.5 0.1 aaBB = 1 0.05 aaBb = 0.5 aabb =0 0 0 0.5 1 1.5 2 Phenotype Much of the Variation in a Quantitative Trait can be Explained by Effects of Multiple Genes. 0.4 0.35 AaBb x AaBb AABB = 2 0.3 AABb = 1.5 0.25 AAbb = 1 Frequency AaBB = 1.5 0.2 AaBb = 1 0.15 Aabb = 0.5 0.1 aaBB = 1 0.05 aaBb = 0.5 aabb =0 0 0 0.5 1 1.5 2 Phenotype Much of the Variation in a Quantitative Trait can be Explained by Effects of Multiple Genes. 0.35AaBbCc x AaBbCc AA, Aa, aa 0.3 BB, Bb, bb 0.25 CC, Cc, cc 0.2 0.15 Frequency 0.1 0.05 0 0 33 67 1 33 67 2 33 66 33 66 3 3 66 33 66 3 3 66 3 3 66 33 66 33 66 3 33 666 333 6 66 33 66. 3. 6 0. 0. 1 1 Phenotype Much of the Variation in a Quantitative Trait can be Explained by Effects of Multiple Genes. 0.35AaBbCc x AaBbCc AA, Aa, aa Multiple 0.3 BB, Bb, bb genotypes 0.25 CC, Cc, cc 0.2 can produce 0.15 the Frequency AA Bb cc 0.1 same AA bb Cc 0.05 phenotype. Aa BB cc 0 0 3 7 1 3 7 2 Aa Bb Cc 33 3 666 33 3 666 3 6 3 6 Aa bb CC 33 66 3 3 66 3 3 6 6 33 6 6 aa BB Cc 3 6 3 6 33 66 33 66 aa Bb CC 33 6 6. 33. 66 3 6 0. 0. 1 1 Phenotype Environmental Influences also Cause Variation in a Quantitative Trait. The same genotype can produce multiple phenotypes. The Motivation aa 0.07 Population 1 0.06 0.05 0.04 Frequency 0.03 0.02 0.01 0 -40 -30 -20 -10 0 10 20 30 40 Trait Value 0.07 The Motivation Population 1 Population 2 0.06 0.05 0.04 Frequency 0.03 0.02 0.01 0 -40 -30 -20 -10 0 10 20 30 40 Trait Value Ge ne GENOTYPE tic R = h 2S PHENOTYPE Va ria nc e i l i ty i tab s e r H i c ion QT e t t n l a L e re G or C BLACK BOX of Quantitative Genetics Quantitative Trait Locus (QTL) A genetic locus associated with variation in a complex trait. QTL are identified through the statistical analysis of complex traits. These traits are typically affected by more than one locus and by the environment. (From: Doerge. 2002. Nature Reviews Genetics. 3:43-51) Learning Objectives: QTL Analyses Describe a Quantitative Trait and what makes it different from a Qualitative Trait. Describe three approaches to detecting Quantitative Trait Loci and their strengths and weaknesses. Describe an Important Statistical Issue with QTL Analyses. Describe ways that Sample Size affects the success of a QTL Analysis. 0.07 Mouse Multiple 0.06 0.05 P0 0.04 Frequency 0.03 Sclerosis 0.02 0.01 0 -40 -30 -20 -10 0 10 20 30 40 Trait Value 0.007 Model 0.006 0.005 Frequency 0.004 F1 0.003 0.002 0.001 0 0 200 400 600 800 Growth Rate 0.007 0.006 F2 intercross 0.005 Frequency 0.004 0.003 0.002 0.001 0 0 200 400 600 800 Growth Rate Mouse Genome 100+ Markers Mouse Genome Unlikely that a causative SNP will be genotyped as marker. QTL Analysis relies on Linkage Disequilibrium between genotyped markers and unknown causative Linkage Disequilibrium is SNPs. the tendency of alleles at 100+ nearby genes to be inherited as a group (a haplotype). Markers The genotype of the Causative SNP is inferred from the genotypes of neighboring markers. Single Marker Analysis We can attempt to associate phenotype with genotype by testing MARKERS (Loci) one at a time. Is there a significant difference in Mean Phenotype among MARKER genotypes? Single Marker Analysis Test for associations between Genotype and Phenotype using t-tests, ANOVA, or Linear Regression. Tome of Statistical Sorcery Single Marker Analysis 20 15 AA 10 y = b 0 + b 1x 5 Aa Trait value 0 -5 -10 aa Additive effect -15 -20 0 1 2 Number of A alleles Additive effect: change in trait caused by substituting an A allele for an a. Single Marker Analysis No Closed Form Estimate of QTL location Estimates of QTL effect and QTL location 0.07 0.07 0.06 0.06 0.05 0.05 Frequency Frequency 0.04 0.04 are confounded 0.03 0.03 0.02 0.02 0.01 0.01 0 0 -40 -30 -20 -10 0 10 20 30 40 -40 -30 -20 -10 0 10 20 30 40 Tr ait V alue Tr ait V alue MARKER Close gene with MARKER Distant gene small effect with large effec Interval Mapping Developed by Lander and Botstein (1989) Uses the genetic map Estimates the likelihood of a QTL at intervals between markers in small increments. Interval Mapping Developed by Lander and Botstein (1989) Uses the genetic map Estimates the likelihood of a QTL at intervals between markers in small increments. A A A A A AA A Interval Mapping Developed by Lander and Botstein (1989) Uses the genetic map Estimates the likelihood of a QTL at intervals between markers in small increments. AA A A A A A A A A a aa a a a a a aa Interval Mapping Tests for a single QTL at each increment What about MULTIPLE QTL? Composite Interval Mapping Controls for the effect of other QTL by using other markers as cofactors in the model. = b0 + b1x1 + b2x2 + b3x3 + …. + bkx Genotype Genotypes at location at other of interest locations of possible QTL 2 2 1.5 1.5 1 Phenotype 1 0.5 0 Phenotype 0.5 0 -0.5 -0.5 -1 -1 -1.5 -1.5 -2 0 1 2 -2 0 1 2 Genotype (number of A alleles) Genotype (number of A alleles) Experimental Crosses X P0 F2 Intercross A aa Aa A AAAAAAAaaaaaaaaaaaAAAAA X F1 X X X X X X X AAaaaaaaaaaaAAAAAAAAAAA X Interchromoso mal Recombination Intrachromoso mal Experimental Crosses X P0 F2 Intercross Strengths: Greater F1 X combinations of alleles in offspring Weaknesses: Challenging to track recombination in two parents Experimental Crosses X P0 F2 Intercross Backcross Strengths: Easier to track recombination F1 X in one parent Weaknesses: Cannot detect loci if alleles from the backcross line are dominant Experimental Crosses X P0 F2 Intercross Backcross Recombinant X F1 Inbred Lines F2 X X … X Increases Recombination … RIL01 RIL02 RIL50 Experimental Crosses X P0 F2 Intercross Backcross Limitation of all QTL approaches: X F1 Recombinant Inbred Lines Genetic variation is limited X X is… to what X inFthe parents present 2 from the P0 Generation. Increases Usually not an issue if Recombination the goal is to … those two populations determine why RIL01 RIL02 are different. RIL50 Learning Objectives: QTL Analyses Describe a Quantitative Trait and what makes it different from a Qualitative Trait. Describe three approaches to detecting Quantitative Trait Loci and their strengths and weaknesses. Describe an Important Statistical Issue with QTL Analyses. Describe ways that Sample Size affects the success of a QTL Analysis. Statistical Issues in QTL Analyses Statistical Issues: Assessing Significance: Single Marker Tests Assessing Significance: Single Marker Tests For a given test, if we set the probability of observing an association between Genotype and Phenotype That is, we have a 0.05 chance of the by chance at α = 0.05: appearance of an association between Genotype and Phenotype when there actually isn’t one. Assessing Significance: Single Marker Tests For a given test, if we set the probability of observing an association between Genotype and Phenotype by chance at α = 0.05: Then the probability of NOT observing at least one spurious association is 1 – α = 0.95. Assessing Significance: Single Marker Tests If we perform 2 statistical tests, what is the probability of observing at least one spurious association? There are 3 ways this could happen: – The first marker has a spurious association – The second marker has a spurious 0.05 * 0.95 association 0.95 * 0.05 – Both markers have spurious associations 0.05 * 0.05 P(at least 1) = 1 – P(no spurious associations) P(no spurious at 2 tests) = (1 – α) * (1 – α) Assessing Significance: Single Marker Tests If we perform 2 statistical tests, what is the probability of observing at least one spurious association? P(at least 1) = 1 – P(no spurious assoc.) 1 – (1 – α)2 = 1 – 0.952 = 1 – 0.9025 = 0.0975 Probability that none of the markers have a spurious association when there is actually no association. Assessing Significance: Single Marker Tests If we perform 2 statistical tests, what is the probability of observing at least one spurious Probability of at least one spurious association? association P(at least in 21) =1 tests – P(no when there spurious is actually no assoc.) association. 1 – (1 – α)2 = 1 – 0.952 = 1 – 0.9025 = 0.0975 Probability that none of the markers have a spurious association when there is actually no association. Assessing Significance: Single Marker Tests If we perform 10 statistical tests, the probability of observing at least one spurious association is: 1 – (1 – α)10 = 1 – 0.9510 = 1 – 0.599 = 0.401 Probability that none of the markers have a spurious association when there is actually no association. Assessing Significance: Single Marker Tests If we perform 10 statistical tests, the probability of observing at least one spurious association Probability of at least oneis: spurious association in 10 tests when there is actually no association. 1 – (1 – α)10 = 1 – 0.9510 = 1 – 0.599 = 0.401 Probability that none of the markers have a spurious association when there is actually no association. Assessing Significance: Single Marker Tests For 100 Markers, the probability of observing at least one spurious association is: Probability of at least one spurious association 1 – (1 – α)100 = 1 – 0.95100 = 0.994 Probability of no spurious associations Assessing Significance Traditional statistics tables (e.g. t, F, 2) don’t work for this type of analysis Instead we can make our own distributions from the data using PERMUTATION TESTS Assessing Significance PERMUTATION TESTS Original Data Markers: 1 2 3 4 5 6 7 8 9 10 11 12 Individual Phenotype1 AA AA Aa Aa Aa Aa aa aa aa aa aa aa 20.5 Individual 2 aa aa aa aa aa aa aa aa aa Aa Aa Aa 685.1 Individual 3 AA AA AA AA AA Aa Aa Aa Aa aa aa aa 879.7 Individual 4 aa aa aa aa aa aa aa aa aa aa aa aa 362.0 Individual 5 aa aa aa Aa Aa Aa Aa AA AA AA AA AA 428.9 Individual 6 Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa 28.5 Individual 7 aa aa aa Aa Aa Aa Aa Aa AA AA AA AA 434.0 Individual 8 AA Aa Aa Aa Aa aa aa aa aa aa aa aa 772.3 Individual 9 aa aa aa Aa Aa Aa AA AA AA AA AA AA 965.1 Individual 10 AA AA AA AA AA AA AA AA AA AA AA AA 212.6 Individual 11 AA AA AA Aa Aa Aa Aa Aa Aa Aa Aa Aa 850.5 : : : : Individual 3,000 aa aa aa Aa Aa Aa Aa AA AA AA AA211.9 AA Assessing Significance Then calculate the test PERMUTATION statistic Permutation #1 TESTS Markers: 1 2 3 4 5 6 7 8 9 10 11 12 Individual Phenotype1 AA AA Aa Aa Aa Aa aa aa aa aa aa aa 965.1 Individual 2 aa aa aa aa aa aa aa aa aa Aa Aa Aa 434.0 Individual 3 AA AA AA AA AA Aa Aa Aa Aa aa aa aa 20.5 Individual 4 aa aa aa aa aa aa aa aa aa aa aa aa 212.6 Individual 5 aa aa aa Aa Aa Aa Aa AA AA AA AA AA 211.9 Individual 6 Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa 850.5 Individual 7 aa aa aa Aa Aa Aa Aa Aa AA AA AA AA 772.3 Individual 8 AA Aa Aa Aa Aa aa aa aa aa aa aa aa 28.5 Individual 9 aa aa aa Aa Aa Aa AA AA AA AA AA AA 879.7 Individual 10 AA AA AA AA AA AA AA AA AA AA AA AA 362.0 Individual 11 AA AA AA Aa Aa Aa Aa Aa Aa Aa Aa Aa 428.9 : : : : Individual 3,000 aa aa aa Aa Aa Aa Aa AA AA AA AA685.1 AA Assessing Significance Then calculate the test PERMUTATION statistic Permutation #2 TESTS Markers: 1 2 3 4 5 6 7 8 9 10 11 12 Individual Phenotype1 AA AA Aa Aa Aa Aa aa aa aa aa aa aa 212.6 Individual 2 aa aa aa aa aa aa aa aa aa Aa Aa Aa 850.5 Individual 3 AA AA AA AA AA Aa Aa Aa Aa aa aa aa 434.0 Individual 4 aa aa aa aa aa aa aa aa aa aa aa aa 28.5 Individual 5 aa aa aa Aa Aa Aa Aa AA AA AA AA AA 428.9 Individual 6 Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa 879.7 Individual 7 aa aa aa Aa Aa Aa Aa Aa AA AA AA AA 362.0 Individual 8 AA Aa Aa Aa Aa aa aa aa aa aa aa aa 965.1 Individual 9 aa aa aa Aa Aa Aa AA AA AA AA AA AA 20.5 Individual 10 AA AA AA AA AA AA AA AA AA AA AA AA 685.1 Individual 11 AA AA AA Aa Aa Aa Aa Aa Aa Aa Aa Aa 211.9 : : : : Individual 3,000 aa aa aa Aa Aa Aa Aa AA AA AA AA772.3 AA Assessing Significance Then calculate the test PERMUTATION statistic TESTS Permutation #10,00 Markers: 1 2 3 4 5 6 7 8 9 10 11 12 Individual Phenotype1 AA AA Aa Aa Aa Aa aa aa aa aa aa aa 772.3 Individual 2 aa aa aa aa aa aa aa aa aa Aa Aa Aa 965.1 Individual 3 AA AA AA AA AA Aa Aa Aa Aa aa aa aa 211.9 Individual 4 aa aa aa aa aa aa aa aa aa aa aa aa 362.0 Individual 5 aa aa aa Aa Aa Aa Aa AA AA AA AA AA 685.1 Individual 6 Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa 212.6 Individual 7 aa aa aa Aa Aa Aa Aa Aa AA AA AA AA 879.7 Individual 8 AA Aa Aa Aa Aa aa aa aa aa aa aa aa 850.5 Individual 9 aa aa aa Aa Aa Aa AA AA AA AA AA AA 28.5 Individual 10 AA AA AA AA AA AA AA AA AA AA AA AA 434.0 Individual 11 AA AA AA Aa Aa Aa Aa Aa Aa Aa Aa Aa 20.5 : : : : Individual 3,000 aa aa aa Aa Aa Aa Aa AA AA AA AA428.9 AA Assessing Significance PERMUTATION TESTS Probability of at least one spurious Frequency in the Permutations association in the entire genome. Experiment-wise P = 0.05 Test statistic Value Assessing Significance PERMUTATION TESTS Hold the genetic data constant, randomly assign phenotypes, calculate the test statistics. Repeat (1,000 times or more) We get the probability of getting a certain value for the test statistic at random Characteristics of Individual QTL Additive effect of each QTL is estimated using the slope of the regression. Body Size QTL in Rainbow Trout QTL Linkage Group Position LOD a r2 FL-1 R7b 22.7 (acgatg14) 2.69 1.77 0.159 FL-2 R16 0.01 (aagcct06) 4.51 1.43 0.171 FL-3 R17b 0.01 (agcaag315a) 5.8 –1.63 0.232 Additive effect Characteristics of Individual 20 QTL 15 AA 10 5 Aa Additive Trait value 0 effect: -5 change in -10 aa trait caused -15 by -20 substituting 0 1 an 2 A for an Number of A alleles a. Characteristics of Individual QTL Each QTL explains a certain proportion of the phenotypic variation in the trait Proportion of Variation Explained Body Size QTL in Rainbow Trout QTL Linkage Group Position LOD a r2 FL-1 R7b 22.7 (acgatg14) 2.69 1.77 0.159 FL-2 R16 0.01 (aagcct06) 4.51 1.43 0.171 FL-3 R17b 0.01 (agcaag315a) 5.8 –1.63 0.232 Additive effect Characteristics of Individual 20 QTL 15 AA 10 5 Aa Trait value 0 R-squared (r2): -5 aa proportion of the -10 -15 phenotypic -20 variation 0 1 explained 2 by the Number of A alleles locus. This QTL explains a relatively Characteristics of Individual 20 QTL 15 AA 10 5 Aa Trait value 0 R-squared (r2): -5 aa proportion of the -10 -15 phenotypic -20 variation 0 1 explained 2 by the Number of A alleles locus. This QTL explains a lower % Characteristics of Individual QTL Each QTL explains a certain proportion of the phenotypic variation in the trait Proportion of Variation Explained Body Size QTL in Rainbow Trout QTL Linkage Group Position LOD a r2 FL-1 R7b 22.7 (acgatg14) 2.69 1.77 0.159 FL-2 R16 0.01 (aagcct06) 4.51 1.43 0.171 FL-3 R17b 0.01 (agcaag315a) 5.8 –1.63 0.232 Additive effect At each location, we test for real phenotypic differences between 0.07 genotypes. 20 15 AA 0.06 Aa aa 0.05 10 5 0.04 Aa 0 Trait value Frequency 0.03 -5 0.02 -10 aa Additive effect 0.01 -15 0 -20 -40 -30 -20 -10 0 10 20 30 40 Trait Value 0 1 2 Number of A alleles Learning Objectives: QTL Analyses Describe a Quantitative Trait and what makes it different from a Qualitative Trait. Describe three approaches to detecting Quantitative Trait Loci and their strengths and weaknesses. Describe an Important Statistical Issue with QTL Analyses. Describe ways that Sample Size affects the success of a QTL Analysis. Sample size is a major determinant of the power of a QTL Analysis! Sample size is a major determinant of the power of a QTL Analysis! Log 10 Sample Size F2 Intercross Backcross Additive Effect of QTL standardized to its standard deviation Mackay, et al. 2009. The genetics of quantitative traits: challenges and prospects. Nature Reviews Genetics Sample size is a major determinant of the power of a QTL Analysis! 31,623 10,000 Sample Size 3,162 1,000 F2 Intercross 316.2 Backcross 100 Additive Effect of QTL standardized to its standard deviation Mackay, et al. 2009. The genetics of quantitative traits: challenges and prospects. Nature Reviews Genetics Sample size is a major determinant of the power of a QTL Analysis! Log 10 Sample Size q = 0.1 Minor q = 0.25 Allele Freq q = 0.5 Additive Effect of QTL standardized to its standard deviation Mackay, et al. 2009. The genetics of quantitative traits: challenges and prospects. Nature Reviews Genetics Sample size is a major determinant of the power of a QTL Analysis! 1,000,000 100,000 Sample Size 10,000 q = 0.1 Minor q = 0.25 Allele 1,000 Freq q = 0.5 100 Additive Effect of QTL standardized to its standard deviation Mackay, et al. 2009. The genetics of quantitative traits: challenges and prospects. Nature Reviews Genetics Sample size is a major determinant of the power of a QTL Analysis! QTL with low effect may not be detected. Not enough recombination events to differentiate closely linked loci. Sample size is a major determinant of the power of a QTL Analysis! QTL with low effect may not be detected. Not enough recombination events to differentiate closely linked loci. Sample size is a major determinant of the power of a QTL Analysis! QTL with low effect may not be detected. Not enough recombination events to differentiate closely linked loci. In experiments with low sample size, Additive effects of detected QTL are inflated. QTL Punchlines QTL Punchlines We need a large sample size. (hundreds to thousands of individuals) We need to screen large numbers of markers. (thousands to millions) QTL Punchlines Does not require mutagenesis; can detect natural variation affecting a trait. Genetic variation is limited to what is present in the original parents. Relies on Linkage Disequilibrium between genotyped markers & unknown causative SNPs. Requires a Genetic (Linkage) Map for analysis. Sensitivity is limited by the number of recombination events (Sample Size) and the Effect Size of the QTL. Requires specialized statistical methods to minimize false positives. Experimental Crosses X P0 F2 Intercross Strengths: Greater F1 X combinations of X X X X alleles in offspring Weaknesses: X X X Challenging to track X recombination in two parents Experimental Crosses X P0 F2 Intercross Backcross Strengths: Easier to track recombination F1 X in one parent Weaknesses: X X Cannot detect loci X X if alleles from the backcross line are dominant Experimental Crosses X P0 F2 Intercross Backcross Recombinant X F1 Inbred Lines F2 Strengths: X X … X Increases Recombination More precise location Weaknesses: RILs are more Requires … homozygous rearing space Each RIL population RIL01 RIL02 RIL50