Podcast
Questions and Answers
What is the primary purpose of quality control in the experimental workflow?
What is the primary purpose of quality control in the experimental workflow?
Which technique is NOT mentioned as a method to collect genotypic data?
Which technique is NOT mentioned as a method to collect genotypic data?
Which step involves using matched reference populations to estimate untyped genotypes?
Which step involves using matched reference populations to estimate untyped genotypes?
What type of models can be used in genetic association tests?
What type of models can be used in genetic association tests?
Signup and view all the answers
What is one of the purposes of using biobanks or repositories in data collection?
What is one of the purposes of using biobanks or repositories in data collection?
Signup and view all the answers
Which strategy is implemented to correct for confounders during genetic association tests?
Which strategy is implemented to correct for confounders during genetic association tests?
Signup and view all the answers
What is the outcome of the genetic association tests intended to inspect?
What is the outcome of the genetic association tests intended to inspect?
Signup and view all the answers
At which stage in the quality control process are bad single-nucleotide polymorphisms (SNPs) deleted?
At which stage in the quality control process are bad single-nucleotide polymorphisms (SNPs) deleted?
Signup and view all the answers
What makes genetic associations difficult to interpret across different ancestries?
What makes genetic associations difficult to interpret across different ancestries?
Signup and view all the answers
Which of the following is NOT a step in the experimental workflow of GWAS?
Which of the following is NOT a step in the experimental workflow of GWAS?
Signup and view all the answers
What should be cautionarily differentiated from genetic association?
What should be cautionarily differentiated from genetic association?
Signup and view all the answers
Why may GWAS results be limited in their utility for drug development?
Why may GWAS results be limited in their utility for drug development?
Signup and view all the answers
Linkage Disequilibrium (LD) is best described as what?
Linkage Disequilibrium (LD) is best described as what?
Signup and view all the answers
Which step of GWAS involves using haplotype phasing?
Which step of GWAS involves using haplotype phasing?
Signup and view all the answers
What is a common limitation of GWAS associated with different ancestries?
What is a common limitation of GWAS associated with different ancestries?
Signup and view all the answers
What is a key challenge in interpreting GWAS results?
What is a key challenge in interpreting GWAS results?
Signup and view all the answers
What type of analytical tool is PLINK used for?
What type of analytical tool is PLINK used for?
Signup and view all the answers
What is a key requirement for external replication of results in a GWAS?
What is a key requirement for external replication of results in a GWAS?
Signup and view all the answers
Which of the following is a purpose of GWAS?
Which of the following is a purpose of GWAS?
Signup and view all the answers
What type of analysis is included in post-GWAS analysis?
What type of analysis is included in post-GWAS analysis?
Signup and view all the answers
What are the two commonly used plot types for visualizing GWAS results?
What are the two commonly used plot types for visualizing GWAS results?
Signup and view all the answers
Which of the following statements best reflects a limitation of GWAS?
Which of the following statements best reflects a limitation of GWAS?
Signup and view all the answers
What is the significance level associated with the P value of $5 imes 10^{-8}$ in genetic association studies?
What is the significance level associated with the P value of $5 imes 10^{-8}$ in genetic association studies?
Signup and view all the answers
Which method is NOT used in in silico analysis of GWAS?
Which method is NOT used in in silico analysis of GWAS?
Signup and view all the answers
Which publication discusses the benefits and limitations of GWAS?
Which publication discusses the benefits and limitations of GWAS?
Signup and view all the answers
In a meta-analysis for GWAS, what is the main advantage of combining results from multiple cohorts?
In a meta-analysis for GWAS, what is the main advantage of combining results from multiple cohorts?
Signup and view all the answers
What is one outcome of using PLINK in genetic studies?
What is one outcome of using PLINK in genetic studies?
Signup and view all the answers
Which of the following components is involved in fine-mapping during post-GWAS analysis?
Which of the following components is involved in fine-mapping during post-GWAS analysis?
Signup and view all the answers
Which of the following resources provides a catalog of GWAS findings?
Which of the following resources provides a catalog of GWAS findings?
Signup and view all the answers
What does SNP stand for in the context of genetic studies?
What does SNP stand for in the context of genetic studies?
Signup and view all the answers
In what year was the publication discussing the finding of missing heritability in complex diseases released?
In what year was the publication discussing the finding of missing heritability in complex diseases released?
Signup and view all the answers
What characterizes the approach of experimental workflow in meta-analysis?
What characterizes the approach of experimental workflow in meta-analysis?
Signup and view all the answers
What aspect does genetic correlation analysis in post-GWAS analysis focus on?
What aspect does genetic correlation analysis in post-GWAS analysis focus on?
Signup and view all the answers
In the context of the Manhattan Plot for Schizophrenia, what is being compared?
In the context of the Manhattan Plot for Schizophrenia, what is being compared?
Signup and view all the answers
Which group represents individuals with the disease in the study?
Which group represents individuals with the disease in the study?
Signup and view all the answers
What percentage of the cases reported the genotype 'A A C'?
What percentage of the cases reported the genotype 'A A C'?
Signup and view all the answers
In this study, how many total individuals were assessed in both cases and controls?
In this study, how many total individuals were assessed in both cases and controls?
Signup and view all the answers
What is being sought after through the analysis of the genomic regions listed in the Manhattan Plot?
What is being sought after through the analysis of the genomic regions listed in the Manhattan Plot?
Signup and view all the answers
Which genotype was most common in the controls with a percentage of 51%?
Which genotype was most common in the controls with a percentage of 51%?
Signup and view all the answers
What type of plot displays the results of variations in allele frequency compared to disease presence?
What type of plot displays the results of variations in allele frequency compared to disease presence?
Signup and view all the answers
What is the significance of identifying hundreds of genomic regions with significant association to the disease?
What is the significance of identifying hundreds of genomic regions with significant association to the disease?
Signup and view all the answers
Which option best describes the role of controls in this genetic study?
Which option best describes the role of controls in this genetic study?
Signup and view all the answers
What is the primary goal of imputation in genotype data processing?
What is the primary goal of imputation in genotype data processing?
Signup and view all the answers
Which step is NOT involved in the imputation process?
Which step is NOT involved in the imputation process?
Signup and view all the answers
What is a potential consequence of not accounting for ancestry in GWAS?
What is a potential consequence of not accounting for ancestry in GWAS?
Signup and view all the answers
How is ancestry typically considered in GWAS?
How is ancestry typically considered in GWAS?
Signup and view all the answers
Why is it important to check for unusual minor allele frequencies during imputation?
Why is it important to check for unusual minor allele frequencies during imputation?
Signup and view all the answers
What might be a result of matching cases and controls by ancestry in a GWAS?
What might be a result of matching cases and controls by ancestry in a GWAS?
Signup and view all the answers
Which of the following tools is NOT mentioned for imputation?
Which of the following tools is NOT mentioned for imputation?
Signup and view all the answers
What is the role of the reference population panel in imputation?
What is the role of the reference population panel in imputation?
Signup and view all the answers
Study Notes
Genome-Wide Association Studies (GWAS)
- GWAS are studies that investigate the association between genetic variants and phenotypes.
- They aim to identify differences in allele frequencies of genetic variants between individuals, focusing on those with similar ancestry but different traits.
- GWAS can analyze copy-number variants or sequence variations in a genome.
- The most common variants analyzed in GWAS are single nucleotide polymorphisms (SNPs).
- GWAS typically involve targeted genotyping of pre-selected variants using microarrays.
- Whole-exome sequencing (WES) and whole-genome sequencing (WGS) also capture all genetic variation and are also considered GWAS, but the term often exclusively refers to common variants.
GWAS in One Slide
- A Manhattan plot displays the significance level of association (-log10 P) for hundreds of genomic regions (loci) in relation to a disease.
- The plot highlights genomic regions exhibiting significant associations with the disease.
- The graph's x-axis indicates chromosomes, and the y-axis displays the significance level.
- The plot visualizes how allele frequency differs between cases (with the disease) and controls (without the disease).
Zoom In to a GWAS Locus
- This section offers a detailed look at a specific location (locus) on a chromosome involved in a disease, like schizophrenia.
- A magnified graphic display of the association significance (-log10 P) for each identified variant (SNP) is featured.
- Recombination rates are illustrated on a separate subplot, enabling researchers to assess the distance between genetic markers.
- Variants, such as rs6759676, are highlighted to show how an association's statistical significance might fluctuate due to various factors.
Outline (Slides 4 and 5)
- The segments cover: Introduction, Experimental Workflow of GWAS, Selecting Study Population, Genotyping, Data Processing and GWAS Results.
What is GWAS?
- Genome-wide association studies (GWAS) find relationships between genotypes and phenotypes.
- These studies look for variations in allele frequency among people with similar ancestries but different phenotypes.
- GWAS can assess copy-number or sequence variations, though single nucleotide polymorphisms (SNPs) are frequently used.
Difference between GWAS, WES, and WGS
- Genome-wide association studies (GWAS) primarily involve targeted genotyping of specific variants.
- Whole-exome sequencing (WES) and whole-genome sequencing (WGS) aim to capture all genetic variation.
- In essence, although WES and WGS involve GWAS, the term GWAS is often specifically applied to studies focusing on common variants.
Common vs. Rare Genetic Variants
- Variants categorized as common or rare are specific to a population.
- Common variants have a minor allele frequency typically above 5%.
- Research usually involves a minimum minor allele count of at least 100 individuals.
- The effect size of a genetic variant's influence on a trait or disease is displayed in various levels ranging from very rare to common.
Questions (Slide 9)
- Key differences between GWAS and WES/WGS are presented.
- GWAS primarily focuses on common variants rather than rare ones, with a rationale to be further examined in relation to why?
Statistics on GWAS Studies
- The number of GWAS studies is above 5,700.
- The number of traits (phenotypes) analyzed exceeds 3,300.
- The number of participants involved in GWAS studies is greater than 1,000,000.
- Hundreds of genomic loci and thousands of replicable SNPs are frequently identified.
Challenges in Interpreting the Associations
- Individual genetic variants typically exhibit very little risk.
- The correlation of genetic variants with other traits can influence their associated effect.
- Drawing direct biological or causal inferences from the association can be highly complex.
Individual Variants Confer Very Little Risk
- Individual genetic variants often confer very small and independent risks for a complex trait or disease.
- Rare alleles that cause Mendelian diseases exhibit high effect sizes, though occurrences of these variants are extremely uncommon.
- Variants influencing common diseases show intermediate effect sizes and frequencies.
Variants Associated to Multiple Traits
- Several traits can be associated with the same region (locus) of a chromosome.
- This section uses illustrative plots to show that a single locus might be associated with multiple traits; in this case, different autoimmune and metabolic disorders.
Variants Correlated with Causal and Non-causal Variants
- Genetic variants are sometimes correlated with both causal and incidental variants at physically close distances due to linkage disequilibrium (LD), which can complicate a study's interpretation.
- The association can be misconstrued for causality, as often the correlation is not with the causal locus/variant.
Another Challenge in Interpreting the Associations
- Variations in genetic associations across various ancestries can confound analysis and interpretations.
- Direct comparisons across various ancestries can often show differences in genetic associations, necessitating careful analysis.
- Differences across different ancestries complicate the identification of meaningful biological or causal pathways that might be specific to one population.
Genetic Associations May Differ Across Ancestries
- A scatter plot graphically represents different ancestry populations’ clustering (via principal components analysis) to illustrate that genetic associations vary among different ancestry groups.
- Principal component analysis (PCA) is used to cluster populations based on their genetic similarity or grouping.
Questions (Slide 17)
- What are the four main challenges in interpreting GWAS results that complicate analysis?
Experimental Workflow of GWAS
- This section details the steps involved in performing a GWAS, starting with data collection, genotyping, quality control, imputation, association testing, meta-analysis, replication, and post-GWAS analyses.
Experimental Workflow: Data Collection
- Data can be assembled from existing study cohorts or publicly available resources like biobanks.
Experimental Workflow: Genotyping of Each Individual
- Genotypes are determined using either microarrays to focus on common variants, or next-generation sequencing for complete genomes.
Experimental Workflow: Quality Control
- Quality control involves analyzing the accuracy of wet and dry-lab stages and identifying unusual patterns or outliers in population strata via principal component analysis.
Experimental Workflow: Imputation of Untyped Variants
- Missing genetic data is inferred using reference panels like the 1000 Genomes Project or TopMed.
- Imputation is conducted by statistical methods.
Experimental Workflow: Genetic Association Test
- Tests are conducted using statistical models like linear or logistic regression to find links between a variant and phenotype while controlling for confounding factors.
Experimental Workflow: Meta-Analysis
- GWAS results from various independent studies are often combined.
- Standardized statistical pipelines are employed in analyzing the combined results to create a more comprehensive analysis and wider generality.
Experimental Workflow: Replication
- Replicating GWAS results in an independent cohort helps validation and robustness of findings.
- The independent sample cohort should be similar in ancestry to the discovery cohort and without overlap.
Experimental Workflow: Post-GWAS Analysis
- In silico analysis of genome-wide association studies (GWAS) uses external resources for additional analysis, enabling fine-mapping of SNPs and exploring their biological functions and pathways.
Question (Slide 29)
- This question asks for an explanation of each step in the GWAS experimental process, suitable to be answered in a short discussion format between students.
Selecting Study Population
- GWAS typically involve very large sample sizes to detect reproducible genome-wide associations.
Selecting Study Population (Cont.)
- Large sample sizes required for GWAS require substantial resources, and many studies utilize public resources.
- Sample selection design often depends on the specific research question being explored.
Genotyping
- Microarray-based methods are commonly used to genotype.
- Complete genome sequencing (WGS) may become a more frequent method in the future due to lower costs.
Data Processing: Input Files
- Input files often include anonymized individual IDs, family relationships, demographic parameters (sex), phenotype (e.g., disease status), covariate data, genotype calls for all analyzed variants, and data on genotyping batches.
Data Processing: Input Files (Cont.)
- Pedigree information and phenotype data (e.g., presence or absence of a specific disease) are essential components for GWAS.
- PLINK is a specialized tool for handling GWAS input and output files, designed with these file formats in mind.
Data Processing: Quality Control
- Rare variants or missing values from a portion of the cohort are excluded from further analysis.
- Inconsistencies in genotyping errors or phenotype information are also eliminated to ensure accuracy.
- Methods such as comparing self-reported sex information with genotype-based determination are frequently used.
Data Processing: Imputation
- Imputation methods fill in missing genotype data using reference panels like 1000 Genomes Project or TOPMed.
- The process involves phasing and statistical inference of missing genotypes from surrounding known information in the dataset.
- Commonly used tools are provided.
Data Processing: Imputation (Cont.)
- Imputation involves several crucial steps: statistically phasing genotypes, selecting an appropriate reference panel, resolving issues in platforms, checking for unusual minor allele frequencies, and lastly, imputing missing genetic data and removing badly imputed data.
Question (Slide 43)
- This is an open-ended question about the process of imputation in GWAS analyses.
Data Processing: Ancestry Consideration
- In GWAS, participants' ancestry must be considered to avoid false positives from population stratification, which can arise when analyzing diverse populations.
- The ancestry and relatedness of research participants must be accounted for in GWAS and other genetic studies, especially when working with varied populations.
- Analyzing individuals across populations using a technique like principal components analysis (PCA) helps identify their ancestry.
Data Processing: Ancestry Consideration (cont.)
- An iterative process employing principal component analysis (PCA) is used to consider ancestry.
- PCA helps cluster individuals with similar characteristics.
- Clusters are used to identify outliers and compute principal components to use as covariates in later GWAS analyses.
Data Processing: Testing for Association
- Linear models are used for continuous phenotypes (e.g., height, blood pressure)
- Logistic regression models are used for binary phenotypes (e.g., disease presence).
- Demographic factors and ancestry are explicitly accounted for as covariates to reduce the risk of confounding.
Data Processing: Testing for Association (cont.)
- Adjusting for confounding factors such as age, sex, and ancestry is often incorporated in the GWAS analysis because they are correlated factors.
- Considering factors like linkage disequilibrium is vital to ensure accurate results in association tests because physically close variants usually act in correlated fashion—this is controlled for in analyses.
Data Processing: Accounting for False Discovery
- When testing many individuals’ genetic variants, it's essential to control for multiple testing errors during analysis to avoid spurious correlations.
- The most frequently employed approach to accounting for multiple testing is to set a more stringent threshold using a Bonferroni correction, dividing the typical 0.05 threshold by the total number of tests. This helps limit false discoveries.
GWAS Results: Summary Statistics
- GWAS summary statistics include association tests' p-values, effect sizes, and directions for reported traits or phenotypes of interest.
GWAS Results: Visualization
- Manhattan plots and quantile-quantile plots (QQ-plots) are used to represent GWAS data visually.
- Visualization tools enable visual inspection of possible spurious associations, patterns, or unusual genetic locations.
Question (Slide 57)
- This question is about GWAS summary results resources and characteristics.
PLINK Analysis Toolset
- PLINK is a software suite used for whole-genome association analysis.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz focuses on the principles and methodologies related to genetic association testing and the quality control processes involved in the workflow. It covers topics such as genotypic data collection, the use of biobanks, and challenges in interpreting genetic associations across different ancestries. Test your understanding of these key concepts in genetics.