Genomics Research PDF

Genomic analysis Identification, measurement or comparison of genomic features • e.g., DNA sequence, structural variation, gene expression, or regulatory and functional element annotation Methods: ❑high-throughput sequencing ❑microarray hybridization ❑Bioinformatics High-throughput sequencing (NGS) ❑ Huge amount of data (terabytes) ❑ Analysis computationally intensive ❑ Dedicated IT ❑ infrastructure Next generation sequencing (NGS) • Based on massive parallel sequencing method ❑Sanger sequencing: 384 samples/batch; NGS: ~109 samples/batch! • Requires complex computer algorithms to line up the reads • Subject to error if a single region is not read multiple times (depth of sequencing) Process of attaching biological information to sequences Genome annotation ❑Newly sequenced genomes include many genes about which little or nothing is known 2 main steps: 1. identifying elements on the genome (structural annotation) 2. attaching biological information to these elements (functional annotation) Identifying genes is still a challenge, more than a decade after the completion of the human genome project The most recent human genome, which geneticists have used as a reference since 2013, still lacks 8% of the full sequence https://www.nature.com/articles/d41586-022-00726y?utm_term=Autofeed&utm_campaign=nature&utm_ medium=Social&utm_source=Twitter#Echobox=164742 7748 Challenges understanding genetic information Genetic Information • • • • • Molecular Structure Biochemical Function Phenotype Genetic information is redundant Structural information is redundant Genes and proteins are meta-stable Single genes have multiple functions Genes are 1D but function depends on 3D structure Also: ✓ Intron-exon variation ✓ Alternative splicing ✓ Strain variations (SNPs) ✓ Sequencing errors Comparative genomics • Comparison of complete genome sequences of different species • Used to pinpoint regions of similarity, difference • Can answer questions like: ❑How has the organism evolved? ❑What differentiates species? ❑Which non-coding regions are important? ❑Which genes are required for organisms to survive in a certain environment? Types of polymorphisms 1. Single Nucleotide Polymorphism (SNP) ❑any single base substitution, e.g., from AAGGCT to ATGGCT ❑most abundant type of genetic variation in the human genome 2. Copy Number Variation (CNV) ❑ Segment of DNA that are found in different numbers of copies among individuals A B C ❑ Substantial regions, not single nucleotides A C ❑ Analyzed via array CGH A B B B C Exploring the human genome 2002 Sanger sequencing, targeted genotyping 2008 Genome-wide genotyping (GWAS) Exome Genome sequencing sequencing International HapMap Project Aimed to define patterns of genetic variation across human genome; tested ff. populations: ❑ CEU: CEPH (Utah residents with ancestry from northern and western Europe) (30 trios) ❑ CHB: Han Chinese in Beijing, China (45 individuals) ❑ JPT: Japanese in Tokyo, Japan (45 individuals) ❑ YRI: Yoruba in Ibadan, Nigeria (30 trios) Guide selection of SNPs efficiently to “tag” common variants The HapMap was constructed in three steps: 1. SNPs are identified in DNA samples from multiple individuals 2. Adjacent SNPs that are inherited together are compiled into "haplotypes" 3. "Tag" SNPs within haplotypes are identified that uniquely identify those haplotypes Genomics and human migration patterns • Haplogroup = group of people sharing similar SNPs • different haplogroups associated with different geographic locations e.g., Africa, Asia, the Americas, Europe • possible to trace migration routes by observing the branching points in an ancestral map containing all known haplogroups 1000 Genomes Project • Whole genome sequencing • Complete description of human genetic diversity in >1000 individuals from multiple populations • aims to extend, refine the HapMap catalog • Goal: identify gene variants associated with disease susceptibility 2012: 100,000 Genomes Project • UK NHS based project • Focus is on rare diseases, some common types of cancer, and infectious diseases Personal genomics • process of deducing a person's entire genetic code • Employs SNP analysis or partial or full genome sequencing • 1st person to have personal genome sequenced = James Watson ❑$2 million, 2 months to finish • Watson's genome deposited in a public database Human disease: a consequence of variation Genetic variation responsible for the adaptive changes that underlie evolution Some changes improve the fitness of a species Other changes are maladaptive ❑ may represent disease Molecular perspective: mutation and variation Medical perspective: pathological condition Genome wide association studies (GWAS) Goal: Find connections between: 1. A heritable phenotype, e.g., height, type-I diabetes, etc. 2. Whole-genome genotype Specific goals are distinct: ❑ Make hypotheses for genotype-phenotype correlations ❑ Generate insights on genetic architecture of phenotype ➢Many small genetic effects dispersed across genome? ❑ Build statistical models to predict phenotype from genotype “Show me your genome and I will tell you what diseases you will get” Genome wide association studies (GWAS) • involves scanning markers (e.g., SNPs) across genomes of many people to find genetic variations associated w/ particular disease • associations identified can be used to develop better strategies to detect/treat/prevent the disease Control Population Disease Population SNP chip e.g., compare SNPs in people who have high blood pressure with SNPs of people who do not Using SNPs to track predisposition to disease Tools: • databases that contain reference human genome sequence • map of human genetic variation • technologies that can quickly, accurately analyze whole-genome samples for genetic variations that contribute to onset of a disease © Gibson & Muse, A Primer of Genome Science GWAS methodology • collect phenotypic information from thousands of individuals • extract DNA; get genotype of at least 500,000 SNPs • label genotypes and detect association using software ➢ chip-based microarray technology can assay millions of SNPs • analyze results; target identification Genotyping chip Affymetrix 100k chip set ❑Entire genome w/ 100k SNPs (low density) Affymetrix 500k chip (SNP array 5.0) ❑Entire genome w/ 500k SNPs (high density) Affymetrix 1M chip (SNP array 6.0) ❑Entire genome with 1M SNPs (very high density) GWAS Catalog The NHGRI-EBI Catalog of published genome-wide association studies Examples: breast cancer, rs7329174, Yang, 2q37.1, HBS1L, 6:16000000-25000000 https://www.ebi.ac.uk/gwas/ • Use of genetic information regarding common disease can lead to “personalized medicine” ➢Improvements in diagnostic, therapeutic, and preventive approaches ➢individualized approach to patients “Show me your genome and I will tell you what diseases you will get” ➢Can change patients’ behaviors in ways that lead to improved health Genetic information regarding common disease can lead to improvements in ❑diagnostics ❑therapeutics ❑preventive approaches “Personalized medicine” or precision medicine ➢individualized approach to patients ➢Can change patients’ behaviors in ways that lead to improved health GWAS vs. study of “single gene” disorders • Many genes, many SNPs ❑~25,000 genes, many can be candidates ❑~12,000,000 SNPs • From large effects of single genes in rare, “single-gene” diseases to smaller effects of multiple genes in common, “complex” diseases • An archive of data from genome-wide association studies on a variety of diseases and conditions already can be accessed through an NCBI Web site ❑Database of Genotype and Phenotype (dbGaP) located at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gap Some GWAS findings do not explain heritability • Height: ❑From twins and family study, about 80% of height variability is heritable ❑Huge height GWAS (n>40K ) found SNPs explaining ~10% of height variability • Diseases: Schizophrenia, heart disease, cancers,… ❑Heritability: 30%-80% ❑For none of these, GWAS gives more than 5%-10% • Basically, for all complex traits investigated, a major gap remains! Where is the missing heritability? Theories 1. Rare variants not covered by GWAS : Every family has its own mutation e.g., BRCA 2. Complex associations/epistasis: combinations of SNPs ❑Problem: 106 SNPs is 1012 pairs 3. Lack of power: the effects are weak, we need much more data ❑Or statistical approaches that aggregate more smartly 4. Epigenetic effects: heritability is not in the genome at all Pharmacogenomics Branch of pharmacology which deals w/ influence of genetic variation on drug response ❑ e.g., differential response of drug transporters, drug-metabolizing enzymes, drug receptors Aims to predict what drugs will be most effective, safe for an individual based on genome sequence/ expression profile ❑ personalized treatment! • Drugs don’t have same efficacy in all patients ❑A US study reports that 6.7% may have adverse drug reactions while 0.32% have fatal reactions • SNPs → alter protein → decrease drug binding → drug inefficacy ❑e.g., asthma patients have differential response to steroids due to SNPs in GLCC1 gene 6-MP = purine analog; interferes with growth of cancer cells ❑ used for treating acute lymphoblastic leukemia Thiopurine S-methyl transferase (TPMT) activity affects 6-MP drug efficacy Eichelbaum et al., Annu. Rev. Med. 2006.57:119-137 Cytochrome oxidase P450 enzymes CYP2A6, CYP2B6, CYP2C9, CYP2C19, CYP2D6, CYP2E1 and CYP3A4 are responsible for metabolizing most clinically important drugs Personalized genomic medicine essentially captures the idea that each person’s individual genome sequence will eventually be part of their own medical care Sample Collection Sample Collection Access to patient’s genome Testing: Sequencing, Gene chips Implications for biomedicine Physicians will use genetic information to diagnose and treat disease ❑ Virtually all medical conditions have a genetic component Faster drug development research: (pharmacogenomics) Personal genetics with 23andMe: risk alleles Hemochromatosis = inherited condition that causes you absorb too much iron from foods • the most common genetic disease among Caucasians Types of polymorphisms 1. Single Nucleotide Polymorphism (SNP) ❑any single base substitution, e.g., from AAGGCT to ATGGCT ❑most abundant type of genetic variation in the human genome 2. Copy Number Variation (CNV) ❑ Segment of DNA that are found in different numbers of copies among individuals A B C ❑ Substantial regions, not single nucleotides A C ❑ Analyzed via array CGH A B B B C Human genetic diversity as basis for identification • Any two individuals differ in about 3 x 106 bases (0.1% of the genome) • The total human population is now about 6 x 109 → no two people, save for identical twins, have exactly the same DNA sequence! • An individual’s genetic profile can be used as basis for precise identification ➢ DNA fingerprinting How is DNA fingerprinting done? • DNA can be obtained from blood, bone, hair, and other body tissues and products. • Forensic scientists scan and DNA regions (markers) that vary from person to person ➢STRs (short tandem repeats) ➢VNTRs (very numerous tandem repeats DNA fingerprinting in forensics • First developed in the mid-1980s • DNA fingerprinting now accepted in most courts in the United States and other countries • The FBI uses a standard set of specific STR regions • The odds that two individuals will have the same DNA profile is about one in one billion! • In several instances, has been used to exonerate or free persons convicted of crimes

Genomics Research PDF

Document Details

Tags

Related

Summary

Full Transcript