Genomics PDF
Document Details
Uploaded by DazzlingFreedom
Tags
Summary
This document introduces genomics and its applications. It discusses genomics (DNA), transcriptomics (RNA), and proteomics (proteins), and includes details about genomic analysis methods such as high-throughput sequencing, microarray hybridization, and bioinformatics. It also covers next-generation sequencing (NGS) and presents examples of sequenced genomes.
Full Transcript
Genomics and its applications Constant Genotype Variable Phenotype Variable morphology physiology behaviour ecology Genomic analysis Identification, measurement or comparison of genomic features • e.g., DNA sequence, structural variation, gene expression, or regulatory and functional element...
Genomics and its applications Constant Genotype Variable Phenotype Variable morphology physiology behaviour ecology Genomic analysis Identification, measurement or comparison of genomic features • e.g., DNA sequence, structural variation, gene expression, or regulatory and functional element annotation Methods: ❑high-throughput sequencing ❑microarray hybridization ❑Bioinformatics High-throughput sequencing (NGS) ❑ Huge amount of data (terabytes) ❑ Analysis computationally intensive ❑ Dedicated IT ❑ infrastructure Next generation sequencing (NGS) • Based on massive parallel sequencing method ❑Sanger sequencing: 384 samples/batch; NGS: ~109 samples/batch! • Requires complex computer algorithms to line up the reads • Subject to error if a single region is not read multiple times (depth of sequencing) Read more: https://www.geno me.gov/aboutgenomics/factsheets/Sequencing -Human-Genomecost Next generation sequencing (NGS) techniques 454 Sequencing Illumina/Solexa ABI SOLiD Sequencing Chemistry Pyrosequencing Polymerase-based sequence-by-synthesis Ligation-based sequencing Amplification approach Emulsion PCR Bridge amplification Emulsion PCR Paired end (PED) separation 3 kb 200-500 bp 3 kb Mb per run 100 Mb 1300 Mb 3000 Mb Time per PED run <0.5 day 4 days 5 days Read length (update) 250-400 bp 35, 75 and 100 bp 35 and 50 bp Cost per run $ 8,438 USD $ 8,950 USD $ 17,447 USD Cost per Mb $ 84.39 USD $ 5.97 USD $ 5.81 USD Sequenced genomes: mammals http://www.genome.jp/kegg/catalog/org_list.html https://www.yourgenome.org/facts/timeline-organisms-that-have-had-theirgenomes-sequenced/ Homo sapiens (human) Pan troglodytes (chimpanzee) Pan paniscus (bonobo) Gorilla gorilla gorilla (western lowland gorilla) Pongo abelii (Sumatran orangutan) Nomascus leucogenys (northern white-cheeked gibbon) Macaca mulatta (rhesus monkey) Macaca fascicularis (crab-eating macaque) Callithrix jacchus (white-tufted-ear marmoset) Mus musculus (mouse) Rattus norvegicus (rat) Cricetulus griseus (Chinese hamster) Heterocephalus glaber (naked mole rat) Tupaia chinensis (Chinese tree shrew) Canis familiaris (dog) Panthera tigris altaica (Amur tiger) Bos taurus (cow) Bos mutus (wild yak) Pantholops hodgsonii (chiru) Capra hircus (goat) Ovis aries (sheep) Sus scrofa (pig) Camelus ferus (Wild Bactrian camel) Balaenoptera acutorostrata scammoni (minke whale) Lipotes vexillifer (Yangtze River dolphin) Equus caballus (horse) Myotis brandtii (Brandt's bat) Myotis davidii Pteropus alecto (black flying fox) Monodelphis domestica (opossum) Sarcophilus harrisii (Tasmanian devil) Ailuropoda melanoleuca (giant panda) Ursus maritimus (polar bear) Felis catus (domestic cat) Fig. 1. Variation in taxonomic richness and genome availability, quality, and assembly size across kingdom Animalia in GenBank (as of 28 June 2021) Toward a genome sequence for every animal: Where are we now? PNAS 2021 Vol. 118 No. 52 e2109019118 Genome availability for kingdom Animalia versus taxonomic descriptions and over time MISSION A deeper understanding and judicious application of advanced knowledge and emerging technologies in genomics and bioinformatics in health and medicine, agriculture, biodiversity, forensics and ethnicity, industry and the environment for the benefit of Filipinos and the rest of humanity. VISION A center of excellence in gene discovery and genomics research that effectively translates knowledge into applications beneficial to the Philippine society. GOALS ❑ Implement and promote research program-driven agenda on identified priority areas of national need and of competitive advantage in order to achieve a leading position in the country, region, and in the world; ❑ Train future scientists, researchers and experts in genomics and bioinformatics of the country; ❑ Promote link between academic research, government and private industries for the development of genome-based applications; ❑ Provide access to state-of-theart tools for genomic research and bioinformatics in order to strengthen the academic and research infrastructure of the country. Process of attaching biological information to sequences Genome annotation ❑Newly sequenced genomes include many genes about which little or nothing is known 2 main steps: 1. identifying elements on the genome (structural annotation) 2. attaching biological information to these elements (functional annotation) Identifying genes is still a challenge, more than a decade after the completion of the human genome project The most recent human genome, which geneticists have used as a reference since 2013, still lacks 8% of the full sequence https://www.nature.com/articles/d41586-022-00726y?utm_term=Autofeed&utm_campaign=nature&utm_ medium=Social&utm_source=Twitter#Echobox=164742 7748 Challenges understanding genetic information Genetic Information • • • • • Molecular Structure Biochemical Function Phenotype Genetic information is redundant Structural information is redundant Genes and proteins are meta-stable Single genes have multiple functions Genes are 1D but function depends on 3D structure Also: ✓ Intron-exon variation ✓ Alternative splicing ✓ Strain variations (SNPs) ✓ Sequencing errors Comparative genomics • Comparison of complete genome sequences of different species • Used to pinpoint regions of similarity, difference • Can answer questions like: ❑How has the organism evolved? ❑What differentiates species? ❑Which non-coding regions are important? ❑Which genes are required for organisms to survive in a certain environment? Types of polymorphisms 1. Single Nucleotide Polymorphism (SNP) ❑any single base substitution, e.g., from AAGGCT to ATGGCT ❑most abundant type of genetic variation in the human genome 2. Copy Number Variation (CNV) ❑ Segment of DNA that are found in different numbers of copies among individuals A B C ❑ Substantial regions, not single nucleotides A C ❑ Analyzed via array CGH A B B B C When DNA sequences on a part of chromosome 7 from two random individuals are compared, two SNPs occur in about 2,200 nucleotides. Exploring the human genome 2002 Sanger sequencing, targeted genotyping 2008 Genome-wide genotyping (GWAS) Exome Genome sequencing sequencing International HapMap Project Aimed to define patterns of genetic variation across human genome; tested ff. populations: ❑ CEU: CEPH (Utah residents with ancestry from northern and western Europe) (30 trios) ❑ CHB: Han Chinese in Beijing, China (45 individuals) ❑ JPT: Japanese in Tokyo, Japan (45 individuals) ❑ YRI: Yoruba in Ibadan, Nigeria (30 trios) Guide selection of SNPs efficiently to “tag” common variants The HapMap was constructed in three steps: 1. SNPs are identified in DNA samples from multiple individuals 2. Adjacent SNPs that are inherited together are compiled into "haplotypes" 3. "Tag" SNPs within haplotypes are identified that uniquely identify those haplotypes Genomics and human migration patterns • Haplogroup = group of people sharing similar SNPs • different haplogroups associated with different geographic locations e.g., Africa, Asia, the Americas, Europe • possible to trace migration routes by observing the branching points in an ancestral map containing all known haplogroups 1000 Genomes Project • Whole genome sequencing • Complete description of human genetic diversity in >1000 individuals from multiple populations • aims to extend, refine the HapMap catalog • Goal: identify gene variants associated with disease susceptibility 2012: 100,000 Genomes Project • UK NHS based project • Focus is on rare diseases, some common types of cancer, and infectious diseases About the 100,000 Genomes Project Personal genomics • process of deducing a person's entire genetic code • Employs SNP analysis or partial or full genome sequencing • 1st person to have personal genome sequenced = James Watson ❑$2 million, 2 months to finish • Watson's genome deposited in a public database Human disease: a consequence of variation Genetic variation responsible for the adaptive changes that underlie evolution Some changes improve the fitness of a species Other changes are maladaptive ❑ may represent disease Molecular perspective: mutation and variation Medical perspective: pathological condition Genome wide association studies (GWAS) Goal: Find connections between: 1. A heritable phenotype, e.g., height, type-I diabetes, etc. 2. Whole-genome genotype Specific goals are distinct: ❑ Make hypotheses for genotype-phenotype correlations ❑ Generate insights on genetic architecture of phenotype ➢Many small genetic effects dispersed across genome? ❑ Build statistical models to predict phenotype from genotype “Show me your genome and I will tell you what diseases you will get” Genome wide association studies (GWAS) • involves scanning markers (e.g., SNPs) across genomes of many people to find genetic variations associated w/ particular disease • associations identified can be used to develop better strategies to detect/treat/prevent the disease Control Population Disease Population SNP chip e.g., compare SNPs in people who have high blood pressure with SNPs of people who do not Using SNPs to track predisposition to disease Tools: • databases that contain reference human genome sequence • map of human genetic variation • technologies that can quickly, accurately analyze whole-genome samples for genetic variations that contribute to onset of a disease SNPIA - protective SNP SNPIG - disease-causing SNP © Gibson & Muse, A Primer of Genome Science GWAS methodology • collect phenotypic information from thousands of individuals • extract DNA; get genotype of at least 500,000 SNPs • label genotypes and detect association using software ➢ chip-based microarray technology can assay millions of SNPs • analyze results; target identification Genotyping chip Affymetrix 100k chip set ❑Entire genome w/ 100k SNPs (low density) Affymetrix 500k chip (SNP array 5.0) ❑Entire genome w/ 500k SNPs (high density) Affymetrix 1M chip (SNP array 6.0) ❑Entire genome with 1M SNPs (very high density) GWAS Catalog The NHGRI-EBI Catalog of published genome-wide association studies Examples: breast cancer, rs7329174, Yang, 2q37.1, HBS1L, 6:16000000-25000000 https://www.ebi.ac.uk/gwas/ • Use of genetic information regarding common disease can lead to “personalized medicine” ➢Improvements in diagnostic, therapeutic, and preventive approaches ➢individualized approach to patients “Show me your genome and I will tell you what diseases you will get” ➢Can change patients’ behaviors in ways that lead to improved health Studies on the prediction of complex diseases using multiple genes Disease Age-related macular degeneration Genetic variants Coronary heart disease UCP2 G(-866)A, APOE e2/3/4, LPL D9N, APOA4 T347S Coronary heart disease AGT T4072C, ACE I/D, AGTR1 A1166C, CYP11B2 C(344)T, ADD1 G614T, GNB3 C825T Hypertriglyceridemia APOA5 S19W, APOA5 T(-1131)C, APOE e/3/4, GCKR rs780094, TRIB1 rs17321515,TBL2/MLXIPL rs17145738, GALNT2 rs 4846914 MI after surgery Systemic lupus erythematosus CFH Y402H, CFH rs1410996, LOC387715 A69S, C2-CFB IL6 G572C, ICAM1 K469E, SELE G98T PXK rs6445975, HLA region rs3131379 and rs9275572, IRF5/TNPO3 rs12537284,KIAA1542 rs4963128, ITGAM rs988 8739 Type 2 diabetes KCNJ11 G23L, PPARG P12A, TCF7L2 rs7903146 Type 2 diabetes GCK G(–30G)A, IL6 G(–174)C, TCF7L2 rs7903146 Type 2 diabetes SNPs in TCF7L2, 2 in CDKN2A/2B, KCNJ11, PPARG, ADAM30/NOTCH2, IGF2BP2, FTO, CDKAL1, SLC30A8, TSPAN8//LGR5, CDC123, WFS1, TCF2, ADAMTS9, HHEX, THADA, JAZF1 Type 2 diabetes SNPs in TCF7L2, 2 in CDKN2A/2B, KCNJ11, PPARG, ADAM30/NOTCH2, IGF2BP2, FTO, CDKAL1, SLC30A8, TSPAN8//LGR5, CDC123, WFS1, TCF2, ADAMTS9, HHEX, THADA, JAZF1 Genetic information regarding common disease can lead to improvements in ❑diagnostics ❑therapeutics ❑preventive approaches “Personalized medicine” or precision medicine ➢individualized approach to patients ➢Can change patients’ behaviors in ways that lead to improved health Potential of GWAS 23andMe FAQ: Genotyping vs. Sequencing GWAS vs. study of “single gene” disorders • Many genes, many SNPs ❑~25,000 genes, many can be candidates ❑~12,000,000 SNPs • From large effects of single genes in rare, “single-gene” diseases to smaller effects of multiple genes in common, “complex” diseases • An archive of data from genome-wide association studies on a variety of diseases and conditions already can be accessed through an NCBI Web site ❑Database of Genotype and Phenotype (dbGaP) located at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gap Genetic spectrum of complex diseases Linkage Sequencing GWAS Some GWAS findings do not explain heritability • Height: ❑From twins and family study, about 80% of height variability is heritable ❑Huge height GWAS (n>40K ) found SNPs explaining ~10% of height variability • Diseases: Schizophrenia, heart disease, cancers,… ❑Heritability: 30%-80% ❑For none of these, GWAS gives more than 5%-10% • Basically, for all complex traits investigated, a major gap remains! Where is the missing heritability? Theories 1. Rare variants not covered by GWAS : Every family has its own mutation e.g., BRCA 2. Complex associations/epistasis: combinations of SNPs ❑Problem: 106 SNPs is 1012 pairs 3. Lack of power: the effects are weak, we need much more data ❑Or statistical approaches that aggregate more smartly 4. Epigenetic effects: heritability is not in the genome at all Pharmacogenomics Branch of pharmacology which deals w/ influence of genetic variation on drug response ❑ e.g., differential response of drug transporters, drug-metabolizing enzymes, drug receptors Aims to predict what drugs will be most effective, safe for an individual based on genome sequence/ expression profile ❑ personalized treatment! • Drugs don’t have same efficacy in all patients ❑A US study reports that 6.7% may have adverse drug reactions while 0.32% have fatal reactions • SNPs → alter protein → decrease drug binding → drug inefficacy ❑e.g., asthma patients have differential response to steroids due to SNPs in GLCC1 gene Applications of pharmacogenomics in medical treatment Pharmacogenetics: A Case Study 6-MP = purine analog; interferes with growth of cancer cells ❑ used for treating acute lymphoblastic leukemia Thiopurine S-methyl transferase (TPMT) activity affects 6-MP drug efficacy Eichelbaum et al., Annu. Rev. Med. 2006.57:119-137 Cytochrome oxidase P450 enzymes CYP2A6, CYP2B6, CYP2C9, CYP2C19, CYP2D6, CYP2E1 and CYP3A4 are responsible for metabolizing most clinically important drugs Effect of metabolic rate on drug dosage © 2006 American Medical Association. All rights reserved Personalized genomic medicine essentially captures the idea that each person’s individual genome sequence will eventually be part of their own medical care Sample Collection Sample Collection Access to patient’s genome Testing: Sequencing, Gene chips Implications for biomedicine Physicians will use genetic information to diagnose and treat disease ❑ Virtually all medical conditions have a genetic component Faster drug development research: (pharmacogenomics) Precision Medicine: some public initiatives Personal Genomics companies ANCESTRY FEATURES Know your personal story, in a whole new way. •Discover where in the world your DNA is from across 2000+ regions — in some cases down to the county level. “My Genome, Myself: Seeking Clues in DNA” A. Harmon, New York Times, Nov 17, 2007 Personal genetics with 23andMe: risk alleles Hemochromatosis = inherited condition that causes you absorb too much iron from foods • the most common genetic disease among Caucasians A journey through a Filipino genome By MICHAEL PURUGGANAN, PhD August 10, 2011 “Armed with data on Filipino genomes, we can find out what genetic disorders are common in our people and which are rare, develop new diagnoses, and maybe find some life – saving cures” “My 22 chromosomes, showing genetic mixing from Asia and Europe. “ • 1 color – comes from either Europe or Asia • 2 colors – mixture from both continents. https://newsinfo. inquirer.net/1689 793/ghosts-thatshape-who-weare-the-phgenome-project The Filipino Genome Research Project (FGRP) is a three-tiered, P173M project led by the NSRI’s DNA Analysis Laboratory and supported by DOST - PCHRD Collecting saliva samples and other data (like physical traits and genealogy) from 2,870 adult volunteers from each of the country’s 17 regions and 24 ethnolinguistic groups ➢ from the Ibaloi of Benguet to the Jama Mapun of Tawi-Tawi Personal genomics: issues Full genome sequencing provides a large amount of information e.g., carrier status for autosomal recessive disorders, genetic risk factors for complex adult-onset diseases Possibility of getting unwanted information e.g., may find out that one has the genetic variant for a largely untreatable or unpreventable disease, such as Alzheimer‘s Possibility of genetic discrimination when applying for a job or trying to get health insurance Personalized medicine • Are patients ready for this? • Are healthcare professionals skilled enough and ready to communicate about it with their patients? • With personalized medicine, there may exist more or even different information about the available treatment options for the patient and doctor to understand and discuss. • Patients who find this difficult will need good support from their doctors. Precision medicine: challenges Cost: $500 for the lite version, $3000 the whole Privacy: What if WikiLeaks releases it to the world Can providers sell information to companies? Insurance: The Genetic Non-Discrimination Act protects from workplace and health discrimination only Anxiety: Will I learn something I don’t want to know? And will there be anything I can do about it? Consent: How can I provide informed consent to something I don’t really understand? Can I trust my doctor to help me decide? Types of polymorphisms 1. Single Nucleotide Polymorphism (SNP) ❑any single base substitution, e.g., from AAGGCT to ATGGCT ❑most abundant type of genetic variation in the human genome 2. Copy Number Variation (CNV) ❑ Segment of DNA that are found in different numbers of copies among individuals A B C ❑ Substantial regions, not single nucleotides A C ❑ Analyzed via array CGH A B B B C Human genetic diversity as basis for identification • Any two individuals differ in about 3 x 106 bases (0.1% of the genome) • The total human population is now about 6 x 109 → no two people, save for identical twins, have exactly the same DNA sequence! • An individual’s genetic profile can be used as basis for precise identification ➢ DNA fingerprinting How is DNA fingerprinting done? • DNA can be obtained from blood, bone, hair, and other body tissues and products. • Forensic scientists scan and DNA regions (markers) that vary from person to person ➢STRs (short tandem repeats) ➢VNTRs (very numerous tandem repeats DNA fingerprinting in forensics • First developed in the mid-1980s • DNA fingerprinting now accepted in most courts in the United States and other countries • The FBI uses a standard set of specific STR regions • The odds that two individuals will have the same DNA profile is about one in one billion! • In several instances, has been used to exonerate or free persons convicted of crimes Adapted from Promega PowerPlex® Fusion System Technical Manual. The STR loci and the chromosomal location used to confirm the identity of DNA profiles in the Promega Fusion System