Summary

This lecture discusses genome variants, mutations, and population genetics.

Full Transcript

Genetics L9: Genome Gene Variants ● Explain why there is no single reference genome that applies to all humans. A. Variants in the human genome are from population genetics over about 0.5 million years 1. Mutations over animal and human evolution 2. Interbreeding with extinct hominins 3. Human mig...

Genetics L9: Genome Gene Variants ● Explain why there is no single reference genome that applies to all humans. A. Variants in the human genome are from population genetics over about 0.5 million years 1. Mutations over animal and human evolution 2. Interbreeding with extinct hominins 3. Human migrations (gene flow) 4. Bottleneck and Founder effects on population size lead to genetic drift 5. Environmental selection for specific alleles B. 1000 Genomes project consortium analyzed genomic data to develop ethnic haplotypes for use as reference genomes 1. Sequenced genomes from regions across the globe to generate haplotypes (a) Haplotype: genotype inherited from one of the parents (b) Haplotype mostly algorithmically determined but imputed (predicted) in some cases via parent-offspring analysis (c) Homologous recombination is not random on a chromosome/have specific locations 2. Compared haplotypes to determine variants within and between different human populations 3. Typical variation from reference genome has about 3 million (about 1%) made of: (a) Mostly (99.9%) SNP (suggested nomenclature is SNV) and short indels (i) 1 SNV per 300 – 1000 bp relative to reference (b) 2,100 – 2,500 “structural variants” that affect more bases but lower number (i) 1,000 large deletions (ii) 160 copy number variants (CNVs) (iii) 915 Alu (SINE) insertions (iv) 128 L1 (LINE) insertions (v) 51 SVA insertions [SINE-VNTR-Alus (SVA) are non-autonomous hominid specific retrotransposons that are associated with disease 4. Variant number per genome differs between continents and within continents (a) Europe less (i) Finland and Great Britain least (b) Africa more (i) Sierra Leone the greatest C. “The tremendous variability of human genomes highlights the almost absurd notion of a single reference human genome sequence.” Feng et al., Annu. Rev. Genomics Hum. Genet. 2009. 10:451–81 D. More recent whole genome sequencing efforts have focused on more isolated populations for anthropologically informed genomic data 1. Observed many common yet private variants in specific populations 2. Some archaic (Neanderthal and Denisovan) haplotypes still exist in the population (a) Inter-b E. Current human genome build is Genome Research Consortium human build 38 (GRCh38) 1. Is from multiple people (20 with 1 contributing 70% of the sequence) but does not represent the most common variants/haplotypes within the “population” 2. Released 12/2013 with current update/patch in 3/19 3. Corrected many sequencing errors and added in synthetic centromeric sequences (a) Centromeres are highly repetitive making short read sequencing challenging 4. Includes alternate contigs for common complex variants such as the HLA locus of the adaptive immune system 5. Currently has >300 unresolved issues mostly related to gaps caused by long sequence repeats 6. Telomere to telomere sequencing nearly complete via single molecule/long read sequence Braca gene: protein in dna repair pathway mechanism with non homologous ends joining the homolgous recombination recognizes dna damage and uses repair mechanism t try and fix double strand breaks Muth: colorectal cancers ● Describe the different types of mutations (variants) that occur, how they are formed, and how they survive in a population. A. Variations are classified based upon their DNA, RNA, and protein sequence and downstream effects on protein function 1. DNA sequence variants that affect the gene or transcription of the gene 2. mRNA variants resulting from alternative splicing or other processing 3. Protein variants resulting from single or multiple amino acid changes or changes in protein structure B. DNA sequence variations have two major groups based on sequence length: 1. The single base pair level (SNV/SNP) that also includes insertions and deletions less than 100 bp 2. Structural variants at the submicroscopic level (karyotyping) of 100 to Phenomes C. Single nucleotide level (point mutations) 1. Substitution 2. Insertion 3. Deletion D. Greatest affect in exons then introns then intergenic regions 1. Exon effects of changing aa code or premature stop codon 2. Intron effects greatest at exon/intron junctions and lariat site as this modifies splicing (a) Mendel’s white flower pea results from an intronic SNV at the 3’ splice site that alters splicing of an upstream transcription factor for the anthocyanin synthase gene (b) 8 nts added to an exon shifting the reading frame and eventually resulting in a premature stop codon and reduced expression of the transcript (c) THM > the recessive mutation that gives white color is in an intron of a transcription factor that normally turns on expression of a pigment-producing enzyme E. Single nucleotide substitution types 1. Transition (a) Purine or pyrimidine change (b) C <-> T, A <-> (c) More likely than transversion (d) May be silent if in the wobble position 2. Transversions (a) Purine <-> pyrimidine change (b) A > C or T, G > C or T, C > A or G, T > A or G (c) More likely to have phenotypic effect G. Protein variants are determined by DNA or mRNA sequence variants that result in changes in the amino acid sequence at a single amino acid or domain/exon level via alterative splicing 1. Amino acid sequence changes resulting from single base substitutions (a) Synonymous (silent?): different codon but codes for same amino acid, different codon may change protein synthesis rate and folding because of codon bias or alter splicing giving a disease (code within the code) (b) Missense: different codon that codes for a different amino acid - Conservative missense: similar amino acid R-group - Non-conservative missense: non-similar amino acid R-group(hydrophilic to hydrophobic, charge changes, cysteines) (c ) Nonsense: codon changed to stop codon typically giving truncated protein or no protein 2. Amino acid sequence changes from indels (single and higher bp) (a) 1 – 2bp indel: frameshift of coding sequence giving different coding sequence at insertion, typically gives premature stop codon (b) 3 bp inframe indel: addition or loss of one amino acid at site (c) 3 bp inframe indel expand/contract length of protein generally modifying function (i) Can have long repeating segments inserted (example of Huntingtin and androgen receptor proteins poly glutamine tract at N-terminus ● Describe the different types of SNPs and CNVs found in human genomes and how they are classified by genetic authorities. F. Small Insertions and deletions (indel) 1. Phenotypic effect dependent upon location within genome and bp size of indel 2. Greatest potential affect in exons then introns then intergenic region 3. Within exons (intragenic) dependent upon number of bases (a) One to two bp insertion or deletion in exon causes frameshift in sequence and subsequent change in reading frame (i) mRNA generally degraded by non-sense mediated mRNA because of premature stop codon (ii) If translated, amino acid sequence is highly modified 3’ to the in/del (b)Three bp indel out of frame has effects similar to 1-2bp insertion (c )Three bp indel in frame either adds or removes an amino acid (i) May or may not influence protein phenotype 4. Within introns dependent upon location with splice sites (donor and acceptor) most likely to have an effect 5. In 5’ or 3’ UTR affects translation and mRNA stability 6. Intergenic effects dependent upon location and size (a) Most likely to modulate transcriptional regulation potentially in a graded fashion (i) May contribute to variable penetrance and/or expressivity in phenome between individuals (b) Small indels in structural regions may have minimal effects (c) Small indels in transcription factor binding sites may have major effects F. Copy number variation (CNVs) are indels that have variable repeat number and bp lengths of 100 to 3Mbp 1. Responsible for most human genome variation on a bp basis 2. Repeat can contain a complete gene or series of genes 3. Phenotypic effects dependent upon size, location, and number of copies (A) Insertion may disrupt existing gene or activate an inactivated gene (B) Duplication/insertion or deletion can change copy number of genes (i) Gene dose effects (C )Position effects by changing gene location within the chromatin landscape of the interphase nucleus and thus transcriptional regulation (D) Duplication/insertion or deletion may disrupt transcriptional regulatory sequences 4. Thought to be important in human evolution and variation between populations (a) Cognition associated with DUF1220 gene expansion (b) Amylase gene expansion in high-starch consuming populations 5. Responsible or associated with many sporadic and Mendelian diseases (a) Hot area of data mining in relation to mental disorders ● Explain the different effects of variants on phenotype such as loss of function, gain of function, conditional loss of function, neomorphic, dominant negative, and synonymous. Phenome of mutations may be directly related to amino acid change(s) with the following types: 1. Loss of function mutation: as stated, generally requires homozygous but can be heterozygote if gene dosage is not corrected (a) Null mutations > complete loss of activity (b) Hypomorph > reduced activity (c) Common for autosomal recessive diseases 2. Gain of function mutation: activates formerly inactive gene, increases activity of protein or gives new function to a protein, resultant phenotype generally gene- dose dependent (a) Hypermorph > increased activity (b) Neomorph > new functionality of protein (c) Common for autosomal dominant and proto-oncogenes>oncogenes 3. Conditional loss or gain of function mutation: mutant phenotype is dependent upon environment (a) Environment-dependent cancer genes 4. Dominant negative mutation: mutant gene product (protein or RNA) that affects wild type gene product, typically in the negative (protein aggregation as an example, protein dimers, or complex protein assemblies) - Toxic proteins such as Huntingtin and Androgen receptor - Globin subunit variants for thalassemias and collagen subunit variants for osteogenesis imperfecta ● Describe how variants flanking the open reading frame in mature mRNA can cause disease. J. UTR variants can affect translation efficiency, mRNA lifetime, and translation location and thus amount and location of protein 1. Main effect is the amount and location of the protein (Goldilocks principle at the table) (a) Too much or too little protein especially for heteromeric (different subunit) proteins (i) hemoglobin diseases (thalassemias) 2. 5’-UTR variants primarily impact translation rate (a) Variants in the Kozak sequence for mRNA scanning by the small subunit (b) Variants in the IRES sequence (5’ cap independent translation) (c) Variants in the uORF (upstream open reading frame) (i) Initiate and terminate protein synthesis upstream of the main gene product (ii) Found in about 50% of coding genes (iii) Can increase or decrease the translation rate of the ORF 3. 3’-UTR variants impact polyadenylation, mRNA lifetime, and transport (a) Variants in the polyadenylation sequence that affect translation initiation and mRNA degradation (b) Variants that affect transport through the cell to sites of translation 4. Variants in the full-length transcript that can affect miRNA binding and hence translation rate and/or mRNA lifetime —-----------------------------------------------------------------------------------------------------------------------Variant classification (via American College of Medical Genetics and Genomics and the Association for Molecular Pathology) has 5 classes 1. Pathogenic (a) Sub-classification of very strong, strong, moderate, and supporting based upon predicted and/or observed effects (i) Very strong > loss of protein function variants such as nonsense, frameshift, exon deletions, splice-site affects exons 1 and 2 2. Likely pathogenic 3. Benign (a) sub-classification of stand alone, strong, or supporting based upon frequency in population, lack of effect of variant, and other criteria 4. Likely Benign 5. Uncertain significance (gene of uncertain significance GUS) Variant coding 1. G for genomic sequence # based on current genome (GRCh27-hg19) build (chromosome #) and M for mitochondrial genome 2. C for coding sequence # 1 for A of ATG 3. P for protein sequence 4. Example: g.117480025..117668665 (chr7), c.34G>C (p.Val12Leu) Variant frequencies from population data-sets for “normal” and data sets for disease states 1. Populations > 1000 Genomes project, Exome Aggregation Consortium, Exome Variant Server, dbSNP, dbVar 2. Disease > ClinVar, OMIM, Human Gene Mutation Database 3. Reference sequences > NCBI Genome, RefSeqGene, MitoMap Software-based interrogation of sequence (imputation) relative to reference sequences for prediction of variant effects 1. Missense variants 2. Spice-site variants A. B. 1. 2. 3. 4. Ancestry Random mutations at the single nucleotide to chromosome level Environment positively selects the good Environment negatively select the bad Maintain the neutral – may be good or bad in a new environment Mutation needs to be in germ line for propagation via gain, loss, or maintenance of the gene in the pool (a) Takes generations to change the level in the population C. Errors rates observed in organisms 1. E. coli – lactose fermentation genes – 1 X10-7/division 2. Nerospora crassa – inositol biosynthesis genes – 8 X10-8/asexual spore 3. Drosophila – eye color – 4 X 10-5/gamete 4. Corn – kernel color – 2.2X10-6/gamete 5. Human error rate varies with locus suggesting mutation hot spots in the genome (a) Human – neurofibromatosis – 1X10-4/gamete (b) Human – Duchenne muscular dystrophy – 9.2 X 10-5/gamete (c) Human – Huntington disease – 1X 10-6/gamete D. Estimated mutation rates in humans for SNVs and CNVs 1. SNV mutation rate estimated at 2 X 10-8/bp-generation (a) 3.2 x 109 bp/haploid genome X 2 x 10-8 bp-generation = 64 SNVs/generation 2. CNV mutation rate estimated at 1 X 10-6 to 10-4/locus-generation (a) Locus dependent (b) 3.2 x 109 bp/haploid genome X 1 x 10-5 bp-generation = 32,000 CNVs/generation E. Somatic and gametic variations (mosaics) 1. Somatic variations limited to one organism generation but in progeny of mutant cell (a) Earlier development more potential for phenotypic effect (mosaics) (b) Estimated at 1/1,000,000 2. Germ-line (gametic) variations passed on through the generations (a) Major focus of human genetic disorders F. Molecular mechanisms of variation at sub-chromosomal level 1. Environmentally induced (a) Chemical (b) Radiation 2. Spontaneous chemical reactions (time dependent) 3. DNA replication errors 4. Errors during meiotic recombination 5. Errors during DNA repair G. Environmentally induced mutations 1. Chemicals cause improper base pairing or insertion/deletions (a) Base pair modifiers (i) Base analogues (ii) Deaminating reagents (iii) Alkylating agents (b) Insertion/deletion inducers (i) Intercalating agents 2. Radiation causes strand breaks and base cross-linking (a) Strand breakers (i) Ionizing radiation (ii) UV-radiation (b) Cross-linkers (i) UV-radiation 3. Why do surface epithelial cells (gut and skin) have such short life-times? (a) 1 – 4 weeks compared to 20+ years for neurons H. Spontaneous chemical reactions result in modification similar to chemically-induced just at a much lower frequency 1. Tautomeric shifts (a) Keto to enol or imino isomers that change base-pairing (i) Results in single site transitions during DNA replication 2. Depurination (a) Purine base excised from nucleotide (i) Repair may be inefficient resulting in base change 3. Deamination (a) Cytosine or 5-methycytosine NH2 group excised giving a U or T transition 4. Oxidation mediated by reactive oxygen species generated by aerobic catabolism and defense mechanisms (a) G to 8-Oxy-7-hydroxyguanosine (i) Base-pars with A eventually giving a G to T transversion upon replication I. DNA replication errors in both synthesis and repair mechanisms can result in SNVs 1. Substitutions via non-standard base pairing of protonated bases (a) Change in proton position alters H-bonding between bases (b) C:A and G:T (c) Results in one strand being normal and one having a substitution (d) Not readily detected and repaired prior to replication 2. SNV insertions and deletions resulting from strand slippage (a) Repeating sequence on template strand (b) Synthesized or template strand loops out resulting in strand slippage (c) DNA pol continues after slippage resulting in error (d) If newly synthesized loops out > newly synthesized strand has insertion (e) If template loops out > newly synthesized has deletion (f) Results in small indels J. Errors during DNA replication can result in CNVs and chromosomal structural aberrations 1. Involves repeat sequences and can be recurrent or non-recurrent (a) Recurrent are at the same place in multiple non-related diseased individuals (b) Non-recurrent are at different places in multiple non-related diseased individuals 2. Strand slippage during DNA replication involving repeating sequences (a) Forward slippage gives deletion (b) Backward slippage gives duplication/expansion 3. CNV deletions and complex chromosomal structural aberrations can result from FoSTeS (Fork Stalling and Template Switching) (a) Lagging strand moves to a different replication fork of similar sequence and re-starts (b) Can occur multiple times during DNA replication giving repeats along a chromosome (c) Can occur on another chromosome giving complex structural rearrangements (chromosomal aberrations) 4. CNV deletions can occur via non-homologous end joining resulting in deletions (a) Breaks and joining more likely in repeats 5. CNVs and chromosome structural aberrations can occur by homologous recombination during meiosis or DNA repair (a) Generally result from misalignment between homologs (i) Alignment errors more frequent in repetitive sequences (b) Results in unequal crossing over between homologs (i) One chromosomes gets insertion (ii) One chromosome gets deletion (c) Can be small or large segments of DNA (d) Gamete result is 2 normal, 1 w/deletion and 1 w/duplication for meiosis 6. Homologous recombination can occur between non-homologous chromosome (non-allelic homologous recombination) having homologous regions resulting in translocations during meiosis or interphase DNA repair (a) Repeats of similar sequence along or on different chromosomes can synapse (i) Regions are homologous but not exactly in the regions flanking the synapse nor are they at alleles (b) Recombination can result in duplications, deletions, and inversions depending on the number, location, and type of the repeats as well as structural aberration —------------------------------------------------------------------------------------------------------------------ A. Monogenic (simple) disease causing variants can be discovered from research into the biochemistry of the disease, the genetics of the disease, or a combination of both 1. Biochemical approach determines the mechanism of the disease through analysis of the pathways and proteins involved (a) Aberrant enzyme or protein in pathway results in disease (i) Sequence protein and determine amino acid mutations 2. Genetic approach uses mapping methods to find the locus (chromosome location) then refine analysis in silico to find potential genes or mechanism 3. Real world uses combination to determine the molecular mechanisms of the disease to guide treatment (a) Greater knowledge of the molecular mechanism allows drug development and treatment strategies using biological knowledge instead of fishing around and statistics 4. Contemporary knowledge of the human genome and disease allows for rapid discovery of the genetics of disease (a) Refined whole genome sequence (b) Homologs found or engineered into model organisms associated with development and disease (c) Human populations genetics and knowledge of disease variants (d) Acceleration of the molecular basis of disease will accelerate leveraging models and artificial intelligence B. Methods based upon linkage of genes that do not follow Mendelian inheritance patterns 1. Classic example of sex-linked inheritance of various characters is related but not exactly the same (a) X chromosome and color blindness (opsins OPN1LW, OPN1MW, and OPN1SW) 2. Linked genes are on the same chromosome and linearly near each other 3. Dihybrid crosses do not result in 9:3:3:1 phenotypic ratio in the F2 offspring and can result in 3:1 if loci are adjacent (a) Behave as if a monohybrid cross (b) More distant the loci on the chromosomes the closer to the 9:3:3:1 ratio 4. Linked genes do not assort independently most of the time (a) If both genes are adjacent, behave as one (do not assort independently) and new combinations of characters in the offspring are not observed 5. Linked genes can assort independently as a result of cross-over (a) Further apart on the chromosome more likely to assort independently (i) Non crossover event gives non-recombinant gamete (ii) Crossover event gives recombinant gamete (assorted) 6. Maximum crossover probability is 0.5 since only one sister of each homolog is involved in crossover (a) Recombinant gamete probability is 0 to 0.5 (i) Adjacent genes – highly linked have 0 recombinant frequency (ii) Genes at the ends of a chromosome close to 0.5 recombinant frequency 7. In a large population or after several generations, linked genes become unlinked (a) Linkage reaches equilibrium (physically unlinking) 8. If genes are adjacent they do not reach equilibrium (do not become unlinked) (a) Linkage disequilibrium (remain physically linked) (b) Greater the linkage disequilibrium the closer the loci C. Mapping of genes along a chromosome is done via genetic or physical mapping 1. Genetic mapping is through breeding studies or analysis of extended families (a) Lower resolution (b) Units of Morgan with 1 cM having a recombinant frequency of 0.01/generation or 1 Mbp physical distance on a chromosome 2. Physical mapping is via hybridization and sequencing and sequence alignment (a) Many organisms have chromosome genome maps with single bp resolution D. Early genetic maps were developed using phenotypic features and inbreeding studies using model organisms 1. Also called genetic linkage analysis as it located the gene on a chromosome and distance relative to other genes 2. Start with heterozygotes for characters and backcross with homozygous recessive for characters 3. Determine recombination frequency in offspring (a) Progeny with recombinant phenotypes (combination of characters different from parents) (b) Back calculate frequency of non-recombinant and recombinant gametes 4. Direct relationship between recombinant frequency and distance between loci on the chromosome E. More recent methods use molecular markers (quantitative trait loci/QTLs or SNPs) as one of the features (loci) to find disease loci 1. Determine which one is most closely linked to the trait or disease loci 2. Early QTLs were not physically mapped and analyzed via statistical methods 3. With chromosome physical maps, SNPs selected with spacing along the chromosomes (a) And other details for GWAS F. Linkage analysis via pedigree analysis 1. Genealogical-based method to find disease loci in the genome 2. Human pedigrees more of a challenge as you cannot do breeding studies nor is there a large number of offspring for statistical analysis 3. Use closely related populations and follow through two or more generations (a) Early phenotype-based studies leverage relatively closed small human populations (i) Various religious sects with founder-effect populations 4. Marker can be phenotype or genotype based (a) Genotype based use well characterized sequences spaced along the chromosomes (i) Typically highly conserved and homozygous - Do not need to determine haplotypes for markers 5. LOD score is Log of ODds ratio score to estimate linkage (a) Simplest form: (i) LOD = log [probability of birth sequence with observed(given) linkage value/probability of birth sequence with no linkage] - Observed linkage = observed recombinant phenotypes - No linkage = probability w/independent assortment giving predicted recombinant phenotypes (b) Recombinant probability (theta) ranges from 0 to 0.5 in one meiosis for one gene and 0.25 for a pair of genes (i) 0 = linked/do not observe recombination between marker and disease (ii) 0.25 = unlinked/do observe recombination between markers and disease and observe non-parental phenotypes in offspring (c ) Null hypothesis is that theta = 0.25 for dihybrid unlinked gene (d) Pedigree data gives the observed recombinant theta G. Note that: “Several lessons emerged from studies of Mendelian disease genes: (i) The ‘candidate gene;’ approach was woefully inadequate; most disease genes were completely unsuspected on the basis of previous knowledge, (ii) Disease-causing mutations often cause major changes in encoded proteins, (iii) Loci typically harbor many disease-causing alleles, mostly rare in the population, (iv) Mendelian diseases often revealed great complexity, such as locus heterogeneity, incomplete penetrance, and variable expressivity.” H. Many discovered genes are used for genomic screening/personalized medicine 1. Geisinger’s MyCode study targets monogenic variant genes with known mechanisms of disease action I) Current state of the art is trio whole exome sequencing or genome sequencing 1) Sequence genome of parents and offspring 2) Compare to reference genome and known disease loci 3) Used for complex disease gene discovery, de novo mutation detection, and rapid genetic diagnosis in infants with congenital diseases ● Explain how disease-causing variants are found in the human genome through linkage analysis and genome-wide association studies (GWAS). GWAS A. Population-genetics-based approach instead of genealogical approach focused on poly-genic diseases 1. Consider population as large related family developed over 100s of generations B. Rapidly accelerating area of knowledge primarily targeted at more complex, multi-genic diseases, multi-loci phenomes, and diseases with gene frequencies greater than 1% in the population 1. Type 2 diabetes, prostate cancer, IBD and other mutli-genic diseases 2. Height, hair color, eye color, and many other phenomes C. Looks for statistical association of potential alleles with SNPs (tag SNPs) via co-segregation in a population 1. NP near disease associated allele co-segregate together hence show linkage dis-equilibrium 2. Most use r2 as a measure of linkage disequilibrium with higher being greater co-segregation and linkage disequilibrium (a) r2 = (freqAB*freqab – freqAb*freqaB)2/ freqAB*freqab* freqAb*freqaB D. Linkage disequilibrium between two alleles decreases with: 1. Increasing generations (reproductive events/generations) 2. Increasing population size 3. Increasing population age 4. In theory, alleles should go from linkage disequilibrium to linkage equilibrium E. SNPs highly selected and currently using population-specific reference genomes 1. Selected SNPs usually have only two different bases (alleles) instead of four 2. Facilitates haplotyping F. Tag SNP may have direct or indirect association with disease 1. Direct are genotyped and involved in disease 2. Indirect are not genotyped but associated with disease or disease risk G. GWAS (current) use haplotype sequences for analysis 1. Haplotype currently determined from “diplotype” using advanced algorithms and/or family generational data (a) Because many regions (used for tag SNP selection) are recombinant resistant (recombination “cold spots”?) H. Use SNP microarray chips with 2,000 to 2,000,000 SNPs to detect variants 1. More using whole genome sequencing but mostly cost prohibitive I. Huge data sets with bigger being better J. Requires “accurate” phenotype from health records 1. Leverage electronic health records now and in the future K. Phenotypes can be binary/categorical or scalar/quantitative 1. Quantitative give more statistical power to analysis L. Statistical analysis uses ANOVA or derivatives to assess co-segregation for single locus associations 1. Gets complicated very quickly depending on genetic model and many other factors M. Because of high number of comparisons highly significant P-values of 5 x 10-8 required to suggest association 1. Reduce the risk of false positives but increase risk of not detecting rare associations (false negatives) N. Statistical analysis for multi-locus associations becomes very complex 1. Attempts to parse phenome contribution of each different locus O. Statistically significant loci for further study are within 10 – 100 kb of tag SNP making region of genome size more amendable for further analysis compared to linkage analysis P. Data presented in a Manhattan plot along a or all chromosomes 1. Scatter plot with X-axis being genomic location and Y-axis being SNP phenome co-segregation P-value (a) Looks like the Manhattan skyline (b) Readily shows statistically significant loci of interest above the set-line Q. Have generated knowledge into multi-genic diseases but have not yielded a lot of clinical utility 1. Individual genetic components have low effects (odds ratio below 1.5) 2. Combining all significant associations typically explains less than half of disease heritability 3. Additional information from normal patient genome typically does not change patient treatment/lifestyle suggestions R. Has generated tons of data for follow-up on mechanisms of action of significant markers 1. Most significant markers are intergenic 2. Many intergenic markers are in cis regulatory gene elements (promoters, enhancers, TAD limits, inhibitors) (a) Modest changes in sequence may give graded effect on phenome (variable expressivity) (b) Epigenetic changes can affect (c) Cis element may modify a few genes or many 3. A lot of work to do for the best/hottest markers (a) Currently, only a few % of the major disease intergenic markers have been investigated for biological mechanisms S. Continued GWAS will give even more hits with a smaller and smaller effects (point of diminishing returns) T. Need to add in additional omics to further refine 1. In progress for the epigenome, transcriptome, and proteome

Use Quizgecko on...
Browser
Browser