Module 3-Polymorphisms And Mutations In Human Diseases PDF
Document Details
Uploaded by AS
Weill Cornell Medical College
Tags
Summary
This document is a module on polymorphisms and mutations in human diseases. It covers various types of mutations, including somatic and germline mutations, and discusses their effects on genes and proteins. The document also touches on the topics of loss and gain of function mutations.
Full Transcript
2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Module 3-Polymorphisms and Mutations in Human Diseases > Classes of Mutations There are four main questions we need to answer about a mutation in order to correctly categorize it. The mutation categories that apply to each of th...
2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Module 3-Polymorphisms and Mutations in Human Diseases > Classes of Mutations There are four main questions we need to answer about a mutation in order to correctly categorize it. The mutation categories that apply to each of these questions are listed below : 1. Where does the mutation occur and how it is passed on to the progeny? Somatic mutations Germline mutations 2. How does the mutation change the DNA sequence? Point mutations Base substitutions (transition and transversion) Single base insertion deletions (indels) Expanding trinucleotides repeats (dynamic mutation) 3. How does the mutation affect the protein structure? Missense mutations Neutral mutations Nonsense mutations Frameshift mutations Silent mutations 4. How does the mutation affect the protein function? Loss of Function Mutations Gain of Function Mutations Dominant Negative Effect Mini-Lecture 4 below provides the definitions for all the terms listed above. Also, the topic of polymorphisms will be covered. Mini-Lecture Slides https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 1/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Mutation: A Definition DNA is a highly stable molecule that is replicated with great accuracy, but changes in DNA structure and errors of replication do take place. A mutation is defined as an inherited change in genetic information; the descendants may be cells or organisms. The word 'mutation', however, is commonly used to describe either an event that produces a DNA sequence change, or the change itself, even if that had been inherited through many generations. In other words, the term mutation can describe the process or its product. Often, the product of the mutation is referred to as a "variant". In the context of molecular diagnosis of human conditions, 'mutation' refers to a pathogenic variant, while non-pathogenic variants are termed polymorphisms (see the section below). Somatic and Germline Mutations In multicellular organisms, we can distinguish between two broad categories of mutations: somatic mutations and germline mutations. Somatic mutations arise in somatic tissues, which do not produce gametes (Figure 18.1). Figure 18.1 When a somatic cell with a mutation divides (mitosis), the mutation is passed on to the daughter cells, leading to a population of genetically identical cells (a clone). The earlier in development that a somatic mutation takes place, the larger the clone of cells will be that contain the mutation. Because of the huge number of cells present in a typical eukaryotic organism, somatic mutations are numerous. For example, there are about cells in the human body. Typically, a mutation arises once in every million cell divisions, so hundreds of millions of somatic mutations must arise in each person. Many somatic mutations have no obvious effect on the phenotype of the organism because the function of the mutant cell is replaced by that of normal cells or the mutant cell dies and is replaced by normal cells. However, cells with a somatic mutation that stimulates cell division can increase in number and spread; this type of mutation can give rise to cells with a selective advantage and is the basis for cancers. Germ-line mutations arise in cells that ultimately produce gametes. A germ-line mutation can be passed to future generations, producing individual organisms that carry the mutation in all their somatic and germ-line cells. When we speak of mutations in multicellular organisms, we're usually talking about germ-line mutations. Historically, mutations have been partitioned into those that affect a single gene, called gene mutations, and those that affect the number or structure of chromosomes called chromosome mutations. This distinction arose because chromosome mutations could be observed directly, by looking at chromosomes with a microscope, whereas gene mutations could be detected only by observing their phenotypic effects. Now, DNA sequencing allows direct observation of gene mutations, and chromosome mutations are distinguished from gene mutations somewhat arbitrarily on the basis of the size of the DNA lesion. There are a number of ways to classify gene mutations. Some classification schemes are based on the nature of the phenotypic effect and others focus on the molecular nature of the defect. Here, we will categorize mutations primarily on the basis of their molecular nature, but we will also encounter some terms that relate to the causes and the phenotypic effects of mutations. Base Substitutions https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 2/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases The simplest type of gene mutation is a base substitution, the alteration of a single nucleotide in the DNA (Figure 18.2a). There are two types of base substitutions. In a transition, a purine is replaced by a different purine or, alternatively, a pyrimidine is replaced by a different pyrimidine (Figure 18.3). In a transversion, a purine is replaced by a pyrimidine, or a pyrimidine is replaced by a purine. Figure 18.2 The number of possible transversions (see Figure 18.3) is twice the number of possible transitions, but transitions arise more frequently because transforming a purine into a different purine or a pyrimidine into different pyrimidine is easier than transforming a purine into pyrimidine, or vice versa. Figure 18.3 Insertions and Deletions Another class of gene mutations contains insertions and deletions (collectively called indels)—the addition or removal, respectively, of one or more nucleotide pairs (Figure 18.2b and c). Although base substitutions are often assumed to be the most common type of mutation, molecular analysis has revealed that insertions and deletions are often more frequent. Insertions and deletions within sequences that encode proteins may lead to frameshift mutations, and changes in the reading frame of the gene. Frameshift mutations usually alter all amino acids encoded by nucleotides following the mutation, and so they generally have drastic effects on the phenotype. Some frameshifts also https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 3/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases introduce premature stop codons, terminating protein synthesis early and resulting in a shortened (truncated) protein. Not all insertions and deletions lead to frameshifts, however; insertions and deletions consisting of any multiple of three nucleotides will leave the reading frame intact, although the addition or removal of one or more amino acids may still affect the phenotype. Indels not affecting the reading frame are called in-frame insertions and in-frame deletions. Deletion and Duplication of Whole Genes Some deletions and insertions can affect an entire gene. Insertions involving an entire gene are termed duplications. These kinds of variants are expected to decrease or increase the amount of gene product proportionally to the change in gene number. For most genes, copy number changes are abnormal and often pathogenic. A gene is called dosage-sensitive if a 50% decrease or increase in copy number (having 1 or 3 copies of a gene that is normally present in 2 copies) causes a phenotypic change. Duplications are less likely than deletions to be pathogenic. https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 4/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Effects of Mutations on Proteins At the most general level, we can distinguish a mutation on the basis of its phenotype compared with the wild-type phenotype. A mutation that alters the wild-type phenotype is called a forward mutation, whereas a reverse mutation (a reversion) changes a mutant phenotype back into the wild-type. Another level at which mutations are classified is on the basis of their effects on the primary structure of proteins. Geneticists use specific terms to describe the effects of mutations on protein structure. A base substitution that results in a different amino acid in the protein is referred to as a missense mutation (Figure 18.6a). A nonsense mutation changes a sense codon (one that specifies an amino acid) into a nonsense codon (one that terminates translation), as shown in Figure 18.6b. If a nonsense mutation occurs early in the mRNA sequence, the protein will be truncated and usually nonfunctional. Figure 18.6 Because of the redundancy of the genetic code, some different codons specify the same amino acid. A silent mutation changes a codon to a synonymous codon that specifies the same amino acid (Figure 18.6c), altering the DNA sequence without changing the amino acid sequence of the protein. Not all silent mutations, however, are truly silent: some do have phenotypic effects. For example, silent mutations may have phenotypic effects when different tRNAs are used for different synonymous codons. Because some isoaccepting tRNAs are more abundant than others, one of the synonymous codons used may affect the rate of protein synthesis. The rate of protein synthesis can influence the phenotype by affecting the amount of protein present in the cell and, in a few cases, the folding of the protein. Other silent mutations can alter sequences near the exon-intron junctions that affect splicing. A neutral mutation is a missense mutation that alters the amino acid sequence of the protein but does not significantly change its function. Neutral mutations occur when one amino acid is replaced by another that is chemically similar or when the affected amino acid has little influence on protein function. For example, neutral mutations occur in the genes that encode hemoglobin; although these mutations alter the amino acid sequence of hemoglobin, they do not affect its ability to transport oxygen. Expanding Nucleotide Repeats Mutations in which the number of copies of a set of nucleotides increases are called expanding nucleotide repeats. This type of mutation was first observed in 1991 in a gene called FMR-1, which causes the fragile-X syndrome, the most common hereditary cause of intellectual disability. The disorder is so named because, in specially treated cells from persons having the condition, the tip of each long arm of the X chromosome is attached by a slender-appearing part of the chromosome (Figure 18.4). The normal FMR-1 allele (not containing the mutation) has 60 or fewer copies of CGG but, in persons with fragile-X syndrome, the allele may harbor hundreds or even thousands of copies. https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 5/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Figure 18.4 Expanding nucleotide repeats have been found in almost 30 human diseases, several of which are listed in Table 18.1. Most of these diseases are caused by the expansion of a set of three nucleotides (called a trinucleotide), most often CNG, where N can be any nucleotide. However, some diseases are caused by repeats of four, five, and even twelve nucleotides. The number of copies of the nucleotide repeat often correlates with the severity or age of onset of the disease. The number of copies of the repeat also correlates to the instability of nucleotide repeats: when more repeats are present, the probability of expansion to even more repeats increases. This association between the number of copies of nucleotide repeats, the severity of the disease, and the probability of expansion leads to a phenomenon known as anticipation, in which diseases caused by nucleotide-repeat expansions become more severe in each generation. Table 18.1 Increases in the number of nucleotide repeats can produce disease symptoms in different ways. In several diseases (e.g., Huntington's disease), the nucleotide expands within the coding part of a gene, producing a toxic protein that has extra glutamine residues (the amino acid encoded by CAG). In other diseases, the repeat is outside the coding region of a gene and affects its expression. In fragile-X syndrome, additional copies of the nucleotide repeat cause the DNA to become methylated, which turns off the transcription of an essential gene. Some evidence suggests that the expansion of nucleotide repeats occurs in the course of DNA replication and appears to be related to the formation of hairpins and other special DNA structures that form in single-stranded DNA consisting of nucleotide repeats. Such structures may interfere with normal replication by causing strand slippage, misalignment of the sequences, or stalling of replication. One model of how repeat-containing hairpins might result in repeat expansion is shown in Figure 18.5. https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 6/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Figure 18.5 Watch the following animation that shows the expansion mechanism of a CAG repeat: Modified from: Pierce. Genetics: A Conceptual Approach. Macmillan Higher Education, 2017. [Macmillan]. Loss and Gain of Function Mutations Loss-of-function mutations cause the complete or partial absence of normal protein function. A loss-of-function mutation alters the structure of the protein so that the protein no longer works correctly—or the mutation can occur in regulatory regions that affect the transcription, translation, or splicing of the protein. Loss-of-function mutations are frequently recessive: an individual diploid organism must be homozygous for a loss-of-function mutation before the effects of the loss of the functional protein can be exhibited. The mutations that cause cystic fibrosis are loss-of-function mutations: these mutations produce a nonfunctional form of the cystic fibrosis transmembrane conductance regulator protein, which normally regulates the movement of chloride ions into and out of the cell. In contrast, a gain-of-function mutation causes the cell to produce a protein or gene product whose function is not normally present. This could be an entirely new gene product or one produced in an inappropriate tissue or at an inappropriate time in development. For example, a mutation in a gene that encodes a receptor for a growth factor might cause the mutated receptor to stimulate growth all the time, even in the absence of the growth factor. Gain-of-function mutations are frequently dominant in their expression because a single copy of the mutation leads to the presence of a new gene product. Still, other types of mutations are conditional mutations, which are expressed https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 7/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases only under certain conditions. For example, some conditional mutations affect the phenotype only at elevated temperatures. Others are lethal mutations, causing premature death. https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 8/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Polymorphisms in the Human Genome Alternative versions of the DNA sequence at a locus are called alleles. For many genes, there is a single prevailing allele, usually present in more than half of the individuals in a population, that geneticists call the wild-type or common allele (this is sometimes referred to as the "normal" allele). However, because genetic variation is itself very much "normal," the existence of different alleles in "normal" individuals is commonplace. Thus one should avoid using "normal" to designate the most common allele. The other versions of the gene are variant (or mutant) alleles that differ from the wild-type allele because of the presence of a mutation, a permanent change in the nucleotide sequence, or the arrangement of DNA. The frequency of different variants can vary widely in different populations around the globe. If there are two or more relatively common alleles (defined by convention as having an allele frequency > 1%) at a locus in a population, that locus is said to exhibit polymorphism (literally "many forms") in that population. Most variant alleles, however, are not frequent enough in a population to be considered polymorphisms; some are so rare as to be found only in individual families and are known as "private" alleles. The DNA sequence of a given region of the genome is remarkably similar among chromosomes carried by many unrelated individuals from distinct populations. In fact, any randomly chosen segment of human DNA approximately 1000 bp in length contains, on average, only one base pair that is different between the two homologous chromosomes inherited from that individual's parents (assuming the parents are unrelated). However, across all human populations, many tens of millions of single nucleotide differences and over a million more complex variants have been identified and cataloged. As many populations around the globe have yet to be studied, and, even in the populations that have been studied, the number of individuals examined is too small to reveal most variants with minor allele frequencies below 1% to 2%. Whether a variant is formally considered a polymorphism or not depends on whether its frequency in a population exceeds 1% of the alleles in that population, and not on what kind of mutation caused it, how large a segment of the genome is involved, or whether it has a demonstrable effect on the individual. The location of a variant with respect to a gene also does not determine whether the variant is a polymorphism. Although most sequence polymorphisms are located between genes or within introns and are inconsequential to the functioning of any gene, others may be located in the coding sequence of genes themselves and result in different protein variants that may lead in turn to distinctive differences in human populations. Polymorphisms are key elements for the study of human and medical genetics. The ability to distinguish different inherited forms of a gene or different segments of the genome provides critical tools for a wide array of applications, both in research and in clinical practice (see Box below). https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 9/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases In this section, we begin by exploring the nature of these genomic variants, ranging from the change of a single nucleotide to alterations of several hundred thousand nucleotides. Inherited Variation and Polymorphism in DNA DNA polymorphisms can be classified according to how the DNA sequence varies between the different alleles see table 4-2 below: https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 10/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Single Nucleotide Polymorphisms The simplest and most frequent of all polymorphisms are single nucleotide polymorphisms (SNPs). A locus characterized by a SNP usually has only two alleles, corresponding to the two different bases occupying that particular location in the genome (see Figure 4-1). As mentioned previously, SNPs are common and are observed on average once every 1000 bp in the genome. However, the distribution of SNPs is uneven around the genome; many more SNPs are found in non-coding parts of the genome, in introns, and in sequences that are some distance from known genes. Nonetheless, there is still a significant number of SNPs that do occur in genes and other known functional elements in the genome. For the set of protein-coding genes, over 100,000 exonic SNPs have been documented to date. Approximately half of these do not alter the predicted amino acid sequence of the encoded protein and are thus termed synonymous, whereas the other half do alter the amino acid sequence and are said to be nonsynonymous. Other SNPs introduce or change a stop codon, and yet others alter a known splice site; such SNPs are candidates to have significant functional consequences. https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 11/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Figure 4-1 The significance of the vast majority of SNPs is unknown and is the subject of ongoing research. The fact that SNPs are common does not mean that they are without effect on health or longevity. What it does mean is that any effect of common SNPs is likely to involve a relatively subtle altering of disease susceptibility rather than a direct cause of serious illness. Insertion-Deletion Polymorphisms The second class of polymorphism is the result of variations caused by insertion or deletion (or simply indels) of anywhere from a single base pair up to approximately 1000 bp, although larger indels have been documented as well. Over a million indels have been described, numbering in the hundreds of thousands in any one individual's genome. Approximately half of all indels are referred to as "simple" because they have only two alleles—that is, the presence or absence of the inserted or deleted segment (Figure 4-1). Microsatellite Polymorphisms Other indels, however, are multiallelic due to variable numbers of the segment of DNA that is inserted in tandem at a particular location, thereby constituting what is referred to as a microsatellite. They consist of stretches of DNA composed of units of two, three, or four nucleotides, such as (TG)n, (CAA)n, or (AAAT)n, where n is the number of the repeated units. Microsatellites are made up of one to a few dozen repeats at a particular site in the genome (see Figure 4-2). The different alleles in a microsatellite polymorphism are the result of differing numbers of repeated nucleotide units contained within any one microsatellite and are therefore sometimes also referred to as short tandem repeat (STR) polymorphisms. https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 12/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Figure 4-2: Clockwise from upper right: The microsatellite locus has three alleles, with four, five, or six copies of a CAA trinucleotide repeat. The inversion polymorphism has two alleles corresponding to the two orientations (indicated by the arrows) of the genomic segment shown in green; such inversions can involve regions up to many megabases of DNA. Copy number variants involve deletion or duplication of hundreds of kilobase pairs to over a megabase of genomic DNA. In the example shown, allele 1 contains a single copy, whereas allele 2 contains three copies of the chromosomal segment containing the F and G genes; other possible alleles with zero, two, four, or more copies of F and G are not shown. The mobile element insertion polymorphism has two alleles, one with and one without insertion of an approximately 6 kb LINE repeated retroelement; the insertion of the mobile element changes the spacing between the two genes and may alter gene expression in the region. A microsatellite locus often has many alleles (repeat lengths) that can be rapidly evaluated by standard laboratory procedures to distinguish different individuals and infer familial relationships (Figure 4-3). Many tens of thousands of microsatellite polymorphic loci are known throughout the human genome. Figure 4-3: The different-sized alleles (numbered 1 to 7) correspond to fragments of genomic DNA containing different numbers of copies of a microsatellite repeat, and their relative lengths are determined by separating them by gel electrophoresis. The shortest allele (allele 1) migrates toward the bottom of the gel, whereas the longest allele (allele 7) remains closest to the top. Left, For this multiallelic microsatellite, each of the six unrelated individuals has two different alleles. Right, Within a family, the inheritance of alleles can be followed from each parent to each of the three children. Variation in Individual Genomes The most extensive current inventory of the amount and type of variation to be expected in any given genome comes from the direct analysis of individual diploid human genomes. https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 13/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases The first of such genome sequences, that of a male individual, was reported in 2007. Now, tens of thousands of individual genomes have been sequenced, some as part of large international research consortia exploring human genetic diversity in health and disease, and others in the context of clinical sequencing to determine the underlying basis of a disorder in particular patients. Individual human genomes typically carry 5 to 10 million SNPs, of which—depending in part on the population—as many as a quarter to a third are novel (See Box below). Within this variation lie variants with known, likely, or suspected clinical impact. Based on studies to date, each genome carries 50 to 100 variants that have previously been implicated in known inherited conditions. In addition, each genome carries thousands of nonsynonymous SNPs in protein-coding genes around the genome, some of which would be predicted to alter protein function. Each genome also carries approximately 200 to 300 likely loss-of-function mutations, some of which are present in both alleles of genes in that individual. Within the clinical setting, this realization has important implications for the interpretation of genome sequence data from patients, particularly when trying to predict the impact of mutations in genes of currently unknown function. An interesting and unanticipated aspect of individual genome sequencing is that the reference human genome assembly still lacks considerable amounts of undocumented and unannotated DNA that are discovered in literally every individual genome being sequenced. These "new" sequences are revealed only as additional genomes are sequenced. Thus the complete collection of all human genome sequences to be found in our current population of 7 billion individuals, estimated to be 20 to 40 Mb larger than the available reference assembly, still remains to be fully elucidated. https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 14/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Disease-Causing Mutations Most types of single-base variants described in this module are found to be the causative mutation in human diseases. Disease-causing mutations are typically heterogeneous among non-related individuals affected by the same condition. Different cases of a particular disorder will therefore usually be caused by distinct underlying mutations. Table 4-4 below lists the different types of mutations found in human diseases and their prevalence Table 4-4 Genotype-Phenotype Correlation Predicting the clinical manifestations associated with specific DNA variants is the ultimate goal of molecular pathology. However, the processes between DNA sequences change and a patient's symptoms are too complex to allow precise correlations. Additionally, gene products function in concert with other molecules in the cell and in the context of the general biochemical environment of an individual. Nevertheless, some general relationship between the nature of a mutation and the effect on the gene's function can be observed. A summary of such effect is provided in table 6.3 below https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 15/16 2/24/24, 8:20 PM Module 3-Polymorphisms and Mutations in Human Diseases Table 6.3 https://www.softchalkcloud.com/lesson/files/UVsrhtS1g3nemI/lesson_print.html 16/16