Basic Genetics PDF
Document Details
Uploaded by ChasteAsh
Universiti Malaya
Karlla Welch Brigatti
Tags
Summary
This document is an introduction to concepts of genetics and genomics. It discusses the human genome, its structure, and function. It also covers the flow of genetic information from DNA to RNA to protein, and introduces the concept of genetic variation.
Full Transcript
CHAPTER Introduction to concepts of genetics and genomics 1 Karlla W...
CHAPTER Introduction to concepts of genetics and genomics 1 Karlla Welch Brigatti Clinic for Special Children, Strasburg, PA, United States 1.1 Introduction Genetics is broadly considered as the study of biological inheritance and traits, whereas the totality of the genetic information of an organism is known as the genome. The large-scale study of the information contained in the genome is referred to as genomics. While genomic studies began with the sequencing of the whole genome of the bacteria Haemophilus influenzae in 1995, the completion of the first draft of the human genome reference sequence by the Human Genome Project (HGP) in 2003 (see Chapter 4: Genomic Sequencing of Rare Diseases) and the continued advances in molecular biology, biochemistry, biophysics, computational sciences, and biotechnology ushered in a new discipline of study, human genomics. The breathtaking advancements of human genomics in the last decades have allowed scientists to understand the human genome, and its variation, to a precise and unprecedented degree, further enabling the application of this knowledge into clinical genomics. Bioinformatics is the interdisciplinary field of biology and computer science that analyzes complex genomic information to interpret biological variant data and predict gene function or dysfunction. 1.2 The human genome: structure and function With the exception of mature red blood cells and cornified hair cells, all cells in the human body contain a nucleus that houses the majority of the human genome (nuclear genome). A much smaller genome can also be found elsewhere in the cell within the mitochondria (mitochondrial genome), the organelles responsible for producing the energy needed for cell function. Disruptions, or pathogenic variation, to either genome can lead to human disease. The genetic information of the human genome is maintained as deoxyribonucleic acid (DNA), a double-stranded macromolecule bound together in stable form as a double helix. Each DNA strand is made of a sugar and phosphate backbone coupled to a sequence of bases in one of four versions: adenine (A), cytosine (C), guanine (G), and thymine (T). These bases pair with one another across the two strands by hydrogen bonds in a prescribed WatsonCrick complementary base-pairing fashion: guanine on one strand pairs with cytosine on the other, as do adenine and thymine. This strict pairing of base pairs makes the sequence of one strand represent the reverse complement of Genomics of Rare Diseases. DOI: https://doi.org/10.1016/B978-0-12-820140-4.00009-0 © 2021 Elsevier Inc. All rights reserved. 1 2 Chapter 1 Introduction to concepts of genetics and genomics the sequence on the opposite strand. Approximately six billion base pairs make up the diploid nuclear genome, which consist of two sets of chromosomes and genes, and consequently double the three billion base pairs that comprise the haploid reference human genome sequence (see Chapter 4: Genomic Sequencing of Rare Diseases). Long stretches of DNA are organized, supported, and packaged into 46 rod-shaped structures called chromosomes within the nucleus of the cell, arranged in 23 homologous pairs of matching DNA sequence. Twenty-two of these pairs are named and numbered from 1 to 22, according to size and relative to DNA and fraction of genome content with chromosome 1 being the largest and chromosomes 21 and 22 the smallest. These 22 pairs of chromosomes are called autosomes and are the same in males and females. The 23rd pair makes up the sex chromosomes: two X chromosomes are found in females, while an X and Y chromosome pair is found in biologically male individuals. The study of chromosomes, their structure, and their inheritance is known as cytogenetics. The chromosomal complement in a given cell is the karyotype, which also refers to the photographic arrangement of the magnified chromosome pairs after specific preparation and under certain staining conditions. Karyotyping and its role in rare disease discovery and diagnosis are explored further in Chapter 2, Karyotyping as the First Genomic Approach. The genome is characterized by long stretches of noncoding DNA sequences interspersed with smaller sequence content (coding DNA) that make up genes. Genes contain the instructions to direct the production of proteins or ribonucleic acid (RNA) products necessary for cells to perform their given function (Fig. 1.1). The human genome contains about 25,000 genes. Genes vary in length, but they share similar characteristics that relate to their function and help differentiate them from the surrounding noncoding sequence to computationally annotate and map them on the genome. Structurally, genes are composed by different regions and recognizable sequence features. Exons are regions of DNA and the parts of a gene that determine the amino acid sequence of its protein product. Conversely, introns are regions of noncoding sequence separating exons from one another that are eliminated from the mature messenger RNA (mRNA) after transcription. In addition, a sequence of noncoding DNA known as the promoter can be found adjacent to the beginning of a gene (classically defined as the 50 end) and acts as the region where certain regulatory proteins will bind sequence elements or motifs to enhance gene expression or silence it altogether. Alterations to the canonical DNA sequence, in the form of mutations or variants, in any of these structural elements of genes can disrupt normal function and expression of the gene, leading to human disease. The majority of variants currently associated with genetic conditions are found in the exons, which make up only about 1% of the haploid human genome and are maintained or constrained by selection and evolution; the aggregate sum of all exons is known as the exome. Following the HGP, the development of massively parallel sequencing technologies, also known as next-generation sequencing (NGS), enabled the rapid sequencing of millions of short DNA fragments in parallel, significantly reducing the cost of sequencing individual human genomes. The main applications of NGS in rare disease have focused on sequencing the protein-coding fraction of the genome through whole-exome sequencing (WES) together with whole-genome sequencing (WGS), which involves sequencing the totality of the DNA in the human genome, including the nonprotein- coding regions; these technologies are explored in-depth in Chapter 4, Genomic Sequencing of Rare Diseases. Both techniques are commonly used in genetics research and clinical genetics settings to identify and investigate the rare variant potential contribution or genetic etiology to the clinical presen- tation of a disorder under investigation, thereby rendering a molecular diagnosis. 1.2 The human genome: structure and function 3 FIGURE 1.1 Gene expression through transcription and translation, simplified in a hypothetical gene containing four exons and three introns. During transcription in the nucleus, the DNA sequence of a gene is used as a template to produce a pre-mRNA transcript that includes introns and exons. The four bases of DNA are shown in exon 2. In mature mRNA, the introns are spliced out, such that the coding sequence is continuous. The mRNA moves to the cytoplasm for translation, where ribosomes attach to the mRNA template and protein synthesis occurs. Specific amino acid tRNA molecules bind to the mRNA as determined by the sequence of mRNA codons, groups of three mRNA bases that correspond to one of 20 amino acids or three stop codons. A peptide bond forms between the growing amino acid chain until a stop codon is reached and the sequence is released. The polypeptide chain undergoes folding and posttranslation modifications to become a functional protein. DNA, Deoxyribonucleic acid; mRNA, messenger RNA; tRNA; transfer RNA. 4 Chapter 1 Introduction to concepts of genetics and genomics The flow of genetic information from DNA to RNA to protein product is known as the “central dogma” of molecular biology, and it can be predicted by scientists thanks to the elucidation and understanding of the genetic code, which establishes the rules of translation via the three base sequence or triplet code, from DNA sequence to amino acid composition of proteins. RNA is the mechanism for expression of the genetic information stored in the DNA toward the cell machinery to process and produce bioactive molecules in the form of proteins or noncoding RNA. RNA is similar to DNA, except that the sugar backbone is ribose, the thymine base is replaced by uracil (U), and RNA is single-stranded instead of double-stranded like the DNA double helix. When the product of a particular gene is needed, that portion of DNA containing the gene will unwind, and through a process known as transcription, a single strand of complementary RNA is generated, and the intronic sequences spliced out to produce a mature mRNA. Transcription takes place in the nucleus, where the DNA resides. Then, the mRNA moves from the nucleus to the intracellular cytoplasm, where organelles known as ribosomes utilize that mRNA template for protein synthesis through a process known as translation. During translation, the ribosome moves along the mRNA strand and binds the mRNA template to a second type of RNA known as transfer RNA (tRNA) that joins together specific amino acids as determined by three consecutive mRNA bases (known as codons) whose sequence encodes one of 20 possible corresponding amino acids (Table 1.1). The genetic code is said to be “degenerate” in that most amino acids are encoded by more than one codon. The standard start codon for translation of a gene is “AUG,” which encodes the amino acid methionine (Met or M), and estab- lishes the reading frame for the ribosome to follow, adding corresponding amino acids in a polypep- tide chain. The translation complex halts the process of protein production once it reaches a stop codon (encoded by one of the three codons UAA, UAG, or UGA), and the completed polypeptide is released from the ribosome for posttranslational modification. This process is illustrated in Fig. 1.1. As previously mentioned, in addition to the genome housed in the nucleus of cells (nuclear genome), human cells also contain another smaller genome that resides within the energy-producing organelles of the cells, the mitochondria. The mitochondrial genome (mtDNA) is made up of a little over 16,000 DNA bases arranged in a circle that contains two related promoter sequences, one for each strand, which are transcribed in their entirety. All cells contain multiple mitochondria, each of which has several copies of their mitochondrial genome. The 37 genes encoded by the mtDNA are specific to the structure and function of the mitochondria itself, which are integral to the production of cellular energy. Unlike the nuclear genome, the mitochondrial genome is inherited only through the maternal line, as the sperm cell contributes no mitochondria during conception. A change in the mtDNA that alters the production of proteins necessary to meet the energy requirements of the cell can cause mitochondrial disease, often affecting the organs with high energy requirements, such as the brain, heart, eyes, and skeletal muscles. Nuclear genes also contribute to mitochondrial function, so mitochondrial disorders can result from alterations in nuclear or mitochondrial genes. Mitochondrial disorders are examined in depth in Chapter 7, X-linked and Mitochondrial Disorders. Replication of nuclear DNA occurs during cell division of somatic cells in a process known as mitosis, in which two genetically identical daughter cells are produced from the original parent cell, and which maintains the diploid (46) chromosomal content. Meiosis is the biological process of germ cell production. It is specific to the cells of the reproductive system and results in four haploid gametes (23 chromosomes each) that are genetically unique from each other and to the parent cell, due to the process of meiotic recombination. During fertilization, the egg and sperm join together and the full chromosomal complement is restored. The biological significance of these two important processes is ensuring the constancy of genetic information from one generation to the next and promoting genetic diversity. 1.3 Genetic variation 5 Table 1.1 The genetic code determines the translation of the DNA sequence encoded in genes into the corresponding sequence of amino acids to produce proteins. During the replication of DNA, either in mitosis or meiosis, changes in the DNA can occur, known as mutations. Several cellular proofreading and repair processes exist to ensure the integrity of the nuclear genome and fidelity of the code, though changes can sometimes escape detection. Certain exposures, such as ionizing radiation, can also increase the rate of mutation. Additionally, other biological factors in humans may contribute to an increased rate of mutation in offspring, such as advanced maternal age for chromosome aneuploidies like trisomy 21, commonly known as Down syndrome (see Chapter 2: Karyotyping as the First Genomic Approach), and advanced paternal age for single gene defects (see Chapter 6: Dominant and Sporadic De Novo Disorders). 1.3 Genetic variation DNA sequence variation is a constant feature of both germ and somatic cells and can occur on a scale varying from single DNA nucleotide changes to deletions or duplications of entire 6 Chapter 1 Introduction to concepts of genetics and genomics chromosomes. The genetic information and variation encoded in the DNA combined with environ- mental influences determine individual characteristics and susceptibility to disease, and together make up the clinical characteristics or traits known as phenotype. The effect of DNA variation on gene expression and ultimately phenotype often depends on where the change occurs, for example, changes that occur in genes can ultimately alter proteins, whereas when DNA alterations happen in the noncoding regions of the genome, they tend to have no obvious or strong effect on cellular function. Such silent or subtle changes are generally considered to be benign polymorphisms. Some of these benign polymorphisms can also occur in coding sequences of genes but if they do not confer a deleterious effect to the cell, they can be passed on by generations contributing to common variation in the human population and to traits such as hair, skin, or eye color. Sequence variants that change one nucleotide in the DNA sequence and that differ between individuals or even humans and other species are known as single-nucleotide polymorphisms (SNPs) and occur quite frequently across the human genome (see Chapter 4: Genomic Sequencing of Rare Diseases). SNPs have been extensively studied in association with disease and drug response. When certain DNA changes occur in introns, exons, promoters, or span entire genes or chromosomes, they can abolish or alter the normal function of the encoded proteins and consequently exert a profound phenotypic effect. These changes are often referred to as deleterious variants or mutations, depending on whether they have been observed in other individuals in the population or they occurred as a new event in a given person, respectively. Single base-pair mutations (also referred to as single or simple nucleotide variants, SNVs) within the exon can alter the coding sequence, as illustrated in Fig. 1.2. Synonymous variants, also called silent mutations, occur when the single base pair substitution maintains the same amino acid, due to the degenerate nature of the genetic code, and consequently do not alter the final protein product. Nonsynonymous or missense mutations cause a codon change from one amino acid to another. A missense mutation may not exert a strong phenotypic effect if the new amino acid shares similar physicochemical properties to the original conserved amino acid at that position or occurs at a nonessential site along the protein. Other missense mutations ultimately alter the protein con- figuration or enzymatic function and may introduce novel properties that exert a deleterious effect or change an important one rendering the protein inefficient or nonfunctional. Nonsense mutations result from substitutions that introduce a premature stop codon in the mRNA sequence. If the non- sense mutation occurs early in the transcribed mRNA, the cell can identify the abnormal location of the stop codon and dispose of the defective transcript through a mechanism known as nonsense- mediated decay (NMD), which effectively leads to the destruction of the mRNA and the absence of a protein product resulting in a loss-of-function (LoF). Conversely, if the nonsense mutation occurs later toward the end of the transcript, the mRNA can escape NMD and go on to be transcribed into a truncated and nonfunctional protein product. In some instances, this truncated form of the protein, although unable to perform normal biological functions, can act as a toxic protein product that interferes with other proteins it may interact with, causing disease through a dominant negative effect (see Chapter 6: Dominant and Sporadic De Novo Disorders). Frameshift mutations are caused by the insertion or deletion (indels) of one to a few nucleotides by a number nondivisible by three. This disrupts the reading frame such that all ensuing DNA sequence is transcribed incorrectly and the improper amino acids are incorporated during translation from the location where the indel occurred. A frameshift mutation may also introduce a premature stop codon resulting in a nonfunctional and truncated protein product or leading to degradation of the mRNA through NMD. 1.3 Genetic variation 7 FIGURE 1.2 Types of mutations that can occur to the reference sequence of nucleotide base pairs and their effect on the resulting protein. When nucleotides are inserted or deleted in the DNA sequence by multiples of three, the reading frame is conserved, although amino acids may be added or missing from the final protein product; these mutations are known as nonframeshifting or in-frame mutations. The addition or deletion of in-frame amino acids can sometimes occur in regions of the protein that are important for proper function or affect amino acids essential for enzymatic or catalytic functions. Lastly, splice site mutations occur at the junctions between exons and introns and may cause exons to be removed or intronic sequence to remain in the mature mRNA, altering the amino acid sequence and exerting a functional effect on the gene product. Copy number variants (CNVs) are a class of structural variation (SV), meaning variation that modifies the architecture of the genome, involving alterations in the number of copies of specific regions of DNA, which can either be deleted or duplicated (see Chapter 3: Genomic Disorders in the Genomics Era). These involve large stretches of DNA varying from thousands of base pairs to 8 Chapter 1 Introduction to concepts of genetics and genomics segments or entire chromosomes (chromosomal aneuploidy; see Chapter 2: Karyotyping as the First Genomic Approach). Some large CNVs do not have any impact on gene function, while other small ones can exert a strong effect by removing sections of a coding gene or altering the expression or dosage of a given gene. While large CNVs can be evident on a karyotype, changes smaller than 35 Mb are below the resolution of chromosome studies; therefore the most common and precise technique in use for identifying submicroscopic CNVs is chromosomal microarray analysis (CMA). As discussed in depth in Chapter 3, Genomic Disorders in the Genomics Era, CMA will not identify balanced rearrangements of genetic material, such as balanced translocations, where different chromosomal segments can be joined together. Intrachromosomal submicroscopic inversions, although copy number neutral, can alter the normal expression of genes or disrupt those that occur at the breakpoint of the genomic rearrangement. Even if a balanced translocation maintains the full genetic complement, it can lead to abnormalities in copy number during meiosis and introduce CNVs in the gametes. Implementation of genomic sequencing technologies is allowing better detection and characterization of CNVs and SV in human genomes. 1.4 Nomenclature in human genetics and genomics The Human Variation Genome Society (www.hgvs.org) maintains the standards for consistent nomenclature for the description of sequence variations and gene names. Human genes are named using symbols designated by the Human Gene Nomenclature Committee (HGNC) and are generally capitalized and italicized in print (e.g., SMN1 is the name for the survival of motor neuron 1 gene); while the protein product of the gene uses a nonitalicized symbol (e.g., SMN1 is the survival of motor neuron 1 protein). Various symbols and abbreviations are used to refer to designated variants or changes to the reference sequence and their impact on different molecules. References to particular molecules generally use the RefSeq database maintained by the National Center for Biotechnology Information. A table of common abbreviations and nomenclature conventions is found in Table 1.2 below. Table 1.2 Common abbreviations used for nomenclature of genetic variants. Abbreviation Interpretation Example “g.” Refers to a genomic reference sequence NC_000009.12: Followed by genomic coordinates including chromosome and position g.114195977G. C of a given nucleotide or variant. If multiple assemblies, the version of the reference assembly should be included “c.” Refers to a coding DNA reference sequence NM_032888 Generally given in reference to a particular transcript isoform. (COL27A1): Followed by the position within the coding sequence and the c.2089G. C nucleotide or variant “p.” Refers to a protein reference sequence NP_116277 Generally given in reference to a particular protein isoform. Followed (COL27A1):p. by the position within the protein sequence and the amino acid change Gly697Arg using the 3-letter code 1.5 Mendelian patterns of inheritance 9 A commonly used resource in human genetics is the Online Mendelian Inheritance in Man (OMIM) database (www.omim.org). OMIM is a continuously updated comprehensive compendium of human genes and phenotypes with a presumed genetic basis. It focuses on the relationships between phenotype and genotype and documents established gene-disease associations of so-called Mendelian disorders, based on literature review and curation. Throughout this book, we refer to many different genetic disorders by name and also by acronyms and provide their designated six- digit identification number or MIM number. The reader can then look-up such disorders of interest in OMIM using these unique identifiers to learn more about their clinical features and associated information. The # symbol prior to the MIM numbers of genetic disorders referenced throughout indicates that the molecular basis or gene affected in that disorder has been identified and documented in the scientific literature. From a clinical perspective, variant classification is primarily based on the predicted impact of the variant on disease expression. A framework for these designation criteria of individual variants as disease-causing (pathogenic) or for variants that do not result in disease expression (benign) has been established by The American College of Medical Genetics and Genomics (ACMG), based on factors such as predicted functional effect, population frequency, evolutionary conservation, segre- gation of the allele among related individuals, or laboratory studies. As these factors may not all be known or may even conflict with their assertions, variants can also be classified as “likely patho- genic,” “likely benign,” or simply as a “variant of unknown significance.” These designations can vary between laboratories and often change based on new evidence to support one assertion over another. The open-access ClinVar database (www.ncbi.nlm.nih.gov/clinvar/) aggregates and main- tains these interpretations from variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and supporting data utilized in that designation. 1.5 Mendelian patterns of inheritance The discipline of medical genetics traditionally has focused on chromosomal syndromes and Mendelian disorders, the latter group caused by single gene mutations exhibiting a significant effect on gene function and responsible for the phenotype of the condition. These mutations are rare in the general population due to selection against them, as they often impact health and potentially reduce the ability to reproduce. In fact, over 80% of Mendelian disorders described to date have a pediatric-onset phenotype, though Mendelian disorders that manifest in adulthood are increasingly appreciated in the clinic. Sometimes pediatric versus adult-onset can relate to the severity of a gene variant allele/mutation and its manifestation over time. “Mendelian disorders” were named as such because they segregate in families in a way that follows Gregor Mendel’s laws of segregation and independent assortment. The law of segregation observes that each parent passes one of two versions of any given gene (known as alleles) to their offspring. The law of independent assortment follows that any two separate genes are passed inde- pendently of one another to offspring. Put simply, generally one copy or allele of each of the two genes in the diploid genome is inherited from each parent. Exceptions to this rule are genes found on the sex chromosomes and those physically close to one another on a given chromosome, which 10 Chapter 1 Introduction to concepts of genetics and genomics are often transmitted together (a phenomenon known as linkage disequilibrium). A genotype is a particular combination of alleles for a particular gene or locus. In Mendelian genetics, the trait or disease in question is transmitted through specific modes of inheritance depending on the expression of the phenotype and the location of the gene. These Mendelian inheritance patterns include autosomal recessive, autosomal dominant, X-linked recessive, and X-linked dominant, and they are the subjects of Chapters 5, 6, and 7 in this book. One of the main tools in medical genetics to help clarify the mode of inheritance for a potential Mendelian disorder is a three-generation family tree or pedigree. This includes information about the index patient (or proband), immediate and extended relatives, both living and deceased, as well as infertility or pregnancy loss, adoption, ethnic background, or consanguinity. Examples of the symbols used in constructing a pedigree are found in Fig. 1.3. Additionally, certain patterns observed through constructing family pedigrees can help distinguish the different inheritance pat- terns. In general, autosomal conditions affect males and females equally, though some may be sex- limited due to other genetic and nongenetic factors. Autosomal dominant inheritance can be observed to follow so-called “vertical transmission,” where affected individuals appear in every generation; whereas autosomal recessive disorders show “horizontal transmission” with affected individuals showing in alternate generations. X-linked recessive disorders are classically recognized by the presence of multiple affected males in the pedigree born to unaffected carrier females. In autosomal recessive conditions, the phenotype is expressed only in individuals who harbor mutations in both their maternal and paternal alleles (i.e., biallelic pathogenic variation); they are said to be homozygous when both alleles represent the same mutation in the gene and compound heterozygous when each allele of the gene has a different mutation. In general, autosomal recessive conditions arise from LoF or hypomorphic mutations, where gene function is eliminated or reduced, respectively, rather than altered function as in dominant disorders described below. Heterozygous carriers are generally asymptomatic because they have one allele that functions appropriately, compensating for the decreased or absent function of the mutant one. Two heterozygous parents for a recessive condition will have a 25% risk for each conception they have together of being affected by the genetic condition, and clinically normal siblings of an affected individual have a 67% chance of being carriers like their parents. For this reason, affected individuals appear to “skip generations” in the pedigree. Recessive conditions are explored in more detail in Chapter 5, Recessive Diseases and Founder Genetics, especially as they relate to founder populations, where a shared genetic background in genetic isolates enriches for recessive alleles within regions of homozygosity in the genome, and manifest in a higher prevalence of population-specific recessive disorders. One of the characteristics of autosomal dominant disorders is the evidence of phenotype in successive generations, affecting both males and females. In dominant conditions, a single copy (heterozygous, also referred to as monoallelic at the genetic locus) of the disease-associated mutation or variant is sufficient to cause the disease, even when the second allele is normal. Homozygotes for the dominant phenotype may be more severely affected than heterozygotes, or a homozygous state for a dominant mutation may be incompatible with life altogether. As heterozygous individuals for dominant diseases have a 50% chance of passing on the mutant allele, each conception in a couple in which one member is heterozygous for a dominant disease mutation has a 50% chance of being affected. Autosomal dominant conditions can be the result of gain-of-function (GoF) mutations in genes, altering the canonical function of the encoded protein; haploinsufficiency, in which the LoF of one allele and the associated decrease in protein production is enough to cause 1.5 Mendelian patterns of inheritance 11 FIGURE 1.3 Pedigree symbols and conventions and examples of pedigrees demonstrating the different types of Mendelian inheritance patterns. These examples assume conditions that are fully penetrant and do not affect reproductive fitness. 12 Chapter 1 Introduction to concepts of genetics and genomics the disease phenotype; or dominant negative effects, where the mutant protein “poisons” the cellular machinery. In pedigrees with no evident family history, dominant disorders often arise as sporadic conditions associated with de novo mutation in a germ cell (egg or sperm) of one of the parents or in the fertilized egg itself early in embryogenesis. In such cases, the recurrence risk for future affected offspring to couples without the constitutional mutation is low but difficult to assess; while the potential for independent spontaneous mutations in the same gene is on the order of 1:10,000 to 1:1,000,000, it is currently challenging to predict or determine what proportion of the unaffected parents’ gametes may contain the germline mutation. Dominant and sporadic disorders are the focus of Chapter 6, Dominant and Sporadic De Novo Disorders. In X-linked recessive disorders, the phenotype is more obvious in males due to being hemizygous for the gene on their single X chromosome inherited from their mothers, as they inherit the Y chromosome from their fathers. If a gene on the X chromosome has a mutation, those males will be affected. A female who has one mutant allele and one unaffected allele (heterozygous) may be only mildly affected (a manifesting heterozygote) or completely asymptomatic (carrier). However, she has a 50% chance to pass the mutant allele to any offspring. Each male offspring has a 50% chance of being affected and each female offspring has a 50% probability of being a carrier like her mother. Males who inherit the mutation will pass it to all their daughters and none of their sons. In contrast, X-linked dominant disorders are often lethal or very severe in males, and so appear predominantly in affected females. While those mutant alleles can be passed to successive generations, X-linked dominant disorders are often associated with reduced reproductive fitness due to the severity of their phenotypes. X-linked disorders are discussed in detail in Chapter 7, X-linked and Mitochondrial Disorders. Other forms of inheritance do not follow canonical Mendelian inheritance patterns. As mentioned earlier, multiple copies of the mitochondrial genome are inherited from the mother to her offspring through the egg, and this specific feature of the mitochondria creates distinctive features for mitochondrial disease inheritance. Mitochondrial replication does not adhere to the tightly controlled segregation of chromosomes that takes place in the nuclear genome, but randomly distributes mtDNA copies between two daughter cells during cell division through a process known as replicative segregation. When a mutation is present in the mtDNA, replicative segregation causes differential distribution of the mutation carrying mtDNA in daughter cells. Heteroplasmy is observed when a proportion of the mitochondria in a cell have the mutation but others do not; while a daughter cell with all normal or all mutant mtDNA is said to be homoplasmic. These properties contribute to the variable expressivity and reduced penetrance characteristic of these disorders. Furthermore, they complicate recurrence risk estimation for offspring of heteroplasmic females, as it can vary considerably depending on the percent of mutant mitochondria passed through replicative segregation, something impossible to determine. Mitochondrial inheritance and its associated considerations are also discussed in Chapter 7, X-linked and Mitochondrial Disorders. 1.6 Other modes of inheritance Beyond the classical inheritance patterns of Mendelian disorders, rare genetic disorders that follow alternative modes of inheritance or special conditions have been described in the literature. As 1.6 Other modes of inheritance 13 genomic sequencing continues to be applied to the study of rare diseases, our understanding of classical and nonclassical modes of inheritance for genetic disorders expand. Repetitive DNA sequences constitute 30% of the human genome, including regions of short trinucleotide repeats near promoters and exons of protein-coding genes. In general, the number of these repeats varies somewhat from person to person and has no effect on gene expression. However, in a small number of genes, the triplet repeat number can expand during meiosis to become unstable and disrupt gene function. The larger the repeat, the more unstable and likely it will expand into the pathogenic range as it is passed from one generation to the next. There are now more than 40 known triplet repeat expansion disorders, including Fragile X (MIM #300624), myotonic dystrophy (MIM #160900), and Friedreich ataxia (MIM #229300). Triplet repeat expansion disorders share certain characteristics: they tend to be progressive and neurodegenerative in nature and exhibit genetic anticipation, a phenomenon in which the condition becomes more severe and earlier in onset with successive generations, depending in some cases on the parent of origin. In pedigrees, triple repeat expansion disorders will be observed to segregate dominantly, either autosomic or X-linked, depending on the chromosome containing the gene involved in the disease. Some disorders can be the result of mutations occurring in the embryo during early development, resulting in only a fraction of the cells in the individual having the genetic defect in a phenomenon known as mosaicism. Mosaicism is discussed in great detail in Chapter 8, Mosaicism in Rare Disease. About 1% of nuclear genes follow the conventions of Mendelian inheritance, but their expression is dependent on the parent of origin of the inherited allele, known as genomic imprinting. In genes that undergo imprinting, methyl groups are chemically attached (methylated) to specific segments of DNA in a sex-specific way during gametogenesis. This methylation phenomenon silences the expression of different genes in the maternal and paternal gametes. An individual normally has one active and one inactive copy of each imprinted gene, but imbalances in that process, through mutation or other mechanisms as discussed below, leading to two active or inactive copies of the imprinted gene will result in abnormalities in growth and development. The classic examples of imprinting defects are Angelman syndrome (AS, MIM #105830) and PraderWilli syndrome (PWS, MIM #176270). These two conditions are characterized by distinctly different phenotypes but associated with one another by the same imprinted region on chromosome 15. AS is paternally imprinted, such that the genes in the imprinted region inherited from the mother are the only expressed ones; the paternal equivalent is silenced through methylation. When the maternal UBE3A allele in this imprinted region of the chromosome is nonfunctional, either due to a gross deletion (70% of cases) or point mutation (11% of cases) of the UBE3A gene, any children who inherit the maternal mutant UBE3A will have AS, as they are rendered without a functional copy of that gene. Uniparental disomy, the inheritance of two copies of a chromosome from a single parent, or abnormalities in the imprinting center itself may also underlie the condition. Conversely, PWS is maternally imprinted, so the expression of the genes in the imprinted region of the maternal chromosome 15 is repressed. While most cases of PWS are due to a deletion of the region in the paternal chromosome (70%80% of cases), uniparental disomy of the maternal chromosome 15 generally accounts for the remaining ones. Indeed, PWS and AS display very different 14 Chapter 1 Introduction to concepts of genetics and genomics inheritance patterns in a pedigree. Uniparental disomy is further discussed in Chapter 2, Karyotyping as the First Genomic Approach. Certain rare disorders have been characterized that manifest in individuals harboring pathogenic variants as two distinct alleles from different genes; such that their disease phenotype is driven by heterozygous alleles in two different but related genes, comprising digenic inheritance. Unaffected parents have a 25% for each of their offspring to be affected, and while children of affected individuals also have a 25% risk of the condition, the inheritance of this condition will appear dominant within a pedigree. Certain types of retinitis pigmentosa, Usher syndrome, and BardetBiedl syndrome have been described in the medical literature with this more complex form of inheritance. Finally, genome-wide interrogation using NGS agnostic approaches such as WES or WGS has identified a growing number of complex genetic phenotypes that arise from two or more separate conditions within one person. Except in cases of linkage disequilibrium, these conditions segregate independently of one another. Recent estimates suggest that these occur in approximately 5% of individuals, the likelihood of which increases with consanguinity. Dual diagnoses and multilocus pathogenic variation and inheritance are the focus of Chapter 9, Multilocus Inheritance and Variable Disease Expressivity in Rare Disease. 1.7 Considerations of Mendelian disorders and genetic inheritance While a disease phenotype is often obvious, some additional variables and phenomena are worth considering that may influence phenotypic differences within affected individuals sharing the same disease genotype. Penetrance of a gene mutation or variant refers to whether individuals with a given genotype present the corresponding phenotype. When the disease manifests in all individuals with the disease genotype, it is said that there is complete penetrance; an example of this is achon- droplasia (MIM #100800). Penetrance may be age-dependent, such that the phenotype in a person with the mutation will be observed after certain age, such as in Huntington disease (MIM #143100). If some individuals with the disease-associated genotype do not develop features of the disorder (i.e., remain asymptomatic), it is said to have incomplete (or reduced) penetrance. Sometimes penetrance is expressed as a fraction of probability that individuals with the disease- associated genotype will exhibit the phenotype as well; one example is retinoblastoma (MIM #180200), in which about 10% of patients who harbor the heterozygous RB1 disease allele do not develop intraocular tumors. Additional genetic or environmental factors may influence penetrance of genetic disorders. A related and often confused genetic term with penetrance is variable expressivity, which refers to the range of phenotypic features (signs and symptoms of the disease) and differences in severity of the disorder among individuals who share the same disease-associated genotype. Some affected individuals may exhibit only a few features of a given condition, while other affected patients have more or different features of the condition. A condition with variable expressivity is Waardenburg syndrome type 4A (MIM #277580) due to biallelic EDNRB mutations, where about 50% of homo- zygotes have aganglionic megacolon and some combination of hearing loss, bicolor irides (hetero- chromia), and white forelock. Further reading 15 In correlating phenotype and genotype, it bears in mind to note that genetic disorders with similar phenotypes can be due to myriad genotypes at different loci in the genome, a phenomenon known as locus or genetic heterogeneity. One example is nonsyndromic hearing loss, for which mutations in over 60 different genes have been found to underlie the phenotype. Furthermore, different mutations in the same gene can cause the same disease, referred to as allelic heterogeneity; while different diseases can converge on the same genetic locus, referred to as allelic affinity. As an example, LoF mutations in the RYR1 gene cause central core disease (MIM #117000), while GoF variants in the same gene are associated with susceptibility to malignant hyperthermia (MIM #145600). 1.8 Conclusion Scientific and technical advances of the recent past are enriching and expanding our understanding of the human genome and the role of genetic variants in health and development, informing the practice of medicine in an unprecedented way. Rare genetic disorders offer insights into the discovery of fundamental and novel biological processes. As these findings increasingly inform medical care, every healthcare provider, scientist, and individual interested in traits that might be specifically observed within families should have a framework of understanding the basic concepts underlying human genetics and genomics. Further reading Balci T, Hartley T, Xi Y, Dyment D, Beaulieu C, Bernier F, et al. Debunking Occam’s razor: diagnosing multiple genetic diseases in families by whole-exome sequencing. Clin Genet 2017;92(3):2819. Bamshad M, Nickerson D, Chong J. Mendelian gene discovery: fast and furious with no end in sight. Am J Hum Genet 2019;105(3):44855. Claussnitzer M, Cho J, Collins R, Cox N, Dermitzakis E, Hurles M, et al. A brief history of human disease genetics. Nature 2020;577(7789):17989. Gonzaga-Jauregui C, Lupski JR, Gibbs RA. Human genome sequencing in health and disease. Annu Rev Med 2012;63:3561. Haendel M, Vasilevsky N, Unni D, Bologa C, Harris N, Rehm H, et al. How many rare diseases are there? Nat Rev Drug Discov 2020;19(2):778. Lee CE, Singleton KS, Wallin M, Faundez V. Rare genetic diseases: nature’s experiments on human develop- ment. iScience 2020;23(5):101123. Lupski JR. Clinical genomics: from a truly personal genome viewpoint. Hum Genet 2016;135(6):591601. Nussbaum R, McInnes R, Willard H. Thompson & thompson genetics in medicine. 8th edition Elsevier; 2017.