Document Details

EndorsedEveningPrimrose7980

Uploaded by EndorsedEveningPrimrose7980

Università degli Studi di Pavia

2024

Tags

chromothripsis cancer genomics mutation signatures oncogenes

Summary

This document discusses chromothripsis, a catastrophic genomic event that produces extensive rearrangements in chromosomes, and its role in cancer development. It explores mechanisms such as micronuclei formation and breakage-fusion-bridge cycles, as well as the functional consequences of chromothripsis-mediated genomic changes. Furthermore, the document touches upon kataegis, clusters of focal hypermutations. It also briefly mentions challenges in understanding mutational signatures and their impact on therapeutic approaches.

Full Transcript

CHROMOTHRIPSIS & MUTATIONAL SIGNATURES IN CANCER Lezione 13 11/11/2024 Focal Mutation Hotspots  Chromothripsis  Kataegis  Long Tracts of Mutations Surround BIR-Mediated SVs Karyotypes of Cancer Cells...

CHROMOTHRIPSIS & MUTATIONAL SIGNATURES IN CANCER Lezione 13 11/11/2024 Focal Mutation Hotspots  Chromothripsis  Kataegis  Long Tracts of Mutations Surround BIR-Mediated SVs Karyotypes of Cancer Cells  The karyotypes of cancer cells are often remarkably complex, littered not only with mutations but also small- and large-scale changes in both chromosome number and architecture.  Copy number alterations in the form of whole-chromosome or segmental aneuploidy are present in most tumors, yet its role as a cause or consequence of cancer development remains under debate.  Structural aberrations and gross rearrangements alter the linear organization of chromosomes, and in some instances, can directly drive tumorigenesis.  Example: The Philadelphia chromosome in chronic myelogenous leukemia involves a translocation between chromosomes 9 and 22, generating an oncogenic gene fusion product effectively targeted by clinical therapeutics.  Aneuploidy is a cancer hallmark. Chromothripsis: A New Phenomenon of Genome Instability Unknown before the next-generation sequencing era, chromothripsis is a new phenomenon of genome instability, by which a presumably single catastrophic event generates extensive genomic rearrangements of one or a few chromosome(s). Discovered in 2011 by Campbell (Cancer Genome Project, Wellcome Trust Sanger Institute). Chromothripsis involves complex clusters of rearrangements, including duplications, deletions, and inversions, occurring in a single chromosome-shattering event. First documented event showed simultaneous deletion of tumor suppressors and duplication of oncogenes Conventional karyotyping or FISH does not have the resolution necessary to completely characterize such complex events, though it can detect some of them from the Greek, “thripsis” = “shattering” 102  Double minute (DM) chromosomes are circular, replication-competent extrachromosomal DNA elements that amplify multiple gene copies, driving high expression levels.  Chromothripsis is present in 30% to 50% of human cancers.  Highly prevalent in specific cancers, including liposarcomas, osteosarcomas, and glioblastomas (over 50% of cases).  Chromothripsis is linked with poor prognosis. Genomic Features of Chromothripsis Suggest That Most Rearrangements Occur in a Single Catastrophic Event  Example of a sequence of progressive rearrangements disrupting a model chromosome. The chromosomal configuration after each rearrangement is shown, together with the copy number and rearrangement plot that would result.  Example of how a chromosomal catastrophe might break the chromosome into many pieces that are then stitched back together haphazardly. Definition:  Chromothripsis is a catastrophic event in which one or a few chromosomes are shattered and stitched back together in random order, producing a derivative chromosome with complex rearrangements within a few cell cycles.  Chromosome mis-segregation during cell division frequently produces small nuclear structures called micronuclei, which are prone to irreversible nuclear envelope disruption during interphase.  Micronucleated chromosomes accumulate extensive DNA damage and are susceptible to shattering during the next mitosis, generating multiple distinct DNA fragments. Chromosome fragments are reassembled by DNA double-strand break repair to form a derivative chromosome. 103 THE MICRONUCLEI o At the exit of mitosis, nuclear lamins and pore complexes redeposit around newly segregated chromosomal masses, forming the cell nucleus.  Chromosome segregation errors during mitosis can lead to the formation of micronuclei, small nucleus-like structures located outside the primary nucleus.  Micronuclei are a consequence of improper kinetochore-microtubule attachments and segregation errors during mitosis.  Micronuclei are a unique source of genomic instability and DNA damage.  Early evidence (1968) suggested that chromosomes in micronuclei undergo pulverization in mitosis after failing to complete DNA replication prior to mitotic entry.  Micronuclear chromosomes acquire DNA damage and exhibit delayed replication kinetics compared to the main nucleus. MECHANISMS AND FUNCTIONAL CONSEQUENCES OF CHROMOTHRIPSIS Mechanisms: Micronuclei formation is a precursor to chromothripsis, resulting from telomere fusion, chromosomal bridges, unrepaired DSBs, and nuclear envelope collapse. Breakage and improper repair of chromosomal bridges can lead to chromothripsis and are associated with the cytoplasmic exonuclease TREX1. Functional Consequences:  Identifying oncogenic structural variants resulting from chromothripsis is challenging.  Micronuclei harboring amplified oncogenes may contribute to positive selection in tumorigenesis.  Micronuclei-based model systems are used to understand the functional consequences of chromothripsis. 104 Causes of Chromothripsis Chromothripsis may arise in any cell, including somatic cells, germline cells, zygotes, and blastomeres of preimplantation embryos, thus determining the fate of an affected organ or the whole organism. Induced by exogenous and/or endogenous factors that trigger chromosome shattering and sequential reassembly of fragments through: o Micronuclei formation o Breakage-fusion-bridge cycles o Aberrant epigenetic regulation o Abortive apoptosis o Other yet unknown mechanisms HOW DOES CHROMOTHRIPSIS CONTRIBUTE TO CANCER DEVELOPMENT? Chromothripsis can result in both the loss of DNA segments and the formation of de novo rearrangements, contributing to cancer development. Culprits Include:  Disruption of tumor suppressor genes.  Formation of oncogenic fusion products.  Rearrangements between normally distant loci may juxtapose active regulatory elements (e.g., a promoter) adjacent to an otherwise repressed oncogene. Cancer Genome Sequencing Has Documented:  Tumor suppressor loss.  Gene fusion events.  Perturbed regulatory elements associated with chromothripsis in human malignancies. 105 Other Associations:  Chromothripsis and gene amplification, often manifested as extrachromosomal DNA elements (double minute chromosomes, DMs), leading to: o High expression of oncogenes o Resistance to therapy  Loss of the TP53 tumor suppressor gene, critical for halting cell cycle progression in response to DNA damage, is often a prerequisite for chromothripsis.  Bypassing the p53 checkpoint allows cells to tolerate damage from chromothripsis.  Most non-transformed cells experiencing chromothripsis do not survive; rare exceptions undergo clonal selection and expansion toward cancer.  A combination of these rearrangements produces a hallmark mutation signature for chromothripsis and fuels cancer development and tumor evolution through selective processes CASE EXAMPLES OF CHROMOTHRIPSIS-DRIVEN GENOMIC CHANGES IN DISEASE  Pancreatic Cancer Acceleration Chromothripsis, often accompanied by genome duplication (polyploidy), is prevalent in two-thirds of pancreatic cancer cases. Simultaneously inactivates key driver genes (CDKN2A, SMAD4, TP53) and triggers further genomic complexity, including mutated KRAS allele amplification.  Supratentorial Ependymomas Whole-genome sequencing of supratentorial ependymomas, a type of brain and spinal cord tumor, revealed chromothripsis affecting chromosome 11 in all cases. Complex rearrangements fused the oncogenic RELA gene with C11orf95, highlighting the oncogenic potential of chromothripsis-generated gene fusions.  These cases demonstrate the significant impact of chromothripsis on disease progression and its potential as a driver of oncogenic changes and therapeutic opportunities. Kataegis: clusters of focal (localized) hypermutation preferentially favoring cytosine substitutions)  Tumors exhibit clusters of point mutations known as mutation showers or kataegis.  These mutations often cluster at TpC dinucleotides adjacent to somatic rearrangements, significantly exceeding the background mutation rate. APOBEC Enzymes and Kataegis:  Overexpression of APOBEC enzymes is linked to kataegis formation and cancer progression.  APOBEC enzymes mutate TpC dinucleotides, suggesting their role in kataegic foci formation. Mechanistic Links:  Kataegis foci are commonly found at sites of DNA double-strand breaks (DSBs), DNA repair, and structural variation.  Structural variation and kataegis can co-occur focally in human cancer, suggesting mechanistic links. 106 In Summary  Mutation Hotspots Are Sequence-Dependent: o Mutation hotspots depend on DNA sequence and structure. o Mutations are subject to selection and can vary based on cellular processes.  Diverse Mechanisms of Mutation Formation: o Mutations can result from various cellular processes, including DNA replication, repair, meiotic recombination, and immunoglobulin specification. o Some mutations are passenger mutations from multimutational events.  Challenges in Understanding Mutational Signatures: o Mechanistic relationships of certain mutational signatures found in cancers remain unclear.  Impact on Therapeutic Approaches: o Investigation of mutation hotspots in cancer has led to the development of novel therapeutic approaches.  Future Insights: o Advancements in technology and large-scale sequencing studies will provide further insights into genetic variation hotspots, their frequency in diverse tissues and populations, and their contributions to diseases. Impact of Mutations in Cancer Genes on Clinical Outcomes  Mutations in known cancer genes, as opposed to non-cancer driver genes, significantly alter the function of genes and proteins, impacting cellular processes.  These mutations in oncogenes are associated with variations in: o Patient survival. o Clinical outcomes. o Characteristics of metastatic or recurrent tumors.  Such mutations play a critical role in predicting tumor responsiveness to anti-cancer drugs.  Understanding the frequency and impact of mutations in cancer genes across different tissue types is crucial for advancing cancer treatment and prognosis strategies. Causes and Challenges in Understanding Somatic Mutations in Cancer  DNA molecules in our cells are targeted by various mutagenic processes acting in both germ and somatic cells.  These processes contribute to species evolution and, particularly in somatic cells, to age-related diseases and cancer.  Cancer genomes accumulate many somatic mutations due to endogenous and exogenous factors like: Normal DNA damage and repair. DNA maintenance aberrations. Carcinogenic exposures. 107  Most mutations are harmless but serve as indicators of mutational processes, each creating characteristic mutational patterns or signatures in the genome.  Different mutational processes generate unique combinations of mutation types, termed »Mutational Signatures».  Understanding these mutational signatures is fundamental for comprehending tumorigenesis and cancer evolution.  Each mutational process may involve components of DNA damage or modification, DNA repair, and DNA replication (normal or abnormal), creating distinct mutational signatures In certain cancers, a substantial proportion of mutations is caused by known exposures (e.g., tobacco in lung cancer, UV light in skin cancer) or abnormalities in DNA maintenance (e.g., defective mismatch repair in colorectal cancer). However, the comprehensive understanding of mutational processes across most cancer types is still remarkably limited. Traditional studies of mutational signatures, often focused on frequently mutated genes like TP53, faced limitations due to their focus on 'driver' mutations and the challenge of composite signatures from multiple mutational processes. IMPLICATIONS FOR PERSONALIZED CANCER THERAPY  Identifying mutagenic processes behind mutational signatures aids in developing personalized cancer therapies  Example: Patients with homologous recombination deficiency (HRD) benefit from PARP inhibitor therapy, with HRD leaving a characteristic mutational signature in the genome.  The presence of specific mutational signatures can guide the use of targeted therapies like PARP inhibitors.  Many signature etiologies remain unknown, necessitating ongoing research to link signatures with potential causes.  There is growing recognition that mutation patterns are context-specific, leading to research focused on understanding this context dependence. Advancements in mutational signature analysis  Recent advancements in sequencing technology have addressed past limitations, allowing the identification of thousands of somatic mutations in a single cancer sample.  This technological progress enables the deciphering of mutational signatures, even with multiple operative mutational processes.  Mutational signatures potentially include base substitutions, small insertions and deletions (indels), genome rearrangements, and chromosome copy-number changes.  An algorithm developed to extract mutational signatures from somatic mutation catalogues has revealed both novel and known signatures in breast cancer genome sequences.  The method has been applied to global sequencing initiatives, providing a comprehensive survey of mutational signatures and processes across a wide spectrum of human cancers. 108 Conceptual workflow of somatic mutational signatures identification  Diverse mutagenesis processes shape the somatic landscape of tumors.  Deciphering the underlying patterns of cancer mutations allows uncovering relationships between these recurrent patterns of mutations and inferring possible causal mutational processes. Characteristics and analysis of mutational signatures  Mutational signatures are inferred computationally from mutation counts across a set of individuals.  Computational methods are used to infer signature activities in individual genomes and their dependencies on the sequential and genomic context.  To provide insights into the etiology of endogenous signatures, researchers use computational and experimental methods to link mutational signatures to dysregulated genes and pathways.  Mutational signatures can be linked to cancer/genome evolution and genetic diversity in a population. All cancers are caused by somatic mutations, but the biological processes behind these mutations are not fully understood.  Analysis of 4,938,362 mutations from 7,042 cancers revealed over 20 distinct mutational signatures.  Some signatures, like those attributed to the APOBEC family of cytidine deaminases, are common across many cancer types, while others are specific to a single cancer class.  Associations are found with patient age at diagnosis, known mutagenic exposures, and defects in DNA maintenance, but many signatures have unknown origins.  The phenomenon of kataegis, hypermutation localized to small genomic regions, is observed in various cancers.  This diversity in mutational processes offers insights into cancer etiology, prevention, and therapy. 109 Compilation and analysis of mutational catalogues in cancer  A comprehensive dataset of 4,938,362 somatic substitutions and small insertions/deletions (indels) was compiled from 7,042 primary cancers of 30 different classes, including both whole genome and exome sequences.  Normal DNA from the same individuals was sequenced to confirm the somatic origin of these variants.  The prevalence of somatic mutations varied significantly across and within cancer classes, ranging from approximately 0.001 per megabase (Mb) to over 400 per Mb.  Certain childhood cancers exhibited the fewest mutations, while cancers associated with chronic mutagenic exposures (e.g., lung cancer from tobacco smoking, malignant melanoma from UV light exposure) showed the highest prevalence. Defining mutational signatures in cancer  Mutational signatures are defined by various classes of mutations, like substitutions, indels, rearrangements, and specific mutation characteristics (e.g., sequence context, transcriptional strand).  For base substitutions, considering the bases immediately adjacent to the mutation, there are 96 possible mutation types in the classification.  This classification is particularly effective in distinguishing mutational signatures that cause the same substitutions but in different contexts.  Analysis of 30 cancer types revealed 21 distinct validated mutational signatures, exhibiting substantial diversity. 110  Some signatures show remarkable specificity (e.g., signature 10), while others have a more uniform distribution of mutation types (e.g., signature 3).  Different signatures are characterized by the predominance of certain mutation types, like C/T, C/A, T/C, and T/G substitutions. Validated mutational signatures found in human cancer  Each signature is displayed according to the 96 substitution classification defined by the substitution class and sequence context immediately 3′ and 5′ to the mutated base.  The probability bars for the six types of substitutions are displayed in different colors.  The mutation types are on the horizontal axes, whereas vertical axes depict the percentage of mutations attributed to a specific mutation type.  All mutational signatures are displayed based on the trinucleotide frequency of the human genome. Diversity and impact of mutational signatures across cancer types  Signatures 1A and 1B, characterized by C/T substitutions at NpCpG trinucleotides, were observed in 25 of 30 cancer classes, likely related to spontaneous deamination of 5-methyl- cytosine.  Signature 2, prevalent in 16 out of 30 cancer types, is primarily characterized by C/T and C/G mutations at TpCpN trinucleotides, possibly due to the activity of the APOBEC family of cytidine deaminases.  Most cancer classes exhibit at least two mutational signatures, with some cancers like those of the liver, uterus, and stomach showing up to six, indicating a complex repertoire of mutational processes.  Individual cancer genomes often exhibit multiple mutational signatures, with the contribution of each signature varying significantly among cancer samples within the same class. Mutational Signatures and age of cancer diagnosis  An analysis was conducted across various cancer types to identify correlations between the age of diagnosis and the number of mutations attributed to each mutational signature.  Signature 1A/B showed a strong positive correlation with age at diagnosis in the majority of both childhood and adult cancer types.  This pattern suggests that a significant proportion of signature 1A/B mutations in cancer genomes are acquired over the patient's lifetime at a relatively constant rate across different individuals, likely in normal somatic tissues.  The lack of a consistent correlation between age and other mutational signatures implies that these mutations may occur at varying rates in different people, potentially due to differing carcinogen exposures or after the initiation of neoplastic changes. 111 Transcriptional strand bias in mutational signatures  The efficiency of DNA damage and maintenance processes can vary between the transcribed and untranscribed strands of genes, a phenomenon influenced by transcription-coupled nucleotide excision repair (NER).  Substitution mutational signatures were re-extracted, incorporating information on the transcriptional strand for each mutation, leading to 192 mutation subclasses.  Several signatures displayed significant transcriptional strand bias, with differences in mutation prevalence between transcribed and untranscribed strands.  Signature 4, commonly found in lung and liver cancers often caused by tobacco smoking, shows a transcriptional strand bias for C/A mutations. This is likely a result of bulky DNA adducts from tobacco smoke and their removal by transcription-coupled NER.  Signature 7, mainly observed in malignant melanoma, exhibits more C/T mutations on the untranscribed strands, aligning with damage from UV exposure and repair by transcription- coupled NER. Mutational Signatures involving insertions and deletions  Mutational signatures were re-extracted to include indels at short nucleotide repeats and indels with overlapping microhomology at breakpoint junctions, in addition to the 96 substitution types.  Signature 6, characterized by C/T at NpCpG mutations and distinct from signature 1A/B, is associated with large numbers of substitutions and small indels (often termed 'microsatellite instability') in several cancer types. This pattern is indicative of cancers with defective DNA mismatch repair.  Signature 15, found in lung and stomach cancers, also contributes a large number of substitutions and small indels but differs in mutation prominence.  Signature 3, associated with larger deletions with overlapping microhomology, was strongly linked to BRCA1 and BRCA2 mutations, particularly in breast, ovarian, and pancreatic cancers. Linking cancer aetiology to mutational signatures  Each mutational signature represents the cumulative effect of one or more DNA damage and/or maintenance mechanisms.  Signature 1A/B is likely due to deamination of 5-methyl-cytosine, a common endogenous process.  Signature 7, associated with UV-light-induced mutations, is observed in malignant melanoma and head and neck squamous carcinoma.  Signature 4, found in cancers related to tobacco smoking, exemplifies the impact of tobacco carcinogens.  Signature 11, linked to alkylating agents like temozolomide, is found in treated malignant melanomas and glioblastoma multiforme.  Signatures 2 and 13 are attributed to the AID/APOBEC family of cytidine deaminases.  The exact underlying processes or etiologies for many mutational signatures remain unknown, highlighting the complexity of cancer genomics and the need for further research. 112 Localized hypermutation and Kataegis in cancer genomes  Tumors exhibit clusters of point mutations known as mutation showers or kataegis.  Kataegis, a phenome non of localized substitution hypermutation, is characterized by clusters of C/T and/or C/G mutations, predominantly at TpCpN trinucleotides, and occurring on the same DNA strand.  An underlying role of APOBEC family enzymes is proposed for both kataegis and signatures 2 and 13.  In a study of 507 whole-cancer genome mutation catalogues, kataegis was observed in various cancers, including breast, pancreas, lung, liver, medulloblastomas, CLL, B-cell lymphomas, and acute lymphoblastic leukemia.  Kataegic foci, ranging from small clusters to major foci of hypermutation, are often linked with genomic rearrangements. Overview of Mutational Signature Analysis in PCAWG Consortium  The analysis used 84,729,690 somatic mutations from 4,645 whole-genome and 19,184 exome sequences, covering a broad range of cancer types.  A total of 49 single-base-substitution, 11 doublet-base-substitution, 4 clustered- base-substitution, and 17 small insertion- and-deletion signatures were identified.  The extensive size of the dataset allowed for the discovery of new signatures, the differentiation of overlapping signatures, and the decomposition of signatures into components indicating distinct DNA damage, repair, and replication mechanisms.  Signatures were associated with various exogenous or endogenous exposures and defective DNA-maintenance processes.  Despite these findings, many signatures still have unknown causes.  This analysis offers a systematic view of the mutational processes contributing to human cancer development. 113 Overview of Doublet-Base Substitution (DBS) Signatures in cancer  Tandem doublet to sextuplet base substitutions were observed at about 1% the prevalence of single-base substitutions (SBSs).  The number of DBSs in most cancer genomes was higher than expected by chance, suggesting common mutagenic events causing neighboring base substitutions.  Eleven DBS signatures were extracted, including three previously reported.  Signature DBS1, characterized by CC>TT mutations, was prevalent in malignant melanomas and is a known consequence of UV light-induced DNA damage.  DBS2, predominantly CC>AA mutations, was common in lung and head and neck cancers, often related to tobacco smoking, indicating guanine damage. Specific DBS signatures and their associations  DBS11, primarily CC>TT mutations, was associated with multiple cancer types and APOBEC activity, suggesting a role for APOBEC in generating DBS11.  DBS3, DBS7, DBS8, and DBS10 showed high mutation counts in rare colorectal, stomach, and esophageal cancers, some linked to defective DNA mismatch repair or polymerase epsilon exonuclease domain mutations.  DBS5 was associated with cancers exposed to platinum chemotherapy and correlated with SBS31 and SBS35 signatures.  The diverse nature of DBS signatures indicates various underlying mutational processes, from environmental exposures to inherent cellular mechanisms. Characteristics of Small Insertion-and-Deletion (Indel) Mutational Signatures  Indels typically occurred at about 10% the frequency of base substitutions, with substantial variation in number across different cancer genomes.  The study identified 17 indel mutational signatures, with variations in the number of deletions and insertions across cancer types. 114  Signature ID1 predominantly involved insertions of thymine, and ID2 deletions of thymine, at long thymine mononucleotide repeats. These were common in colorectal, stomach, endometrial, esophageal cancers, and diffuse large B cell lymphoma.  ID3, characterized by deletions of cytosine at short mononucleotide repeats, was prevalent in lung and head and neck cancers associated with tobacco smoking.  Signature ID13 was mainly found in malignant melanomas, marked by deletions of thymine at thymine–thymine dinucleotides, and correlated with UV light-induced damage.  A small fraction of cancers exhibited very high numbers of ID1 and ID2 mutations, often accompanied by SBS signatures associated with DNA mismatch repair deficiency. Constraints, limitations, and future directions in Mutational Signature analysis  The study acknowledges important constraints and limitations in the analytical frameworks used for characterizing mutational signatures.  Signatures are mathematical approximations and potentially influenced by the chosen analytical approach and other factors.  Development of more refined methods for deciphering and attributing mutational signatures is essential, ideally incorporating experimental systems where causes are known.  Despite potential limitations, the analysis captured a substantial proportion of naturally occurring mutational signatures in human cancer, forming a foundation for further research.  Future research directions include exploring the aetiologies of geographical and temporal differences in cancer incidence, understanding mutational processes in healthy tissues and non-neoplastic diseases, clinical applications, and mechanistic insights into carcinogenesis. Mutational signatures are markers of drug sensitivity of cancer cells  Cancer therapeutics often target DNA synthesis or repair.  Mutational signatures make useful markers of drug sensitivity.  There are robust associations between various mutational signatures and drug activity across cancer cell lines.  Signatures of prior exposures to DNA-damaging agents, including chemotherapy, tend to associate with drug resistance.  Signatures of deficiencies in DNA repair tend to predict sensitivity toward particular therapeutics. 115 Sex-biases in mutational signatures  Analysis covered 47 validated SBS, 11 DBS, and 17 ID signatures, assessing both the presence of signatures and the percentage of mutations attributed to each signature between sexes.  Three signatures (SBS1, SBS17a, SBS17b) showed significant sex differences at the pan-cancer level. o SBS1 was more common in female samples, whereas SBS17a and SBS17b were more frequent in male samples.  In hepatocellular cancer, four ID signatures (ID3, ID8, ID11, ID1) exhibited sex biases. o ID3 and ID8, associated with tobacco smoke and double-stranded break repair respectively, were more common in males. o ID1 mutations were more prevalent in female samples.  In B-cell non-Hodgkin lymphoma, significant differences were observed in SBS17a and SBS17b signatures between sexes.  These insights into sex-biased mutational signatures across various cancers underscore the importance of considering sex as a factor in understanding tumor aetiology and developing personalized treatment strategies. 116 INTRODUCTION TO CANCER EPIGENOMIC Lesson 14 15/11/2024 Epigenetics is the study of modifications made to DNA and associated factors that:  Do not change the DNA sequence itself  Are maintained during cell division  Cause stable changes in gene expression The collection of all epigenetic changes in a genome is called an epigenome. The term ‘‘epigenetics’’ was originally coined by Conrad Waddington to describe dynamic interactions between the environment and the genome that bring the characteristic traits of an organism, defined as the phenotype (Waddington, 1942). Epigenetic alterations are defined as nonpermanent and potentially heritable changes that regulate gene expression without alterations to the DNA sequence. Epigenetic modifications are considered to be dynamic and reversible, established by modification enzymes (named writers), interpreted by modification specific binding proteins (readers), and removed by enzymes (erasers). Epigenetic mechanisms that control changes in gene expression levels can be divided into three major groups: DNA methylation Histone and chromatin modifications Noncoding RNAs (ncRNAs Nonmutational epigenetic reprogramming  Definition: Nonmutational epigenetic regulation involves changes in gene expression without DNA mutations. Analogous Mechanism: Similar to embryonic development and long-term memory, epigenetic changes can drive cancer progression. Role in Cancer: Specific epigenetic alteration in cancer, such as DNA methylation or histone modification, leads to changes in gene expression and promotes cancer development and progression Multiple layers of epigenetic regulations in cancer Alterations in epigenetic modifications in cancer regulate various cellular responses, including cell proliferation, invasion, and senescence. apoptosis, Through DNA methylation, histone modification, chromatin remodeling, and noncoding RNA regulation, epigenetics play an important role in tumorigenesis 117 Multiple layers of gene expression regulation Chromosomes are regulated by their locations, or territories, in the nucleus both relative to one another and to the nuclear lamina. Long-range interactions are further regulated by TADs within and across chromosomes. At the epigenetic level, gene expression is regulated by modifications including reversible within histone nucleosomes methylation, phosphorylation, acetylation, ubiquitination, sumoylation. An additional layer of gene expression regulation: RNA modifications  Some RNA modifications are reversible  More than 150 structurally distinct modification types have been identified across all types of RNA  These modifications are associated with various biological processes and human diseases  The common RNA modifications include N6 methylation of adenosine (m6A), N1 methylation of adenosine (m6A), N7 methylguanosine (m7G), 5-methylcytosine (m5C), 2′O methylation (2′- O-Me or Nm), pseudouridine (5-ribosyl uracil or Ψ) and adenosine to inosine RNA editing (A-to- I editing), etc.  Among these RNA modifications, m6A is the most abundant form in eukaryotic cells. m6A expression is abundant in the liver, kidney, and brain.  m6A in protein-coding regions (CDS) and untranslated regions (UTRs) is relatively high.  The m6A modification has been implicated in the activation of multiple signaling pathways associated with lung cancer 118 Different isoforms, can be downregulated in normal cases but upregulated in cancer Dna methylation  Addition of a methyl group to the fifth carbon of the cytosine ring to form 5-methyl cytosine, mostly on CpG dinucleotides.  The symmetrical presence of CpG methylation marks on both DNA strands allows the post- replicative maintenance of DNA methylation patterns and is therefore a key feature of epigenetic regulation. 119  DNA methylation is a crucial molecular mechanism in cell differentiation and function.  Uneven distribution  Most CpGs in the genome sparsely distributed (70-80% of those are methylated)  CpG islands (CGI) - clusters of CpGs (typically unmethylated) mark housekeeping active genes. In mammals, 5-methylcytosine (5mC) is the major form of DNA modification, and it has important roles in development and disease. About 70–80% of the CpG sites in the mammalian genome are modified by 5mC. The major functions of 5mC include mediating genomic imprinting and X- chromosome- inactivation, repressing transposable elements and regulating transcription. 5mC is both chemically and genetically stable. Chemically, the methyl group is connected to the 5-position of the cytosine base through a stable carbon–carbon bond, creating a barrier for direct removal of the methyl group. DNA methylation: function (transcriptional regulation)  Generally a repressive mark  Reduced DNA-binding of many proteins  Binding site for methyl binding proteins (MBD domain containing, MeCP2)  Condensed chromatin structure DNA methyltransferases-1  The DNA methyltransferases (DNMTs) are a conserved family of cytosine methylases with a key role in epigenetic regulation.  The human genome encodes five DNMTs: DNMT1 DNMT2 DNMT3A DNMT3B DNMT3L  DNMT1, DNMT3A, and DNMT3B are canonical cytosine-5 DNMTs that catalyze the addition of methylation marks to genomic DNA.  DNMT2 and DNMT3L are non-canonical family members, as they do not possess catalytic DNMT activity. 120  DNMTs are important for normal mammalian development, and DNMT mutations are associated with several human diseases, including cancer All DNMTs share the conserved catalytic domain (shown in red). The small domain within the catalytic domain of DNMT2 (shown in black) represents the unique CFT motif that distinguishes DNMT2 from other DNMTs. Regulation of DNMT activity  In mammals, DNA methylation is highly regulated owing to a variety of molecular interactions and modifications that regulate DNMT activity.  Regulation of DNMTs by molecular interactions: For example, DNMT1-interacting protein E3 ubiquitin-protein ligase UHRF1, which binds to DNA, is essential for maintenance methylation in vivo.  Regulation of DNMTs by post-translational modifications: DNMT1 can be demethylated at lysine 1094 by the lysine demethylase LSD1 and methylated at lysine 142 by the lysine methyltransferase SET7.  Regulation of DNMTs by alternative splicing: For example, the DNMT3B gene is expressed as many different isoforms.  Regulation of DNMTs through gene loss and duplication: Changes in DNMT gene copy number have a role in adapting DNMT activity to species-specific requirements. 121 Genetically, on its establishment by de novo DNA methyltransferase 3A (DNMT3A) and DNMT3B, 5mC is maintained by the maintenance methyltransferase DNMT1, which recognizes hemi-methylated CpG Eukaryotic DNA methylation Methylation patterns: 1. Erased in pre-implantation embryos 2. Established in early development (DNMT3A or B) 3. Mantained throughout the remainder development (DNMT1) Genomic location of dna methylation 122 The commonly accepted definition of the CpG island- centric landscape situates a CpG -ich "island" in the center. Flanking island are CpG shores, which extend for 2 kb on each flank. Thereafter, CpG shelves extend 2 kb beyond that, with any regions beyond that (4kb) falling in the open sea window Methylation patterns of CGI may vary in different tissues Some gene aren’t expressed in the same way in different tissue Passive vs active DNA demethylation  Passive DNA demethylation is a replication- dependent and non-catalytic event  DNA replication is uncoupled from DNA maintenance  Active DNA demethylation is a catalytic event  5mC is replaced with C by iterative 5mC oxidative steps, coul be related to the replication. 123 Active DNA demethylation DNA methylation in the form of 5-methylcytosine (5mC) can be actively reversed to unmodified cytosine (C) through the activity of TET enzymes, Activity of TET enzyme need alpha ketoglutarate and oxygen to work CYTOSINE MODIFICATIONS DNA methylation in the form of 5-methylcytosine (5mC) is actively reversed through TET dioxygenase mediated oxidation of 5mC to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5 carboxylcytosine (5caC). This is followed by replication-dependent dilution or Thymine DNA glycosylase (TDG)-dependent base excision repair (BER) Cross talk of methylation and dna repair 124 TEN ELEVEN TRASLOCATION (TET) FAMILY TET proteins were named because of the rare ten eleven translocation associated with myeloid and lymphoid malignancies that fuses the N-terminal region of the mixed lineage leukemia (MLL) gene (encoded on chromosome 11) to the C-terminal catalytic domain of TET1 (encoded on chromosome 10) TET proteins are Fe(II)- and 2- oxoglutarate-dependent dioxygenases and depend on three cofactors for their activity: divalent iron (Fe II), α- ketoglutarate (αKG) and oxygen The C-terminal core catalytic domain shared by all TET enzymes consists of the double-stranded β-helix (DSBH) domain, a cysteine-rich (Cys) domain, and binding sites for the Fe(II) and 2-OG cofactors. The DSBH domain contains a large low complexity region of unknown function. TET1 and TET3 have an N-terminal CXXC domain that can bind directly to DNA and facilitate recruitment to genomic target sites. Factors influencing TET enymatic activity: crosstalk between metabolism and epigenetics In presence of mutation idh 1 and 2 we have ipo- methylation (glioblastoma) Catalytic and non-catalytic roles of TETs involved in transcriptional repression Tet2 involved in regulating the inflammation in different genes Non-catalytic ole not well known but, opposite of catalytic role. 125 CATALYTIC ROLE OF TDG N ON-CATALYTIC DEPENDENT ROLES OF TDG Active DNAdemethylation occurs in various biological contexts during embryogenesis: o pre-implantation and precursor germ cell (PGC) development; o Embryonic stem cell (ESC) maintenance and differentiation; o neuronal functions Aberrant demethylation is observed in cancer. Recent studies have also revealed the involvement of TET and active DNA demethylation in genomic instability and DNA damage repair alteration of epigenome transcriptome and epigenome integrity 126 Active DNA demethylation during preimplantation and embryo development Shortly after fertilization, mouse and human zygotes undergo extensive epigenetic reprogramming, including global DNA demethylation of both the paternal and maternal genomes Demethylation of the paternal genome mainly occurs through a combination of passive dilution of 5mC and TET3- mediated active modification– passive dilution (AM–PD) of 5hmC, 5fC and 5caC; Demethylation of the maternal genome mainly occurs through passive dilution Roles of active DNA demethylation example in active TET1 dna metilation The female is wt the male are knockout , the progeny is genotypically the same(heterozygous) but the phenotype are all different, the epigenetic defect is what cause the phenotype Different phenotype embryonic lethality, post natal growth defect, and alteration at placenta level 127 Studies done in the setting of development that discovered the funncton of these genes 128 Histone modification Histones are a family of small, positively charged proteins termed H1, H2A, H2B, H3, and H4 (Van Holde, 1988). DNA is negatively charged, due to the phosphate groups in its phosphate-sugar backbone, so histones bind with DNA very tightly. Histonic variants: H2A.X CENP-A  Post-translational modifications of amino acids occur primarily on the histone tails  The main histone tains modifications are: o Histone acetylation o Histone methylation Histone modifications: nomenclature 129 Additional histone modifications Serine phosphorylation: H3S10, H3S28 Arginine mono-, di and tri- methylation: H3R2, H3R17, H3R3 Threonine phosphorylation: H3T3, H3T6, H3T11 Lysine ubiquitylation: H2AK119, H2BK120 Histone modification are reversible like all the epigenetic modification Histone acetylation and deacethylation 130 The Epigenetic tools: The histone writers, readers and erasers Writers: Enzymes that add histone modifications Erasers: Enzymes that remove histone modifications Readers: Proteins that bind histone modifications and alter gene activity and protein expression 131 “T HE HISTONE CODE” Modifications of histone proteins that, together with DNA sequence, determine the transcriptional status: Histone modifications are induced by specific enzymes (HATs, HDACs, etc.). Examples: o Methylation of lysine 4 and 14, and phosphorylation of serine 10 of H3: ACTIVATION o Methylation of lysine 9 of H3: INACTIVATION Histone Modification Status Correlates with Transcriptional Activity: o Gene activation is associated with H3-K9 acetylation. o Gene silencing is associated with H3-K9 methylation. Mechanism of chromatin regualtion Actively transcribed genes are found in open euchromatin and are associated with histone acetylation (H3/H4Kac) and tri-methylation of H3 lysine 4 (H3K4me3) at promoters, and trimethylation of H3 lysine 36 (H3K36me3) over the gene body. Silenced genes are associated with densely packed heterochromatin marked by DNA methylation (5mC) and H3 lysine 9 trimethylation (H3K9me3) or in silenced polycomb domains marked by tri- methylation of H3 lysine 27 (H3K27me3). Each of the four standard histones can be simultaneously modified at multiple different sites with multiple different modifications. For example, H3 contains 19 lysines (K) known to be methylated—each can be un-, mono-, di- or tri methylated. If modifications are independent, this allows a potential 419 or 280 billion different lysine methylation patterns. 132 Histone modifications can recruit protein complexes that regulate the chromatin state and gene activity Crosstalk between modified histones Histone modifications summary Covalent and reversible modifications, usually on histone tails Modifying and de-modifying enzymes  Redundancy:A single position can be modified by multiple different enzymes  Specificity: Histone modifying enzymes can target only one or many positions Can act intrinsically (only on single nucleosomes) or extrinsically (affect nucleosome/nucleosome interactions) Can recruit other proteins to DNA  Via specific domains: e.g. Bromo (ac), Chromo/PHD (me), 4-3-3 (ph) Participate in the regulation of many processes  Transcription, DNA repair, chromatin assembly, silencing, heterochromatin formation 133 DECIPHERING THE METHYLOME: METHODOLOGIES & APPLICATIONS IN CANCER groupwork 18/11/2024 (lezione 15) DNA Methylation  5 methyl group is added to the 5th carbon of cytosine rings.  Occurs at cytosine residues within CpG sites.  Mediated by DNA methyltransferases (DNMTs).  Epigenetic control mechanism that modulates gene activity (usually repressive).  Cellular processes regulated by DNA methylation: o Genomic imprinting. o X-chromosome inactivation. o Repressing transposable elements. o Regulating transcription.  Abnormal methylation can result in the development of oncogenic properties. TYPES OF CYTOSINE MODIFICATIONS TECHNIQUES TO STUDY METHYLATION  DNA Methylation arrays: EPIC array.  NGS-based, short-read sequencing: WGBS, RRBS.  NGS-based methods: MeDIP-seq, MRE-seq.  Long-read sequencing: PacBio, Oxford NanoPore. BISULFITE CONVERSION  Technique used to study methylations.  Converts unmethylated cytosines to uracil; methylated cytosines remain unchanged.  After sequencing, unmethylated cytosines are read as thymines.  Reduces sequence complexity in unmethylated regions (all C’s replaced by T’s).  Makes mapping bisulfite-converted reads to the reference genome more challenging. 134 EPIC ARRAY  Developed by Illumina for DNA methylation analysis.  Focuses on CpG-rich regions; can detect the methylation status of over 930,000 CpG positions. Regions Covered:  CpG islands: high density of CpG sites.  Non-CpG methylated sites: methylated cytosines outside CpG.  ENCODE open chromatin & enhancers: accessible chromatin and regulatory enhancers.  DNase hypersensitivity sites: regions of open chromatin. CpG regions, non-CpG methylation sites, various enhancers, open chromatin regions, and cancer- relevant markers PROTOCOL OVERVIEW  Bisulfite conversion: convert unmethylated cytosines to uracil by treating DNA with sodium bisulfite  Amplification of bisulfite treated DNA Fragmentation into smaller pieces Fragments applied to EPIC array (hybridises to specific probes that correspond to CpG sites)  Single-base extension reaction: labelled nucleotide added to identify whether cytosine sites at the CpG site are methylated o EPIC array uses specific probes to detect if a fragment is methylated o Each probe hybridises (binds) to a specific genomic region near a target CpG site o Reaction: adds nucleotide to probe depending on the methylation state o 2 types of chemistry: Infinium 1 & Infinium 2 Wider range of coverage across genome  Infinium 1: 2 probes per CpG site 1 probe binds DNA if CpG is methylated. Other one binds DNA if CpG is unmethylated.  Infinium 2: 1 probe per CpG site (can detect both methylated & unmethylated states with single probe)  Imaging & detection: Use Illumina iScan/HiScan systems to measure signal intensity and calculate beta values: o 1 = fully methylated CpG site. o 0 = fully unmethylated CpG site.  Data analysis: Software like Illumina GenomeStudio performs normalization, quality assessment, and data visualization. o Tables: CpG locus lists, methylation levels 135 BIOINFORMATIC ANALYSIS Data analysis & quality control: using GenomeStudio  calculates methylation levels (beta values)  performs normalisation analyses differential methylation levels between groups  detect cytosine methylation at single-base resolution  identify methylation signatures across the entire genome  calculate methylation levels and visualize CpG island informatio WHOLE-GENOME BISULFITE SEQUENCING (WGBS)  Protocol used to identify methylated cytosines in genomic DNA. o Genomic regions covered: CpG islands, shores, shelves, promoter regions, gene bodies, enhancer regions, non-coding regions.  DNA treated with sodium bisulfite, then sequenced.  Single base resolution of methylated cytosines.  Covers the whole genome. PROTOCOL  DNA purification & sonication:Isolate DNA from the sample and fragment it into smaller pieces using sonication. Make DNA suitable for sequencing.  End repair: Repair fragmented DNA to make it compatible for adapter ligation. A tailing: Add adenosine bases to 3' ends. Adapter ligation: Attach methylated adapters to both ends of fragments.  Size selection: Fragment length adjusted for sequencing (200-400 bp). Desired fragment length can be cut out of agarose gel or using magnetic beads.  Bisulfite conversion: Convert unmethylated cytosines to uracil by treating DNA with sodium bisulfite.  PCR: Amplification of bisulfite converted DNA.  Sequencing of resulting library: Single-paired or paired-end sequencing. Paired-end reduces error rate & improves sensitivity. 136 BIOINFORMATIC ANALYSIS 1. Data processing & Quality control: Remove low quality reads. a. Tools: Trimmomatic. 2. Adapter trimming & alignment: a. Mapping tools: Bismark, BS-seeker 2 (consider C -> T conversions). 3. Methylation quantification/calling: Alignment is assessed & methylated CpG sites are identified. a. Tools: Bismark, MethylDackel. 4. Differential Methylation Analysis: Assess methylation level changes across samples. a. Tools: Metilene, Defiant, DSS, Methylkit. 5. Visualisation & Interpretation: a. Tools: R, Python. REDUCED REPRESENTATION BISULFITE SEQUENCING (RRBS) b. Used to study genome-wide DNA methylation at single-nucleotide resolution. c. Focuses on CpG-rich regions. i. Genomic regions covered: CpG islands, shores, promoters, gene bodies, enhancer regions. d. Combines DNA restriction with enzymes and bisulfite sequencing. General steps e. DNA digestion with methylation-sensitive enzyme. f. Similar steps to WGBS: end repair, A tailing, adapter ligation, fragment purification, bisulfite conversion, PCR, sequencing. g. Bioinformatics tools similar to WGBS. 137 METHYLATION-SENSITIVE ENZYME 1. DNA digestion with methylation- sensitive enzyme is the first step of RRBS 2. Restriction endonuclease generates fragments for sequencing. 3. MspI restriction enzyme is used: cleaves DNA at CCGG sites 4. Enriches regions high in CpG sites. 5. Fragments are size-selected to focus on CpG-rich areas. 6. Fragments are amplified via PCR and sequenced. 138 MEDIP-SEQ 7. Methylated DNA immunoprecipitation sequencing. 8. Immunocapturing technique for detecting methylated DNA. 9. Unbiased detection. 10. Focuses on CpG-rich regions. 11. Genomic regions covered: CpG islands, promoters, gene bodies, enhancers. 12. Can be combined with microarrays or NGS to provide genome-wide methylation profiles. General steps 13. Random shearing of genomic DNA by sonication. 14. Followed by immunoprecipitation with methylated cytosine antibody that selectively targets 5mC. 15. Sequencing using NGS. PROTOCOL  DNA extraction & fragmentation via sonication: Fragments of 300–1000 bp. Must be performed in siliconised tubes to prevent non-specific binding of proteins to tube walls.  Denaturation: Denature fragmented DNA to create ssDNA.  Immunoprecipitation: Performed to enrich for methylated DNA regions.  A set of DNA is put aside that is not immunoprecipitated; this serves as the control and reference to compare methylation enrichment.  Immunoprecipitation is performed using an anti-5mC antibody to bind methylated DNA.  Denaturation is done prior to immunoprecipitation because the anti-5mC antibody only binds to single- stranded DNA (ssDNA).  Fragmented DNA is mixed with the antibody.  Methylated DNA fragments are captured using magnetic beads.  Washing is performed to remove unbound DNA and select only for methylated DNA.  Library preparation End repair, A tailing, adapter ligation, size selection, PCR amplification  Sequencing: NGS technologies used to sequence the library Illumina (high throughput genome-wide methylation analysis)  Bioinformatics analysis & data interpretation: BWA/Bowtie for alignment Reads filtered to remove errors, Quality control checks using MeDIP-specific QC (MEDIPS) Map number of reads at each genomic site to assess methylation levels 139 MRE-SEQ Overview 1. Methylation-sensitive Restriction Enzyme Sequencing. 2. Enzyme-based technique for detecting hypomethylated regions. 3. Focuses on unmethylated regions. General steps  Genomic regions covered: CpG islands, gene promoters, regulatory regions.  Complements MeDIP-seq for full methylation profiling.  Genomic DNA is digested using methylation-sensitive restriction enzymes.  Enzymes cut at unmethylated recognition sites but leave methylated DNA intact.  Digested fragments are prepared for sequencing with NGS.  Reads are mapped to the reference genome to identify unmethylated regions. PROTOCOL  DNA Extraction: Isolate genomic DNA from the sample.  Digestion with Methylation-sensitive Enzymes: Treat DNA with methylation-sensitive restriction enzymes that selectively cut unmethylated regions.  Fragment Size Selection: Filter the digested DNA to obtain fragments of suitable sizes for sequencing  Library Preparation: the digested DNA fragments are processed into sequencing libraries, including adding adapters for next-generation sequencing (NGS).  Sequencing: libraries are sequenced using platforms like Illumina to generate short reads 140 BIOINFORMATIC ANALYSIS 1. BWA/Bowtie2: align reads to the reference genome. 2. Bismark: detect unmethylated regions. 3. EdgeR/DESeq2: identify differential methylation across conditions MeDIP-seq identifies methylated regions, while MRE-seq reveals unmethylated areas. By integrating these two methods, we get a fuller picture of the methylation landscape. 141 BIOINFORMATIC ANALYSIS  Data Preprocessing & Alignment: Align large sequence datasets from MeDIP-seq and MRE- seq to a reference genome using tools like Bowtie2 or BWA.  Methylation Analysis:  Identify methylated regions (MeDIP-seq) using MEDIPS.  Identify unmethylated regions (MRE-seq) using Bismark.  Data Integration: Combine MeDIP-seq and MRE-seq data with MethPipe or methylKit to build a unified methylation map.  Differential Methylation Analysis: Use DSS or EdgeR to detect methylation changes across samples, such as between healthy and cancerous tissues. OUTPUTS OF MEDIP-SEQ AND MRE-SEQ COMBINATION 1. Comprehensive Methylation Map: provides a complete, detailed view of highly and weakly methylated regions across the genome. 2. Methylated & Unmethylated States: identifies methylated (MeDIP-seq) and unmethylated (MRE- seq) regions, revealing methylation strength, especially in regulatory regions like promoters. 3. Differential Methylation Mapping: highlights methylation changes across conditions (e.g., disease vs. control), offering insights into DNA methylation’s role in disease. 142 EMERGING TECHNOLOGIES TET-ASSISTED BISULFITE SEQUENCING (TAB-SEQ) 1. TET-Assisted Bisulfite Sequencing 2. Differentiates between 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). 3. Provides single-base resolution of 5hmC distribution across the genome. 4. High levels of 5hmC in neural and stem tissues. 5. Requires combination of enzymatic treatment and bisulfite sequencing. General Steps  Protect 5hmC using a selective glucosylation reaction.  Oxidize 5mC to 5fC/5caC using TET enzymes.  Perform bisulfite treatment: converts unmethylated cytosines to uracils, leaves protected 5hmC intact.  Sequence with NGS to distinguish 5mC, 5hmC, and unmethylated cytosines. BASIS OF TAB-SEQ Tet proteins, through a series of enzymatic reactions, catalyze the oxidation of 5 methylcytosine (5mC). Through this reactions they produce various derivatives, including 5 hydroxymethylcytosine (5hmC), 5- formylcytosine (5fC), and 5-carboxylcytosine (5caC). However, the unique feature of TAB-Seq lies in its ability to specifically detect and quantify 5hmC at single-base resolution WORKFLOW β 1.-GT AND MTET1 TREATMENT 4. Glucosylation Protection: adds glucose to 5hmC to protect it. The procedure uses β glucosyltransferase (βGT) to convert 5hmC to 5 glycosylmethylcytosine (5gmC). 5. Glucosylated DNA is treated with mTet1 (mouse TET1) to oxidize 5mC to 5caC. This enzymatic combination enables the differentiation between 5hmC and 5mC, allowing for specific identification of hydroxymethylated cytosines in sequencing analyses 143 3. Bisulfite Treatment: converts unmodified cytosines to uracil. 4. Sequencing: provides a pattern that reveals 5hmC positions. BIOINFORMATIC ANALYSIS OF TAB-SEQ DATA  Data Alignment:  Sequencing reads are aligned to the reference genome using tools like BWA or Bowtie2.  Goal: Map the exact locations of modifications (5hmC) in the genome.  Cytosine Modification Differentiation:  Tools like Bismark or BSMAP distinguish:  5hmC (protected)  Converted 5mC (5caC)  Unmethylated cytosines  Quantification and Distribution:  Analyze the quantity and distribution of 5hmC across the genome.  Map patterns to genomic elements (e.g., gene bodies, regulatory regions).  Visualize regions enriched in 5hmC.  Differential Analysis:  Compare multiple samples (e.g., tissues, conditions, disease states).  Tools like EdgeR or methylKit identify significant differences in 5hmC levels.  Understand the biological relevance of 5hmC changes. OUTPUTS OF TAB-SEQ 1. 5hmC-Specific Methylation Map: Generates a high-resolution, genome-wide map of 5hmC, distinguishing it from 5mC and identifying regions of potential DNA demethylation or gene activation. 2. Functional Insights into Gene Regulation: Highlights 5hmC enrichment in gene bodies, enhancers, and regulatory elements, indicating roles in transcriptional regulation and gene activation—valuable for studies on development, differentiation, and cancer. 3. Differential Methylation in Biological Contexts: Identifies differences in 5hmC patterns between samples, uncovering disease-related changes and associations with environmental factors or active transcription in specific genes. 144 CHEMICAL-ASSISTED BISULFITE SEQUENCING (CAB-SEQ) Chemical-Assisted Bisulfite Sequencing Targets 5-carboxylcytosine (5caC), a key intermediate in DNA demethylation. 5caC is primarily present in regulatory regions and developmentally regulated genes in tissues or cells with active DNA demethylation. Chemical Labeling: Label 5caC using EDC to form an amide bond with a primary amine. General steps Biotin Tagging: Attach a biotin tag to labeled 5caC for enrichment. Bisulfite Treatment: Protects 5caC from deamination while converting unmethylated cytosines to uracils. Sequencing: Perform NGS to map 5caC across the genome. Data Analysis: Identify 5caC-enriched regions and compare across conditions. WORKFLOW 1. Chemical Labeling: 1-ethyl-3-[3 dimethylaminopropyl]carbodiimide hydrochloride (EDC) serves as a chemical catalyst. EDC facilitates the formation of an amide bond between the carboxyl group of 5-carboxylcytosine (5caC) and a primary amine group. 2. Biotin Tag Integration in DNA Modification Detection  Purpose: biotin tagging targets 5-carboxylcytosine (5caC) in DNA, enabling efficient enrichment and detection of modified fragments.  Enrichment: biotin binds strongly to streptavidin, allowing selective capture of 5caC- containing fragments using streptavidin-coated beads.  Detection: biotin-tagged DNA can be identified with streptavidin-conjugated fluorescent dyes, aiding in quantification during analysis.  Cleavage: a disulfide bond allows biotin to be cleaved from DNA using DTT treatment, enabling flexible recovery of enriched DNA 145 3. PCR Amplification: perform PCR under standard conditions to amplify and generate labeled DNA fragments. 4. Sanger Sequencing: use Sanger sequencing on labeled DNA to distinguish 5caC from thymine (T) with single-base resolution. 5. High-Throughput Sequencing: integrate the chemical method with high-throughput sequencing for comprehensive, genome-wide detection of 5caC BIOINFORMATICS WORKFLOW OF CAB-SEQ 1. Alignment and Mapping 1. Sequence reads are aligned to the reference genome using BWA or Bowtie2. Allows precise mapping of cytosine modifications across the genome. 2. Identification of 5caC 1. Sites Unconverted cytosines are identified as 5fC using tools like Bismark or BSMAP. 2. Protected 5caC remains cytosine during bisulfite treatment, allowing specific detection. 3. Quantification and Distribution Analysis 1. Quantifies 5caC levels and visualizes distribution across genomic regions. Analyzes enrichment in gene bodies, enhancers, or regulatory elements. 4. Differential Analysis 1. For comparative studies, tools like methylKit or EdgeR identify changes in 5caC levels across conditions. 2. Helps uncover differences in 5caC patterns in disease states or developmental stages. OUTPUTS OF CAB-SEQ 5caC-Specific Methylation Map:  CAB-seq produces a comprehensive, genome-wide map of 5caC distribution, distinct from other cytosine modifications.  This map is essential for identifying sites of active DNA demethylation. Insights into DNA Demethylation Processes:  5caC serves as an intermediate in DNA demethylation, marking regions of active demethylation.  Mapping 5caC helps uncover active regulatory elements and the dynamics of gene activation/repression. Differential Mapping Across Biological Contexts:  CAB-seq reveals changes in 5caC levels between different biological conditions or tissues.  Useful for understanding tissue-specific gene regulation and developmental changes. 146 LONG-READ SEQUENCING Allows direct detection of DNA methylation (e.g., 5mC, 5hmC) without chemical treatments. Provides long reads spanning kilobases, enabling analysis of complex regions and haplotypes. Suitable for epigenome-wide studies, resolving repetitive sequences, and structural variation. General steps DNA Preparation: Extract high-molecular-weight DNA to maintain long fragments. Library Construction: Prepare libraries with ligation or circular templates. Real-Time Sequencing Methylation Analysis: Analyze raw signals to detect modifications like 5mC and 5hmC. Data Analysis: Align long reads to the genome to map epigenetic patterns. 147 BIOINFORMATICS WORKFLOW 1. Data Processing and Base Calling o PacBio: SMRT Analysis detects methylation by polymerase speed changes. o Nanopore: Guppy or Bonito converts signals to sequences, directly identifying methylation. 2. Alignment and Error Correction o Tools like Minimap2, Medaka (Nanopore), and Canu (PacBio) correct errors and map methylation accurately, even in complex regions. 3. Methylation Calling and Analysis o Nanopolish (Nanopore) and SMRT Link (PacBio) identify methylation patterns, offering insights into gene regulation. 4. De Novo Assembly for Methylation Mapping o Long reads enable genome assembly with methylation data, valuable for non-model organisms. ADVANTAGE OF USING LONG-READ SEQUENCING 1. Resolution of complex regions. 2. More accurate mapping. 3. Preservation of epigenetic context. 4. Simultaneous detection of epigenetic modifications. 5. Haplotype-specific analysis. 6. Speed and simplicity. 7. Applicability to non-reference genomes. 148 APPLICATIONS OF METHYLOME ANALYSIS IN CANCER RESEARCH BIOMARKERS Biomarker analysis is the study of biological indicators (biomarkers) that can give us insights into normal biological processes, diseases, or responses to treatments. Biomarkers are measurable characteristics, such as molecules, genes, or physiological traits, that can indicate a specific state in the body Traditional Biomarkers And Their Epigenetic Counterparts: A Side-By Side Exploration Biomarker Clinical tool for early diagnosis, prognosis and monitoring diseases evolution that enables clinical decision-making Epigenetic Biomarker  Any epigenetic mark or altered epigenetic mechanism which is stable and reproducible during sample processing and can be measured in the body fluids or primary types of tissue preparations. Mapping The Epigenome: How Epigenetic Biomarkers Inform Precision Healthcare Cytosine methylations are among the earliest events in carcinogenesis. However, we estimate 28 million CpG sites in the human genome. What Limits Epigenetic Biomarkers In Precision Medicine? 1. Degradation upon storage. 2. Low sensitivity of DNA methylation assays. 3. Not suitable DNA for further analysis. Stable DNA for 4 weeks at -80°C. EPIGENETIC BIOMARKERS: HOW QUANTIFICATION ANALYSIS DRIVES PRECISION MEDICINE OverviewQuantification of the epigenetic biomarkers permits understanding the stages of the disease. Technologies Methylation-specific multiplex ligation-dependent probe amplification (MS-MLPA). a. PCR -> automated sequencer. b. Quantification -> undigested sample signals & undigested control. c. Raw data analysis: free software/excel-based in-house program. Pyrosequencing. SCREENING AND EARLY DIAGNOSISScreening analysis is a process used to quickly and broadly assess the presence, level, or quality of specific components within a large sample set. Early diagnosis 149 is the identification of a disease or condition at its initial stages, often before symptoms become severe or noticeable. COLORECTAL CANCER SCREENING: A MODEL FOR EARLY DETECTION AND PREVENTION PROSTATE CANCER o Second most diagnosed cancer in men. o Differential incidence: o African American: more frequent. o European American: less frequent. o Epigenetic differences: o GSTP1, RASSF1, and APC. Which test would you use?  Several methylation markers: GSTP1, CDKN2A, DnMt3B, SCGB3A1, HIF3A.  Illumina Infinium Human Methylation 27 (HM27) and Human Methylation 450 (HM450) BeadChip: analyze gene body and intergenic regions.  Hypermethylation differences: Hypermethylation patterns differ between African American (AA) and European (EUR) populations.  Epigenome-wide DNA methylation analysis: Conducted on 76 AA prostate cancer (PCa) patients  Findings on methylation differences:  Metastatic-lethal PCa vs. no recurrence.  Regional vs. local pathological stage.  Higher vs. lower tumor aggressiveness. BREAST CANCER 1. Second-highest cancer-related mortality in women. 2. Medical imaging detection: a. Ultrasonic testing, X-ray imaging, computed tomography, magnetic resonance imaging, positron emission tomography. 3. Challenges with imaging: a. Harm to patients when using contrast agents and high-energy rays. b. Lags behind tumor progression. 4. Serum antigen protein markers: a. Cancer antigen 15-3, carcinoembryonic antigen, and cancer antigen 125. b. Underevaluate tumor heterogeneity. 5. Tissue biopsy: Expensive. Invasive. False-negative rate in small tumors. 150 Which method would you use? ctDNA-WGBS methods can be used for whole-genome base-level resolution detection of 5- methylcytosine. According to hormone and epidermal growth factor receptor expression status: o Triple negative breast cancer o Luminal A o Luminal B o Human HER2 enriched Three databases: Cancer subtype and negative control. Differentially methylated CpG sites. LUNG CANCER  Small-cell lung cancer is an aggressive malignancy.  Limited tissue availability.  2-year survival rate.  Distinct subgroups of SCLC:  ASCL1, NEUROD1, POU2F3  ctDNA.  Selected 7 methylation sites highly and linearly correlated to ctDNA fraction.  Distinguish between ASCL1 and NEUROD1 driven tumors.  EMT. EPITHELIAL OVARIAN CANCER According to the American Cancer Society, a total of 19,710 new cases and 13,270 deaths have been recorded in 2023. Non-specific or no symptoms during its early stages. Diagnostic methods: o Pelvic examination, Transvaginal ultrasonography, MRI and PET scans. o Limited sensitivity and specificity. Serum biomarkers: o CA125 and HE4: sensitivity 50-55% and specificity 90%. o CA125 is found to be elevated in benign conditions and in other non-ovarian malignancies during pregnancy. Hypermethylation at the tumor suppressor gene promoters. 151 PRIMARY GLIOBLASTOMA  Aberrant DNA hypomethylation.  Activation of the Cancer-germline gene MAGEA1.  Other recurrently hypomethylated genes: o SOHLH2, SSX2, SSX4B, SSX8, SSX9, and PAGE5.  Cryptic promoters.  5 newly diagnosed Glioblastomas (GBM).  Alternative promoters in gene bodies could be one consequence of GBM hypomethylation.  Genes affected: TNXB, PAX7, and SIGLEC11. KIDNEY CANCER TET: Converts 5mC to 5-hydroxymethylcytosine (5hmC). TET-mutated tumors are expected to accumulate 5mC compared with normal tissues. The relationship among TET mutations, 5hmC, and 5mC levels in tumorigenesis remains obscure. Renal cell carcinoma (RCC) as a model of solid tumor, which displays TET2 mutations in approximately 6% of patients. OPPORTUNITIES AND OBSTACLES IN GENOMIC RESEARCH 152 Mrna EXPRESSION ANALYSIS - TUTORING LESSON 3 Federico Manai 19/11/2024 lezione 16 mRNA Expression Analysis: A Brief Recap…  Normal mRNA expression levels and sequencing of normal tissues are currently not present.  mRNA expression levels are reported as Z-scores, a dimensionless measure used to compare data from different samples or experiments.  In RNA-seq analysis, Z-scores are used to compare expression levels between samples.  Z-scores for mRNA expression are usually computed using all tumor samples as the reference pool. The Z-score  The Z-score represents the number of standard deviations a data point is from the mean as a standard score. To calculate the Z-score, subtract the mean from the data point and divide the difference by the standard deviation.  The Z-score of a gene is calculated by comparing its expression level in a given sample to the expression level of that gene across all samples.  A Z-score of zero indicates that the gene's expression level is the same as the mean expression level across all samples.  A positive Z-score indicates that the gene is expressed at a higher level than the mean, and a negative Z-score indicates that the gene is expressed at a lower level than the mean. How to choose normalization methods (TPM/RPKM/FPKM)? The abundance of transcripts is measured digitally by reading count. To eliminate technical biases in sequenced data, such as sequencing depth (deeper sequencing depth produces more read counts for one gene) and gene length (longer gene length produces more read counts at the same sequencing level), normalization of gene expression measurements is required. RPKM (Reads Per Kilobase per Million mapped reads) was made for single-end RNA-seq, where every read corresponded to a single fragment that was sequenced. FPKM (Fragments Per Kilobase per Million mapped fragments) is very similar to RPKM. We divide the number of fragments of a gene by the total sequencing depth, and the ratio is divided by the gene length. Note that, strictly speaking, the gene length mentioned above represents the total length of exons from one gene. The difference between RPKM and FPKM is that F stands for fragments and R stands for reads. In the case of PE (Pair-end) sequencing, each fragment will have two reads, and FPKM only calculates the number of fragments that can be compared to the same transcript for both reads, while RPKM calculates the number of reads that can be compared to the transcript. The FPKM only counts the number of fragments that can be matched to the same transcript. In the case of SE (single-end) sequencing, the results calculated by FPKM and RPKM will be the same. 153 FPKM and RPKM ultimately normalize the abundance of transcripts from different samples (or the same sample under different conditions) to a standard that allows quantitative comparison by dividing both L (transcript length) and N (total number of Reads (Fragment)). TPM (transcripts per kilobase million) is very much like FPKM and RPKM, but the only difference is that at first, normalize for gene length, and later normalize for sequencing depth. However, the differencing effect is very profound. Therefore, TPM is a more accurate statistic when calculating gene expression comparisons across samples. While using TPM, the sum of all TPMs are the same in each sample. This makes the comparison of the proportion of reads mapped to a gene in each sample very convenient. The TPM normalization results are sample independent and the TPMs are guaranteed to be the same across samples; however, the FPKM and TPM are about the same for each gene in each sample, so many people still use FPKM or RPKM to compare expression values of the same gene across samples. As with any high sequencing throughput technology, the analytical method is critical to interpret the data, and the RNA-seq analysis process is always evolving. Therefore, the appropriate method should be selected based on a combination of research directions. 154 mRNA expression analysis: co-expression  When two variables vary together, statisticians say that there is a lot of covariation or correlation.  The correlation coefficient, r, quantifies the direction and magnitude of correlation.  Correlation is used when you measured both variables (often X and Y), and is not appropriate if one of the variables is manipulated or controlled as part of the experiment.  Values of the two variables are almost always real numbers (not integers, not categories, not counts).  The correlation analysis reports the value of the correlation coefficient. It does not create a regression line. If you want a best-fit line, choose linear regression. Note that correlation and linear regression are not the same.  Correlation computes a correlation coefficient and its confidence interval. Its value ranges from -1 (perfect inverse relationship; as the value of one variable goes up, the value of the other goes down) to 1 (perfect positive relationship; as the value of one variable goes up, so does the value of the other). A correlation coefficient of zero means that there is no correlation at all between the values of the two variables.  Correlation analysis also reports a P value that can be used to test the null hypothesis that the data were sampled from a population where there is no correlation between the two variables (in other words, the null hypothesis is that r = 0).  The difference between Pearson and Spearman correlation is that the confidence interval and P value from Pearson's can only be interpreted if you assume that values from both variables are sampled from populations with a Gaussian distribution. Spearman correlation does not make this assumption.  The quartile analysis can help sometimes to investigate possible correlations.  Percentiles are useful for giving the relative standing of an individual in a group. Percentiles are essentially normalized ranks. The 80th percentile is a value where you'll find 80% of the values lower and 20% of the values higher. Percentiles are expressed in the same units as the data.  The median is the 50th percentile. Half the values are higher; half are lower. Rank the values from low to high. If there are an odd number of points, the median is the one in the middle. If there are an even number of points, the median is the average of the two middle values.  Quartiles divide the data into four groups, each containing an equal number of values. Quartiles are divided by the 25th, 50th, and 75th percentiles, also called the first, second, and third quartiles. The comparison between the high quartile (Q4, 75th percentile) and the low quartile (Q1, 25th percentile) can help in the identification of correlations. 155 NGS-BASED METHODOLOGICAL APPROACHES FOR CHROMATIN FUNCTIONAL ANALYSIS Groupwork 4 25/11/2024 lezione 17 CHROMATIN STRUCTURE Chromatin is the complex of DNA and proteins found in eukaryotic cells. It actively regulates gene expression through histone modifications that impacts DNA accessibility and gene regulation. Organized in: Nucleosomes: o DNA wrapped around histone proteins core (H2A, H2B, H3, H4). o Linker DNA with histone H1 Euchromatin Heterochromatin. Epigenetic regulation of gene expression dictate how genes are turned on or off without altering the DNA sequence itself:  Chromatin Modifications: Chemical modifications, such as DNA methylation and histone modifications, act as epigenetic marks. These marks either recruit or repel proteins that regulate transcription, thereby influencing gene activity.  Differential Accessibility: The chromatin state determines whether transcription factors and other regulatory proteins can access DNA. Open chromatin promotes gene expression, while closed chromatin silences genes.  Architecture: The 3D organization of the chromatin within the nucleus affects interactions between enhancers, promoters, and other regulatory elements, influencing gene expression patterns. NGS role in Chromatin studies NGS enables mapping of chromatin states and epigenetic modifications at genome-wide scales. Provides deep insights into how chromatin structure and epigenetics affect gene expression. Crucial for understanding gene regulation mechanisms and developing therapeutic strategies. Type of analysis NGS approaches for chromatin modification analysis NGS techniques for chromatin accessibility analysis NGS techniques for chromatin architecture analysis NGS technique for chromatin MODIFICATION analysis 5 Core Histone Marks Used in ChIP-seq analysis to map key chromatin states and gene regulatory elements in transcription H3K4me1 & H3K27ac, associated with enhancer regions. o Activates transcription H3K4me3, associated with promoter regions. o Activates transcription H3K36me3, associated with transcribed regions in gene bodies. o Activates transcription H3K27me3, associated with repressed regions. 156 o Represses transcription H3K9me3, associated with heterochromatin. o Represses transcription CHIP- SEQ Chromatin Immunoprecipitation followed by high-throughput short-read sequencing is a key technique in epigenomics, allowing for: Mapping histone modifications Identifying Transcription Factors Binding Sites (TFBS) Creating chromatin state maps This technique allows researchers to study how chromatin structure influences cell identity, development, gene expression, and disease. Workflow  Crosslinking:  Formaldehyde fixes targeted proteins to DNA  Fragmentation: Sonication or restriction enzymes  Immunoprecipitation: o Incubated with specific antibodies coated in magnetic beads to help with isolation. o Unbound proteins are washed  Reverse Cross-linkage: o Protein is released from DNA with extensive heat o DNA is purified  Library Prep: o Ligating adapters are added o PCR amplification  NGS: Illumina Computational analysis  Read Mapping & Alignment: o Bowtie or BWA o Redundant reads are filtered  Peak Calling: o MACS, SPP, PeakSeq identify enriched genomic regions of the targeted protein. o Analyzes gene regulation and epigenetic changes  Chromatin-State Annotation: Classifies genomic regions based on chromatin features  Motif Analysis: Examines sequence patterns within identified peaks or epigenomic regions to predict TFBS. o De novo motif discovery o Motif scanning  Visualization: o IGV or UCSC Genome Browser 157 CHIP-EXO CHROMATIN IMMUNOPRECIPITATION (CH IP) & EXONUCLEASE DIGESTION (EXO) Very similar to ChIP-seq but adds an additional λ exonuclease step to acquire smaller DNA fragments (unlike ChIP-seq which has large fragments) for short-read sequencing.  Precisely maps TFBS and histone modifications at single base resolution. o More accurate than ChIP-seq o Useful for overlapping binding sites or areas of weak interaction. Workflow Cross-linking: o Formaldehyde fixes targeted proteins to DNA Fragmentation: o Sonication or restriction enzymes Immunoprecipitation: o Incubated with specific antibodies coated in magnetic beads to help with isolation. o Unbound proteins are washed Exonuclease Treatment: o λ exonucleases digest any DNA that is not adjacent to the protein binding site Reverse Cross-linkage: o Protein is released from DNA with extensive heat o DNA is purified Library Prep: o Ligating adapters are added o PCR amplification NGS: Illumina Computational analysis Read Mapping & Alignment: Bowtie or BWA Peak Calling:MACS, SPP, & PeakSeq Chromatin-State Annotation: ChromHMM & Segway Motif Analysis: MEME & FIMO Visualization: IGV or UCSC Genome Browser 158 CUT&RUN CLEAVAGE UNDER TARGETS & RELEASE USING N UCLEASE More sensitive than ChIP-seq, requires less starting material and reduces background noise, making it ideal for precise mapping analysis of regulatory elements. Short-read sequencing Protein is kept in its native state (No cross-linking) Workflow Cell Preparation: o Cells are immobilized on magnetic beads coated with ConA o Permeabilized with detergents (Trixon X-100) Primary Antibody Incubation: o Binds to chromatin-bound protein of interest MNase Incubation: o Binds to targeted antibody Fragmentation: o Ca activates MNase which cleaves adjacent chromatin DNA Purification: o The remaining DNA is washed Library Prep o Adapter Ligation o PCR amplification NGS: Illumina 159 Computational analysis Read Mapping & Alignment: Bowtie or BWA Peak Calling:MACS, SPP, & PeakSeq Chromatin-State Annotation: ChromHMM & Segway Motif Analysis: MEME & FIMO Visualization: IGV or UCSC Genome Browser CUT&TAG CLEAVAGE UNDER TARGETS & TAGMENTATION Short-read sequencing that identifies protein-DNA interactions and histone modifications with high sensitivity and low background. Simplified & faster workflow derived from CUT&RUN Protein is kept in its native state (No cross-linking) Can use single-cell platforms (tagmentation instead of fragmentation) o Less handling steps, reduces the chances of sample degradation Workflow Cell Preparation: o Cells are immobilized with magnetic beads coated with ConA o Permeabilized with detergents (Trixon X-100) Primary Antibody Incubation: o Binds to chromatin-bound protein of interest Tn5 Transposase Incubation: o Preloaded with sequencing adapters o Binds to targeted antibody Tagmentation: o Mg activates Tn5 transposase which cleaves adjacent chromatin o Simultaneously ligates sequence adapters to DNA DNA Purification: o The remaining DNA is washed Library Prep o PCR amplification NGS: Illumina 160 Computational analysis Read Mapping & Alignment: Bowtie or BWA Peak Calling:MACS, SPP, & PeakSeq Chromatin-State Annotation: ChromHMM & Segway Motif Analysis: MEME & FIMO Visualization: IGV or UCSC Genome Browser Comparative summary ChIP-seq: Broad genome-wide mapping with moderate resolution. o Abundance of cells o Don't need high-resolution binding site mapping ChIP-exo: Genome-wide with base-pair precision. o Abundance of cells o Need high-resolution mapping at base-pair level o Studying overlapping or weak binding sites CUT&RUN: Genome-wide with high sensitivity and low background. o Working with few cells CUT&Tag: Genome-wide with fast profiling. o Working with few cells o Requires a single-cell approach Ngs techniques for chromatin ACCESSIBILITY analysis ATAC-SEQ TRANSPOSASE-ACCESSIBLE CHROMATIN USING SEQUENCING Atac-seq is a relatively recent and highly sensitive technique. It is especially valuable because it does not require prior knowledge of regulatory elements, making it a powerful tool for epigenetic discovery. One of the most exciting aspects of ATAC-seq is its ability to map open chromatin regions, which are regions of the genome that are accessible for regulatory proteins to bind. This is critical for understanding gene regulation, transcription factor binding, and chromatin dynamics in a variety of biological contexts, from complex diseases to cancer, developmental biology, and immune system activation. ATAC-Seq can be performed on bulk cell populations or on single cells at high resolution 161 Methodology Sample Preparation The sa