Genetics and Epigenetics (6 CFU) PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document discusses genetics and epigenetics, focusing on the regulation of genes and their role in understanding genetic diseases. It covers common topics, classifications of genetic disorders, and important online resources.
Full Transcript
Genetics and Epigenetics 1. Genetics: a fundamental skill for medical biotechnologists 01/03/2022 The regulation of the genes (genetic and epigenetic) is a key aspect to understanding genetic diseases. →...
Genetics and Epigenetics 1. Genetics: a fundamental skill for medical biotechnologists 01/03/2022 The regulation of the genes (genetic and epigenetic) is a key aspect to understanding genetic diseases. → common topics: why a genetic disease is affecting children or elderly people, or some tissues and not others, why some mutations are able to make a predisposition to every type of cancer or just some types of cancer, etc. Thousands of genetic disorders are known in humans, in general they are very complex because our body is complex. Some of them are mendelian and therefore one gene causes a single genetic disorder (single phenotype), some of them are complex disorders and the genetic basis are not clearly understood, some of them are due to epigenetic disorders. Important links for the study of genetic disorders: ➔ https://www.omim.org/ ➔ https://www.ncbi.nlm.nih.gov/clinvar/ The list of the known genes is increasing continuously, because we discover new genes that are involved in some clinical phenotypes. Some disorders are difficult to associate with a specific gene and this is due to the fact that, in these situations, the causative genes can be several with also functions not clearly understood, while mendelian disorders can be easier to be classified, unless they are very rare. For many mendelian disorders the identification of the causative gene(s) has not yet been achieved. Pleiotropy occurs when a gene influences two or more seemingly unrelated phenotypic traits. A pleiotropic gene is able to affect and give different phenotypes, not only diseases’ phenotypes, but also phenotypes described in physiological features. Mutations in a pleiotropic gene may have an effect on several traits simultaneously. However, the pathological phenotype of human disease depends on the interaction of the causative gene(s) with several cis- (same segment of the chromosome, example promoters) and trans-acting genetic factors (proteins). These factors represent the genetic background of a person → in the presence of the same gene different people can express different phenotypes, because of the interaction with additional genes producing different genetic factors. 1 In the language of mendelian genetics: Expressivity → the degree to which a phenotype is expressed by individuals having a particular genotype. Penetrance → the proportion of individuals carrying a particular variant (or allele) of a gene (the genotype) that also expresses an associated trait (the phenotype). Pleiotropy → a single genotype can give more phenotypes. Complex genetic disease → genetic disease caused by the interaction of multiple genes and environmental factors. Complex diseases are also called multifactorial. Examples of complex diseases include cancer and heart disease. Constitutive condition → not modifiable and not subjected to regulation, a condition always present, in all the tissues and the situations. Copy number variation (CNV) → when the number of copies of a particular gene varies from one individual to the next. Single nucleotide polymorphism (SNP) → substitution of a single nucleotide present in at least 1% of the population. Epigenetic mutation → mutation that does not directly affect DNA sequence but its expression through modifications on the chromatin state. For complex genetic disease, a further level of uncertainty is due to the interaction of several additive loci with the environment. Indeed, in laboratory mice, identical mutations often produce distinct phenotypes across different inbred strain backgrounds. An issue that must be considered when designing the constructing mouse models for human disease. → same mutations in different strains and so correlate the phenotype. An epigenetic trait is a stably heritable phenotype, due to features not implying a change in the DNA sequence. Molecular mechanism involved: histone changes, DNA methylation at CpG islands in promoters, ncRNAs. Epigenetic modifications may be also reversible, thus offering on the whole a powerful and flexible tool for gene regulation. Epigenetic changes can be affected by age, diet, smoking, stress, and disease. The human genome organization The coding protein regions are very small portions of the genome. In the genome we have the main nuclear genome and the mitochondrial genome. There are some conserved regions, less conserved regions, heterochromatin regions and so on → it is possible to describe the genome in several ways: by a comparative aspect (conservation or not), by a functional aspect (nuclear or mitochondrial), by the chromatin organization. All these kinds of 2 definitions are important because the genome can be understood by its complex regulation, related also to the chromosomal organization of the genome. In chromosomes genes are transmitted to the proteins and they can be expressed whenever they are placed, in general. An exception is the reciprocal translocation where the genes are under positional effects and they change their level of expression. Example of FISH technology where the telomeres are labeled with this approach. It is important to know where the genes are located, but also how they are expressed and how they are placed in the 3D structure of the nucleus. The human genome is organized in chromosomes. The human genome organization is more complex that previously understood and which of copy number variation. At the beginning, the copy number variants were not understood completely, as in the first draft of the human genome where the repetitive sequences were discarded and considered a noise. Segmental duplications → low-copy repeats of genes, they have an evolutionary meaning. Copy number variants → it can vary between homologs, a single allele repeated twice in the right chromosome and once in the left chromosome, but it is also present in different positions. Segmental duplications can be copy number variants → combination of the two concepts, which leads to the large heterogeneity among subjects. Structural variations in the human genome are 1kb (technical limit given by the discovery of these repeats) – several Mb and cover about 15% of the human genome, it depends how the definition is given. Per each human subject, about 5 CNV are present, which can be population variants (polymorphisms) or “private” variations (mutations). The presence of CNVs is associated with a gene dosage issue, as well as it may represent a cause for further mutation/genomic instability, indeed the presence of CNVs confers flexibility to the genome that can lead to further mutations and genomic instability → this is because whenever there is repeat this one can recombine with additional part of the genome leading to an equal recombination product, which are mutations. Not only because they 3 change the dosage level but also because they offer to the genome the possibility to recombine in a wrong way and to introduce mutations. Genomic disorders: a new medical genetics discipline There is a new field in human genetics based on the discovery of human heterogeneity. In the top of the figure, you can see the classical idea that gene dosage differences can lead to detrimental phenotype. Examples: The BAR mutation in drosophila is shaped in different ways in the drosophila eye. This mutation is caused by a duplication in a certain segment of the X chromosome of drosophila. A single duplication is leading to this mutated shape of the eye. At the beginning of the history of drosophila genetics, they observed that if the duplication is carried as a heterozygous condition the females can produce a triplicated locus, because of an unequal crossing-over between the normal chromosome and the duplicated one. In this way, some gametes transmit duplicated alleles, some others can transmit the normal allele starting from the duplicated one. Some important neurological disorders, in which we can discover by FISH technique that in one chromosome duplication is present. The discovery of the presence of triplicated loci was also associated with a more severe disease. When we have a clinical phenotype, and there is suspicion to be under some genetic defect, we can infer that, in the absence of additional evidence of chromosome duplication, segmental duplication, and other evident causes, there is a new copy number variant to be discovered. Chromosomes’ discovery Only in 1956 it was possible to define the correct chromosome number in humans → entire karyotype. Due to a mistake of a person with a wrong molar concentration of a solution, he was able to spread the chromosomes from one metaphase plate. Before this mistake chromosomes were prepared by spreading them physically on the slide, pressing the cells and it was not so efficient. Incubation with 4 hypotonic solution is the crucial step to separate chromosomes into metaphase cells (remember: NO nuclear membrane at mitosis). Chromosomes can be banded, and it is one of the major tools in cytogenetics. Before the discovery of chromosome banding only size and centromere position were available features for human karyotyping. Definition of chromosomes in terms of size and centromere position: In the human genome each chromosome pair has a specific banding pattern → staining chromosomes homogeneously. The banding is typical for each specific chromosome and it allows to distinguish the organization of the different chromosomes. Examples: Chromosome 1 → two black stripes in the upper arm. Chromosome 2 → the black bands are evenly distributed with respect to chromosome 1. Chromosome 3 → single dark band instead of the 2 typical of chromosome 1. It is important to know how to interpret the banding, because chromosomes can be observed at different resolutions and at different condensation states. Chromosomes are highly condensed in metaphase, so after the discovery of banding we have learned how to prepare the chromosomes not too condensed. Chromosomes may be collected at metaphase or prometaphase, thus allowing different banding resolution. We can study chromosomes at different resolution levels and so having different total number of bands, the greater the resolution the greater the number of bands per chromosome. We need a specific tool to define the presence of an alternation of bands. The presence of these bands is the first way to describe if a karyotype is normal or not: it is possible to observe if there is an additional chromosome, a missing chromosome or a rearrangement between different chromosomes, but also because we can discover duplications. 5 Here human chromosome 4 shown at: The dark regions are the part highly condensed. The banding is something flexible and it reflects the chromatin remodeling → this could be an additional tool to predict in which region the genes are expressed or not. 2. Methodologies of molecular investigation, from cytogenetics to genomics 3/03/22 In the human genome each chromosome pair has a specific banding pattern. The banding pattern is a staining method that allows us to understand how the chromosomes are organized. With this method it is possible to note that the chromosomes are not stained homogeneously, because the dark and light bands are related to the different condensation of the chromatin. The banding pattern is stable, so it cannot vary, and this tells us that this kind of organization has a functional and biological meaning. This table summarizes the major techniques of chromosome binding. Some of them are based on classical microscopy and others on fluorescence microscopy. The first evidence (1970) came from a fluorescent dye which is Quinacrine, defining this banding Q- banding. The quinacrine is a fluorescent dye with specific features, it was the first demonstration of the presence of these alternations of condensed and less condensed regions in the genome. The molecular activity of quinacrine was to bind specific AT pairs, and so AT pairs rich regions (at least three AT pairs) appear bright and regions of GC pairs are less bright. It was demonstrated that the banding was related to the AT content or to the GC content, which is an average proportion in the genome. Now it is possible to assume that there are regions with different base content that change sharply along chromosomes: a band is followed by a band of different intensity. And therefore, there is a sharp change in the GC content in some regions of the genome. This is something related to the organization of the chromosomes. 6 This method was useful to let us understand the chromosomes’ biology and not only the function. This pattern is coincident if we apply different banding methodologies. The fluorescence bands called Q-bands mark the same chromosomal segments as G-bands. They are the same, apart from the alternation that in this case depends on the intensity of fluorescence and not in the dark color. A single coincident pattern is found, irrespectively of the banding technique applied. We can describe the same karyotype with different techniques, due to the fact that we are interested in the interpretation of the banding patterns in terms of molecular activity of the dyes. G-banding is due to the staining with the Giemsa, a very well-known dye for the cells. This banding is obtained digesting with trypsin for a very short time the chromosome preparation. Otherwise, we will obtain an even homogenous staining of the chromosomes. With this method it is not the dye per se that it is making the banding but the trypsin and heat treatment: how much the chromosomes are able to be more or less stainable is related to their capacity to renature. The propensity to denature and renature depends on the chromatin condensation as well as the chemical properties of the bonds: for example, CG regions have a greater number of chemical bonds, because every GC pair has 3 hydrogen bonds instead of 2, and this influences the staining. Another interesting banding is R-banding that is the reverse of the classical G-banding: what is dark with the Giemsa staining here is pale. This is because the chromosomes are heat-denatured in saline before being stained with Giemsa. In this case, with the heat-shock we obtain the same thing but with a reverse pattern than with the trypsin digestion. The other 2 banding are no longer applicable. C-banding is a method to label the chromosomes at the centromeres, but now we can do this with the FISH technology which is better. T-banding is a method to label telomeres. 7 In silico bands (blue) generated by GC content resemble Giemsa bands, using the data from the human genome draft. We can see that the bands represented in grey are the classical ideogram of the chromosomes according to the G-banding, and looking at the blue regions it is possible to see the alternation according to the GC content. The coincidence is remarkable. Also note that in the chromosomes there is a gap at the centromeres because we had no idea about the centromeres with the first genome project because these regions were discarded. In general, the bands that are in blue in the in-silico analysis were coincident with the interpretation of the GC content in dark and pale bands. The banding is a tool to describe the karyotype at specific regions. According to the number and position of the bands we can describe the chromosomes in arms or specific subregions. Each chromosome can be divided into 2 arms: in some cases, the centromere is in the middle of the chromosome, but it is always possible to identify a p arm (shorter) and a q arm (longer). We can see a list of numbers, which describes the different regions of chromosomes and in this way is possible to locate genes. The first number before the dot represents the region and the number after represents the subregion. Every region and subregion of the arm is starting to be numerated to the centromere to the telomeres. This kind of numeration is human specific. 8 It is possible to see the 3 red regions indicated in the 3 large acrocentric chromosomes and the 2 small acrocentric chromosomes in the bottom, and these regions, in the p arm, are enriched in ribosomal genes. These regions can recombine all together because the homology is very high. They also are placed all together in the so-called nucleolar region inside the interphase nucleus. Because of this location (the location is due to their common function) these regions can recombine because of their homology and their close position and this can lead to some chromosomal rearrangements which are recurrent in our species, called Robertsonian translocation: rearrangement involving two different acrocentric chromosomes. The chromosome banding is a tool able to make a description of chromosome rearrangements, because deletions, duplications, and inversions (it is the reverse of the expecting banding) are visible. In some cases, we can also interpret by the banding the molecular explanation of some human diseases. There is a syndrome, not so rare among new- borns in humans, known as the cri-du-chat syndrome (5p-). This is a severe defect that can be discovered at birth because the babies have a special cry more similar to that of cats. It is a very severe mental retardation syndrome with other clinical symptoms, and actually this syndrome is not due to a mendelian mutated gene, but instead it is due to a loss of the terminal band of chromosome 5 in the p arm. In the picture are represented chromosomes 5: the normal one (left side) has a pale region at the telomere and the defective one (right side) has the p arm terminating with a dark band and the pale one missing. The extent of the deletion can be variable and looking at the common regions among the different subjects affected by this pathology it is possible to understand how many genes can be involved in the pathology. Indeed, we have no clear idea of the segment: probably it is present the hTERT gene (important for the telomerase activity), but it is not the only one present in that region and probably this is a case of haploinsufficiency. In this case we have lost in the double normal dosage one or more gene functions and this is leading to the phenotype. The phenotype is the direct outcome of the deletion. The ISCN (International System for Human Cytogenetic Nomenclature) publishes regularly (every 4 years) the recommendations of the International Standing Committee on Human Cytogenetic Nomenclature, for the interpretation and communication of human cytogenetic and molecular cytogenomic nomenclature. Chromosome bands have great theoretical and practical significance We can understand the evolutionary history of the karyotype studying the chromosome bands. It should be clear that they represent units of chromosome organization. They can also represent a key epigenetic role in gene regulation: the different concentration of GC can influence the methylation of the DNA. We can see how these bands are involved in the so-called preferential fragility of the fragile 9 sites. They allow us to identify and locate the karyotype abnormalities (they represent a barcode of the chromosomes), and for that they were used to construct the physical maps of the human genome. Bands as fundamental units of chromosome organization, with a key role in gene regulation: a longstanding concept. In this very old table (1998), the first idea that the chromosomes were banded for some intrinsic biological features derived from the comparison between G-bands and R-bands. Talking about the average content of AT and GC rich regions in the human genome, we have to consider that the situation is different for the centromeres: they have a content which is clearly far from the average content of the human genome, and this because the centromeres are repeated sequences of AT elements. The average content in the human genome is 41% for the GC pairs. Then, it was clear that apart from this feature (AT rich and GC rich content) we could understand additional features, for example the timing of replication of the genome. In general, we know that heterochromatin is definitely later replicated in the human genome, because observing the centromeres (highly heterochromatic) they always are the last portion of the genome under replication. Indeed, if you think about the huge task of DNA replication, how much time do human cells need to replicate in culture? It depends on the cell type: 48h for primary cells, some of them can also replicate in 24h. If you compare different cells with different properties in general, what is different in the cell cycle is the extension of G1: it can be prolonged or not. The S phase is in the same order of 8-9h, which is the time necessary to complete the replication of the entire human genome. During this time there is an order: some regions are early replicated with respect to the others. Not all the replication origins are activated at the same time, this is reasonable because of the large number of sequences to be replicated. What is unexpected is that the alternation of earlier and later replication sequences is associated with the chromosome banding. Considering the G-bands we can see regions of late replication in the dark bands and early replicating regions in the pale bands. How was it possible to discover this feature? (doi: 10.1101/cshperspect.a010132) The definition of the replication timing pattern is possible, giving to cultured cells a precursor of DNA for 20-30 minutes (not more than 1h), which can be followed by labeling with fluorescent molecules, 10 and collecting the number of cells which display according to the moment in which they were labeled. The nuclei are seen in interphase: in general, you will see these spots within the nucleus (a part the nucleolus that remains dark, because it will be replicated in the middle-late of the S phase together with the periphery of the nucleus), and finally the heterochromatic portion. It is possible to define a progression of 5 stages. To understand the pattern of replication another technique was applied (in the 70-80s) and it is the so- called replication banding. It is a dynamic representation of chromosomes (not static as the G- banding). The principle is detecting chromosome replication using bromodeoxyuridine (BrdU) labeling. It coincides with G(Q)/R banding, indicating that timing of replication fully explains the meaning for these technical patterns. The chromosomes appear banded, which is obvious because not all the regions of the interphase nucleus are homogeneously replicating. The dynamic property means that what is dark in one phase could be pale in another temporal phase: in the picture we can see the example of chromosome 1. The replication bands were compared with R- and Q-bands. The early replication pattern (E) corresponds to that of the R-bands (R), while the late replication pattern (L) corresponds to that of the Q-bands (Q). If we collect chromosomes labeled during the S phase, we will see for the same chromosomes a pattern that depends on the moment in which we had administered that precursor. If in that moment one chromosome was at the beginning of the S phase, it will display an early alternation of fluorescent or not fluorescent bands, and the same pattern will be found with the reverse pattern if we collect the chromosome that was later replicated in that moment. 11 By associating all these regions (early, mid, or late) with the classical static banding, it was possible to understand that each chromosome is consisting of an alternation of early and later replicating regions. They had the idea that the concentration of the genes was different along the chromosomes: we have regions which are enriched in coding genes and others not so rich. This is not trivial because then we can try to understand why some chromosome regions are more prone to chromosome rearrangements. Is it possible that regions that do not involve coding genes are less problematic in human diseases? Is this due to the fact that the chromatin organization makes some regions fragile and others not? There are a lot of questions to better understand the genome organization, functions and consequences for human diseases. G- and R-bands are differentially enriched in some elements as the LINEs (long intermediate repetitive DNA sequences) or the SINEs (short intermediate repetitive DNA sequences). The enrichment in LINEs of the gene bands is due to the transposition of these elements at that specific sequence. The consensus sequence is enriched in TA and therefore clearly, we can have more LINEs introduced in the bands that correspond to the AT rich regions. Instead, the SINEs that are present abundantly in the R- bands probably correspond to the old Alu elements. Molecular cytogenetics From karyotyping to chromosome function and genome organization The in-situ hybridization is a technique proposed earlier in the history of the cytogenetics. Gall and Pardue in 1969 used ribosomal RNA labeled with tritium to demonstrate that nucleic acid hybrids may be detected in situ. The centromeric regions are detectable by nucleic acid hybridization with radioisotopes (a technique not more used) but unfortunately in the chromosome preparation there was a lot of background noise. The technique was proposed very earlier but it was not applied for a lot, with the introduction of the fluorescence it became possible to re-use the principle of the in-situ hybridization. In the picture we see that the centromeric probes were used but without radioisotopes, but instead with the labeling of the probes with fluorescence. It is important to address that, apart from the basic principle of this technique, you can play a lot of variable protocols if you are aware of the principles of the method. 12 Using nucleic acids is clear that denaturation is necessary and so denaturation must be achieved without affecting the structure of chromosomes and also chromatin must be preserved. This problem was solved with the FISH (Fluorescence In Situ Hybridization) strategy and it could consist of many different strategies: from the detection of the centromeres to the detection of the entire chromosomes. The principle is always the same: you have to denature your target and to have a probe that must be labeled and so you will detect the hybridization on the target sequence. To do epigenetic studies you need some very specialized approaches, because you must be able to apply immunofluorescence and FISH (coupled protocol) with which in general you can detect some modified histones along the chromosomes in a specific region that is detectable by fluorescence in situ hybridization. So, we can say that it is possible to use these kinds of techniques but usually there is a very specific application and in general they are not applied in chromosome preparation, because the resolution of the FISH is completely different from the resolution of PCR and other specific techniques. Using the FISH technique, the dark and pale bands contain millions of base pairs so you have never a fine resolution of the chromosomes, you have instead a picture which is important to orientate you, but you are not able to distinguish details. Applications of immunofluorescence 1994: Craig and Bickmore provided evidence for the heterogeneous organization of human chromosomes by using molecular cytogenetics. In this picture you can see in blue the banding of chromosomes with a type of staining called DAPI: it is a fluorescent staining detecting with different brightness the AT and GC rich regions. On the left side you can see the emerge of different colors: green obtained with one of those DNA analogues that can be able to detect the late replication banding (bromodeoxyuridine). This bands correspond to the AT regions (brighter); red there is the location of CpG islands, they are defined by one antibody that is detecting the methylation of these sequences. 13 Not all the genome pictured here is late replicating or CpG islands enriched, there are also some dark regions with no matches, but in general what is green it is not red and in general it is because CpG islands are enriched in the early replicating genome. In this way it is possible to investigate the gene content also by comparison between chromosomes. Some chromosomes have an enrichment in the telomeric ends. Different probe types are commercially available or can be designed for metaphase and interphase multicolour FISH. The interphase studies are useful because the majority of the cells in our body are not replicating and also because the analysis of the metaphase stage is very long and complex. 1. Probes for the entire chromosomes → in interphase we will see very large fluorescent domain 2. Probes for the different arms of chromosomes → to study how the different chromosomes are related to the nucleus 3. Centromeric probes → very powerful because in the interphase nucleus we can define the presence of two probes, and this is important in prenatal diagnosis, because we can expect that Down syndrome would have three fluorescence signals if we probe chromosome 21. 4. Telomeric probes 5. Use of probes to detect rearrangements within the nucleus → application in blood cancers that are associated with very well known chromosome rearrangements. Knowing the location of this chromosome rearrangement you can construct different probes with different colors that can display if reciprocal translocation is present or not by changing the pattern of the expected or unexpected color within the interphase nucleus. There was a very quick development of fluorophores that can be used to label the DNA and to be detected easily in order to make a lot of analysis. With a look at the molecular biology rules: hybridization conditions must be defined taking into consideration the complexity of the target sequence and its probe. Don’t forget that chromosomes are in part repetitive material and this one can be a problem giving a lot of background noise. 14 When you think about the chromosome probe you are not using a probe that is as long as the chromosome, because you have very fragmented pieces of DNA in order to have the better hybridization property. The fluorescence is the result of several probes labeled in the same way. The DNA, especially the human genome, is an alternation of single copy and repetitive elements. In the picture the white panels represent the single sequence and the grey parts represent repetitive elements. The problem is that: when you fragment the probe to have the better efficiency in hybridization you label either the white segment or the grey one. Starting from your fragmented probe, you have generated fragments labeled to single copy and so detecting the target or to the repetitive regions that can hybridize everywhere in the nucleus. So, these almost grey or totally grey segments are a big problem for the detection, and to avoid this background noise the hybridization is made under competitive conditions, in which an excess of repetitive sequences is added to the probe (non labeled). This procedure is necessary to make a pre-annealing: at this stage the repetitive elements will pair with the labeled fragmented pieces of the probe (the repetitive sequences are very quick to hybridize in comparison to the single copy genes). This step will be done in about 1h and then we will be able to hybridize the probes having single and sharp signals, without the background noise. Whole chromosome probes may be derived: - from flow-sorted chromosomes (because of the different size of the chromosomes). After flow- sorting it was possible to have the separated sequences to start this kind of applications - by microdissection → very useful to have arm- or band-specific probes In the last 10 years new probes have been developed to balance all the features that are necessary to achieve the hybridization and avoid the loss of the signal and the background noise. These probes are known as PNA (Peptide Nucleic Acid): they are artificially synthesized polymers forming PNA:DNA hybrids 15 which are more stable than DNA:DNA ones. Use in FISH is increasing. Mouse painting probes on mouse chromosomes 2 (red), 4 (green) and 6 (yellow). Mixing red and green to obtain the yellow. Useful if you are looking for some translocations of rearrangements. The chromosomes in laboratory mice are all acrocentric with the centromere at one extremity. Mice are subjected to Robertsonian translocations because of the presence of acrocentric chromosomes. Human chromosomes 1, 2 and 4 are identified by fluorescent probes. Here chromosome 4 is involved in a reciprocal translocation with an unknown chromosome. Multicolour FISH is helpful to identify multiple anomalies, as in cancer genetics. Image analysis and specific sets of fluorophores are required. The goal is the discrimination of autosomes and sex chromosomes (in humans, 24 chromosomes). Minimum set of 5 fluorophores allows distinguishing all the chromosomes because they are labeled with single or paired fluorescence that can be read by the detector system more efficiently. M-FISH: each of 5-7 filters capture the fluorescent signal, then a merged image is obtained. N = (2n–1) combinations = minimum 5 fluorophores necessary 16 8/03/22 When using Interphase FISH Advantages: - we can use it in situations in which metaphase is not available for investigating chromosome: in not proliferating tissues - when the cell sample is small , as in prenatal diagnosis - to speed the analysis, for example to make a diagnosis, as in prenatal diagnosis Limitations: We are observing an interphase nucleus so what is really found is the number of spots visualized, that can be more or less sharp. So we have to try to select the best cells to be inspected but we also have to use some internal controls. For example controls with probes that are expected to detect normal parts of a chromosome or normal entire chromosomes when we are looking for numerical alteration. For example in red or in green we can detect chromosome 1 and chr 15. The small spot is probably because we chose to detect the chromosomes through the centrometers (centromeres can be chromosome specific, so we can use probes specific on the centromeres to detect the corresponding entire chromosome). Instead using the whole chromosome probes the risk is to have a very large fluorescent domain and therefore more dots than robust evidence. In interphase FISH multiple probes are necessary for control. By using pair of chromosomes we can distinguish trisomy conditions that are clear when we have 3 fluorescent spots (example trisomy 15 in figure C). In this case trisomy 15 can be distinguished from the signal in figure G (trisomy for both the chromosomes investigated, in this case 16 and 18). It is obvious that you do not have the entire karyotype detected but because 2 pairs of chromosomes both show 3 signals, probably the cell is …..so this is a way with which you can be sure that you are enumerating the chromosomes correctly. Less clear is the evidence for Monosomy: the loss of one signal in a single cell can be an artifact but we can test many cells to test recurrence of this signal. 17 Picture on the left: the scheme of the chro 1 in the human genome in the region that can be duplicated when a syndrome called Charcot- Marieis tooth is present. Actually the idea that the syndrome was due to the duplication of a small portion of the genome was obtained by cytogenetics ; to demonstrate a duplication with genomics at that time was almost impossible, was considered an artifact. Instead with cytogenetic approaches it was clear that a certain region of chromosome 17 was duplicated. Were used 2 probes - a specific probe here represented in red in a limited region of interest of the entire locus(in general we avoid to fish with large regions because it leads to more background noise). If duplicated we expect to see 2 separate spots that are separated but also close. In the picture we see the duplication as 2 close red spots and the normal 17 chromosome as the single red spot. The other red spots are background noise. - But also a second probe, the green one, was constructed, labeling a different portion of the chromosome 17. So we will expect to find the green spot close to the single red spot or to the double red spot (when there is the duplication) On the picture on the right there is one of the several translocations known in cancer. In cancer cytogenetics is complex because the cancer cells have an unstable karyotype and sometimes we are not able to predict their development during cancer proliferation, but some translocations and alterations are sometimes clear markers like it is the philadelphia chromosome. It is a very small marker chromosome, the outcome of the reciprocal translocation involving chr 22 (one of the smallest) and 9. It is observed in the large majority of patients with chronic myeloid leukemia, almost 90 % of them. It is associated with the development of strategies of therapy with different drugs. The classical idea is that you know in advance what kind of rearrangement you are looking for, you have to design probes to recognise either the normal chr 9 or the normal chr 22, with probes very close to the breakpoint regions. So the idea is that the red signal that is labeling the normal chromosome 9 and green normal chromosome 22, are very close to each other. The low resolution due to condensed chromatin results in a yellow signal together with a single red and a single green signal. If this yellow signal is recurrent it is not an error and the translocation is really present. Talking about diagnosis it is interesting that companies developing these probes have 2 strategies: - The Fusion strategy→ the one discussed just before - The Split Signal strategy→ based on a single chromosome labeled both in downstream and upstream the known breakpoint region. If the chromosome is normal we will see 2 yellow spots. If there is the translocation, split of the red and green and when separated by the 18 chromosome rearrangement we can find the green signal (marks one of the reciprocal translocated chromosomes) the red one(the other) and the yellow (normal chromosome). Why 2 different strategies? The Fusion Strategy → application when we already perfectly know the 2 chromosomes involved in the translocation Split signal →we can demonstrate that the single chromosome can be involved in the reciprocal translocation with more than one partner. For example in cancer, chromosome 8 carrying the myc gene is able to have several rearrangements with chromosomes in regions carrying genes for the immunoglobulines (Burkitt lymphoma). Cytogenetic resolution: - With the banding technique (conventional cytogenetics) very poor resolution(4Mb) → we are considering Megabases in the genome - In molecular cytogenetics applied to metaphase chromosomes →1-2 Mb resolution. This is due to the physical resolution of the microscope. Super resolution microscopes are being introduced, but this makes us exclude the view of a complete chromosome or nucleus. - Interphase FISH → improved resolution (→100Kb) as euchromatin is less condensed than in metaphase. For example you can’t demonstrate the Charcot-Marieis tooth easily with a metaphase FISH because the 2 signals would be very often overlapped. The locus is large enough to have the hope to demonstrate the duplication also with chromosomes but much better in interphase. Some methods of high resolution cytogenetic or “cytogenomic” methods were developed in the last 20 years, the majority of them are now largely used also as normal tools in biomedicine and they rely on the elongation of fibers, elongating the genome in order to observe spots that are not round small but elongated fluorescent spots→ better resolution. Fiber FISH Prepare chromatin by digestion: use some solution used for extraction of chromatin on the classical microscopic slide. During the treatment the slide is tilted. In this way the chromosome is extracted and able to run and elongate; then apply the probes and find signals that are not very easy to interpret as it is a way of extraction of DNA that is not so reproducible. This method allowed detection of patterns, not easy to be observed with genomics because rare patterns(rearrangements present in a very small percentage in our sample) that would appear as background. It can be used when you suspect some situation as in the paper (Molina et. al 2012 DOI: 10.1016/j.ygeno.2012.08.007): the authors suspected some repeated regions were able to be 19 found into different rearranged patterns. The presence of some repeats introduced the possibility of orienting the same region into the opposite direction in a mixed sample of cells. They defined a set of probes, some of them in red, others in green fluorescence and because they designed probes according to the known sequence of a reference genome they expected to find these patterns in all the observations. In this case the pattern was different and the difference within this region allows to find 2 kinds of orientations. Alternative protocols can be developed to make the observations more constant. An example of this is Molecular combing. Molecular combing: Combing means that from a curly hair you make it an elongated hair. The elongation and linearization of DNA is applied using strategies based on its biochemical and biophysical features. So there is a device that extracts a coverslip from a dna solution: if the coverslip (vetrino coprioggetti) is vertically introduced in a dna solution and the coverslip then is extracted in a constant rate, at the interface bet liquid and air, the dna would change the shape. In particular at the beginning the coverslip in the solution has a charged surface in order to let DNA attach from one extremity. Then the extraction must be very constant and precise. If it is, the DNA will elongate onto the surface. The DNA solution must be a high molecular weight DNA solution. A precise relationship exists between the length of the observation and dna size in Kb: 1micrometer= 2kb. Since you can make measurements, you can measure the distance between 2 probes and define how they are eventually rearranged. Here is an example from my laboratory 2 years ago: we had the idea that 2 probes (ren and green) should correspond to the normal arrangement within a fragile site. Instead we discovered that several times not only we have the probes with the expected patterns and interval separating, but also we found an example of rearrangement with another probe corresponding to the upstream here in this scheme and therefore some duplication within this sequence that was found in few proportions in a normal sample of molecules: sometimes in the fragile sequences some rearrangement are normally observed. 20 Another application of combing is associated with the possibility of understanding how the genome is replicated (early or late replicated regions). We can study the DNA replication fork, that is bidirectional running. We label with 2 specific analogs (chloro-deoxy uridine CldU or bromo- deoxy uridine, incorporated instead of thymidine, or other halogenated bases) that can be detected each by a different fluorescent antibody. Therefore you can: - label the first pulse of incorporation when the cells are cultured in the presence of iododeoxyuridine IdU, - then wash medium, - add chloro-deoxy uridine CldU and - detect this tract of replication with another antibody. In this example you can observe a replication fork with a space in between that is dark, probably because the replication fork was already running while the label was given. Now we can measure the length of both arms and if you expect that this is proceeding with constant rate, we can evaluate its rate (using equivalence length and kb)in specific loci. Unfortunately it is impossible to make co-application of the molecular combing with immunofluorescence to analyze epigenetics, since we need a pure solution of DNA, with lost epigenetics marks. In this case FISH is the only solution to have highly resolution observations as for example if you are interested in studying the presence of centromeric sequence carrying or not the active marks of the centromere. The epigenetic marks of the centromeres are represented by a set of proteins. SO: “Fiber FISH “ cytogenetic resolution is 1-5 kb → gives great advantage bc we can distinguish signals that are in the order of a few kb, still large but very interesting bc it leads to the definition of patterns. Historical summary of chromosomes studies in science…. 21 CGH- Comparative Genomic Hybridization To identify chromosome imbalance (deletions or duplications) without chromosome observation. It is very quick and efficient but also applicable for diagnosis of some clinical syndromes not easy to be defined from the beginning like in children with delay in development that may carry a duplication or deletion. Protocol in brief: - Reference dna labeled in red - Test Dna labeled in green - Patient and control dna are hybridized to the microarray - Yellow signal→ no difference between the two - Green signal → dna loss on the patient dna - Red signal→ dna gain on the patient dna The experiment should then be repeated inverting the type of label used for each type of sample (Test dna in red and reference dna in green) as an internal control. Of course you need the COT-1 DNA to block repeats. A clinical CGH array contains a few hundred thousands probes, according to the interest. Instead research CGH arrays contain millions of probes, higher resolution. None of these approaches are universal,they have to be combined: for example, the CGH is not able to define balanced rearrangements. Combination of analysis are necessary. For example in this patient M- 22 FISH detects several rearrangements escaping CGH analysis. Instead only at a high resolution CGH 4 deletions on chromosomes 5 and 6 can be detected ranging from 1.4 Mb to 4.3 Mb (range below the size of an average band). SNP patterns Detection of triploidy is possible also by SNPs patterns. Examples from a paper (Levy and Wapner, Fertility and Sterility (2018) 109:201-212) summering new application in the field of prenatal diagnosis. Comparing chr 15 and 21: - If there are 2 chromosomes there are 3 different levels that correspond to the combination of the snps in different arrangements (AA, AB, BB)--> in “allele differences” box in chr 15, we see 3 signals, so only 2 chromosomes - In the presence of 3 chromosomes, the possible SNP combinations are 4: AAA, AAB, ABB, and BBB→ in “allele differences” box we see in chr 21 four signals, so trisomy. But in the average signal graph we see an higher signal but not so high as expected for trisomy: the median copy number value suggests about 60% mosaicism (2=normal; 3=trisomic) Similar indication with a triploid genome with respect to the normal one. Molecular cytogenetics: from karyotyping to chromosome function and genome organization 3. Concept of Synteny 08/03/22 Synteny: the conservation of the linear organization of the chromosomal sequence among species. Synteny is the aspect related to how one part of the genome is organized in a single array along the chromosomes. For example if you look at the chromosome 10 in the human genome and its banding and genes positions and then you compare it with the mouse genome, you will never find some similar chromosomes, but some genes maintain a synteny conservation. For example in the mouse the upper tract of the second chromosome of the mouse is entirely conserved in the chromosome 10 in the human 23 genome. The same for many other tracts. So some regions are maintained between species, sometimes instead genes are shuttered in different chromosomes. Chromosome 21 in the human genome: its majority of genes is summarized in chromosome 16 of mice, with very strong synteny conservation. Is important as a strategy for the study and also because if the synteny is conserved probably there is a region so maybe common regulating factors. In general we are sure that it is not a random observation because of the number of the segments that define the differences between the human and other animals karyotipe. like the mouse’s one. For each color you can find the correspondence to one of the human karyotypes. Colors are shuttered but not so much as it would be if it was random. Human-mouse synteny (mouse has 20 chromosomes) is characterized by about 340 common segments, so higher than expected if a random breakage occurred. The evolutionary divergence in time between mouse and humans is about 90 millions of years, so we would expect a very large noise. Expecial example with the chromosome X, perfectly matching the mouse one, due to X chromosome inactivation in mammals. Indeed, even if the color is the same, if then we inspect the sequence of the two x chromosomes we discover a lot of intrachromosomal rearrangements. A comparison among primates (H-C-G-OM) karyotypes using the same molecular probes (human, chimpanzee, gorillas, orangutan, macaques) This image is the result of a hybridization with probes that generally define the human karyotype, but can be used also for chromosomal preparation of other primates , as the divergence time is short and We expect that these genomes are very similar. In Fact the similarity is huge in sequence , but less great is the similarity according to karyotypes. We have 46 chromosomes but the other primates have 48, this is because there are 2 short chromosomes with an acrocentric organization in primates in place of a very long one in humans. This major karyotypic difference was caused by the fusion of two ancestral chromosomes to form human chromosome 2 and subsequent inactivation of one of the two original centromeres. This is demonstrated by observing the bands (same upper and bottom acrocentric chromosome banding when comparing with the long human chromosome) and observing the fluorescence pattern (complete overlapping of the color pattern even though the chromosomes are separated). 24 The biological explanation The biological explanation for synteny conservation is in general found in the existence of a Super- order for chromosomal organization, for example related to gene regulation. No gene is activated and repressed just because of promoters or enhancers surrounding the sequence, but also there are some super-order features in the chromosome organization that define a way to regulate genomes that is very complex and is leading to study the 3D organization of the genome. 3D organization of mammalian nuclei may correspond to the need for a spatial synteny. It was of interest for several scientists that tried to apply FISH based on the existence of different multicolor probes, so that it is visible that each human chromosome can occupy a specific territory within the nucleus. So the original idea was to define the functions of the different chromosomes and parts of the genome, but the real story is different. EARLY KNOWLEDGE Historically the idea that the nucleolus has a spatial organization started with the observation of the nucleolus. The nucleolus (identified by classical conventional staining due to its heterochromatic organization) is a well-known nuclear territory, containing different chromosome segments that carry the arrays of rDNA genes (in humans the p arm of acrocentric chromosomes 13-15, 21-22) → this case shows the need to unite together some genes with common functions. Telomerase, centromeres, and heterochromatin in general are placed in the periphery of the nucleus and part of late replicating regions of the genome. 4. The 3D organization of the human genome. Chromosome territories. Origin and genetic consequences of chromosome structural aberrations. 10/03/2022 Summary of the previous lecture We discussed the conventional molecular cytogenetics, we compared the conventional topics of molecular cytogenetics and we discussed about the interphase FISH and the additional methods in FISH application, where the resolution can be improved, like molecular combing and fiber FISH, as an approach on elongated molecules. Then, we jumped to cytogenomics and therefore to the application of those techniques based on genomics that can allow us to evaluate chromosome changes, chromosome mutations, as for dosage imbalance. We also briefly discussed comparative genomic hybridization and its application on SNPs. Then, we started discussing the meaning of “synteny” conservation, which is important because it is leading to an overview of the genome in a 3D way. We briefly introduced the 3D genome and now we will see it in more detail: we will see how the genome is organized and why we are interested in understanding the 3D genome. We concluded with the given early knowledge: the nucleolus is a well known nuclear territory, containing different chromosome segments that carry the arrays of rDNA genes (in humans, the p arm of acrocentric chromosomes 13-15, 21-22). The association of telomeres, centromeres, and heterochromatin with the nuclear matrix is well known and predates the era of molecular cytogenetics. Introduction to LADs It is also well-known for the existence of LADs (Lamina Associated Domains) representing chromatin loops enriched in repressed genes. They are associated with heterochromatin and later replicating domains in the periphery of the nucleus: this is important because the nucleus, of course, is not just the periphery of something, but it is also the boundary between the cytoplasm and nuclear area itself. 25 Therefore, we must also expect to have a cross-talk between the two domains, so we must avoid thinking that at the periphery there is something completely silent. Chromosome banding also represents an early and convincing indication that chromosomes are organized in different functional domains; not only the nucleus is organized in domains, but every chromosome must be organized in domains, because we discussed that chromosome banding is not only an alternation between dark and pale regions. Each of the bands represents an enrichment in expressed or not expressed genes. So, clearly over the length of the chromosome, there’s the alternation of several functional domains. Genome rearrangements By interphase FISH clearly we can make an important type of investigation within the nucleus: we can use FISH probes to appreciate the spatial arrangement of the genome; in the shown scheme we can see all chromosome domains. Every color is representing a single human chromosome, and therefore, the probes that we are using are all chromosome probes (those that in general are also expected to cause some overlapping during interphase, not giving sharp dots). In the scheme, it is for example convincing that each chromosome belongs to a separate territory which is not the real one actually. The first achievements that were obtained by applying FISH technology, during the discovery of 3D organization, were the ones given by two small chromosomes, 18 and 19, in the human genome that are differentially enriched in coding and expressing genes. On the other hand, chromosome 11 is very poor in genes in comparison to chromosome 18. This is better shown in the image on the right →. The gene density is different along the chromosomes and it is not a direct or simple function of the chromosome length. This is clear if you observe the entire genome in which even a medium large chromosome, like the X one, is not as enriched in coding genes as the small chromosome 19. Chromosome 19 is really enriched in genes and can be considered as an exception; if we look at the “cloud”, there is no direct relationship between the lengths of the chromosomes. Now, if we search for the enrichment of coding sequences by using CpG islands, chromosome 19, in comparison to the 18 one, has the same size, but a very different content in terms of CpG islands. 26 In the reported graph, chromosome 18 is green (labeled ligated portion of the genome), instead chromosome 19 is completely red and enriched in CpG islands. Moreover, the gene density is different along the chromosomes, and this is not a function of the chromosome length. Even a very large chromosome is not supposed to be enriched in highly expressed regions. The chromosome 19 has a different content in CpG islands, and it is highly enriched in them (red color). When these concepts were finally understood, the same authors that described chromosome 19 and 18 also studied their position inside the nucleus: they discovered that chromosome 18 (colored in red, or green, just to make sure that it is not a problem of color) is always peripherically positioned in the interphase nucleus. If we try to observe the position of chromosome 19 in a dual-color approach, it is located inside the nucleus, especially in the center, independently from the probe color. It doesn’t matter the color that you choose to label the chromosomes, because their position is due to something that is related to the function of the nucleus and the function of a specific tract of the genome. This is not so surprising because if we think that telomeres and centromeres, and no coding genes in general, are later replicating and located in the periphery, and in the center of the nucleus we think that the regulational machine is really active, the difference is highly convincing. Some of the chromosomes have very sharp differences as seen in 18 and 19, but the majority of all other chromosomes are not so convincing in their differential proportion and representation of expressed genes. Another big problem is given by tissue specificity and the origin of different cells as explained here: in general, chromosome territories and gene densities are linked with some tissue specificities. The chromosome size is instead not determinant in gene density; either large chromosomes and smaller ones can be peripheral or can stay in the center of the nucleus. In human fibroblasts, chromosome size is an additional determinant of chromosome territories (not in contrast with the role of gene density). We can state that chromosome territories mark the position according to gene density of different chromosomes: if we have a look at the picture and compare the gene density with the location of these territories, the description is pretty convincing, but it is also a function of the cell type and tissue specificity. 27 Technical limitations for chromosome territories description What is difficult in this kind of approach is that there are some technical limitations for the description of chromosomal territories: the investigation of the 3D genome is urgent because we can try to postulate, and then try to understand, if these territories, suspected for a functional role, can also change according to the state of the cell, when the phenotype is normal or pathological. The idea to study the 3D genome in order to discover, for example, the biological meaning for some genetic diseases, is urgent. In this representation we can look again at the nuclear distribution of chromosomes with different colors, but also at what the nucleus is more similar to for real. We can appreciate the existence of the pores, for example, the nuclear lamina and therefore all the machinery that is associated not only to genome organization, but also to what deals with translation and the exit from the cell. It is obvious that we must take into account cell differentiation, tissue specificities, and also the comparison between the normal and pathological state of cells. The 3D organization is cell and tissue specific and can change in cell differentiation as well as in healthy and pathological conditions. These kinds of representations are nice (the application of FISH can be highly informative, but also very limiting), but very difficult to generate and study; indeed, here we can never see all the entire set of chromosomes, but just a subset of them. In the first row we can compare normal vs. malignant human breast cells; we are comparing cells from the same tissue, but with normal and cancer phenotype. By a lot of comparisons, we can appreciate a difference in organization, that is sharper in the normal cell and less clear in cancer phenotype cells. We can observe a different organization of chromatin that is obviously associated with cancer phenotype. So, which is the meaning of the maintenance of different chromosomal territories? They are less maintained in cancer cells, but the same difference can be appreciated if we compare non- differentiated and differentiated cells, as we can see in the example for human keratinocytes. We can see that the domains are different in size, condensation and position. In the last row we are comparing normal human fibroblasts primary cells: on the left we can see chromosome territories for number of chromosomes. It is possible to observe the nuclear organization that is not good for early observations and to confirm molecular data. Which are the problems with this approach? Pitfalls in investigating the nuclear architecture by microscopy are: - Artifacts may be introduced by fixation and cells will be very flattened (no possibility to observe a good preparation if it is in 3D); - On the other hand, in living cells the duration of observations is a key aspect influencing the results. So, there’s a paradox because if we want to perform a good FISH assay, we need well fixed preparations and cells will be flattened; even if you apply confocal microscopy your result will be, in fact, an artifact 28 with respect to a normal preview of the conformation of the cell. If we try to apply the same approach on living cells, you will find additional limitations; also the duration of the observation is crucial. During years, a lot of efforts were done to improve the resolution and further applications in microscopy. Now, we can state the following: - Consensus has been reached about the ability of chromosome territories to undergo changes in position and shape according to cell type and cell cycle stage. Chromosome territories do exist (they are true representations in the cell): by comparing different cells we can see some differences, but they also can be variable in shape and position on the basis of what we already found (also in normal conditions, not just when we compare pathological clinical phenotypes). During normal cell life the chromatin is clearly variable too. - Different cell processes e.g, DNA replication, transcription, RNA processing, are regulated through the nuclear architecture and the variation of nuclear territories. - More specifically, chromosome territories with high levels of movement and redistribution are those which are in active genes. The redistribution of chromosome territories is also changing because the genes that are activated or not can vary during differentiation and among different cell types, but also during normal cell life because of the normal occurring inside biological processes. - Globally, in spite of observation of preferential interactions among chromosome territories, they must be considered only a probabilistic state of the 3D nucleus. Additional information: chromosomal territories do exist, but what we are defining is highly probabilistic, so nothing is really fixed; in the cells we can think that there are some preferential interactions that can be observed, but in general we must not wait for a fixed 3D state in the cell. Not only it is probabilistic our view of the cell because of the heterogeneity along the cell life, but also, thanks to different other approaches that has led to the possibility to skip the FISH step and to use new tools in the 3D reconstruction, we can now define how the cells can change also on cell-per-cell base by using single-cell analysis. Thanks to this, we are reaching the idea that each single cell can have different chromosomal territories. Chromosome Conformation Capture Technique (3C) We can now define how cells change and perform single cell analysis. The first problem is that we must improve the resolution: in this case we need 3C techniques, all the ones that are based on conformation capture approaches. The idea for 3C techniques is that starting from the concept that the genome is in a 3D conformation, and also from the biological meaning of that, the problem is: every time a yellow domain is close to the green one, for example, as in the image, some interactions are probably more common between the two chromosomes and domains that are associated with that location in the interphase nucleus. So, if a common regulation of the genes is possible, this should lead to a result which is associated and attributed to this kind of proximity, or, viceversa, to the reciprocal distance from the light blue region. This is true not only for the regulation and transcription, but also for the 29 possibility that these chromosomes can interact preferentially during chromosome rearrangements and when mutations occur. So, if we want to understand the proximity of the genome in the cell, of course we can apply FISH technology, but there still are a lot of question marks. Another possibility can be based on the isolation of tracts of the genome that are located at the boundary of specific domains. So this can be stated as a final result of the pairing of short fragments that come from different chromosomes, and that are able to be amplified and read by sequencing, demonstrating that two parts of different chromosomes stay in the same part of the nucleus. Again, if we want to demonstrate that there is a preferential proximity of some sequences, we can try to confirm it by sequencing, but before we must isolate these small parts of the genome. The approach is the following: if we want to isolate specific tracts of the genome, first at all, we have to introduce some chemical bonds that maintain together the pieces of the genome. This assay is performed by using some mutagens, known as crosslinker, that generate some chains connecting two chromosomes, or pieces of them. After the digestion of the entire nucleus, if the cell had been treated with crosslinkers, we will maintain some fragments that link together different pieces (i.e., blue and green pieces). This is performed with the idea that if two fragments are linked together, they were initially so close to become the target of crosslinkers. Then, we can make the extremities of these fragments cohesive, using a strategy involving biotin and ligation of biotin itself. Now the two pieces will form a single molecule; after that, we can approach additional methods to isolate these fragments using biotin and to perform PCR amplification coupled with deep sequencing, leading to the demonstration of the eventual proximity of some genomic regions. Starting from the biological meaning for 3D conformation of nucleus (summary of the steps): - Interaction chromosomal domains are crosslinked (making chains that connect chromosomes) with formaldehyde - Crosslinked DNA is digested with a restriction enzyme - Cohesive ends are filled with biotin - Ligation of biotin labeled ends - Purify DNA after reverse crosslinking and DNA fragmentation by shearing - Pull down of biotin containing DNA using streptavidin coated magnetic beads - PCR amplification and deep sequencing using paired ends demonstrating eventual proximity of chromosomes This is the so-called “3C” approach, because after its introduction it was additionally improved, and now we are speaking about 4C and 5C techniques that represent additional improvements in their ability to isolate and obtain this kind of information. In general, these approaches are also known as Hi-C, high-throughput chromosome conformation capture techniques. This is a panel of techniques that can lead to several indications and results. Hi-C (high-throughput chromosome conformation capture) They are experiments that are of great value to investigate 3D interactions in the genome. At the end we will get a scheme in which the distribution is described along a single chromosome. Even in this 30 case, we start by performing the crosslinking of two different chromosomes, represented by two different colors; they are assumed to be preferentially obtaining a specific signal on the basis of their position. After that, the ligation step is performed with the following DNA purification and sequencing. Due to the fact that this assay is performed on the entire genome of these cells, we will have a scheme as the following, in which every proximity can be described and the distribution can be analyzed along a single chromosome. In the image we can see some territories, in green or red, with their relationship which is per chromosome, and also seen per genome. Simply looking at data at the per chromosome resolution, we can appreciate that the proximity is indeed not found along the chromosome; there is a disruption of our common concept of “proximity”. The region in red is proximal to some other, but some areas are interspersed by green regions. This disruption is due to the banding; it is not different from the banding that represents sharp differences in gene distribution events. Indeed, what we see in red or green, also known as A and B domains, are the representation of open or closed chromatin. When we define open or closed chromatin, of course we are defining very small subtle differences that are not as important as the comparison between heterochromatin and euchromatin. In fact, they are more similar to the banding alternation. This was obtained by a completely different technique in which we are demonstrating their proximity thanks to the sequencing of common isolated DNA. We can understand that this approach can be repeated on different subtypes and by cell per cell with single cell approaches, and therefore can be really powerful. One of the notions that derive from chromosome conformation capture assays is the existence of the so-called TADs, that stands for Topological Associated Domains. They represent a functional alternation of domains and, in general, what they represent is sharing a panel of features within the chromosome, as replication or transcriptional common features. The domains which are considered under the same regulation rule are defined as TADs. We will meet TADs in several examples and we will discover how they can be informative in research and also in the study of human diseases; we will see how they can be variable in comparing the normal and pathological conditions. For instance, TADs can be coincident with the replication domains that we saw in the previous lecture: remember that the replication timing is not in agreement with chromosome distribution. We cannot find early or later replicating chromosomes, but instead every chromosome can be in part early and in part later replicating, exactly as we can observe in the alternation of chromosome bands. Furthermore, we can measure the replication timing in different ways, not only by cytogenetics and 3C techniques, but there also are some molecular approaches to verify it and to compare data coming from different assays. What is clear is that if we think about the TADs compartments, as A and B domains, as open or closed chromatin, and we compare their replication timing, we will clearly discover that what is early replicating is corresponding to an open chromatin, while what is late is in general corresponding to a more closed compartment (what we expect generally in a nucleus image). For example, Replication domains (RD) are genome segments with coordinated replication timing (RT) and they seems to coincide with TADs In this kind of representation, we can see probably something that we remember from the observation of proliferating cells labeled with bromodeoxyuridine: it was discovered that there is a pattern that can be early or late or related to the nucleolus replication. Now, we can also mark more clearly the early compartment, as corresponding to A, with TADs, as well as the late compartment B. It is more or less what is shown in this complex picture (right panel) in which we are putting together genetic information with cell biology data: we can see, starting from DNA, and the double helix, and its conformation as chromatin, that the story is not simply related to its linear organization in the chromosome because each chromosome can be differentially associated with periphery or other compartments within the cell. With this scheme, we can think that these differential domains can be better regulated if they maintain their position in close proximity, because if the regulators are positive or negative, they must control everything within that domain. Every time the situation is changing from the normal one, by mutation or chromosome injury, we can think that we are losing not only the information, but in particular the normal regulation of this particular genomic 31 information. This concept is rather new because we started from the point of view of the human genome as a simple sequence, and after reading it we must discover all those kinds of regulations (some of them not trivial to be discovered). In general, the 3D studies are giving us a very important point of view and, in the last few years, the improvement of techniques has led to robust data (the beginning of every kind of new technique is always tough because laboratories can obtain contrasting data, but now we have large consensus). Now we can assume that some mutations can be very important also at distance, but when we think about the genome we can see how two distant points can be actually close in the 3D conformation in the nucleus. When we say “distant”, it is because of different chromosomes or distant territories of the same chromosome, but actually they are in close proximity within the nucleus. We can observe the influence in unpredicted silencers, enhancers or insulators that at first glance were so far, and therefore not immediately detected as important in this kind of phenotype; we can see that this long range of chromatin interaction of course can be considered as tissue specific because we have compared differences in differentiated and undifferentiated cells, and between cell types that differ in origin. This kind of tissue specificity is also a new concept that especially in genetics class is escaping our attention: our body is differentially organized and some compartments of it are preferentially involved in clinical phenotypes. On the right we can see a graph in which we have a number of tissues that express a clinical phenotype in a single mendelian disease (the majority of mendelian diseases that we know are affecting a single tissue). When we speak about any kind of disease indeed we use an adjective, like neurological disease, muscular disease and so on, because in general a single gene defect can lead to almost normal phenotypes in the majority of cell tissues. Discovering why some tissues are affected and others not is important, of course, and it is also related to the activity of regulation under a 3D based conformation. So, 3D studies demonstrate that: 1. non-coding mutations may influence the activity of several DISTANT regulators such as: Silencers Enhancers Insulators 2. long-range chromatin interactions must be considered among the causes of tissue- specificity of heritable human disease and cancer A message that is following the concept of tissue specificity is that when you are looking for a good cell model to study a clinical phenotype, it should be preferentially the cell target of the disease; this is very difficult to obtain because if we want to study the biochemistry at the base of genetic defects, we are working on cell cultures that are not so similar to the real target tissue. 32 We have understood that the 3D structure can be variable if we compare different cell types, but it is also important to remember that this variation can be in agreement with the normal cell history and so it can vary along the normal cell differentiation cycle, and by single-cell analysis we can get data suggesting that TADs are not fixed representations of each cell type, but there is also cell-per-cell variation (probabilistic state). Because of this huge amount of noise, even if these analyses with 3C approaches are rather informative, they always must be validated. When we apply 3C or Hi-C, in general we must then use less sensitive microscopy to validate our results. Another important message: when reading a paper reporting results, their validation must be present. Take home message: The 3D chromatin structure changes dynamically in the course of cell differentiation and during the cell cycle. Single-cell analyses indicate that TADs may be variable, suggesting that they represent contact preferences of a cell population. 3C and Hi-C results must be validated by microscopy (FISH). Why is this information important for us? There are a lot of examples of cytogenetics and cytogenomics applications in pediatrics, in cancer diseases and in neuroscience. We will see how the genetic features in normal cells can be maintained or not; this is very important when we speak about stem cells, because we will analyze how they can be used in regenerative medicine. Chromosomes participate to structural rearrangements Now, we have the clear idea that the genome is ordered in a 3D condition so we are ready to discuss in more details the meaning of chromosome rearrangements. Chromosome rearrangements can be associated with human diseases in a different sense: in the image we can observe the cri-du-chat syndrome caused by the deletion of a piece of chromosome; the resulting heterozygous condition (one chromosome is deleted, while the other one is normal) is enough to give the phenotype of this syndrome. The resulting phenotype is 5p– , so there’s a missing piece in the short arm of chromosome 5. Some chromosome rearrangements can be detectable in comparison among the species, as the professor previously showed in the comparison between different primates, including the human karyotype. Why are chromosome rearrangements associated with clinical diseases? Because of different phenotypic defects that in general are detrimental and only in some cases they can be tolerated and included in the evolutionary history and divergence of different species. In the majority of the cases, the phenotypic consequences are instead very serious; the major consequences are the ones followed by deletions, as in the cri-du-chat syndrome, known as haploinsufficiency. A part of the genome is not enough for the gene dosage interpretation and to give the normal phenotype. Another term used to indicate haploinsufficiency, when associated with cancer genetics, is the LOH (Loss Of Heterozygosity). We talk about LOH when, during the deletion, we are losing the normal allele that represents the dominant 33 one, and this makes the recessive one overexpressed. In case of a recessive allele involved in cancer genetics, LOH can start a malignant proliferation phenotype. Inversions are very common chromosome rearrangements that, due to the fact that they involve wrong orientations of a tract that however is not lost, are considered balanced. In general, these are arrangements that are not associated with a clinical phenotype, but sometimes a positional effect (something changing in the regulation of genes) is possible. We can expect to find inversions as evolutionary rearrangements that are separating species without any consequence or phenotype, but by serendipity we could also find some of these inversions in some normal subjects. Reciprocal translocations are also balanced, and therefore we can imagine again that they can be present in different human subjects. However, translocations can lead to positional effects too, but also they are associated with human infertility: during meiosis, translocated chromosomes will have a normal partner. In the representation (“chromosome changing and phenotypic effects”) we can see a single rearranged chromosome exchanging a segment with another one; at meiosis, each of the translocated chromosomes will have a normal partner and therefore the chromosome pairing will imply that a normal blue chromosome, as represented on the left, must pair now with the blue-red chromosome and vice versa. So, chromosome pairing is not leading to the formation of a single chromosome pair, but instead of a tetrad pairing configuration that is characterized by a special shape (image on the right). The formation of this tetravalent structure is then leading to different possibilities for segregation at meiosis: one of the gametes can include the translocated 1 with 2, while the other one can receive to the normal two chromosomes (balanced condition because every gamete has the entire chromosome content). In general, when we have a reciprocal translocation, ½ of gametes will instead inherit either translocated and normal chromosomes, and vice versa, and this is of course an unbalanced condition because some information will be missing and the gamete will give a non-available fertilization. Reciprocal translocations are usually balanced in information content, but functionally they are leading to partial infertility, which in humans can also be a very serious phenotype, and positional effects are also associated with different organizations of the genome. In the bottom of the scheme (“chromosome changes and phenotypic effects”) we can better observe the biological meaning of positional effects; if translocations between blue and red chromosomes occur, 34 we now have a new promoter acting on the coding region with a resulting de-regulated expression (it can be over-regulated or under-regulated). This different regulation can act on what normally is instead typical of random new chromosomes. We can also have some chimeric proteins if we are fusing two coding sequences together, so again chimeric proteins will be differentially expressed, regulated and we can also have the loss-of-function if we are losing a piece of coding region at the break point. In conclusion, reciprocal translocations are balanced, unless something is making the difference. An important question is also “why are we observing chromosome breakages associated with clinical phenotypes in general?” We can observe this condition especially when comparing different organisms; we discussed that synteny conservation is very strong when we analyze the X chromosome in mammals, but even in this case if we compare a long distribution of the genes, we can see that the human and mouse X chromosome present several inversions. X chromosome rearrangements are a regular event in chromosome history, indeed the karyotypes are very plastic and able to participate in these kinds of events. But why do they occur so often? Because there are several opportunities for chromosomes to rearrange and, in general, every time a double strand break is present in DNA, the chromosomes are prone to rearrangement. This was found out centuries ago when a scientist working with drosophila, Hermann Müller, discovered the activity of telomeres after treatment with X-rays that led to the loss of protection for telomeres with the resulting possibility for rearrangements. Now, we know that these kinds of rearrangements are also associated with a normal loss of telomere length (so this happens not only because of exogenous activity or treatment with ionizing radiations) because of aging. Every time a double strand break is introduced, it can make the chromosome really reactive; moreover, if there are two interruptions within the genome, these two pieces can rearrange, forming all together a structural rearrangement. The possibility for this structural rearrangement to be viable or not will depend then on the presence of the centromere: we can have a single centromere, two centromeres that will interact together to make the chromosome rearrangement heritable or not. Also, the complete lack of centromeres can make a difference, and we will better understand this topic in the next lectures. Clearly, after the rearrangement, not all of them are heritable for cells or gametes, but what is important is that these breaks in the DNA can occur either directly or indirectly because they can also be introduced during DNA repair (indirect effect). Breaks can also come from exogenous sources like ionizing radiations; furthermore, they can be regulated by a lot of factors as age (telomeres), lifestyle, occupational exposure in humans and some gene variants that can predispose to these kind of chromosomal instability. Summary of the content from slides: WHY CHROMOSOMES UNDERGO BREAKAGES? Causes are associated with breakage of the double helyx – DSB (Double Strand Breaks) DSBs can be introduced DIRECTLY (e.g. telomere erosion, ionizing radiation) or INDIRECTLY (e.g. through repair of chemical modifications/ lesions) and represent endogenous or exogenous events Possible modulator factors: - age - life style - occupational exposure - gene variants in key genes 35 We also have to consider susceptibility features that are related to DNA sequences and epigenetic features of chromosomes: everything related to chromatin organization in a particular tract of the genome can influence breakage probability because open chromatin can be more accessible to different enzymes and therefore also to eventual damage by endogenous sources, like histone modifications and other factors that can de-protect some tracts of the genome and can induce different susceptibility. In the image we can see an interesting example associated with the 3D story because the DNA is forming some loops; it is now understood that in segments that represent loop anchors (bottom image), often several breaks are introduced because of this conformational state (torsional stress at the loop anchors is often associated with DSBs). The second important concept is the one of proximity that was anticipated just before: if we think of the genome as 3D organized, the exchanges between chromosomes that are both carrying a double strand break must be also seen in the sense that probably if we have very far chromosomes, or some of them that are more proximate to one each other, the possibility to form rearrangements will be different. So, we won’t observe translocations if one chromosome is integral and only one is broken of course, while if we have breakages in both chromosomes the possibility to have a rearrangement will be in proportion to the distance between them. Indeed, in the bottom cartoon we can see the nucleus as a basket in which there are different chromosomes: if a chromosome is broken, and it must go in search of a partner for DNA repair, clearly this will be highly probable if they close together as the orange and purple domains. This event will lead to a preferential occurrence of some rearrangements because they stay in proximity within the normal cell. This is explaining us part of reciprocal translocations which are recurrent in some clinical phenotypes. It is not only a matter of proximity, but chromosome rearrangements must also have a phenotype and therefore the explanation can be associated to the selection of an outcome. When we are able to find some recurrent translocations, this can be due to 36 proximity. We have seen how these locations can vary according to the history of the cell, so everything is limited to these shown aspects. Summary of the content from slides: PROXIMITY: After spontaneous/induced formation, DSBs must be repaired so they undergo the search for a partner The 3D organization of the nucleus clearly influence the choice (as confirmed by recurrent translocations) Quiz time: 1.TADs are coregulated genes coregulated chromosome domains coregulated genome territories Correct answer: B 2. Do TADs and LADs coincide? A. reasonably yes B. reasonably no Correct answer: B 3. Do TADs and Replication domains coincide? A. reasonably yes B. reasonably no Correct answer: A 5. Fragile sites Introduction to fragile sites We have seen that chromosome rearrangements can be due to proximity and chromatin conformation, but also to some very well known loci, the so-called fragile sites; they represent an example of how sequences and chromatin conformation can influence the possibility for chromosomes to rearrange. Human fragile sites: Human fragile sites are chromosome positions exhibiting recurrent breaks and rearrangements under specific in vitro culture conditions. Their fragility is associated with cancer and with neurological genetic diseases Human fragile sites are regions which are prone to rearrangement and they are known with the name of the locus. For example, FRA16D represents a fragile site on chromosome 16 and in position D along the chromosome itself. FRA3B is one of the most unstable regions in chromosome 3. The letters represent the order of discovery, as for proteins. Common fragile sites represent a kind of breakage that is not the 100% of observations that we can have on a cell culture or cell line. Moreover, we don’t expect to have 100% of cells with this kind of visible breakage, indeed we can use the term “expression of fragile sites” that is not intended as transcription, but as the expression of instability of breakage. In general, fragile sites can be very often found as broken when