Genomics: A Study of Genes and Genomes PDF

Summary

This document provides an overview of genomics, exploring the structure and function of genes and genomes, transcriptomes, and proteomes, as well as comparative genomics, which analyzes similarities and differences between the genomes of different species. Keywords covered are genomics, gene expression, and genetics.

Full Transcript

13/01 GENOMICS -​ Genomics is the complete study of genomes, that is, all the genetic material (DNA or RNA) present in a cell or organism. -​ It aims to analyze the structure, function, evolution, and interactions of genes and their products -​ This includes disciplines such as genetic sequencin...

13/01 GENOMICS -​ Genomics is the complete study of genomes, that is, all the genetic material (DNA or RNA) present in a cell or organism. -​ It aims to analyze the structure, function, evolution, and interactions of genes and their products -​ This includes disciplines such as genetic sequencing, gene annotation, and the study of large-scale genetic interactions. -​ Genomics is fundamental to understanding biological mechanisms, genetic diseases, and to developing approaches such as personalized medicine. The gene: -​ Is a functional region of DNA -​ Is made up of transcribed regions (proteins) and regulatory regions (cis-regulation) -​ Can be transcribed into a functional RNA. -​ Transcription is regulated both spatially (according to cell or tissue types) and temporally (during development or in response to stimuli). cRNA: -​ Copies of messenger RNA (mRNA) synthesized by an enzyme called reverse transcriptase -​ Unlike genomic DNA, they contain only exons (coding sequences) because they are generated from mRNA after splicing. -​ cDNA are widely used in molecular biology, particularly to study gene expression or produce recombinant proteins. The evolutionary hypothesis: -​ Suggests that life began with RNA molecules, because they have both the capacity to store genetic information and a catalytic function, allowing them to participate in biochemical reactions. -​ This hypothesis is supported by the existence of ribozymes, enzymes made of RNA capable of catalyzing specific reactions, including the formation of peptide bonds during translation. -​ Today, all cells operate according to the central dogma of molecular biology, in which genetic information is stored in DNA, transcribed into RNA, and then translated into proteins -​ However, some viruses deviate from this model by using either RNA as their genetic material or alternative mechanisms for their replication and gene expression. The structure of a gene is composed of several functional elements, organized differently depending on whether they are prokaryotic or eukaryotic organisms. A)​ In prokaryotes, the gene is simpler, comprising a regulatory region for transcription initiation, a protein coding region, and transcription termination signals. B)​ In eukaryotes, the gene is more complex and includes exons (coding sequences), introns (non-coding sequences), as well as 5' and 3' regulatory regions. -​ These regulatory regions play a key role, particularly in the 3' region, where they influence the stability of the messenger RNA and signal the end of translation. -​ The 5' region participates in the initiation and regulation of gene expression. -​ Introns, which make up the majority of the gene, are removed by a process called splicing. This process only occurs in eukaryotes, such as humans. -​ Exons are the regions of the gene that remain after splicing and that code for proteins. They are often highly conserved during evolution because of their functional importance. -​ Introns, although they do not directly code for proteins, may have several roles, including regulating gene expression, controlling splicing, and facilitating evolution by creating new genetic combinations through alternative splicing. Alternative splicing, which involves combining exons differently, allows for the production of multiple proteins from a single gene, increasing protein diversity. This can provide an evolutionary advantage by enabling adaptation, but can also cause disease if the process is poorly regulated. Genomics deals with: -​ Mapping and sequencing genomes. -​ The study of the structure and function of genomes. -​ Comparative genomics, which analyzes the similarities and differences between the genomes of different species. Vocabulary: -​ Gene → Genome -​ Transcript → transcriptome -​ Protein → proteome -​ Interactome: Set of molecular interactions in a cell, particularly between proteins or between proteins and other molecules. -​ Microbiome: Set of microorganisms (bacteria, viruses, fungi) living in a given environment, such as the human body. -​ Metabolome: Set of metabolites (small chemical molecules) present in a cell, tissue or organism, reflecting its physiological state. Genomics research relies on the analysis of large amounts of genetic data. These data, derived from high-throughput sequencing and other advanced technologies, can provide valuable information but require sophisticated tools to be interpreted. Bioinformatics plays a central by developing programs and algorithms to process, analyze and visualize this complex data. The main objectives of genomics research include: -​ Identifying and localizing genes responsible for genetic diseases -​ Unraveling the molecular mechanisms of diseases -​ Developing therapeutic protocols -​ Implementing personalized gene therapies It has been realized that most diseases are not caused by errors in the sequencing of the genes themselves but rather by problems in their regulation. Personalized medicine aims to adapt care to each individual in three main steps: 1.​ Prevention: Identifying patients at risk early in order to implement appropriate preventive measures, both at the individual and collective levels. 2.​ Diagnosis: Using precise data (molecular profile, genomics, physical examinations, imaging, medical history, etc.) to establish an accurate diagnosis and develop a personalized therapeutic strategy. 3.​ Treatment: Offering targeted treatments, adapted to the patient's genetic characteristics, in order to improve therapeutic results while reducing side effects. Synthetic Biology Synthetic biology combines biology, engineering and computer science to design and modify biological systems, such as DNA, proteins or cells, to produce high-value applications. These applications include the production of human therapeutics, biomaterials, biofuels, biosensors, innovative chemicals, food ingredients and solutions for bioremediation (environmental decontamination). It thus offers opportunities in the fields of medicine, agriculture, industry and environmental protection. Genome size: -​ The genomes of bacteria (such as Mycoplasma) are the smallest, while those of flowering plants and amphibians are among the largest. -​ There is a large variability in size within groups, indicating that genome size does not necessarily reflect the complexity of the organism. For a bacterium, there are 4k-6k genes, while for mammals, it is around 35k. Men = 20/25k genes. -​ In bacteria like E. coli, most genes are coding (up to 85%), while in humans they are a minority (2-5%, most are introns). -​ In bacteria like E. coli, the predominance of coding genes (up to 85%) reflects their need for a compact and efficient genome, adapted to their rapid life cycle and their often competitive environment. -​ In contrast, in humans, the majority of the genome is non-coding (only 2-5% of coding genes), which allows for complex regulation of gene expression, the evolution of sophisticated molecular interactions, and functional diversity through mechanisms such as alternative splicing. -​ Today, it’s no longer thought that there’s a correlation between genome size and complexity Genome division: 1.​ Introns (25.9%): Non-coding sequences located inside genes, removed during splicing. 2.​ DNA transposons (2.8%): Mobile sequences that can move around the genome. 3.​ Protein coding regions (1.5%): Protein coding regions, essential for cellular functions. 4.​ Miscellaneous heterochromatin (8%): ​Highly condensed DNA, often transcriptionally inactive Then, repeating sequences (⅔ of the genome): -​ Miscellaneous heterochromatin: ​Highly condensed DNA, often transcriptionally inactive -​ Segmental duplications: Copies of large segments of DNA, contributing to genetic diversity -​ LINEs: Long repeated sequences, often derived from retrotransposons, playing a potential role in evolution. -​ SINEs: Short repeats, also related to retrotransposons. -​ Miscellaneous unique sequences: Unique regions of the genome with no clearly defined function -​ LTR retrotransposons: Repeats derived from retroviruses. -​ Simple sequence repeats: Short sequence repeats, implicated in genetic variability and disease (No need to remember percentages) The dystrophin gene is one of the largest human genes, with a transcribed region of 2.5 Mb representing about 1.5% of the X chromosome. It contains 78 introns, reflecting a very complex gene structure. Mutations in this gene are responsible for Duchenne muscular dystrophy, a serious genetic disease affecting the muscles. Pseudogenes are nonfunctional copies of genes, often resulting from duplications or mutations that render them unable to produce a protein. They represent about 1% of the human genome. Repeated sequences: The repeated sequences in the human genome are divided into: -​ Duplicated genes: Cop gene ies, including non-functional pseudogenes. -​ Tandem repeats: Very short sequences repeated in series (CAGCAGCAG…), such as satellite DNA, often found in centromeres and telomeres -​ Interspersed repeats: Scattered repeated sequences, including LINEs (long repeats) and SINEs (short repeats), often derived from retrotransposons. Autonomous: These sequences, like LINEs, have the necessary genes (often genes encoding enzymes such as reverse transcriptase) to move or copy themselves independently in the genome. Non-autonomous: These sequences, like SINEs, do not have the necessary genes for their own mobility and rely on enzymes produced by autonomous sequences to move or replicate There are more SINEs than LINEs but they make up a smaller part of the genome because they are shorter. Tandem repeats impact the genome by contributing to chromosomal structure (centromeres, telomeres), genetic variability (polymorphisms), and gene regulation. However, their instability can cause mutations associated with genetic diseases such as spinal muscular atrophy or Huntington's chorea. They can insert themselves into introns or exons and thus modify the genome. We have gene clusters. These are groupings of genes close to each other, often linked by similar functions (for example, globin genes). Genes can be oriented differently: some are transcribed in one direction (from 5' to 3'), while others, located on the opposite strand of DNA, are transcribed in the opposite direction (from 3' to 5'). DNA consensus sequences: comparative genomics A)​ Common features between distant species: Conserved elements in the genome, such as between humans and fish, demonstrate their functional importance throughout evolution B)​ Differences between closely related species: Comparing humans and chimpanzees identifies species-specific genomic elements that reveal unique traits, despite their genetic similarity (99% similar -> ‫)אחוז השימור‬. C)​ Variations within a species: In humans, genetic differences can reveal variants associated with diseases or specific biological characteristics. Evolution: Conserved sequences are those that show an evolutionary advantage and will survive natural selection. There are many more differences in the genome between humans and mice than between humans and monkeys. Staining of Human Chromosomes Staining of human chromosomes is performed using the FISH technique (fluorescent in situ hybridization). DNA probes labeled with fluorescent dyes bind specifically to complementary regions of chromosomes (specific sequences). This allows each chromosome to be identified and visualized with a distinct color. This only works for human chromosomes. If we tried this technique with mouse chromosomes, we would get a mixture of colors, because the colored sequences are in completely different places. Synteny: conservation of the order and organization of genes on the same segment of chromosome between different species. This reflects common evolutionary origins and allows us to study the relationships between genomes, the evolution of species, and the identification of homologous genes. For example, although human and mouse chromosomes are very different, synteny is observed for the X chromosome. DNA Sequencing Techniques The Human Genome Project has led to the complete sequencing of the human genome through international collaboration of many countries, each of which has contributed by sequencing a specific part of the genome. Shotgun sequencing - Craig Venter: A rapid sequencing method where DNA is fragmented into small random pieces, sequenced individually, and then assembled using bioinformatics programs. This approach has accelerated the sequencing of the entire genome by avoiding the laborious step of prior mapping. Next Generation Sequencing: NGS is a more advanced and automated technology that allows massive parallel sequencing, making the process faster, less expensive and capable of generating much larger data. Fluorescence is used. Many errors are possible but the steps are repeated many times to reduce the margin of error. The price of sequencing has decreased over the years = today, only $1000. Nanopore sequencing: an innovative method that allows DNA or RNA molecules to be read directly in real time. DNA is passed through a nanopore (a tiny pore), and the changes in electrical voltage caused by each base (A, T, C, G) are recorded to determine the sequence. This technology is fast, portable, and capable of sequencing long molecules without prior amplification. Why sequence DNA? -​ The goal is to identify genetic variations responsible for phenotypic differences and to understand how these variations lead to diseases or complications. -​ It also makes it possible to determine common sequences in patients in order to elucidate the underlying mechanisms. -​ This approach works because new genes involved in diseases are discovered every year The OMIM (Online Mendelian Inheritance in Man) tool is a database that lists human genetic diseases and associated genes. It makes it possible to classify phenotypes (diseases and traits) according to their molecular basis. For example, of the 7,153 documented phenotypes, 4,622 are linked to specific mutations. OMIM distinguishes between: -​ Monogenic diseases, representing the majority of cases. -​ Susceptibilities to complex diseases or infections. -​ "Non-diseases" related to non-pathogenic genetic variations. -​ Somatic diseases affecting genetic cells. However, despite advances, early detection remains low, risk assessment models are insufficient, and many disease mechanisms are still poorly understood, limiting the effectiveness of therapies.

Use Quizgecko on...
Browser
Browser