Genomic and Transcriptomic Analysis Techniques PDF
Document Details
Uploaded by WholesomePond
JUST (Jordan University of Science and Technology)
Tags
Summary
This document outlines genomic and transcriptomic analysis techniques. It details DNA microarrays, explaining their principle, technique, interpretation, and limitations. Next-generation sequencing is also discussed, covering library preparation, amplification, sequencing processes, and data analysis.
Full Transcript
Genomic and Transcriptomic Analysis Techniques D N A M I C ROA R RAY LECTURE 11 Recap… • mRNA is an intermediary molecule which carries the genetic information from the cell nucleus to the cytoplasm for protein synthesis. • Whenever some genes are expressed or are in their active state, many copi...
Genomic and Transcriptomic Analysis Techniques D N A M I C ROA R RAY LECTURE 11 Recap… • mRNA is an intermediary molecule which carries the genetic information from the cell nucleus to the cytoplasm for protein synthesis. • Whenever some genes are expressed or are in their active state, many copies of mRNA corresponding to the particular genes are produced. • These mRNAs synthesize the corresponding protein. So, indirectly by assessing the various mRNAs, we can assess the genetic information or the gene expression. This helps in the understanding of various processes behind every altered genetic expression. • Thus, mRNA acts as a surrogate marker. Since mRNA is degraded easily, it is necessary to convert it into a more stable cDNA form. Definition • A DNA microarray technique is used to analyze the expression of thousands of genes simultaneously. • E.g > 800 mutations in BRCA1 and BRCA2 cause hereditary ovarian and breast cancer • Purpose: It allows researchers to investigate the activity of genes within a cell or tissue sample, which can provide insights into gene regulation, gene function, and how genes are involved in various biological processes, such as disease development. • Global expression profiling has the advantage that the genes examined are not influenced by preselection of genes. Principle • Technology rests on the ability to deposit many (tens of thousands) different DNA sequences on a small surface, usually a glass slide (chip). • The different DNA fragments are arranged in rows and columns such that the identity of each fragment is known through its location on the array. • Complementary base pairing by Hydrogen bonding • Signal depends on hybridization conditions (Temp) • Strength of signal depends on amount Technique • DNA microarrays are microscope slides that are printed with thousands of tiny spots in defined positions, with each spot containing a known DNA sequence or gene. • Collect mRNA samples from both a reference sample (healthy individual) and an experimental sample. • Convert to cDNA and each sample is labeled with a fluorescent probe of a different color; dyes Cy3 (green) and Cy5 (red). • Equal amounts of the two samples are then mixed and allowed to bind to the microarray slide (Hybridization). • Washed multiple times. • The microarray is scanned to measure the expression of each gene printed on the slide (laser excites the fluorescence) • The data gathered through microarrays can be used to create gene expression profiles, which show simultaneous changes in the expression of many genes in response to a particular condition or treatment. Interpretation: Colon Cancer Example • A green spot means that the gene is not expressed in cancer cells but is expressed in normal cells. This gene may be involved in preventing colon cancer • A red spot means that the gene is expressed in cancer cells but not expressed in normal cells. This gene may be involved in causing colon cancer. • A yellow spot mean that the gene is expressed in both normal and cancer cells. This gene is probably not involved in causing colon cancer. • A black spot means that the gene is not expressed in either type of cell. This gene is probably not involved in causing colon cancer. Microarray in Drug Discovery and Development • Disease Pathway Identification Researchers use genome-wide expression profiling to generate hypotheses for complex disease mechanisms and to identify drug targets and their pathways. • Disease Pathway Validation Once a disease pathway is identified, researchers need to know that disrupting the pathway will affect the disease etiology. Using whole-genome expression profiling, scientists can understand a wide range of effects — desirable and undesirable — that result from disrupting a pathway. They are then able to better evaluate potential targets for drug design. • Compound Screening: Mechanisms of Action Following disease pathway identification and validation, whole-genome microarray analysis can be used to characterize lead compounds for selectivity and specificity, and to identify molecules that disrupt expression of intended disease genes. While existing technologies are well suited to measure the anticipated action of a development compound, these methods do not typically identify any additional or unexpected effects. • Compound Screening: Mechanisms of Toxicity Microarray gene expression screening not only helps to identify mechanisms of drug action, but also points to other off-target effects that may suggest the compound produces far too many side-effects to be approved. For instance, if changes in gene expression match those of a known toxin, a compound can be eliminated from the screening process early in development, saving both time and money. Limitations of DNA Microarray • The results take a lot of time to analyze as the amount of data collected from each array will be huge • The results may be too complex to interpret and are not always quantitative • The results are not always reproducible • The technology is too expensive • The arrays provide an indirect measure of relative concentration • Especially for complex mammalian genomes, it is often difficult to design arrays in which multiple related DNA/RNA sequences do not bind to the same probe on the array • A DNA array can only detect sequences that the array was designed to detect Question… • A patient with breast cancer has shown to be resistant to available chemotherapies. Figure A shows the results of a gene expression array prior to starting a novel experimental biopharmaceutical. Figure B shows the results of a gene expression 6 weeks into treatment. 1. Is this drug affecting the patient’s gene expression? 2. Is this experimental treatment working for the patient? Figure A Figure B DNA Sequencing Sequencing • The process of determining the order of nucleotides adenine (A), thymine (T), cytosine (C), and guanine (G) along a DNA strand. Types of Sequencing Sanger Sequencing overview (https://www.youtube.com/watch?v=wdS3j0TgbjM ) Sanger’s- Chain Termination Sequencing • It is PCR based method • A modified DNA replication reaction (dNTPs and ddNTPs) • Four separate reaction tubes are set up • Each tube contains identical DNA of interest (template), primers, dNTPs, DNA polymerase, and small amount of ddNTP. Sequencing ladder Sanger Sequencing GOOD FOR: D I S ADVANTAGES: • Could only sequence 200-500 nucleotides in a single reaction • To run a sequence of 1000 nucleotides, 2 reactions would be needed with overlapping sequences • The quality of a Sanger sequence is often not very good in the first 15 to 40 bases because that is where the primer binds. SEQUENCING SINGLE GENES SEQUENCING AMPLICON TARGETS UP TO 100 BASE PAIRS SEQUENCING 96 SAMPLES OR LESS IDENTIFYING OF MICROBES ANALYZING FRAGMENTS ANALYZING SHORT TANDEM REPEATS (STRS) • Sequence quality degrades after 700 to 900 bases. • Expensive • The principle behind NGS is like that of Sanger sequencing, which relies on capillary electrophoresis. Next Generation Sequencing (NGS) • The genomic strand is fragmented, and the bases in each fragment are identified by emitted signals when the fragments are ligated against a template strand. • The NGS method uses array-based sequencing which combines the techniques developed in Sanger sequencing to process millions of reactions in parallel, resulting in very high speed and throughput at a reduced cost. Library preparation libraries are created using random fragmentation of DNA, followed by ligation with custom linkers Amplification the library is amplified using clonal amplification methods and PCR Sequencing DNA is sequenced using one of several different approaches Analysis Sequencing is analysed using bioinformatics tools Library Preparation • Firstly, DNA is fragmented either enzymatically or by sonication (excitation using ultrasound) to create smaller strands. • Adaptors (short, double-stranded pieces of synthetic DNA – 80 bps) are then ligated to these fragments with the help of DNA ligase The adaptor • The adaptors has three sections: 1. Sequencer binding site: Allows fragments of DNA to bind to the surface of the chip or beads in the sequencer. Complementary to a set of oligos already bound to the chip when it arrives 2. Index (barcode): identify which sample is which as we can sequence multiple samples at once 3. Sequencing primer binding site: primers will bind here to allow the polymerase to bind and extend during the sequencing reaction Clonal Amplification • template DNA is amplified via droplet or bridge PCR to generate millions of template molecules for the sequencing reaction. Sequencing • During the sequencing reaction, each nucleotide position on the template molecule generates an optical (e.g., fluorescence) or chemical signal (e.g., pH) in response to a process, such as nucleotide addition on a growing complementary strand. • This signal is recorded Data Analysis • Optical or chemical signal is processed to determine sequence Assembly Consensus: TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC ATTGTTCCCACAGACCG CGGCGAAGCATTGTTCC ACCGTGTTTTCCGACCG AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC ATGCCGGCGAAGCATTGT ACAGACCGTGTTTCCCGA TAATGCGACCTCGATGCC AAGCATTGTTCCCACAG TGTTTTCCGACCGAAAT TGCCGGCGAAGCCTTGT CCGACCGAAATGGCTCC Assembly Consensus: TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC ATTGTTCCCACAGACCG CGGCGAAGCATTGTTCC ACCGTGTTTTCCGACCG AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC ATGCCGGCGAAGCATTGT ACAGACCGTGTTTCCCGA TAATGCGACCTCGATGCC AAGCATTGTTCCCACAG TGTTTTCCGACCGAAAT TGCCGGCGAAGCCTTGT CCGACCGAAATGGCTCC Coverage: # of reads underlying the consensus Assembly Consensus: TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC ATTGTTCCCACAGACCG CGGCGAAGCATTGTTCC ACCGTGTTTTCCGACCG AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC ATGCCGGCGAAGCATTGT ACAGACCGTGTTTCCCGA TAATGCGACCTCGATGCC AAGCATTGTTCCCACAG TGTTTTCCGACCGAAAT TGCCGGCGAAGCCTTGT CCGACCGAAATGGCTCC 6x coverage 100% identity Coverage: # of reads underlying the consensus Assembly Consensus: TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC ATTGTTCCCACAGACCG CGGCGAAGCATTGTTCC ACCGTGTTTTCCGACCG AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC ATGCCGGCGAAGCATTGT ACAGACCGTGTTTCCCGA TAATGCGACCTCGATGCC AAGCATTGTTCCCACAG TGTTTTCCGACCGAAAT TGCCGGCGAAGCCTTGT CCGACCGAAATGGCTCC 5x coverage 80% identity Coverage: # of reads underlying the consensus Assembly Consensus: TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC ATTGTTCCCACAGACCG CGGCGAAGCATTGTTCC ACCGTGTTTTCCGACCG AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC ATGCCGGCGAAGCATTGT ACAGACCGTGTTTCCCGA TAATGCGACCTCGATGCC AAGCATTGTTCCCACAG TGTTTTCCGACCGAAAT TGCCGGCGAAGCCTTGT CCGACCGAAATGGCTCC 2x coverage 50% identity Coverage: # of reads underlying the consensus Assembly Consensus: TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC ATTGTTCCCACAGACCG CGGCGAAGCATTGTTCC ACCGTGTTTTCCGACCG AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC ATGCCGGCGAAGCATTGT ACAGACCGTGTTTCCCGA TAATGCGACCTCGATGCC AAGCATTGTTCCCACAG TGTTTTCCGACCGAAAT TGCCGGCGAAGCCTTGT CCGACCGAAATGGCTCC 1x coverage Coverage: # of reads underlying the consensus Third Generation Sequencing • Unlike previous generations of sequencing technologies, which relied on short reads, 3rd GS generates long reads that can span entire genomic regions, providing researchers with a more complete picture of the genome. • 3rd GS is based on several different technologies, including single-molecule realtime sequencing (SMRT), and nanopore sequencing SMRT • In SMRT sequencing, a single DNA molecule is immobilized on a polymerase enzyme, which then incorporates fluorescently labeled nucleotides as it reads the DNA sequence in real-time. This generates a long, continuous read that can be several kilobases in length. Nanopore • works by threading DNA through a tiny pore and measuring changes in electrical conductivity as the DNA bases pass through. • This generates a continuous read that can also be several kilobases in length. Advantages of Third Generation Sequencing • the ability to generate long reads that can span entire genomic regions. • Useful for studying complex genomic regions, such as repetitive regions, structural variations, and regions with high GC content, which are difficult or impossible to sequence using short-read technologies • Long reads can help resolve ambiguities in the genome, such as determining the exact location of gene boundaries, identifying alternative splicing events, and detecting mutations. Disadvantages Target Identification and Validation Uses of DNA sequencing in Biopharma Biomarker Discovery Clinical Trial Design Precision Medicine Bioproduction