Document Details

mxrieen

Uploaded by mxrieen

CSJMU Kanpur, India

Dr Gaurav Kumar

Tags

Transcriptomics molecular biology gene expression biology

Summary

This document provides a detailed overview of transcriptomics, including various techniques such as microarrays, SAGE (Serial Analysis of Gene Expression), and real-time PCR. It explains the principles, applications, and advantages of each method in studying gene expression.

Full Transcript

Metabolic Engg/ Transcriptomics Dr Gaurav Kumar Assistant Professor, CSJMU Kanpur, India System Biology METABOLOMICS Transcription Evolution of Transcription Technology Publication over the Years RNA seq Microarray SAGE/EST SAGE (Serial Analysis of Gene Expression) SAGE (Serial An...

Metabolic Engg/ Transcriptomics Dr Gaurav Kumar Assistant Professor, CSJMU Kanpur, India System Biology METABOLOMICS Transcription Evolution of Transcription Technology Publication over the Years RNA seq Microarray SAGE/EST SAGE (Serial Analysis of Gene Expression) SAGE (Serial Analysis of Gene Expression) Isolation of mRNA: SAGE begins with the isolation of messenger RNA (mRNA) from a sample of interest. mRNA represents the active genes in a cell because it carries the genetic information from DNA to the ribosomes, where proteins are synthesized. Creation of SAGE Libraries: The isolated mRNA is converted into short cDNA (complementary DNA) tags. These tags are typically 10-14 base pairs long and represent a snapshot of the genes that are being expressed in the cell at the time of sampling. Concatenation and Cloning: These cDNA tags are then concatenated or linked together to form long chains of tags. Each chain of concatenated tags is cloned into a vector for amplification and sequencing. This cloning step ensures that multiple copies of each tag are available for analysis. Sequencing: The cloned tags are sequenced, which provides a unique digital signature for each gene. The sequences are typically determined using automated DNA sequencers. Data Analysis: Once the tags are sequenced, the data is analyzed to determine the frequency of each tag. This frequency represents the expression level of the corresponding gene in the original sample. By comparing the tag frequencies in different samples, researchers can identify genes that are upregulated or downregulated under specific conditions. Microarray A microarray is a laboratory tool used to detect the expression of thousands of genes at the same time. DNA microarrays are microscope slides that are printed with thousands of tiny spots in defined positions, with each spot containing a known DNA sequence or gene. Microarray Work Flow Microarray Work Flow The workflow of microarray data processing starts with raw image data acquired with laser scanners and ends with the results of data mining that have to be interpreted by biologists. The microarray data processing workflow includes issues related to (1) data management (2) Image processing (grid alignment, foreground separation, spot quality assessment, data quantification and normalization) (3) Data analysis (identification of differentially expressed genes, data mining, integration with other knowledge sources, and quality and repeatability assessments of results (4) Biological interpretation (visualization). The main objective of this project is related to image processing, namely grid alignment, foreground separation, spot quality assessment, data quantification, normalization and visualization. Microarray for study of gene expression Microarray for study of gene expression • mRNA molecules are typically collected from both an experimental sample and a reference sample. • The two mRNA samples are then converted into complementary DNA (cDNA), and each sample is labeled with a fluorescent probe of a different color • The two samples are then mixed together and allowed to bind to the microarray slide. The process in which the cDNA molecules bind to the DNA probes on the slide is called hybridization. • Following hybridization, the microarray is scanned to measure the expression of each gene printed on the slide. • The data gathered through microarrays can be used to create gene expression profiles, which show simultaneous changes in the expression of many genes in response to a particular condition or treatment. Real-time PCR Real-time polymerase chain reaction (real-time PCR) is commonly used to measure gene expression. It is more sensitive than microarrays in detecting small changes in expression but requires more input RNA and is less adaptable to high-throughput studies. It is best suited for studies of small subsets of genes. Its one major shortcoming is that the sequence of the specific target gene of interest must be known (so you can design the PCR primers), hence real-time PCR can only be used for studying known genes. Real-time PCR Real-time PCR involves conversion of RNA to cDNA via reverse transcription, followed by several rounds of PCR to amplify and detect the genes of interest. The products can be detected in ‘real-time’ by using SYBR-green or Taqman probes. 1.Sample Preparation: In RT-PCR, a sample containing the nucleic acid of interest is first processed to extract and purify the target RNA. This is often done through reverse transcription, where RNA is converted into complementary DNA (cDNA). 2.PCR Amplification: The cDNA is then subjected to PCR, which involves multiple cycles of heating and cooling. In each cycle, the DNA is denatured (separated into single strands), annealed (primers bind to the target DNA), and extended (DNA polymerase creates a copy of the target DNA). 3.Fluorescence Detection: To monitor the progress of the PCR reaction, a fluorescent dye or probe is used that binds to the newly synthesized DNA. As the DNA amplification progresses, the amount of fluorescence increases. 4.CT Value: The CT value is the cycle number at which the fluorescence signal in the PCR reaction crosses a certain threshold, indicating that a significant amount of DNA has been amplified. This threshold is set at a point above the background noise but below the maximum fluorescence signal. The lower the CT value, the more target DNA was present in the initial sample. Limitation of Microarray, RNA seq comes to rescue Novel Transcript, splice variant, Single nucleotide variant, Insertion , Deletion, Gene fusions are not detected by Microarray RNA seq TOP HAT Input Data: TopHat takes as input RNA-Seq reads in the form of short nucleotide sequences, typically in FASTQ format. These reads are generated from high-throughput sequencing technologies like Illumina. Reference Genome/Transcriptome: You need a reference genome or transcriptome to which the reads will be aligned. In many cases, researchers use the reference genome of the species they are studying. Alternatively, if you have a reference transcriptome or a combination of both, TopHat can align the reads to the transcriptome and then map those alignments to the genome. Spliced Alignment: One of the key features of TopHat is its ability to perform spliced alignment. This means that it can align reads that span exon-exon junctions in the genome. This is crucial in RNA-Seq because many eukaryotic genes are split into exons and introns, and the reads can cover both exonic and intronic regions. Data Preprocessing: Quality Control: Assess the quality of the raw sequencing data. Trim or filter low-quality reads if necessary. Read Alignment: Map the clean reads to a reference genome or transcriptome. Alternatively, you can perform de novo assembly if there is no reference available. Read Count Quantification: Count the number of reads that align to each gene or transcript. This step creates a matrix of read counts, where each row corresponds to a gene and each column corresponds to a sample. Normalization: Normalize the read counts to account for differences in sequencing depth and library size between samples. Common methods include TPM (Transcripts Per Million) or FPKM (Fragments Per Kilobase of transcript per Million mapped reads). Statistical Analysis: Apply statistical tests to identify differentially expressed genes. Common methods include edgeR, DESeq2, and limma-voom. These tests compare the expression levels between groups and calculate statistical significance (adjusted p-values or q-values). Adjust for Multiple Testing: Single cell Seq Mining Functional Genes of Plants Throughout their evolution, medicinal plants have developed various regulatory mechanisms to counter external stresses to adapt to their environments. Functional gene mining involves the identification of associated biosynthetic pathways, genes encoding key enzymes, and plant regulatory mechanisms, which helps improve our understanding of plant molecular biology Development of Molecular Markers Based on Transcriptome Sequencing Simple sequence repeat (SSR) markers, also known as microsatellite DNA markers, are one of the most commonly used microsatellite markers. The core sequence of the tandem repeat is of one to six base pairs, and dinucleotide repeats are the most common sequences. Exploring the Biosynthetic Pathways of Secondary Metabolites Transcriptome sequencing is used to study the biosynthetic pathways of secondary metabolites and mine biosynthesis-related genes in different environments, growth stages, and organs. Transcriptomics and Developmental Mechanisms in Medicinal Plants Transcriptomics has been used to study the differences in gene expression in medicinal plants under abiotic stress and to identify genes that affect the growth and development of medicinal plants and resistance to external stress. This information can help identify the key influencing factors in the growth and development process, provide a basis for the cultivation and breeding of medicinal plants, and facilitate the targeted selection of better varieties. Biological interpretation of gene expression data Heatmaps and clustering In heat maps the data is displayed in a grid where each row represents a gene and each column represents a sample. The colour and intensity of the boxes is used to represent gene expression Cluster analysis aims to group the large number of genes present in a sample of gene expression profile data, such that similar or related genes are in same clusters, and different or unrelated genes are in distinct ones. Bioinformatics Analysis Signaling Pathways •Transmits signals from membrane to gene regulation. •Its function is enigmatic as some of the molecules involved are common to different functions and how cross-interaction is avoided is unknown. www.hprd.org from Pierre deMeyts KEGG Pathways GSEA and David Bioinformatics (Databases of Pathway Enrichment) Gene Regulatory Network The identification of hub genes Network Construction: First, you need to have a network representation of your biological data. These networks are typically represented as graphs, where nodes represent genes or proteins, and edges represent interactions or connections between them. Network Analysis: Perform network analysis to quantify the importance of each node (gene) within the network. Several network centrality measures can be used, but two of the most common ones are: a. Degree Centrality: It measures the number of connections (edges) a node (gene) has. Nodes with a high degree are considered more central in the network. b. Betweenness Centrality: It quantifies the number of times a node acts as a bridge along the shortest path between other nodes. Nodes with high betweenness centrality often connect different parts of the network and are considered important. Biological Interpretation: Once you have identified hub genes, you should perform functional enrichment analysis and pathway analysis to understand the biological significance of these genes. Summary of Analysis

Use Quizgecko on...
Browser
Browser