NGS Data Analysis - MBI-31 PDF
Document Details
Uploaded by Deleted User
Jamia Millia Islamia (Central University), New Delhi
Dr. Khalid Raza
Tags
Summary
This document provides an overview of a course on NGS data analysis, specifically Unit 1: Course Overview & Introduction. The learning objectives focus on understanding and analyzing large-scale NGS experiments and using biological networks.
Full Transcript
COURSE DESCRIPTION The living systems are very complex to understand. Advances i...
COURSE DESCRIPTION The living systems are very complex to understand. Advances in computing and information technology facilitate understanding these systems in an easier way. MBI-31 This course aims to provide an advance understanding of living systems through computation. The complexity NGS DATA ANALYSIS of these systems, however, provides challenges for software and algorithms, and often requires entirely UNIT : #1 novel approaches in computer science. RNA sequencing COURSE OVERVIEW & INTRODUCTION has revolutionized the ways we can study biological processes. RNA-Seq is much more accurate and overtaken the microarrays. This course covers the Dr. Khalid Raza fundamentals of Next Generation Sequencing (NGS), DEPARTMENT OF COMPUTER SCIENCE principles and use of practical tools to understand Jamia Millia Islamia (Central University), New Delhi complex biological systems. [email protected] | Web site: www.kraza.in SYLLABUS 1. Introduction to NGS: Generation to DNA sequencing technologies, A Typical NGS Experimental Workflow, Different NGS Platforms – Illumina, Ion Torrent Semiconductor Sequencing, Pacific Biosciences SMRT, ONT Nanopore; Major Applications of NGS LEARNING OBJECTIVES 2. Base Calling, Quality Control & Read Mapping: Base Calling, FASTQ File Format, Base Quality Score, NGS Data Quality Control and Preprocessing; Reads Mapping – Mapping Approaches and Algorithms, Selection of Mapping Algorithms and Reference Genome Sequences, SAM/BAM as the Standard Mapping File Format, Mapping Understand, and analyze large-scale NGS File Examination and Operation, Tertiary Analysis, NGS Data Storage, Transfer, and Sharing, Computing Power experiments. Required for NGS Data Analysis, Bioinformatics Skills & Software Required for NGS Data Analysis. 3. Transcriptomics by RNA-Seq: Principle of RNA-Seq; Experimental Design: Factorial Design, Replication and Perform quality control, read mapping Randomization, Sample Preparation, Sequencing Strategy; RNA-Seq Data Analysis: Data Quality Control and Reads Mapping, RNA-Seq Data Normalization, Identification of Differentially Expressed Genes, Differential Splicing Analysis, Analysis of an RNA-sequencing study. Visualization of RNA-Seq Data, Functional Analysis of Identified Genes; RNA-Seq as a Discovery Tool. Small RNA Sequencing: Data Generation, Preprocessing, Mapping, Identification of Known and Putative Small RNA Species, Learn to use Linux operating system / Online Platform Normalization, Identification of Differentially Expressed Small RNAs, Functional Analysis of Identified Small RNAs. Understand and quantify TPs, FPs, and other metrics. 4. Genotyping and Genomic Variation Discovery: Data Preprocessing, Mapping, Realignment, and Recalibration; Single Nucleotide Variant (SNV) and Indel Calling: SNV Calling, Identification of de novo Mutations, Indel Calling, Understand the new paradigms of biological networks Variant Calling from RNA-Seq Data, Variant Call Format (VCF) File, Evaluating VCF Results. Structural Variant (SV) Calling: Read-Pair-Based SV Calling, Breakpoint Determination, De novo Assembly-Based SV Detection, CNV (gene regulatory networks) Detection, Integrated SV Analysis; Annotation of Called Variants, Testing of Variant Association with Diseases or Traits. Various applications of NGS 5. De novo Genome Assembly & ChIP-Seq Analysis: Genomic Factors and Sequencing Strategies for de novo Assembly, Genomic Factors That Affect de novo Assembly, Sequencing Strategies for de novo Assembly; Assembly of Contigs, Sequence Data Preprocessing, Error Correction, and Assessment of Genome Characteristics, Contig Assembly Algorithms; Scaffolding, Assembly Quality Evaluation, Gap Closure, Limitations and Future Development. Principle of ChIP-Seq, Experimental Design: Experimental Control, Sequencing Depth, Replication; Read Mapping, Peak Calling, and Peak Visualization, Differential Binding Analysis, Functional Analysis, Motif Analysis, Integrated ChIP-Seq Data Analysis. SOFTWARE TOOLS PRE-REQUISITES S.No. Tools Tentative problem type Introduction to Bioinformatics 1. Linux operating Basic commands & tools system Introduction to sequence analysis (BioLinux) 2. FastQC Sequence read quality check and improvement Basic concept of statistics 3. Mapping tools Mapping reads to known genomes under the Knowledge of computation (Hands-on) Linux. 4. Samtools Working with aligned SAM/BAM file 5. DESeq2/Cufflink Using statistical models available in DESeq2 and (Hands-on) other Bioconductor packages for background correction, normalization, and analysis of DEGs. 6. Genome browsers Exploring and hands-on using Genome browser. (Hands-on) 7. IGV Sequence reads and alignment visualization GENERAL GUIDELINES EVALUATION (THEORY) Quizzes Joint the classroom on scheduled time. Assignments Attend classes regularly. Mid-Term Evaluation Submit your assignments on time. Weblems Please co-operate me during lectures. Semester-End Examination Copying and plagiarism of any kind in the assignment is strictly prohibited. A very short history of DNA sequencing “I started from the conviction that, if different DNA species exhibited different biological activities, there should also exist chemically demonstrable differences between DNAs”. INTRODUCTION Edwin Chargaff A very short history of DNA sequencing A very short history of DNA sequencing Discovery of double helix structure of DNA 1953. Nobel Price: 1962 Chargaff discovered two rules that lead to the discovery of double helix Thomas Watson Francis Crick structure of DNA (Chargaff’s rule, 1950): Robert Holley, first to sequence nucleic acid during 1964-65. i) In DNA, n(G) = n(C), and n(A) = n(T). Nobel Price: 1968 This hinted about base-pair makeup of DNA. Developed sequencing method for ii) The relative amounts of A, G, C, and DNA 1977. T bases vary from species to species. This hinted that DNA rather than protein could Nobel Price: 1980 be the genetic material. Frederic Sanger Walter Gilbert A very short history of DNA sequencing A very short history of DNA sequencing MILESTONES MILESTONES First Isolation of DNA : 1867 (Freidrich Meisher) Virus-3222 (Bacteriophage phi X 174), G=C and A=T however, the G/C and A/T content of Size: 5,386 nt, (1977) different organisms vary (Edwin Chargaff, 1950) Bacteria-2289 (Haemophilus influenza), Double-helix model of DNA (Watson & Crick, 1953) Size 1.8 x 106 nt, (1995) G/C content measured by annealing (Mandel & Marmur, Archaea-152 (Methanococcus jannaschi), 1968) Size: 1.7 x 106 nt 1(1996) Robert Holley, first to sequence nucleic acid during 1964- Eukarya-168 65. (S. cerevisiae) Size: 1.2 x 107 nt, (1995); Maxam-Gilbert Sequencing (1976-77) (H. sapien) Size: 3 x 109 nt, (2001) Sanger Sequencing (1977) Next-Generation Sequencing (2005) Third Generation Sequencing (2008 onwards) DNA/GENOME SEQUENCING GENOME SEQUENCING Goal TG..GT TC..CC AC..GC CG..CA Identifying the order of nucleotides across a genome TT..TC AC..GC GA..GC TG..AC CT..TG GT..GC AC..GC AC..GC AA..GC AT..AT TT..CC Problem Current DNA sequencing methods can handle only Genome Short fragments of DNA Short DNA sequences short stretches of DNA at once (