Bioinformatics Introduction PDF
Document Details
Uploaded by VividGroup9428
Dr. Ibrahim Zaghloul
Tags
Summary
This document provides an introduction to bioinformatics, covering topics such as the importance of bioinformatics and computational biology, different types of bio-molecules like DNA and RNA, and how to obtain sequences. It also looks at protein structures and genetic variations.
Full Transcript
BIOINFORMATICS(BIOCOMPUTING) (1) INTRODUCTION DR. IBRAHIM ZAGHLOUL IMPORTANCE OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY Computational methods are used to study biological data because: - Massive amount of biological data. - Model the rules for biological processes. - Id...
BIOINFORMATICS(BIOCOMPUTING) (1) INTRODUCTION DR. IBRAHIM ZAGHLOUL IMPORTANCE OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY Computational methods are used to study biological data because: - Massive amount of biological data. - Model the rules for biological processes. - Identify important patterns. - Visualize operations and data. - Create simulations for biological processes. - Predict effects and behaviors. - Suggest mechanisms and actions - Efficient, reduce experimental time and resources. - Identify higher order relationships. 2 A general introduction – what problems are people working on? – how people solve these problems? – what key computational techniques are needed? – how much help computing has provided to biological research? A way of thinking -- tackling “biological problems” computationally – how to look at a “biological problem” from a computational point of view? – how to formulate a computational problem to address a biological issue? – how to collect statistics from biological data? – how to build a “computational” model? – how to solve a computational modeling problem? – how to test and evaluate a computational algorithm? 3 WHAT IS BIOINFORMATICS? (Molecular) Bio-informatics Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry), and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics), to understand and organize the information associated with these molecules, on a large-scale. Bioinformatics is a practical discipline with many applications. 4 SCALES OF LIFE ANIMAL CELL 5 TWO KINDS OF CELLS Prokaryotes – no nucleus (bacteria) – Their genomes are circular Eukaryotes – have nucleus (animal,plants) – Linear genomes with multiple chromosomes in pairs. When pairing up, they look like 6 7 Differences between organisms comes from different DNA. 8 DNA: THE SEQUENCE OF LIFE What is DNA? It is a long molecule that contains our unique genetic code. It holds the instructions for making all the proteins in our bodies. Made of a chemical called deoxyribonucleic acid, or DNA for short. DNA contains four basic building blocks or ‘bases’: adenine (A), cytosine (C), guanine (G) and thymine (T) 9 MOLECULAR BIOLOGY INFORMATION - DNA 10 11 12 13 14 15 16 GENES AND PROTEINS 17 GENES AND PROTEINS Transcription: The enzyme RNA polymerase uses DNA as a template to produce a pre-mRNA transcript and then a mature mRNA. Translation: mRNA is translated to build the protein molecule (polypeptide) encoded by the original gene. The ribosome reads the mRNA to produce a chain of amino acids. https://www.youtube.com/watch?v=gG7uCskUOrA&t=161s 18 DNA TO PROTEIN 19 DNA TRANSCRIPTION 20 DNA TRANSCRIPTION Transcription takes place in the nucleus. It uses DNA as a template to make an RNA (mRNA) molecule. During transcription, a strand of mRNA is made that is complementary to a strand of DNA. 21 MRNA TRANSLATION Translation is the process by which a protein is synthesized from the information contained in a messenger RNA (mRNA). During translation, an mRNA sequence is read using the genetic code, which is a set of rules that defines how an mRNA sequence is to be translated into the 20- letter code of amino acids. amino acids are the building blocks of proteins. 22 GENES AND PROTEINS One gene encodes one protein. Like a program, it starts with start codon (e.g. ATG), then each three code one amino acid. Then a stop codon (e.g. TGA) signifies end of the gene. Sometimes, in the middle of a (eukaryotic) gene, there are introns that are spliced out (as junk) during transcription. Good parts are called exons. This is the task of gene finding. 23 CODON TO AMINO ACID TABLE 24 CODON TO AMINO ACID 25 GENES AND PROTEINS Amino acids: the building blocks of proteins, there are 20 different amino acids. A protein: is composed of one or more long chains of amino acids that corresponds to the DNA sequence of the encoding gene. A protein is a basic structure that is found in all of life. It’s a molecule. Proteins play a variety of important roles in the body. https://www.youtube.com/watch?v=gG7uCskUOrA&t=161s 26 AMINO ACIDS 27 FROM DNA TO PROTEINS 28 PROTEIN FOLDING AND STRUCTURE Protein amino acid sequence is folded into a 3-D structure. Folding: Physical process in which the 3D structure of a protein is formed. A conformation that is usually biologically functional Function is related to structure: Different structure implies different function. [Wikipedia] 29 PROTEIN STRUCTURE https://www.rcsb.org/structure/3ERT 30 PROTEIN FOLDING AND STRUCTURE 31 GENETIC VARIANT EFFECT PREDICTION 32 GENETIC VARIANT EFFECT PREDICTION 33 GENETIC VARIANT EFFECT PREDICTION 34 GENETIC VARIATIONS Causes of variations 1. Mistakes in DNA replication 2. Environmental agents (radiation, chemical agents) 3. Transposable elements (transposons) A part of DNA is moved or copied to another location in genome 4. Horizontal transfer of DNA Organism obtains genetic material from another organism that is not its parent Utilized in genetic engineering GENETIC VARIATIONS CONT. Types of variations: 1. SNP (Single Nucleotide Polymorphisms) 2. Indels (Insertion-Deletion) 3. Inversion 4. Duplication Back GENETIC VARIATIONS CONT. SNP (Single Nucleotide Polymorphisms) Ref G A C T T C G A T C A Sample G A C G T C G A T C A Back Synonymous GENETIC VARIATIONS Non-synonymous SNP SNP (sSNP) (nsSNP) Ref G A C T T C G A T C A G DFDQ Sample G A C G T C G A T C A A DVDQ Back GENETIC VARIATIONS Indels (Insertion-Deletion) Insertion Ref G A C T - - - - T C G Sample G A C T C G A T T C G GENETIC VARIATIONS CONT. Deletion Ref G A C T T C G A T C A Sample G A C - - C G A T C A GENETIC VARIATIONS CONT. Inversion REF G A C T T C G A T C A Sample G A C G C T T A T C A Back GENETIC VARIATIONS CONT. Duplication REF G A C G T C G A T C A ReSeq G A C G T C G T C C A Back DNA: THE SEQUENCE OF LIFE Nucleic acids: Chemical molecules that copy information about fundamental determinants of life. DNA: is a string where each letter is a nucleotide {A, C, T, G}. A genome of a living organism is its entire DNA. Living organisms have up to trillions of cells (according to the organism type). Each cell of the same living organism contains the same genome. DNA varies in length from a few million nucleotides (bacteria) to a few billion nucleotides (mammals). 43 DNA SEQUENCE DNA is usually double-stranded, with one strand being the complement of the other: Complement of A is T Complement of C is G. Thus, a letter of DNA is usually called a base-pair, not a nucleotide since, it actually represents two nucleotides: one in a DNA strand , and one in its reverse complement strand. A DNA strand has a start and an end, and such direction from start to end is important. For example, consider the following DNA: ATGTCAGGC TACAGTCCG 44 DNA SEQUENCE Each nucleotide is physically bound to its complementary nucleotide below it, forming one base-pair. Assuming that the start-to-end direction of the top strand is from left to right, the start-to-end direction of the bottom strand will be from right to left. Normally, we write DNA strings or substrings such that it starts at left and ends at right. Thus, the DNA consists of two these two strands: ATGTC and its reverse complement: GACAT. When working with DNA sequences, both strands are equally important, and they are not equivalent. DNA is a factory of proteins. 45 RNA RNA sequences is usually single-stranded and consist of letters from the 4-sized alphabet of nucleotides {A, C, U, G}. It exists to initiate and regulate protein production from scattered genes. Also, the genome of many viruses is RNA genome. 46 PROTEINS Proteins are short strings (few hundred letters) where each letter is one of the 20 amino acids. Bacteria make around 500 to 1500 proteins , while human genome makes around 100,000 proteins. Each protein is produced by a gene. A gene is a fragment of the DNA. Every three adjacent nucleotides of a gene produces one amino acid letter of the corresponding proteins. Three adjacent nucleotides are called a codon. There are 43 = 64 possible codons. Since 64 > 20, several different codons may produce the same amino acid. Also, there is a special type of codons called stop codons which indicate the end of protein. 47 BIOLOGICAL SEQUENCES So far we have discussed three types of sequences, or strings, in which we are interested DNA sequences which consist of letters from the 4-sized alphabet of nucleotides: {A, C, T, G}. Protein sequences which consist of letters from the 20-sized alphabet of amino acids: {A, R, N, D, C, E, Q, G, H, I, L, K, M, F, P, S, T, W, Y, V} RNA sequences which consist of letters from the 4-sized alphabet of nucleotides: {A, C, U, G}. 48 OBTAINING SEQUENCES Gene Sequences www.ensembl.org https://www.ncbi.nlm.nih.gov/ 49 OBTAINING SEQUENCES Protein Sequences https://www.uniprot.org/ 50