Summary

This document provides an overview of biological databases. It discusses the different types of databases, such as primary, secondary, and composite. It explains their uses in research and highlights the importance of data quality and accessibility in the field.

Full Transcript

B.Sc. (Hons) Botany/ B.Sc. (P) Life Science Semester III DSE-2-Biostatistics & Bioinformatics for Plant Sciences BIOINFORMATICS Pratical Biological databases (NCBI, EMBL, UniProt, PDB, PlantPepDB) 22/09/23 BIOLOGICAL DATABASES The Biological database a...

B.Sc. (Hons) Botany/ B.Sc. (P) Life Science Semester III DSE-2-Biostatistics & Bioinformatics for Plant Sciences BIOINFORMATICS Pratical Biological databases (NCBI, EMBL, UniProt, PDB, PlantPepDB) 22/09/23 BIOLOGICAL DATABASES The Biological database as a collection of data that is structured, searchable, updated periodically, and cross-referenced. The database administrator updates these data from time to time by editing existing data and adding new data. Biological Databases serve a critical purpose in the collection and organization of data related to biological systems. They provide a computational support and a user-friendly interface to a researcher for a meaningful analysis of biological data. Aids systematization of results from biological experiments and analysis. So helps in organization and storing of all known data which prevents recomputing and duplication of experiments Biological Databases make data available to scientists at one place and help them to obtain data for their research and cross-validation. Biological data is stored in computer readable format which is fundamental step for biological data analysis. v The databases consisting of biological data like proteinsequencing, molecular structure, DNA sequences, etc in an organized form. v Scientists, researchers from all over the world enter their experiment data and results in a biological database so that it is available to a wider audience. v Biological databases are free to use and contain a huge collection of a variety of biological data. v One of the first databases to emerge was GenBank, which is a collection of all available protein and DNA sequences. v It is maintained by the National Institutes of Health (NIH) and the National Centre for Biotechnology Information (NCBI). v The data stored in biological database consists of two types: Raw and curated (or annotated). KINDS OF BIOLOGICAL DATABASES Nucleotide Sequence Data Base Protein Sequences Data Base Gene Expression Data Base Clinical Databases Metabolic Pathways Database Structural Databases Need for databases in Biology Need for storing and communicating large datasets has grown. Need to disseminate biological information. Provide Organized data for analysis friendly retrieval. Need to make biological data available in computer-readable form. Types of Biological Databases Primary Databases or Archival Databases Includes original data from researchers. These databases are public and offer open access to scientist for annotation purpose. Primary databases are repository of raw sequenced and annotated data that and signifies important properties of each sequence. These primary databases can be accessed freely through internet over World Wide Web (W.W.W). Primary databases can be further classified as given below. Sequence database: Sequence database stores information on sequence of DNA/nucleotide and protein i. DNA/nucleotide database: DNA/nucleotide databases stores data on DNA/nucleotide sequence. Each database maintains an own set of submission and retrieval tools, but they exchange data daily so that all the databases should contain the same set of sequences. Some important examples of DNA/nucleotide databases ii. Protein database: Primary protein sequence databases store information on protein sequence. Some primary protein sequence databases Types of Biological Databases Secondary Databases It contains results from the analysis of entries of primary database. These databases are either manually curated or automatically generated. Secondary databases comprise data derived from the results of analysing primary data. Secondary databases often draw upon information from numerous sources, including other databases (primary and secondary), controlled vocabularies and the scientific literature. They contain information such as the conserved sequence, signature sequence, and active-site residues of protein families arrived at by multiple sequence alignment of a set of related proteins Curated databases are maintained by one or more curators who select, input, or invite only highest quality data from the selected research centres and database communities. The quality of the data is of utmost importance, whearas the quantity of data being deposited is secondary Types of Biological Databases Composite Databases Combines different primary databases sources. This makes querying from and searching multiple resources more efficient. Although these are complied from various primary databases, non-redundancy is maintained by filtering multiple data from different primary database sources. This merges a variety of different primary database sources, which avoids the need to search multiple resources. Different composite database use different combinations of primary database and different criteria in their search algorithm. Examples OWL (Non-Redundant composite of the four publicly-available primary sources : Swiss-Prot, PIR, Genbank (Translation) and NRL-3D Non-Redundant Database (NRDB) BioSilico Primary database Secondary database Synonyms Archival database Curated database; knowledgebase x Source of data Direct submission of experimentally-derived data Results of analysis, literature research and from researchers interpretation, often of data in primary databases Examples ENA, GenBank and DDBJ (nucleotide InterPro (protein families, motifs and sequence) ArrayExpress and GEO (functional domains) UniProt Knowledgebase (sequence and genomics data) Protein Data Bank (PDB; functional information on coordinates of three-dimensional macromolecular proteins) Ensembl (variation, function, regulation structures) and more layered onto whole genome sequences) NUCLEOTIDE DATABASES Ø DNA sequence, genes, gene products (proteins), mutation, gene coding, distribution patterns, motifs Ø Genomics: genome, gene structure and expression, genetic map, genetic disorder Ø RNA sequence, secondary structure, 3D structure, Ø DNA databases – NCBI (GenBank), EMBL, DDBJ PROTEIN DATABASES A protein database is one or more datasets about proteins, which could include a protein's amino acid sequence, conformation, structure, and features such as active sites. Examples of Protein Data Bases- PDB, PIR, UniProt NUCLEOTIDE DATA BASES NCBI (National Center for Biotechnology Information) Databases OR GENBANK Data Base (October 1988) established as a part of the National Library of Medicine at the national Institutes of Health, Bethesda, Maryland, USA. Its aim was to create public databases, develop software tools for sequence analysis and disseminate biomedical information, mainly to aid the research in computational Biology. All of NCBI's databases and software tools are available on the NCBI web page https://www.ncbi.nlm.nih.gov GenBank: GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. Nucleotide: collection of sequences from several sources, including GenBank, RefSeq, TPA and PDB. Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery." “NCBI HOME PAGE” EMBL (EUROPEAN MOLECULAR BIOLOGY LABORATORY) The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl.html ) is a central activity of the European Bioinformatics Institute (EBI) (http://www.ebi.ac.uk), an EMBL outstation located at the Wellcome Trust Genome Campus in Hinxton, near Cambridge, UK. This database is the European part of an THE EMBL NUCLEOTIDE SEQUENCE DATABASE The EMBL Database collects, organizes and distributes database of nucleotide sequence data and related biological information from the public sources. It is a part of GenBank(USA) and DNA Data base of Japan (DDBJ). This aims to collect and present nucleotide sequence and annotation with comprehensive global approach. Key Goal of EMBL Nucleotide Sequence Database is to build, maintain and prepare biological databses and other computational services to support data deposition and data analysis and make it available to scientific communities https://www.embl.org SWISS-PROT/UniProtKB” Uni-PROT is an annotated protein sequence database, which was created at the Department of Medical Biochemistry of the University of Geneva since 1987. UniProtKB/Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase. www.uniprot.org. This protein sequence and knowledge database is valued for its high quality annotation, the usage of standardized nomenclature, direct links to specialized databases, and the minimal redundancy. PROTEIN DATA BASES PDB (PROTEIN DATA BANK) Ø Protein Data Bank (RCBS PDB) began in 1970's by a group of the young crystallographers, including Edgar Meyer, Gerson Coheon and Helen M Berman. Ø The project was initiated in 1971 to list the structures of all the amino acids using neutron diffraction and has grown as a major international resource. It contains: Ø The three-dimensional structures of the biological macromolecules data available with PDB is determined by experimental methods such as X-ray crystallography, Nuclear magnetic resonance (NMR) spectroscopy, electron microscopy etc. Ø https://www.rcsb.org PlantPepDB PlantPepDB is a manually curated database of plant peptides having different functions and therapeutic activities. It contains 3848 peptide entries collected from 11 databases and 835 published research articles. Each entry provides comprehensive information about a peptide that includes its sequence, functional information, source, physicochemical properties, tertiary structure, etc. These peptides have been collected and complied from published databases and research articles. On the basis of their activities, we classified the peptides in 9 categories. Simple Search facilitates the user to search for any field and display any field (e.g. peptide name, plant source, peptide family, peptide name, etc.). Advanced Search provided more advance features such that user can perform search on multiple fields at a time using various conditional operators. AA Composition Search facilitates the user to search for peptides based on particular amino acid count as well as amino acid percentage. Advanced Physico Chemical Search provides advanced features to search the peptides according to physicochemical properties of the peptides. User should have pre- handed information about the physicochemical properties to narrow down the search. USES OF BIOLOGICAL DATABASES : Ø It helps the researchers to study the available data Ø It helps scientists to understand the concepts of biological phenomena. Ø It helps to understand molecular mechanism of diseases which better help in treatment, cure and efficient diagnosis. Ø It helps to development of personalized medicine to prescribed best suited drug. Ø It is used in gene therapy for the treatment of genetic diseases by changing the expression of victim gene/Person gene. Ø It also helps in drug designing and drug development. Ø In addition to these it also help in Bioweapon creation, evolutionary studies, Crop improvement and improving nutritional quality

Use Quizgecko on...
Browser
Browser