Biological Database Lecture 1 PDF
Document Details
Uploaded by Deleted User
Dr. Amira El_Zeiny
Tags
Summary
This lecture introduces the fundamental concepts of biological databases. It discusses the architecture and components of these databases, differentiating between primary, secondary, and composite databases. The material also explores traditional file-based systems and their limitations.
Full Transcript
Lecture 1: Introduction To Biological Databases Dr. Amira El_Zeiny *DNA Database: Overview There are two major components to bioinformatics: A.Storing and retrieving information about DNA sequencing. 1. Biological databases. 2. Querying these to retrieve data B...
Lecture 1: Introduction To Biological Databases Dr. Amira El_Zeiny *DNA Database: Overview There are two major components to bioinformatics: A.Storing and retrieving information about DNA sequencing. 1. Biological databases. 2. Querying these to retrieve data B. Manipulating the data–tools. 1. Finding features on sequences. 2. Sequence similarity searches. (DNA Alignment) 3. Protein families and function prediction. 4. Comparing sequences. *DNA Database: Overview There are three main types of biological databases: Database Types Primary Databases Secondary Databases Composite Databases These types of databases are the This type of databases are first analysed result of the primary database. compared and then filtered based on Computational algorithms are applied to Archival databases desired criteria. the primary database and meaningful The initial data are taken from the and informative data is stored inside the It archives the experimental results primary database, and then they are secondary database. submitted by the scientists. merged together based on certain Example of DB. The data are given accession conditions. InterPro (protein families, motifs, numbers when they are entered into It helps in searching sequences and domains). the database. rapidly. UniProt Knowledgebase (sequence Nucleic Acid Databases are Example: OWL, NRD, and and functional information on GenBank and DDBJ or Protein Swissport +TREMBL proteins). database 6 Traditional File-Based Systems A file is simply a collection of records, which contains logically related data. Each record contains a logically connected set of one or more fields, where each field represents some characteristic of the real-world object that is being modeled. Limitations of the File-Based Approach 1. Data redundancy and inconsistency Multiple file formats, duplication of information in different files 2. Difficulty in accessing data Need to write a new program to carry out each new task 3. Data isolation — multiple files and formats 4. Integrity problems Integrity constraints (e.g., account balance > 0) become “buried” in program code rather than being stated explicitly Hard to add new constraints or change existing ones Limitations of the File-Based Approach (Cont.) 5. Atomicity of updates Failures may leave database in an inconsistent state with partial updates carried out Example: Transfer of funds from one account to another should either complete or not happen at all 6. Concurrent access by multiple users Concurrent access needed for performance Uncontrolled concurrent accesses can lead to inconsistencies 7. Security problems Hard to provide user access to some, but not all, data Database systems offer solutions to all the above problems What is Database? Database: A shared collection of logically related data, and a description of this data, designed to meet the information needs of an organization. DBMS: A software system that enables users to define, create, maintain, and control access to the database. What is Database? A Database is a collection of related data The Database Management System (DBMS) is the software that manages and controls access to the database. A Database Application is simply a program that interacts with the database at some point in its execution. A Database System contains not only the database itself but also a complete definition or description of the database structure and constraints stored in the catalog (called meta-data). The database and the metadata are managed by DBMS and accessed by database applications TYPES OF DATABASES Hierarchical database Network database Relational database Object-oriented database Relational Database A DBMS is said to be a Relational DBMS or RDBMS if the database relationships are treated in the form of a table. there are three keys on relational DBMS 1)relation 2)domain 3)attributes. Database Schemas The overall description of the database is called the database schema External schemas (subschema) Correspond to different views of the data. Conceptual schema (database schema) Describes all the entities, attributes, and relationships together with integrity constraints Internal schema Containing the definitions of stored records, the methods of representation, the data fields, and the indexes and storage structures used. Database Schemas Mapping Translate information from one level to the next External/conceptual mapping Conceptual/internal mapping Provide data independence The Database Management System (DBMS) DBMS provides the following facilities: It allows users to define the database, usually through a Data Definition Language (DDL). It allows users to insert, update, delete, and retrieve data from the database, usually through a Data Manipulation Language (DML) as SQL. It provides controlled access to the database. For example: o security system, which prevents unauthorized users accessing the database; o integrity system, which maintains the consistency of stored data; o a \concurrency control system, which allows shared access of the database; o a recovery system, which restores the database to a previous consistent state following a hardware or software failure; o a user-accessible catalog, which contains descriptions of the data in the database.