Biological Databases and Bioinformatics

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Why is bioinformatics essential in modern biological research?

It reduces the cost of genetic sequencing projects.
It helps manage and visualize the rapid growth of biological information. (correct)
It standardizes the publication of research findings.
It eliminates the need for traditional laboratory experiments.

Which of the following is a key element that distinguishes a biological database from other forms of data storage like web pages or journal articles?

Peer-reviewed data.
A specific tool for searching and data extraction. (correct)
Multimedia content such as images and videos.
Accessibility to the general public.

What is the primary role of bibliographic databases in bioinformatics?

Classifying organisms based on evolutionary relationships.
Abstracting medical and scientific literature. (correct)
Providing a direct link to protein structure data.
Storing and analyzing genomic sequences.

How do nucleotide sequence databases such as INSDC, DDBJ, GenBank, and EMBL-Bank ensure data consistency?

By exchanging new and updated data on a daily basis. (A)

Signup and view all the answers

Which type of biological database is most useful for researchers studying the expression profiles of genes in different cell types?

Microarray databases. (A)

Signup and view all the answers

What characteristic defines a primary biological database?

It archives raw, unanalyzed data submitted by experimentalists. (A)

Signup and view all the answers

What is the main function of the NCBI?

To create public databases and provide software for analyzing genomic data. (A)

Signup and view all the answers

What is the key difference between a primary and a derivative sequence database?

Primary databases contain raw data submitted by researchers, while derivative databases contain processed and curated data. (B)

Signup and view all the answers

Which of the following databases is NOT part of the International Nucleotide Sequence Database Collaboration (INSDC)?

Swiss-Prot. (C)

Signup and view all the answers

The INSDC databases (GenBank, EMBL-Bank, DDBJ) primarily serve as:

Repositories for scientist's data. (B)

Signup and view all the answers

Under the GenBank divisions what does ENV represent?

Environmental Samples. (C)

Signup and view all the answers

How are GenBank records typically organized?

Into traditional divisions, including organismal and functional groupings. (D)

Signup and view all the answers

What feature distinguishes 'organismal' divisions from 'functional' divisions?

Organismal divisions are organized by taxonomy, while functional divisions are organized by sequence type. (C)

Signup and view all the answers

What is the primary purpose of a RefSeq database?

To provide a curated, non-redundant set of reference sequences. (D)

Signup and view all the answers

Which of the following is a key characteristic of RefSeq accessions?

They are updated to reflect current sequence data and biology. (B)

Signup and view all the answers

If you want to find publications related to a specific protein, what is the most efficient first step using PubMed?

Type the protein name in the search bar. (D)

Signup and view all the answers

What is a key feature of PubMed's search functionality regarding case sensitivity?

Searches are case-insensitive, so capitalization does not affect results. (B)

Signup and view all the answers

In a GenBank record, what information does the 'DEFINITION' line provide?

A summary of the sequence and its biological context. (B)

Signup and view all the answers

If a researcher wants to find the three collaborating databases that share data nightly, which of these databases is most useful to start with?

GenBank. (B)

Signup and view all the answers

Which type of database would contain protein motifs of a particular protein family?

Specialized protein database. (B)

Signup and view all the answers

What is the key function of a taxonomic database?

Classifying all organisms. (C)

Signup and view all the answers

What must all databases have to allow for a specific search and data extraction?

Specific tool for searching and data extraction. (B)

Signup and view all the answers

What kinds of genomic and transcriptomic fragments belong in DDBJ/EMBL/GenBank?

Contains all kinds of sequence information. (C)

Signup and view all the answers

Which database serves as primary nucleotide sequence database?

DDBJ/EMBL/GenBank. (D)

Signup and view all the answers

How can data be submitted to GenBank?

All of the above. (D)

Signup and view all the answers

True or False. Genomic, mRNA, and protein data exists in primary databases.

True (B)

Signup and view all the answers

What is the first thing to click in Pubmed to find your desired paper?

Click the small arrow to the right of the Display drop-down menu. (D)

Signup and view all the answers

Which of the following is NOT a feature of GenBank?

Requires paid access. (A)

Signup and view all the answers

If you are examining the growing trends to use genetic information, which graph would be best to examine to examine its development? (Assume both graphs have a linear time dimension)

The growth of GenBank. (A)

Signup and view all the answers

What type of information regarding the components, quality and process for genetic information is present in GenBank records?

FEATURES. (B)

Signup and view all the answers

When looking for high quality genetic samples in GenBank records which accession version should be used?

VERSION (B)

Signup and view all the answers

What does AFS1 mean in the following DESCRIPTION "Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds."?

Gene. (A)

Signup and view all the answers

If a researcher notices that one of the sequences they find online differs from a previous entry, what should one examine?

The version record. (C)

Signup and view all the answers

Which database is a 'derivative database'?

RefSeq. (D)

Signup and view all the answers

From the following list, pick the quality most associated with RefSeq.

Curated. (A)

Signup and view all the answers

If only one paper corresponds, in a PubMed search, and the "description" is desired to be read, which tag must be present?

Abstract. (A)

Signup and view all the answers

Which of these genetic features is present in Eukaryota genes?

Presence of introns, with only proteins for production. (B)

Signup and view all the answers

Which type of file would be best to describe both the function and code associated with a protein?

Flat (D)

Signup and view all the answers

Flashcards

Disseminate (in Biology)

To spread or distribute biological data and information widely.

Computer-Readable Data

Biological data available in a format readable by computers.

Allow Data Analysis

Allows for the examination and assessment of biological information.

Tools for Data Extraction

At the bare minimum, the database should have a tool to search and extract data.

Signup and view all the flashcards

Biological Sequences

Representation of biological sequences, like DNA or proteins.

Signup and view all the flashcards

Bibliographic Databases

Databases focusing on published scientific literature.

Signup and view all the flashcards

Taxonomic Databases

Databases that categorize and classify organisms.

Signup and view all the flashcards

Nucleotide Databases

Databases of nucleotide sequences.

Signup and view all the flashcards

Genomic Databases

Databases containing genomic information.

Signup and view all the flashcards

Protein Databases

Databases focused on protein information.

Signup and view all the flashcards

Microarray Databases

Databases for microarray experiment data.

Signup and view all the flashcards

Primary Databases

Primary databases containing raw, submitted experimental data.

Signup and view all the flashcards

Derivative Databases

Derivative databases with compiled, curated data.

Signup and view all the flashcards

GenBank

NCBI's primary sequence database for nucleotides.

Signup and view all the flashcards

Archival Nature

A nucleotide archive.

Signup and view all the flashcards

INSDC

International Nucleotide Sequence Database Collaboration.

Signup and view all the flashcards

EMBL-Bank

One of the members of the INSDC located in the European Bioinformatics Institute.

Signup and view all the flashcards

DDBJ

One of the members of the INSDC located in Japan.

Signup and view all the flashcards

GenBank at NCBI

One of the members of the INSDC located at the National Center for Biotechnology Information in USA.

Signup and view all the flashcards

Accession Number

A unique identifier for a sequence.

Signup and view all the flashcards

Version

Changes made to a sequence over time.

Signup and view all the flashcards

The Sequence

The key information that a DNA is made of

Signup and view all the flashcards

RefSeq

Database with non-redundant sequences by NCBI.

Signup and view all the flashcards

PubMed

NCBI's portal for biomedical literature.

Signup and view all the flashcards

Case Sensitivity

Allows text to be case-sensative instead of case-insensitive.

Signup and view all the flashcards

Study Notes

Biological databases covered include bibliographic, taxonomic, nucleotide, genomic, protein, and microarray databases.
Bioinformatics is used to keep pace with information growth, discover knowledge, visualize data, and globalize research.

Biological Information Categories:

Nucleic acids include DNA sequence, genes, gene products (proteins), mutation, gene coding, distribution patterns, and motifs.
Genomics covers genome, gene structure and expression, genetic map, and genetic disorders.
RNA sequence includes secondary structure, 3D structure, and interactions.
Proteins consist of protein sequence, corresponding gene, secondary structure, 3D structure, function, motifs, homology, and interactions.
Proteomics involves expression profiles and proteins in disease processes.
Ligands and drugs include inhibitors, activators, substrates, and metabolites.
Pathways contain molecular networks, biological chain events, regulation, feedback, and kinetic data.

Function categories

Binding sites, interactions, and molecular action, such as binding and chemical reaction.
Biological effects through signaling, transport, feedback, and regulation
Functional relationships involve protein families, motifs, and homologs.

Biological Databases Purpose

Disseminate biological data and information.
Provide biological data in computer-readable form.
Allow for analysis of biological data.
A database requires a specific tool for searching and data extraction.
Web pages, books, journal articles, tables, text files, and spreadsheets are not considered databases.

Objects in biology

Sequences, extended sequences (topologies), domains (sec. structure cartoons), 3D structure
Diagrams (hydrophobicity profiles, helical circles) and 3D cartoons

Biological Databases and the Web

Interconnections exist between bibliographic data, phylogeny (taxonomy), 3D structures, DNA sequences, and protein sequences with keyword, taxonomic, structure, DNA, and protein sequence similarities.

Bibliographic Databases

Available in machine-readable form in the early 1960s.
MEDLINE is accessible through EBI.
PUBMED is accessible through NCBI.
EMBASE is a commercial product
BIOSIS is the inheritor of the old Biological Abstracts.
CAB is International, maintaining abstract databases in agriculture and parasitic diseases.
AGRICOLA is for the agricultural field.

Taxonomic Databases

The Taxonomy Browser is a taxonomic database maintained by the NCBI.
It is hierarchical and sequence-based, aiming to centralize the classification of all organisms
There are other taxonomy resources: NEWT, The Tree of Life project, Species 2000, International Organization for Plant Information, and Integrated Taxonomic Information System.

Nucleotide Databases

The International Nucleotide Sequence Database Collaboration (INSDC) is a joint operation by EMBL-Bank at the European Bioinformatics Institute (EBI), the DNA Data Bank of Japan (DDBJ) at the Center for Information Biology (CIB) and GenBank at the National Center for Biotechnology Information (NCBI).
DDBJ, GenBank and EMBL-Bank exchange new and updated data daily for optimal synchronisation.
The exchange means all databases should contain the same data, except for sequences added in the last 24 hours.

Genomic Databases

Genomic databases exists for organisms like humans, rice, rat, and Drosophila.
These were in conventionally published catalogues of genes or mutations
Most of these have been made available in electronic form and new databases have been developed.
Genomic databases vary in data scope and storage methods.

Protein Databases

Protein databases can be grouped as simple sequence archives or annotated databases with added information.
Primary protein sequence databases: UniProtKB/Swiss-Prot
Specialised protein sequence databases: GOA and ENZYME
Secondary protein databases: InterPro
Structure databases: PDB

Microarrays and Gene Expression Databases

Microarray technology utilizes resources created by genome projects to answer questions
What genes are expressed in specific cell types of an organism at particular times and conditions?

Biological Databases Inter-connectivity

Shows the importance of the correlation between different biological databases, allowing researchers to connect unknown DNA and proteins to structures and bibliographies.

Databases: Molecules to Systems

Components of biological systems from nucleotide sequences to complete systems

INSDC

The International Nucleotide Sequence Database Collaboration.
It involves NIH, NCBI GenBank, EMBL, EBI SRS, NIG, CIB, and getentry.

Primary Databases

These feature raw and redundant data
Data is submitted, "owned," and updated by experimentalists, GenBank, EMBL-Bank, DDBJ

Derivative Databases

Human-cured derivative databases compile and cure data, for example, GEO Datasets , Structure & Literature databases.
Computationally-Derived examples: UniGene, HomoloGene, PubChem Compound
Combination examples: RefSeq, Genome Assembly, Conserved Domain and Structure databases

DDBJ/EMBL/GenBank

They are primary nucleotide sequence databases that serve as repositories for scientists' data and are archival.
These databases catalog regulatory and individual genes, large regions with several genes, complete genomes, cDNAs (mRNA), and various RNA types.
The databases also include genes from different strains, the same gene/organism published by others, or complete/partial genes (regions).

GenBank

It is NCBI's primary sequence database and a nucleotide sequence database which is archival
GenBank data comes through direct submissions, batch submissions via email (EST, GSS, STS), and ftp accounts.
Data is shared nightly among GenBank, the DNA Database of Japan (DDBJ), and the European Molecular Biology Laboratory Database (EMBL).

NCBI

The National Center for Biotechnology Information was created in 1988 as part of the National Library of Medicine at NIH.
It Establishes public databases, researches computational biology, develops software tools for sequence analysis, and disseminates biomedical information.

Genbank is divided into organization divisions

Traditional divisions that focus on organismal-linked characteristics such as species, or bulk divisions which are based on functionality.
Records are divided into 18 Divisions with twelve traditional and 6 bulk

GenBank Divisions (Traditional)

PRI (Primate), PLN (Plant and Fungal), BCT (Bacterial and Archeal), INV (Invertebrate), ROD (Rodent), VRL (Viral), VRT (Other Vertebrate), MAM (Mammalian), PHG (Phage), SYN (Synthetic), ENV (Environmental Samples), UNA (Unannotated)
Organized by taxonomy,
Direct Submissions
More accurate

GenBank Divisions (Bulk)

EST (Expressed Sequence Tag), GSS (Genome Survey Sequence), HTG (High Throughput Genomic), STS (Sequence Tagged Site), HTC (High Throughput cDNA), PAT (Patent)
organized by sequence type,
Batch Submission,
Less accuate

The FlatFile Format in GenBank

Is an indexed set of terms
However the sequence data is not indexed, use Blast.

Key Features of GenBank

It is predominantly a nucleotide sequence database.
The database is archival
Stable accession numbers are assigned to each record.
Data submissions occur directly via the Web, through bulk email submissions, and via FTP for sequencing centers.
It relies on collaborations with other databases.

Sequence Revision History

Revision Accession update records
RefSeq Sequence Database and non-redundant sequences.
Validated by hand.
The RefSeq database is a primary sequence database, meaning that its sequence is data. It is linked nucleotide and proteins.

RefSeq

It is NCBI's Derivative Sequence Database that features curated transcripts and proteins
Features reviewed human, mouse, rat, fruit fly, zebrafish, arabidopsis, microbial genomes (proteins).
assembled genomic regions(contigs) which includes: human, mouse and rat. black poplar, zebrafish cow and dog.
Reference Genomic is for records and Chromosomes number
Microarrays are used to identify what genes are being expressed for time and testing

RefSeq Accession Numbers

mRNAs and Proteins such as contigs and Supercontigs
NG is linked between Curated mRNA, Curated Protein (NP_123456), Curated non-coding RNA (NR_123456), Predicted Protein, is Reference Genomic Sequence, and Microbial replicon.

RefSeq Benefits

Benefits include non-redundancy, explicitly linked nucleotide and protein sequences, reflects current sequence data and biology, data validation, format consistency, a distinct accession series, and stewardship by NCBI staff and collaborators

Finding a protein by its name using Pub Med

First, locate the specialist for bioinformatics.
Then navigate to Pub med
Type in dUTPase in the For window and click the Go button.
Look for the affiliated author names.

Subroutines provided to support a single-minded protein

FileSave option
When using search engines used for medical needs there is an instruction for a blue rectangle
This brings to the publishers content whether it is scientific and or a book.
PubMed searches are case-insensitive.
Display all options

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Biological Databases and Bioinformatics

Choose a study mode

Podcast

Questions and Answers

Why is bioinformatics essential in modern biological research?

Which of the following is a key element that distinguishes a biological database from other forms of data storage like web pages or journal articles?

What is the primary role of bibliographic databases in bioinformatics?

How do nucleotide sequence databases such as INSDC, DDBJ, GenBank, and EMBL-Bank ensure data consistency?

Which type of biological database is most useful for researchers studying the expression profiles of genes in different cell types?

What characteristic defines a primary biological database?

What is the main function of the NCBI?

What is the key difference between a primary and a derivative sequence database?

Which of the following databases is NOT part of the International Nucleotide Sequence Database Collaboration (INSDC)?

The INSDC databases (GenBank, EMBL-Bank, DDBJ) primarily serve as:

Under the GenBank divisions what does ENV represent?

How are GenBank records typically organized?

What feature distinguishes 'organismal' divisions from 'functional' divisions?

What is the primary purpose of a RefSeq database?

Which of the following is a key characteristic of RefSeq accessions?

If you want to find publications related to a specific protein, what is the most efficient first step using PubMed?

What is a key feature of PubMed's search functionality regarding case sensitivity?

In a GenBank record, what information does the 'DEFINITION' line provide?

If a researcher wants to find the three collaborating databases that share data nightly, which of these databases is most useful to start with?

Which type of database would contain protein motifs of a particular protein family?

What is the key function of a taxonomic database?

What must all databases have to allow for a specific search and data extraction?

What kinds of genomic and transcriptomic fragments belong in DDBJ/EMBL/GenBank?

Which database serves as primary nucleotide sequence database?

How can data be submitted to GenBank?

True or False. Genomic, mRNA, and protein data exists in primary databases.

What is the first thing to click in Pubmed to find your desired paper?

Which of the following is NOT a feature of GenBank?

If you are examining the growing trends to use genetic information, which graph would be best to examine to examine its development? (Assume both graphs have a linear time dimension)

What type of information regarding the components, quality and process for genetic information is present in GenBank records?

When looking for high quality genetic samples in GenBank records which accession version should be used?

What does AFS1 mean in the following DESCRIPTION "Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds."?

If a researcher notices that one of the sequences they find online differs from a previous entry, what should one examine?

Which database is a 'derivative database'?

From the following list, pick the quality most associated with RefSeq.

If only one paper corresponds, in a PubMed search, and the "description" is desired to be read, which tag must be present?

Which of these genetic features is present in Eukaryota genes?

Which type of file would be best to describe both the function and code associated with a protein?

Flashcards

Disseminate (in Biology)

Computer-Readable Data

Allow Data Analysis

Tools for Data Extraction

Biological Sequences

Bibliographic Databases

Taxonomic Databases

Nucleotide Databases

Genomic Databases

Protein Databases

Microarray Databases

Primary Databases

Derivative Databases

GenBank

Archival Nature

INSDC

EMBL-Bank

DDBJ

GenBank at NCBI

Accession Number

Version

The Sequence

RefSeq

PubMed

Case Sensitivity

Study Notes

Biological Information Categories:

Function categories

Biological Databases Purpose

Objects in biology

Biological Databases and the Web

Bibliographic Databases

Taxonomic Databases

Nucleotide Databases

Genomic Databases

Protein Databases

Microarrays and Gene Expression Databases