Podcast
Questions and Answers
Why is bioinformatics essential in modern biological research?
Why is bioinformatics essential in modern biological research?
- It reduces the cost of genetic sequencing projects.
- It helps manage and visualize the rapid growth of biological information. (correct)
- It standardizes the publication of research findings.
- It eliminates the need for traditional laboratory experiments.
Which of the following is a key element that distinguishes a biological database from other forms of data storage like web pages or journal articles?
Which of the following is a key element that distinguishes a biological database from other forms of data storage like web pages or journal articles?
- Peer-reviewed data.
- A specific tool for searching and data extraction. (correct)
- Multimedia content such as images and videos.
- Accessibility to the general public.
What is the primary role of bibliographic databases in bioinformatics?
What is the primary role of bibliographic databases in bioinformatics?
- Classifying organisms based on evolutionary relationships.
- Abstracting medical and scientific literature. (correct)
- Providing a direct link to protein structure data.
- Storing and analyzing genomic sequences.
How do nucleotide sequence databases such as INSDC, DDBJ, GenBank, and EMBL-Bank ensure data consistency?
How do nucleotide sequence databases such as INSDC, DDBJ, GenBank, and EMBL-Bank ensure data consistency?
Which type of biological database is most useful for researchers studying the expression profiles of genes in different cell types?
Which type of biological database is most useful for researchers studying the expression profiles of genes in different cell types?
What characteristic defines a primary biological database?
What characteristic defines a primary biological database?
What is the main function of the NCBI?
What is the main function of the NCBI?
What is the key difference between a primary and a derivative sequence database?
What is the key difference between a primary and a derivative sequence database?
Which of the following databases is NOT part of the International Nucleotide Sequence Database Collaboration (INSDC)?
Which of the following databases is NOT part of the International Nucleotide Sequence Database Collaboration (INSDC)?
The INSDC databases (GenBank, EMBL-Bank, DDBJ) primarily serve as:
The INSDC databases (GenBank, EMBL-Bank, DDBJ) primarily serve as:
Under the GenBank divisions what does ENV represent?
Under the GenBank divisions what does ENV represent?
How are GenBank records typically organized?
How are GenBank records typically organized?
What feature distinguishes 'organismal' divisions from 'functional' divisions?
What feature distinguishes 'organismal' divisions from 'functional' divisions?
What is the primary purpose of a RefSeq database?
What is the primary purpose of a RefSeq database?
Which of the following is a key characteristic of RefSeq accessions?
Which of the following is a key characteristic of RefSeq accessions?
If you want to find publications related to a specific protein, what is the most efficient first step using PubMed?
If you want to find publications related to a specific protein, what is the most efficient first step using PubMed?
What is a key feature of PubMed's search functionality regarding case sensitivity?
What is a key feature of PubMed's search functionality regarding case sensitivity?
In a GenBank record, what information does the 'DEFINITION' line provide?
In a GenBank record, what information does the 'DEFINITION' line provide?
If a researcher wants to find the three collaborating databases that share data nightly, which of these databases is most useful to start with?
If a researcher wants to find the three collaborating databases that share data nightly, which of these databases is most useful to start with?
Which type of database would contain protein motifs of a particular protein family?
Which type of database would contain protein motifs of a particular protein family?
What is the key function of a taxonomic database?
What is the key function of a taxonomic database?
What must all databases have to allow for a specific search and data extraction?
What must all databases have to allow for a specific search and data extraction?
What kinds of genomic and transcriptomic fragments belong in DDBJ/EMBL/GenBank?
What kinds of genomic and transcriptomic fragments belong in DDBJ/EMBL/GenBank?
Which database serves as primary nucleotide sequence database?
Which database serves as primary nucleotide sequence database?
How can data be submitted to GenBank?
How can data be submitted to GenBank?
True or False. Genomic, mRNA, and protein data exists in primary databases.
True or False. Genomic, mRNA, and protein data exists in primary databases.
What is the first thing to click in Pubmed to find your desired paper?
What is the first thing to click in Pubmed to find your desired paper?
Which of the following is NOT a feature of GenBank?
Which of the following is NOT a feature of GenBank?
If you are examining the growing trends to use genetic information, which graph would be best to examine to examine its development? (Assume both graphs have a linear time dimension)
If you are examining the growing trends to use genetic information, which graph would be best to examine to examine its development? (Assume both graphs have a linear time dimension)
What type of information regarding the components, quality and process for genetic information is present in GenBank records?
What type of information regarding the components, quality and process for genetic information is present in GenBank records?
When looking for high quality genetic samples in GenBank records which accession version should be used?
When looking for high quality genetic samples in GenBank records which accession version should be used?
What does AFS1 mean in the following DESCRIPTION "Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds."?
What does AFS1 mean in the following DESCRIPTION "Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds."?
If a researcher notices that one of the sequences they find online differs from a previous entry, what should one examine?
If a researcher notices that one of the sequences they find online differs from a previous entry, what should one examine?
Which database is a 'derivative database'?
Which database is a 'derivative database'?
From the following list, pick the quality most associated with RefSeq.
From the following list, pick the quality most associated with RefSeq.
If only one paper corresponds, in a PubMed search, and the "description" is desired to be read, which tag must be present?
If only one paper corresponds, in a PubMed search, and the "description" is desired to be read, which tag must be present?
Which of these genetic features is present in Eukaryota genes?
Which of these genetic features is present in Eukaryota genes?
Which type of file would be best to describe both the function and code associated with a protein?
Which type of file would be best to describe both the function and code associated with a protein?
Flashcards
Disseminate (in Biology)
Disseminate (in Biology)
To spread or distribute biological data and information widely.
Computer-Readable Data
Computer-Readable Data
Biological data available in a format readable by computers.
Allow Data Analysis
Allow Data Analysis
Allows for the examination and assessment of biological information.
Tools for Data Extraction
Tools for Data Extraction
Signup and view all the flashcards
Biological Sequences
Biological Sequences
Signup and view all the flashcards
Bibliographic Databases
Bibliographic Databases
Signup and view all the flashcards
Taxonomic Databases
Taxonomic Databases
Signup and view all the flashcards
Nucleotide Databases
Nucleotide Databases
Signup and view all the flashcards
Genomic Databases
Genomic Databases
Signup and view all the flashcards
Protein Databases
Protein Databases
Signup and view all the flashcards
Microarray Databases
Microarray Databases
Signup and view all the flashcards
Primary Databases
Primary Databases
Signup and view all the flashcards
Derivative Databases
Derivative Databases
Signup and view all the flashcards
GenBank
GenBank
Signup and view all the flashcards
Archival Nature
Archival Nature
Signup and view all the flashcards
INSDC
INSDC
Signup and view all the flashcards
EMBL-Bank
EMBL-Bank
Signup and view all the flashcards
DDBJ
DDBJ
Signup and view all the flashcards
GenBank at NCBI
GenBank at NCBI
Signup and view all the flashcards
Accession Number
Accession Number
Signup and view all the flashcards
Version
Version
Signup and view all the flashcards
The Sequence
The Sequence
Signup and view all the flashcards
RefSeq
RefSeq
Signup and view all the flashcards
PubMed
PubMed
Signup and view all the flashcards
Case Sensitivity
Case Sensitivity
Signup and view all the flashcards
Study Notes
- Biological databases covered include bibliographic, taxonomic, nucleotide, genomic, protein, and microarray databases.
- Bioinformatics is used to keep pace with information growth, discover knowledge, visualize data, and globalize research.
Biological Information Categories:
- Nucleic acids include DNA sequence, genes, gene products (proteins), mutation, gene coding, distribution patterns, and motifs.
- Genomics covers genome, gene structure and expression, genetic map, and genetic disorders.
- RNA sequence includes secondary structure, 3D structure, and interactions.
- Proteins consist of protein sequence, corresponding gene, secondary structure, 3D structure, function, motifs, homology, and interactions.
- Proteomics involves expression profiles and proteins in disease processes.
- Ligands and drugs include inhibitors, activators, substrates, and metabolites.
- Pathways contain molecular networks, biological chain events, regulation, feedback, and kinetic data.
Function categories
- Binding sites, interactions, and molecular action, such as binding and chemical reaction.
- Biological effects through signaling, transport, feedback, and regulation
- Functional relationships involve protein families, motifs, and homologs.
Biological Databases Purpose
- Disseminate biological data and information.
- Provide biological data in computer-readable form.
- Allow for analysis of biological data.
- A database requires a specific tool for searching and data extraction.
- Web pages, books, journal articles, tables, text files, and spreadsheets are not considered databases.
Objects in biology
- Sequences, extended sequences (topologies), domains (sec. structure cartoons), 3D structure
- Diagrams (hydrophobicity profiles, helical circles) and 3D cartoons
Biological Databases and the Web
- Interconnections exist between bibliographic data, phylogeny (taxonomy), 3D structures, DNA sequences, and protein sequences with keyword, taxonomic, structure, DNA, and protein sequence similarities.
Bibliographic Databases
- Available in machine-readable form in the early 1960s.
- MEDLINE is accessible through EBI.
- PUBMED is accessible through NCBI.
- EMBASE is a commercial product
- BIOSIS is the inheritor of the old Biological Abstracts.
- CAB is International, maintaining abstract databases in agriculture and parasitic diseases.
- AGRICOLA is for the agricultural field.
Taxonomic Databases
- The Taxonomy Browser is a taxonomic database maintained by the NCBI.
- It is hierarchical and sequence-based, aiming to centralize the classification of all organisms
- There are other taxonomy resources: NEWT, The Tree of Life project, Species 2000, International Organization for Plant Information, and Integrated Taxonomic Information System.
Nucleotide Databases
- The International Nucleotide Sequence Database Collaboration (INSDC) is a joint operation by EMBL-Bank at the European Bioinformatics Institute (EBI), the DNA Data Bank of Japan (DDBJ) at the Center for Information Biology (CIB) and GenBank at the National Center for Biotechnology Information (NCBI).
- DDBJ, GenBank and EMBL-Bank exchange new and updated data daily for optimal synchronisation.
- The exchange means all databases should contain the same data, except for sequences added in the last 24 hours.
Genomic Databases
- Genomic databases exists for organisms like humans, rice, rat, and Drosophila.
- These were in conventionally published catalogues of genes or mutations
- Most of these have been made available in electronic form and new databases have been developed.
- Genomic databases vary in data scope and storage methods.
Protein Databases
- Protein databases can be grouped as simple sequence archives or annotated databases with added information.
- Primary protein sequence databases: UniProtKB/Swiss-Prot
- Specialised protein sequence databases: GOA and ENZYME
- Secondary protein databases: InterPro
- Structure databases: PDB
Microarrays and Gene Expression Databases
- Microarray technology utilizes resources created by genome projects to answer questions
- What genes are expressed in specific cell types of an organism at particular times and conditions?
Biological Databases Inter-connectivity
- Shows the importance of the correlation between different biological databases, allowing researchers to connect unknown DNA and proteins to structures and bibliographies.
Databases: Molecules to Systems
- Components of biological systems from nucleotide sequences to complete systems
INSDC
- The International Nucleotide Sequence Database Collaboration.
- It involves NIH, NCBI GenBank, EMBL, EBI SRS, NIG, CIB, and getentry.
Primary Databases
- These feature raw and redundant data
- Data is submitted, "owned," and updated by experimentalists, GenBank, EMBL-Bank, DDBJ
Derivative Databases
- Human-cured derivative databases compile and cure data, for example, GEO Datasets , Structure & Literature databases.
- Computationally-Derived examples: UniGene, HomoloGene, PubChem Compound
- Combination examples: RefSeq, Genome Assembly, Conserved Domain and Structure databases
DDBJ/EMBL/GenBank
- They are primary nucleotide sequence databases that serve as repositories for scientists' data and are archival.
- These databases catalog regulatory and individual genes, large regions with several genes, complete genomes, cDNAs (mRNA), and various RNA types.
- The databases also include genes from different strains, the same gene/organism published by others, or complete/partial genes (regions).
GenBank
- It is NCBI's primary sequence database and a nucleotide sequence database which is archival
- GenBank data comes through direct submissions, batch submissions via email (EST, GSS, STS), and ftp accounts.
- Data is shared nightly among GenBank, the DNA Database of Japan (DDBJ), and the European Molecular Biology Laboratory Database (EMBL).
NCBI
- The National Center for Biotechnology Information was created in 1988 as part of the National Library of Medicine at NIH.
- It Establishes public databases, researches computational biology, develops software tools for sequence analysis, and disseminates biomedical information.
Genbank is divided into organization divisions
- Traditional divisions that focus on organismal-linked characteristics such as species, or bulk divisions which are based on functionality.
- Records are divided into 18 Divisions with twelve traditional and 6 bulk
GenBank Divisions (Traditional)
- PRI (Primate), PLN (Plant and Fungal), BCT (Bacterial and Archeal), INV (Invertebrate), ROD (Rodent), VRL (Viral), VRT (Other Vertebrate), MAM (Mammalian), PHG (Phage), SYN (Synthetic), ENV (Environmental Samples), UNA (Unannotated)
- Organized by taxonomy,
- Direct Submissions
- More accurate
GenBank Divisions (Bulk)
- EST (Expressed Sequence Tag), GSS (Genome Survey Sequence), HTG (High Throughput Genomic), STS (Sequence Tagged Site), HTC (High Throughput cDNA), PAT (Patent)
- organized by sequence type,
- Batch Submission,
- Less accuate
The FlatFile Format in GenBank
- Is an indexed set of terms
- However the sequence data is not indexed, use Blast.
Key Features of GenBank
- It is predominantly a nucleotide sequence database.
- The database is archival
- Stable accession numbers are assigned to each record.
- Data submissions occur directly via the Web, through bulk email submissions, and via FTP for sequencing centers.
- It relies on collaborations with other databases.
Sequence Revision History
- Revision Accession update records
- RefSeq Sequence Database and non-redundant sequences.
- Validated by hand.
- The RefSeq database is a primary sequence database, meaning that its sequence is data. It is linked nucleotide and proteins.
RefSeq
- It is NCBI's Derivative Sequence Database that features curated transcripts and proteins
- Features reviewed human, mouse, rat, fruit fly, zebrafish, arabidopsis, microbial genomes (proteins).
- assembled genomic regions(contigs) which includes: human, mouse and rat. black poplar, zebrafish cow and dog.
- Reference Genomic is for records and Chromosomes number
- Microarrays are used to identify what genes are being expressed for time and testing
RefSeq Accession Numbers
- mRNAs and Proteins such as contigs and Supercontigs
- NG is linked between Curated mRNA, Curated Protein (NP_123456), Curated non-coding RNA (NR_123456), Predicted Protein, is Reference Genomic Sequence, and Microbial replicon.
RefSeq Benefits
- Benefits include non-redundancy, explicitly linked nucleotide and protein sequences, reflects current sequence data and biology, data validation, format consistency, a distinct accession series, and stewardship by NCBI staff and collaborators
Finding a protein by its name using Pub Med
- First, locate the specialist for bioinformatics.
- Then navigate to Pub med
- Type in dUTPase in the For window and click the Go button.
- Look for the affiliated author names.
Subroutines provided to support a single-minded protein
- FileSave option
- When using search engines used for medical needs there is an instruction for a blue rectangle
- This brings to the publishers content whether it is scientific and or a book.
- PubMed searches are case-insensitive.
- Display all options
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.