Podcast
Questions and Answers
What is the primary goal of merging data from various sources in SWISS-PROT?
What is the primary goal of merging data from various sources in SWISS-PROT?
Which of these is NOT a criterion used by SWISS-PROT to distinguish itself from other protein sequence databases?
Which of these is NOT a criterion used by SWISS-PROT to distinguish itself from other protein sequence databases?
Which of the following is considered core data in SWISS-PROT?
Which of the following is considered core data in SWISS-PROT?
What are the three main categories of information found in SWISS-PROT annotation?
What are the three main categories of information found in SWISS-PROT annotation?
Signup and view all the answers
What is the primary function of the Protein Information Resource (PIR)?
What is the primary function of the Protein Information Resource (PIR)?
Signup and view all the answers
Which of the following databases is NOT included in the PIR-NREF database?
Which of the following databases is NOT included in the PIR-NREF database?
Signup and view all the answers
What is the main purpose of the superfamily classification in PIR-PSD?
What is the main purpose of the superfamily classification in PIR-PSD?
Signup and view all the answers
Which of the following is NOT a feature of the Martinsried Institute for Protein Sequence (MIPS) database?
Which of the following is NOT a feature of the Martinsried Institute for Protein Sequence (MIPS) database?
Signup and view all the answers
What is the purpose of the TREMBL database?
What is the purpose of the TREMBL database?
Signup and view all the answers
SWISS-PROT always includes complete source code for all proteins in the database.
SWISS-PROT always includes complete source code for all proteins in the database.
Signup and view all the answers
The superfamily classification in PIR-PSD is based on the assumption that protein families are non-overlapping.
The superfamily classification in PIR-PSD is based on the assumption that protein families are non-overlapping.
Signup and view all the answers
MIPS primarily focuses on collecting and annotating protein sequences from yeast organisms.
MIPS primarily focuses on collecting and annotating protein sequences from yeast organisms.
Signup and view all the answers
One of the key goals of PIR-NREF is to minimize redundancy in protein sequence data by integrating sequences from multiple databases.
One of the key goals of PIR-NREF is to minimize redundancy in protein sequence data by integrating sequences from multiple databases.
Signup and view all the answers
The TREMBL database is intended to be a complete and comprehensive source of protein sequences, including all known variants and isoforms.
The TREMBL database is intended to be a complete and comprehensive source of protein sequences, including all known variants and isoforms.
Signup and view all the answers
What is the main reason for the inclusion of TREMBL in SWISS-PROT releases?
What is the main reason for the inclusion of TREMBL in SWISS-PROT releases?
Signup and view all the answers
Study Notes
SWISS-PROT Database
- SWISS-PROT is a protein sequence database, established in 1986, and maintained by the Department of Medical Biochemistry at the University of Geneva and the EMBL Data Library.
- The database contains sequence entries comprised of different line types with unique formats.
- It follows the EMBL nucleotide sequence database format closely for standardization.
- SWISS-PROT is distinguished from other databases by its annotation, minimal redundancy, and integration with other databases.
Annotation
- SWISS-PROT data includes core data (sequence information, citations, taxonomic data) and annotation.
- Annotation details include protein function, post-translational modifications (e.g., carbohydrates, phosphorylation), domains and sites (e.g., calcium binding, ATP binding), secondary/tertiary structure, protein similarities, diseases associated with deficiency, sequence conflicts, and variants.
- Annotation is mainly located in comment lines (CC), feature tables (FT), and keyword lines (KW). Comments are categorized by "topics" for efficient data retrieval.
Minimal Redundancy
- SWISS-PROT aims to minimize redundancy by merging sequencing reports and indicating conflicts in the feature table of the corresponding entry.
Integration with Other Databases
- SWISS-PROT is cross-referenced with 24 different databases to provide links to related information.
- The databases referenced often contain literature, nucleic acid sequences, protein sequences, protein tertiary structures, and specialized data collections.
- The selected organisms for annotation include Arabidopsis thaliana, Bacillus subtilis, Drosophila melanogaster and more.
PIR (Protein Information Resource)
- PIR is a resource for protein sequence identification and interpretation.
- It was established in 1984, and since 1988, maintained by PIR-International.
- This association includes the Protein Information Resources at NBRF, the international protein information database of Japan (JIPID), and the Martinsried Institute for Protein Sequences (MIPS).
PIR-PSD (Protein Sequence Database)
- PIR-PSD is an annotated protein database with over 283,000 sequences covering the entire taxonomic range.
- It focuses on superfamily classification, superfamily curation (signature domains, member categorization), and bibliography mapping and attribution.
- This enables automated classification of sequences and permits creating alignments and phylogenetic trees.
PIR-NREF (Non-redundant REFerence) Database
- PIR-NREF collects protein sequences from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB, totaling over a million entries.
- It includes identical sequences, identical subsequences, and highly similar sequences (>95% identity) from multiple sources.
- PIR-NREF aids in protein identification and sequence searching across the entire sequence collection or portions of specific genomes.
TREMBL (TRanslation from EMBL)
- The TREMBL database is a supplement to SWISS-PROT, containing translations of coding sequences from the EMBL nucleotide sequence database that haven't been incorporated yet to maintain SWISS-PROT's quality.
- TREMBL is split into SP-TREMBL and REM-TREMBL.
- SP-TREMBL is merged with known SWISS-PROT entries for minimal redundancy.
- REM-TREMBL contains unmerged entries, including sections on immunoglobulins, T-cell receptors, and synthetic/incomplete sequences.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.