Podcast
Questions and Answers
What is the primary goal of merging data from various sources in SWISS-PROT?
What is the primary goal of merging data from various sources in SWISS-PROT?
- To prioritize data from specific sequencing reports
- To reduce redundancy and provide a comprehensive view of protein information (correct)
- To create a separate entry for each literature report
- To eliminate conflicting information from the database
Which of these is NOT a criterion used by SWISS-PROT to distinguish itself from other protein sequence databases?
Which of these is NOT a criterion used by SWISS-PROT to distinguish itself from other protein sequence databases?
- Minimal redundancy
- Integration with other databases
- Annotation
- Complete source code availability (correct)
Which of the following is considered core data in SWISS-PROT?
Which of the following is considered core data in SWISS-PROT?
- Sequence data (correct)
- Post-translational modifications
- Similarities to other proteins
- Function of the protein
What are the three main categories of information found in SWISS-PROT annotation?
What are the three main categories of information found in SWISS-PROT annotation?
What is the primary function of the Protein Information Resource (PIR)?
What is the primary function of the Protein Information Resource (PIR)?
Which of the following databases is NOT included in the PIR-NREF database?
Which of the following databases is NOT included in the PIR-NREF database?
What is the main purpose of the superfamily classification in PIR-PSD?
What is the main purpose of the superfamily classification in PIR-PSD?
Which of the following is NOT a feature of the Martinsried Institute for Protein Sequence (MIPS) database?
Which of the following is NOT a feature of the Martinsried Institute for Protein Sequence (MIPS) database?
What is the purpose of the TREMBL database?
What is the purpose of the TREMBL database?
SWISS-PROT always includes complete source code for all proteins in the database.
SWISS-PROT always includes complete source code for all proteins in the database.
The superfamily classification in PIR-PSD is based on the assumption that protein families are non-overlapping.
The superfamily classification in PIR-PSD is based on the assumption that protein families are non-overlapping.
MIPS primarily focuses on collecting and annotating protein sequences from yeast organisms.
MIPS primarily focuses on collecting and annotating protein sequences from yeast organisms.
One of the key goals of PIR-NREF is to minimize redundancy in protein sequence data by integrating sequences from multiple databases.
One of the key goals of PIR-NREF is to minimize redundancy in protein sequence data by integrating sequences from multiple databases.
The TREMBL database is intended to be a complete and comprehensive source of protein sequences, including all known variants and isoforms.
The TREMBL database is intended to be a complete and comprehensive source of protein sequences, including all known variants and isoforms.
What is the main reason for the inclusion of TREMBL in SWISS-PROT releases?
What is the main reason for the inclusion of TREMBL in SWISS-PROT releases?
Flashcards
SWISS-PROT
SWISS-PROT
An annotated protein sequence database
1986
1986
Year SWISS-PROT was established.
Annotation
Annotation
Describing protein characteristics like function, modifications, and similarities.
Minimal redundancy
Minimal redundancy
Signup and view all the flashcards
Integration with other databases
Integration with other databases
Signup and view all the flashcards
Core data
Core data
Signup and view all the flashcards
Annotation data
Annotation data
Signup and view all the flashcards
Sequence entries
Sequence entries
Signup and view all the flashcards
EMBL nucleotide sequence database
EMBL nucleotide sequence database
Signup and view all the flashcards
PIR
PIR
Signup and view all the flashcards
Protein Information Resource (PIR)
Protein Information Resource (PIR)
Signup and view all the flashcards
NBRF
NBRF
Signup and view all the flashcards
PIR 1-4
PIR 1-4
Signup and view all the flashcards
Study Notes
SWISS-PROT Database
- SWISS-PROT is a protein sequence database, established in 1986, and maintained by the Department of Medical Biochemistry at the University of Geneva and the EMBL Data Library.
- The database contains sequence entries comprised of different line types with unique formats.
- It follows the EMBL nucleotide sequence database format closely for standardization.
- SWISS-PROT is distinguished from other databases by its annotation, minimal redundancy, and integration with other databases.
Annotation
- SWISS-PROT data includes core data (sequence information, citations, taxonomic data) and annotation.
- Annotation details include protein function, post-translational modifications (e.g., carbohydrates, phosphorylation), domains and sites (e.g., calcium binding, ATP binding), secondary/tertiary structure, protein similarities, diseases associated with deficiency, sequence conflicts, and variants.
- Annotation is mainly located in comment lines (CC), feature tables (FT), and keyword lines (KW). Comments are categorized by "topics" for efficient data retrieval.
Minimal Redundancy
- SWISS-PROT aims to minimize redundancy by merging sequencing reports and indicating conflicts in the feature table of the corresponding entry.
Integration with Other Databases
- SWISS-PROT is cross-referenced with 24 different databases to provide links to related information.
- The databases referenced often contain literature, nucleic acid sequences, protein sequences, protein tertiary structures, and specialized data collections.
- The selected organisms for annotation include Arabidopsis thaliana, Bacillus subtilis, Drosophila melanogaster and more.
PIR (Protein Information Resource)
- PIR is a resource for protein sequence identification and interpretation.
- It was established in 1984, and since 1988, maintained by PIR-International.
- This association includes the Protein Information Resources at NBRF, the international protein information database of Japan (JIPID), and the Martinsried Institute for Protein Sequences (MIPS).
PIR-PSD (Protein Sequence Database)
- PIR-PSD is an annotated protein database with over 283,000 sequences covering the entire taxonomic range.
- It focuses on superfamily classification, superfamily curation (signature domains, member categorization), and bibliography mapping and attribution.
- This enables automated classification of sequences and permits creating alignments and phylogenetic trees.
PIR-NREF (Non-redundant REFerence) Database
- PIR-NREF collects protein sequences from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB, totaling over a million entries.
- It includes identical sequences, identical subsequences, and highly similar sequences (>95% identity) from multiple sources.
- PIR-NREF aids in protein identification and sequence searching across the entire sequence collection or portions of specific genomes.
TREMBL (TRanslation from EMBL)
- The TREMBL database is a supplement to SWISS-PROT, containing translations of coding sequences from the EMBL nucleotide sequence database that haven't been incorporated yet to maintain SWISS-PROT's quality.
- TREMBL is split into SP-TREMBL and REM-TREMBL.
- SP-TREMBL is merged with known SWISS-PROT entries for minimal redundancy.
- REM-TREMBL contains unmerged entries, including sections on immunoglobulins, T-cell receptors, and synthetic/incomplete sequences.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.