Podcast
Questions and Answers
Protein synthesis is a process that occurs in three steps: Transcription, Splicing, and Translation.
Protein synthesis is a process that occurs in three steps: Transcription, Splicing, and Translation.
False
UniProtKB is a central repository of protein sequences and is supported by a collaboration between EBI, Swiss Institute of Bioinformatics, and Protein Information.
UniProtKB is a central repository of protein sequences and is supported by a collaboration between EBI, Swiss Institute of Bioinformatics, and Protein Information.
True
Post-translational modifications refer to the changes that occur to proteins after they have been synthesized, transforming them into mature proteins.
Post-translational modifications refer to the changes that occur to proteins after they have been synthesized, transforming them into mature proteins.
True
Motifs or profiles databases contain exhaustive primary sequences of proteins with no abstractions or patterns.
Motifs or profiles databases contain exhaustive primary sequences of proteins with no abstractions or patterns.
Signup and view all the answers
Generalist databases, such as UniProtKB, only include sequences from very specifically defined sources.
Generalist databases, such as UniProtKB, only include sequences from very specifically defined sources.
Signup and view all the answers
There is only one type of protein sequence database available for researchers.
There is only one type of protein sequence database available for researchers.
Signup and view all the answers
The quality level of annotation in databases like UniProtKB can vary between manual and automatic entry.
The quality level of annotation in databases like UniProtKB can vary between manual and automatic entry.
Signup and view all the answers
The primary sequence of proteins is unrelated to annotations and cross-references in databases.
The primary sequence of proteins is unrelated to annotations and cross-references in databases.
Signup and view all the answers
PAM matrices represent evolutionary information based on distant protein relationships.
PAM matrices represent evolutionary information based on distant protein relationships.
Signup and view all the answers
BLOSUM62 is derived from sequences clustered at 62% identity or greater.
BLOSUM62 is derived from sequences clustered at 62% identity or greater.
Signup and view all the answers
Higher numbers in BLOSUM matrices indicate more evolutionary distance between sequences.
Higher numbers in BLOSUM matrices indicate more evolutionary distance between sequences.
Signup and view all the answers
PAM250 corresponds to a residue identity of 45% between proteins.
PAM250 corresponds to a residue identity of 45% between proteins.
Signup and view all the answers
BLOSUM1 corresponds to 1% identity and evaluates highly diverse protein alignments.
BLOSUM1 corresponds to 1% identity and evaluates highly diverse protein alignments.
Signup and view all the answers
PAM1 corresponds to a residue identity of 99%.
PAM1 corresponds to a residue identity of 99%.
Signup and view all the answers
The BLOSUM matrices are derived from individual sequences without any clustering.
The BLOSUM matrices are derived from individual sequences without any clustering.
Signup and view all the answers
PAM matrices are extrapolated from PAM1 to represent various evolutionary distances.
PAM matrices are extrapolated from PAM1 to represent various evolutionary distances.
Signup and view all the answers
Introducing a gap in sequence alignment results in a negative score penalty.
Introducing a gap in sequence alignment results in a negative score penalty.
Signup and view all the answers
The identity matrix for protein similarity uses a score of 1 for different amino acids.
The identity matrix for protein similarity uses a score of 1 for different amino acids.
Signup and view all the answers
Substitution models evaluate the likelihood of one specific amino acid replacing another during mutation.
Substitution models evaluate the likelihood of one specific amino acid replacing another during mutation.
Signup and view all the answers
1 PAM is defined as the time it takes for 1 out of 100 amino acids to mutate.
1 PAM is defined as the time it takes for 1 out of 100 amino acids to mutate.
Signup and view all the answers
The Dayhoff Mutation Data Matrix is based on inferred evolutionary distances derived from genome sequencing.
The Dayhoff Mutation Data Matrix is based on inferred evolutionary distances derived from genome sequencing.
Signup and view all the answers
The PAM matrix product allows for inference of homology in proteins beyond the twilight zone.
The PAM matrix product allows for inference of homology in proteins beyond the twilight zone.
Signup and view all the answers
Gaps introduced in sequence alignments are beneficial as they eliminate the need for substitution models.
Gaps introduced in sequence alignments are beneficial as they eliminate the need for substitution models.
Signup and view all the answers
Scores in substitution models are based exclusively on the identity of the amino acids involved.
Scores in substitution models are based exclusively on the identity of the amino acids involved.
Signup and view all the answers
PDB format is advantageous because it is rarely supported by the majority of tools.
PDB format is advantageous because it is rarely supported by the majority of tools.
Signup and view all the answers
A significant disadvantage of the PDB format is the absolute limits on the size of certain items of data.
A significant disadvantage of the PDB format is the absolute limits on the size of certain items of data.
Signup and view all the answers
The mmCIF format was developed to simplify the handling of complicated structure data.
The mmCIF format was developed to simplify the handling of complicated structure data.
Signup and view all the answers
One disadvantage of the mmCIF format is that it is easily readable by humans and computers.
One disadvantage of the mmCIF format is that it is easily readable by humans and computers.
Signup and view all the answers
A notable feature of PDB format is its consistency across individual entries.
A notable feature of PDB format is its consistency across individual entries.
Signup and view all the answers
The mmCIF format is more suitable for accessing individual entries compared to the PDB format.
The mmCIF format is more suitable for accessing individual entries compared to the PDB format.
Signup and view all the answers
Hydrogen bonding and active sites are part of the data captured in the PDB format.
Hydrogen bonding and active sites are part of the data captured in the PDB format.
Signup and view all the answers
The maximum number of chains allowed in the PDB format is over 30.
The maximum number of chains allowed in the PDB format is over 30.
Signup and view all the answers
R-factor should always be ≤ 0.4 for reliable models.
R-factor should always be ≤ 0.4 for reliable models.
Signup and view all the answers
DRESS and RECOORD web servers provide improved versions of NMR models.
DRESS and RECOORD web servers provide improved versions of NMR models.
Signup and view all the answers
Local errors in a structure are indicated by residue B-factors < 50.
Local errors in a structure are indicated by residue B-factors < 50.
Signup and view all the answers
Predictions of atomic resolution in NMR structures can be made using the ResProx tool.
Predictions of atomic resolution in NMR structures can be made using the ResProx tool.
Signup and view all the answers
No guidelines exist for selecting NMR structures unlike X-ray structures.
No guidelines exist for selecting NMR structures unlike X-ray structures.
Signup and view all the answers
Quality checks involve only comparisons against high-resolution structures of nucleic acids.
Quality checks involve only comparisons against high-resolution structures of nucleic acids.
Signup and view all the answers
A structure showing a high number of outliers is likely to be problematic.
A structure showing a high number of outliers is likely to be problematic.
Signup and view all the answers
B-factor values are irrelevant for assessing the reliability of a structure.
B-factor values are irrelevant for assessing the reliability of a structure.
Signup and view all the answers
The Ramachandran plot is used to check the stereochemical quality of protein structures by plotting the Ψ versus the Φ main chain torsion angles.
The Ramachandran plot is used to check the stereochemical quality of protein structures by plotting the Ψ versus the Φ main chain torsion angles.
Signup and view all the answers
In a well-defined protein structure, residues are typically dispersed in the 'disallowed' regions of the Ramachandran plot.
In a well-defined protein structure, residues are typically dispersed in the 'disallowed' regions of the Ramachandran plot.
Signup and view all the answers
Bad atom-atom contacts in protein structures are defined as two nonbonded atoms that have a center-to-center distance greater than the sum of their van der Waals radii.
Bad atom-atom contacts in protein structures are defined as two nonbonded atoms that have a center-to-center distance greater than the sum of their van der Waals radii.
Signup and view all the answers
Counts of unsatisfied hydrogen bond donors are a parameter evaluated in validating protein structures.
Counts of unsatisfied hydrogen bond donors are a parameter evaluated in validating protein structures.
Signup and view all the answers
A real space R-factor is used to express how poorly each residue fits its electron density in a protein structure.
A real space R-factor is used to express how poorly each residue fits its electron density in a protein structure.
Signup and view all the answers
Knowledge-based potentials assess how 'happy' each residue is in its local environment according to predefined criteria.
Knowledge-based potentials assess how 'happy' each residue is in its local environment according to predefined criteria.
Signup and view all the answers
The databases EDS and PDBREPORT provide pre-computed quality criteria for every structure in the Protein Data Bank (PDB).
The databases EDS and PDBREPORT provide pre-computed quality criteria for every structure in the Protein Data Bank (PDB).
Signup and view all the answers
Poorly defined protein structures generally show residues clustered tightly in the most favored regions of the Ramachandran plot.
Poorly defined protein structures generally show residues clustered tightly in the most favored regions of the Ramachandran plot.
Signup and view all the answers
Protein synthesis includes four steps: Transcription, Splicing, Translation, and Elimination.
Protein synthesis includes four steps: Transcription, Splicing, Translation, and Elimination.
Signup and view all the answers
Bioinformatics relies on databases that can provide sequences from any source, such as UniProtKB.
Bioinformatics relies on databases that can provide sequences from any source, such as UniProtKB.
Signup and view all the answers
PAM and BLOSUM matrices are interchangeable for evaluating amino acid substitutions across all evolutionary distances.
PAM and BLOSUM matrices are interchangeable for evaluating amino acid substitutions across all evolutionary distances.
Signup and view all the answers
The dynamic programming algorithm used for sequence alignments is optimized for both global and local alignments.
The dynamic programming algorithm used for sequence alignments is optimized for both global and local alignments.
Signup and view all the answers
Post-translational modifications occur before protein synthesis is completed, altering proteins into their mature forms.
Post-translational modifications occur before protein synthesis is completed, altering proteins into their mature forms.
Signup and view all the answers
Transmembrane beta-strand barrels (TMB) typically contain 10 - 30 residues.
Transmembrane beta-strand barrels (TMB) typically contain 10 - 30 residues.
Signup and view all the answers
UniProtKB annotations may vary in quality depending on whether they are created manually or automatically.
UniProtKB annotations may vary in quality depending on whether they are created manually or automatically.
Signup and view all the answers
Word-based methods for sequence alignments guarantee optimal alignments each time they are applied.
Word-based methods for sequence alignments guarantee optimal alignments each time they are applied.
Signup and view all the answers
In the context of multiple sequence alignments, progressive methods begin by aligning the least similar sequences first.
In the context of multiple sequence alignments, progressive methods begin by aligning the least similar sequences first.
Signup and view all the answers
Low-quality B-factor values indicate that residues are likely stable in their local environment.
Low-quality B-factor values indicate that residues are likely stable in their local environment.
Signup and view all the answers
The positive-inside rule indicates that positively charged residues are more prevalent in loop regions outside the membrane.
The positive-inside rule indicates that positively charged residues are more prevalent in loop regions outside the membrane.
Signup and view all the answers
Diagonal transitions in the dynamic programming matrix represent gaps in the sequence alignment.
Diagonal transitions in the dynamic programming matrix represent gaps in the sequence alignment.
Signup and view all the answers
Motif databases derive information solely from full primary sequences without abstract representation.
Motif databases derive information solely from full primary sequences without abstract representation.
Signup and view all the answers
PDB format is a flexible format that allows variable lengths for its entries.
PDB format is a flexible format that allows variable lengths for its entries.
Signup and view all the answers
The Ramachandran plot illustrates the steric arrangement of amino acid residues based on the angles of the main chain torsion.
The Ramachandran plot illustrates the steric arrangement of amino acid residues based on the angles of the main chain torsion.
Signup and view all the answers
Hydrophobicity analysis is particularly useful for predicting transmembrane beta-strand barrels.
Hydrophobicity analysis is particularly useful for predicting transmembrane beta-strand barrels.
Signup and view all the answers
The final alignment in dynamic programming corresponds to the path in the matrix that minimizes the score.
The final alignment in dynamic programming corresponds to the path in the matrix that minimizes the score.
Signup and view all the answers
Methods for solubility and expressability prediction do not rely on machine learning techniques.
Methods for solubility and expressability prediction do not rely on machine learning techniques.
Signup and view all the answers
Back-tracing in sequence alignment starts from the top-left corner of the scoring matrix.
Back-tracing in sequence alignment starts from the top-left corner of the scoring matrix.
Signup and view all the answers
The mmCIF format is specifically designed to complicate the handling of structure data.
The mmCIF format is specifically designed to complicate the handling of structure data.
Signup and view all the answers
Gaps in sequence alignments are always beneficial as they improve alignment scores.
Gaps in sequence alignments are always beneficial as they improve alignment scores.
Signup and view all the answers
The substitution model scores are based solely on the identity of the corresponding amino acids.
The substitution model scores are based solely on the identity of the corresponding amino acids.
Signup and view all the answers
The PDB format is the least supported format for 3D structure data representation.
The PDB format is the least supported format for 3D structure data representation.
Signup and view all the answers
ResProx tool is used to make predictions about atomic resolution in NMR structures.
ResProx tool is used to make predictions about atomic resolution in NMR structures.
Signup and view all the answers
Using an identity matrix, a score of 1 is assigned when two different amino acids are present.
Using an identity matrix, a score of 1 is assigned when two different amino acids are present.
Signup and view all the answers
The Dayhoff Mutation Data Matrix is based on a large sample of observed mutations for estimating evolutionary distances.
The Dayhoff Mutation Data Matrix is based on a large sample of observed mutations for estimating evolutionary distances.
Signup and view all the answers
A gap in sequence alignment is treated as a positive score penalty to encourage shorter alignments.
A gap in sequence alignment is treated as a positive score penalty to encourage shorter alignments.
Signup and view all the answers
Evolutionary distance in PAM is measured as the time for 1 out of 100 amino acids to remain unchanged.
Evolutionary distance in PAM is measured as the time for 1 out of 100 amino acids to remain unchanged.
Signup and view all the answers
The PAM250 matrix represents a scenario where the proteins considered have approximately 45% residue identity.
The PAM250 matrix represents a scenario where the proteins considered have approximately 45% residue identity.
Signup and view all the answers
Substitution models assess the probability of observing mutations without considering evolutionary relations.
Substitution models assess the probability of observing mutations without considering evolutionary relations.
Signup and view all the answers
The introduction of more gaps in sequence alignment can enhance the accuracy of biologically meaningful alignments.
The introduction of more gaps in sequence alignment can enhance the accuracy of biologically meaningful alignments.
Signup and view all the answers
A Markov chain model is utilized to derive the PAM matrix product, which helps infer protein homology.
A Markov chain model is utilized to derive the PAM matrix product, which helps infer protein homology.
Signup and view all the answers
The maximum number of atom records in a PDB file is limited to 99,999.
The maximum number of atom records in a PDB file is limited to 99,999.
Signup and view all the answers
The mmCIF format is rarely supported by visualization and computational tools.
The mmCIF format is rarely supported by visualization and computational tools.
Signup and view all the answers
PDB format is deemed suitable for computer extraction of information due to its consistency.
PDB format is deemed suitable for computer extraction of information due to its consistency.
Signup and view all the answers
Each field of information in the mmCIF format is linked to other fields using a designated syntax.
Each field of information in the mmCIF format is linked to other fields using a designated syntax.
Signup and view all the answers
PDB format allows for a maximum of 30 chains in a single file.
PDB format allows for a maximum of 30 chains in a single file.
Signup and view all the answers
Inconsistencies within a single PDB entry include different residue numbering in the SEQRES and ATOM sections.
Inconsistencies within a single PDB entry include different residue numbering in the SEQRES and ATOM sections.
Signup and view all the answers
The advantages of the PDB format include being difficult to read and use.
The advantages of the PDB format include being difficult to read and use.
Signup and view all the answers
The mmCIF format is suitable for accessing individual entries as it is easily readable.
The mmCIF format is suitable for accessing individual entries as it is easily readable.
Signup and view all the answers
In a Ramachandran plot, residues of a well-defined protein structure are typically dispersed in the 'disallowed' regions.
In a Ramachandran plot, residues of a well-defined protein structure are typically dispersed in the 'disallowed' regions.
Signup and view all the answers
Bad atom-atom contacts are defined as two nonbonded atoms with a center-to-center distance less than the sum of their van der Waals radii.
Bad atom-atom contacts are defined as two nonbonded atoms with a center-to-center distance less than the sum of their van der Waals radii.
Signup and view all the answers
Hydrogen bonding energies are not assessed during the validation of protein structures.
Hydrogen bonding energies are not assessed during the validation of protein structures.
Signup and view all the answers
The Ramachandran plot is only useful in evaluating RNA structures, not protein structures.
The Ramachandran plot is only useful in evaluating RNA structures, not protein structures.
Signup and view all the answers
A high number of unsatisfied hydrogen bond donors in a protein structure is a sign of good structural quality.
A high number of unsatisfied hydrogen bond donors in a protein structure is a sign of good structural quality.
Signup and view all the answers
The real space R-factor is a metric that expresses how well each residue fits its electron density.
The real space R-factor is a metric that expresses how well each residue fits its electron density.
Signup and view all the answers
Knowledge-based potentials evaluate how 'unhappy' each residue is in its local environment, indicating a problematic overall structure.
Knowledge-based potentials evaluate how 'unhappy' each residue is in its local environment, indicating a problematic overall structure.
Signup and view all the answers
All major databases provide pre-computed quality criteria for every structure in the Protein Data Bank (PDB).
All major databases provide pre-computed quality criteria for every structure in the Protein Data Bank (PDB).
Signup and view all the answers
Alternative splicing can result in multiple isoforms of proteins that share identical sequences.
Alternative splicing can result in multiple isoforms of proteins that share identical sequences.
Signup and view all the answers
The evolutionary information can enhance the accuracy of predictions related to protein properties.
The evolutionary information can enhance the accuracy of predictions related to protein properties.
Signup and view all the answers
The process of sequence alignment aims to assess the differences exclusively without considering evolutionary relationships.
The process of sequence alignment aims to assess the differences exclusively without considering evolutionary relationships.
Signup and view all the answers
Darwinian evolution posits that variations that enhance an individual's biological fitness will likely be inherited by future generations.
Darwinian evolution posits that variations that enhance an individual's biological fitness will likely be inherited by future generations.
Signup and view all the answers
The assumption of large inter-individual differences is essential for Darwinian evolutionary theory.
The assumption of large inter-individual differences is essential for Darwinian evolutionary theory.
Signup and view all the answers
Homology can be inferred solely from matching the primary sequences of proteins without any additional information.
Homology can be inferred solely from matching the primary sequences of proteins without any additional information.
Signup and view all the answers
Proteins can exhibit properties such as transmembrane regions solely based on their secondary structure.
Proteins can exhibit properties such as transmembrane regions solely based on their secondary structure.
Signup and view all the answers
Speciation is a direct result of the accumulation of inherited variations over time due to natural selection.
Speciation is a direct result of the accumulation of inherited variations over time due to natural selection.
Signup and view all the answers
Function is solely dictated by sequence without regard for 3D structure.
Function is solely dictated by sequence without regard for 3D structure.
Signup and view all the answers
Selective pressure operates primarily at the sequence level in proteins.
Selective pressure operates primarily at the sequence level in proteins.
Signup and view all the answers
Homologous proteins arise from genes that evolved from a common ancestor.
Homologous proteins arise from genes that evolved from a common ancestor.
Signup and view all the answers
Innovation in proteins occurs exclusively through large-scale genetic changes.
Innovation in proteins occurs exclusively through large-scale genetic changes.
Signup and view all the answers
3D structures of proteins are unaffected by their amino acid sequences.
3D structures of proteins are unaffected by their amino acid sequences.
Signup and view all the answers
Adaptation in proteins leads to improved function in a given environment.
Adaptation in proteins leads to improved function in a given environment.
Signup and view all the answers
Mutations cannot be passed down to subsequent generations.
Mutations cannot be passed down to subsequent generations.
Signup and view all the answers
The sequence-structure-function paradigm emphasizes the relationship between these three aspects in proteins.
The sequence-structure-function paradigm emphasizes the relationship between these three aspects in proteins.
Signup and view all the answers
Protein synthesis involves processes including Transcription, Splicing, and Translation, followed by Post-translational modifications to form mature proteins.
Protein synthesis involves processes including Transcription, Splicing, and Translation, followed by Post-translational modifications to form mature proteins.
Signup and view all the answers
UniProtKB is exclusively a specialist database that focuses solely on sequences from a limited biological pathway.
UniProtKB is exclusively a specialist database that focuses solely on sequences from a limited biological pathway.
Signup and view all the answers
Motifs or profiles databases do not provide abstracted information from primary sequences of proteins.
Motifs or profiles databases do not provide abstracted information from primary sequences of proteins.
Signup and view all the answers
Post-translational modifications occur prior to the synthesis of proteins and are essential for their final functional state.
Post-translational modifications occur prior to the synthesis of proteins and are essential for their final functional state.
Signup and view all the answers
The quality of annotations in databases like UniProtKB is only determined by automatic processes, with no human intervention.
The quality of annotations in databases like UniProtKB is only determined by automatic processes, with no human intervention.
Signup and view all the answers
BLOSUM matrices are used to evaluate evolutionary information based on proteins that share at least 62% identity.
BLOSUM matrices are used to evaluate evolutionary information based on proteins that share at least 62% identity.
Signup and view all the answers
Multiple databases such as WormBase exclusively provide exhaustive primary sequences without any additional annotations.
Multiple databases such as WormBase exclusively provide exhaustive primary sequences without any additional annotations.
Signup and view all the answers
The mmCIF format is specifically designed to restrict access to individual entries, unlike PDB format.
The mmCIF format is specifically designed to restrict access to individual entries, unlike PDB format.
Signup and view all the answers
A pairwise alignment technique is only associated with Global alignments.
A pairwise alignment technique is only associated with Global alignments.
Signup and view all the answers
Local alignments only consider similarity across the entire sequence of proteins.
Local alignments only consider similarity across the entire sequence of proteins.
Signup and view all the answers
Substitution scores in amino-acid alignments are fixed and do not vary.
Substitution scores in amino-acid alignments are fixed and do not vary.
Signup and view all the answers
Homologous proteins are those that share structural, functional, or sequence similarities regardless of their evolutionary background.
Homologous proteins are those that share structural, functional, or sequence similarities regardless of their evolutionary background.
Signup and view all the answers
Iterative methods are the only techniques used for multiple sequence alignments.
Iterative methods are the only techniques used for multiple sequence alignments.
Signup and view all the answers
Gaps in sequence alignments receive a positive score, encouraging their introduction.
Gaps in sequence alignments receive a positive score, encouraging their introduction.
Signup and view all the answers
The purpose of a substitution matrix in sequence alignment is to optimize the total alignment score by pairing amino acids.
The purpose of a substitution matrix in sequence alignment is to optimize the total alignment score by pairing amino acids.
Signup and view all the answers
The concept of homology in proteins is irrelevant to their structure and function.
The concept of homology in proteins is irrelevant to their structure and function.
Signup and view all the answers
The dynamic programming algorithm for sequence alignments allows back-tracing from the top-left corner of the matrix for global alignment.
The dynamic programming algorithm for sequence alignments allows back-tracing from the top-left corner of the matrix for global alignment.
Signup and view all the answers
Progressive methods for multiple sequence alignments first align the most divergent sequences before adding similar ones.
Progressive methods for multiple sequence alignments first align the most divergent sequences before adding similar ones.
Signup and view all the answers
Word methods in sequence alignment guarantee an optimal alignment by matching short non-overlapping sequence stretches.
Word methods in sequence alignment guarantee an optimal alignment by matching short non-overlapping sequence stretches.
Signup and view all the answers
In local alignment, the Smith & Waterman algorithm allows back-tracing from any position in the alignment matrix.
In local alignment, the Smith & Waterman algorithm allows back-tracing from any position in the alignment matrix.
Signup and view all the answers
Dynamic programming algorithms for sequence alignments are known for being computationally efficient at all times.
Dynamic programming algorithms for sequence alignments are known for being computationally efficient at all times.
Signup and view all the answers
The relative positions of matching regions in word methods define an offset, which is the sum of corresponding coordinates.
The relative positions of matching regions in word methods define an offset, which is the sum of corresponding coordinates.
Signup and view all the answers
Substitution models assess the likelihood of residue pairs being aligned based solely on their sequence identity.
Substitution models assess the likelihood of residue pairs being aligned based solely on their sequence identity.
Signup and view all the answers
Access to the PDB archive is available only through paid subscriptions.
Access to the PDB archive is available only through paid subscriptions.
Signup and view all the answers
Systematic errors in model structures contribute to the overall accuracy of the data.
Systematic errors in model structures contribute to the overall accuracy of the data.
Signup and view all the answers
Most structures in the PDB are of high quality, typically containing only systematic errors.
Most structures in the PDB are of high quality, typically containing only systematic errors.
Signup and view all the answers
Completely wrong structures can be caused by misinterpretation of the electron density map.
Completely wrong structures can be caused by misinterpretation of the electron density map.
Signup and view all the answers
All structures in the PDB are guaranteed to be correct and free from any type of error.
All structures in the PDB are guaranteed to be correct and free from any type of error.
Signup and view all the answers
Sequence-based and text-based queries are available through the wwPDB sites.
Sequence-based and text-based queries are available through the wwPDB sites.
Signup and view all the answers
Random errors are less common than systematic errors in structural models.
Random errors are less common than systematic errors in structural models.
Signup and view all the answers
Quality checks on structures require critical assessment before being used for specific purposes.
Quality checks on structures require critical assessment before being used for specific purposes.
Signup and view all the answers
Study Notes
Bioinformatics Protein Sequences and Databases
- Bioinformatics protein sequences and databases are a crucial area of study
- Databases are used to store and manage protein sequences
Structure Prediction
- Artificial intelligence (AI) algorithms (like AlphaFold) can predict protein 3D structures from their amino acid sequences
- Google DeepMind developed AlphaFold
- This is a significant advancement in protein science
Protein Synthesis
- Protein synthesis occurs in two steps:
- Transcription: DNA → RNA
- Translation: mRNA → protein
- Post-translational modifications occur after translation, transforming the protein into a mature form.
Levels of Protein Structure
- Primary structure: sequence of amino acids
- Secondary structure: local folding patterns (alpha-helices, beta-sheets)
- Tertiary structure: overall 3D folding of the polypeptide chain
- Quaternary structure: arrangement of multiple polypeptide chains in a protein complex
Sources of Protein Sequences
- UniProtKB: comprehensive generalist database of protein sequences and annotations. Includes manual annotations
- Collaborates with EBI and the Swiss Institute of Bioinformatics
- Includes biological information, quality level, and links to other databases.
- WormBase: specialist resource focusing on organisms like the worm
- UniProtKB- also includes primary sequence information, and annotations and cross-references.
- PFam: focuses on patterns within proteins; finds the most conserved features among related proteins
UniProt KB
- Contains reviewed protein entries (SwissProt) and automatic entries (TrEMBL)
- High-quality manual annotations provide reliable information
- ~570,000 curated protein records (2024)
- ~250,000,000 automatically translated protein records of lower quality (2024)
- Rich human-readable information about functions, names, taxonomy, subcellular locations, phenotypes, PTMs, expression, interactions, family/domains, sequence, and similar proteins.
UniProt KB - Specific Features
- Human-readable explanations of protein function and information about enzymatic parameters (activity, kinetics)
- Pathways and biological interpretations
- Access to mutations and their effect on protein activity
- Displays available 3D structures, links to AlphaFold predictions and other 3D structure databases
UniProt KB - Access
- Unique accession numbers for protein sequences. -Serial accession numbers are used for variants (P21397-1, P21397-2).
Summary of 1D Predictions
- Primary sequence information can predict various properties of proteins like solvent accessibility, solubility, transmembrane regions, and secondary structure.
- Evolutionary information enhances accuracy of prediction methods.
Introduction to Sequence Alignment
- Alignments model similarities and differences between protein sequences to identify conserved regions
- Evolutionary information inferences (homology) can be drawn from sequence similarity
A Few Words on Evolution
- Species develop through natural selection.
- Variations are small and inheritable contributing to evolutionary fitness.
- Selection pressure favors traits that improve functions allowing better survival and reproduction. This leads to speciation.
- Evolutionary adaptation is a key concept.
A Few Words on Molecular Evolution
- Function is determined by 3D shape.
- Adaptations are influenced by environmental pressure.
- Structure is determined by the sequence of a protein.
Sequence, Structure, Function Paradigm
- 3D structure is determined by amino acid sequence.
- Function is determined by 3D structure of the protein.
- This is a core principle of protein science.
A Few Words on Molecular Evolution(2)
- Evolutionary changes occur at the sequence level as mutations.
- Selective pressures act and support variations that enhance function and adaptations through natural selection
A Few Words on Molecular Evolution (3)
- Homology means two proteins trace back a common ancestor.
- Paralogs are homologous proteins that evolved from the same ancestor gene through duplication.
Sequence Alignments
- Alignments aim to align similar regions of different protein sequences.
- Global alignments = consider similarity of the entire sequence.
- Local alignments = consider similarity in small parts of the sequence.
- Pairwise alignments = compare two sequences
- Multiple sequence alignments = align more than two sequences. -Methods for performing these alignments include: Dynamic programming, Progressive methods, Iterative methods -Algorithms like Needleman-Wunsch, Smith-Waterman -Dot plot methods -Word methods
- Using matrices to score similarity (Identity, substitution models)
- matrices like Dayhoff’s PAM matrix, BLOSSUM matrices can help scoring.
Dynamic Programming Algorithm
- Measures similarity in a pairwise alignment.
- Each dimension corresponds to one of the proteins to be aligned. Each square contains the score based on substitutions.
- Diagonal transitions show aligned positions, vertical and horizontal transitions identify gaps.
- The optimal path in the matrix shows the best match.
Word Methods
- Small sequence-stretches (k-tuples or words) in the query are matched across target sequences.
- The position of matches defines alignment offsets.
Multiple Sequence Alignments (MSA)
- MSA methods like Dynamic programming, Progressive Methods and Iterative methods are used.
- These methods often start by aligning most similar pairs and sequentially add less similar sequences, accounting for sequence length variations.
- Additional information from T-Coffee is helpful in this process
Beyond Pure Sequences:Patterns and Models
- Aligned sequences help define patterns for searches in databases.
- Useful tools include Position-Specific Scoring Matrices and Hidden Markov Models.
Secondary Structure Prediction
- Prediction of the conformational state of each amino acid residue.
- Common states include helix (H), strand (S), and coil (C)
- Software that can predict : PSI-PRED
Solvent Accessibility Prediction
- Predict the extent to which a residue is accessible to solvent.
- Different amino acids and accessibility values vary.
- Simplified description is exposed vs buried residues
Solubility and Expressability Prediction
- Predicting solubility in an expression system or propensity for aggregation.
- Major methods rely heavily on machine-learning.
Transmembrane Region Prediction
- Transmembrane (TM) proteins are predicted.
- TMH(Transmembrane helix): 12–35 residues.
- TMB(Transmembrane beta-strand): 10–25 residues.
- Hydrophobicity is not helpful for TMB prediction. Analysis of charged residues and positive inside rule is used.
Structural Databases
- A variety of databases exist for storage and analysis.
- Examples include; wwPDB
Data Formats
- PDB
- mmCIF
- PDBML
- Contains atom coordinates and associated data.
PDB Format
- Designed in the early 1970s.
- Rigid structure, 80 characters per line.
- Most widely supported.
- Contains atom coordinates, biological features and experimental details.
- Advantages: Ease of reading and use, majority of tools support this format; often easy to access individual records.
- Disadvantage: Inconsistency between records, limited information and data constraints.
mmCIF Format
- Designed to handle complex data; contains each item of info explicitly assigned by a tag, linked by syntax
- Advantages: Computer parsable, database-wide consistency
- Disadvantage: Difficult to read.
Structural Databases (2)
- wwPDB: Extensive database of 3D structures (proteins, nucleic acids, oligosaccharides)
- RCSB PDB, EMDB,PDBe, PDBj, BMRB
- Includes data from X-ray crystallography, NMR, and Electron microscopy.
- Other Databases include PDBsum, SCOP, Protopedia and Structural Biology KnowledgeBase
wwPDB - Data Deposition
- All data to the database is deposited in a central repository
- Common standards are used ensuring consistency and quality of structures.
- PDB-IDs are unique identifiers for each structure.
wwPDB - Data Access
- Free access to the archive's resources
- Sites distribute archives that are updated regularly.
- Different interfaces facilitate varied searches
Structural Quality Assurance
- Structures are models that satisfy experimental data.
- These models can have random and systematic errors.
- Important tests are crucial for identifying and interpreting data.
Errors in Deposited Structures
- Types of errors include: systematic errors (related to accuracy relative to true structure).
- Examples include misinterpretations, tracing errors, spectral interpretations
- Random errors (uncertainties in atomic positions)
- Examples include flip in side-chains, or positions
Examples of Systematic Errors
- Completely wrong structures: tracing the protein chain through electron density yields an entirely incorrect fold.
- Incorrect connectivity between secondary elements: a false order of secondary components, and residues in improper places in the 3D model
- "Frame-shifts": Residues fitted into electron density of the next residue resulting in incorrect structure interpretation
Examples of Random Errors
- Uncertainties in atomic positions ( e.g., 0.01-1.27 Å)
- Side-chain flips: symmetry in some amino acid side-chain shapes can lead to incorrect placements.
Rules of Thumb for Selecting Structures (X-ray)
- High resolution (≤ 2.0Å) and low R-factor (≤ 0.2) indicate higher accuracy.
- Selection depends on the need for the analysis type and if comparing folds, an even higher resolution is required.
- R-factor is a parameter to consider for reliability
- RFREE is a tool that accounts for a fraction of the data for better reliability.
- Residue B-factor (>50) indicates local errors; checks should be used
Rules of Thumb for Selecting Structures (NMR)
- No simple rule of thumb exists due to differences in accuracy.
- Quality of the structure is found in the original paper and quality checks often required.
Quality Checks of Structures
- Testing for normality compares a given protein or structure against common structures..
- Assessing outliers: multiple outliers suggest problems.
- Examples: Ramachandran plot (favorable/disallowed regions), unfavorable atom-atom contacts analysis, potential energy evaluation, real-space R-factor, and various other parameters.
Validation of Protein Structures
- Ramachandran plot (favored/disallowed regions)
- Unfavorable Atom-Atom Contacts
- Parameters like: H-bond donor counts, H-bond energies, real-space R-factor and others
Quality Information on the Web
- Several databases provide pre-calculated quality criteria for all wwPDB structures. Specific databases: EDS, PDBsum, PDBREPORT, and RCSB PDB are discussed in detail.
- The Electron Density Server, PDBsum provide various parameters.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz covers essential topics in bioinformatics, focusing on protein sequences and databases, structure prediction using AI algorithms like AlphaFold, and the protein synthesis process. Test your understanding of protein structure levels, including primary, secondary, tertiary, and quaternary structures.