Podcast
Questions and Answers
Protein synthesis is a process that occurs in three steps: Transcription, Splicing, and Translation.
Protein synthesis is a process that occurs in three steps: Transcription, Splicing, and Translation.
False (B)
UniProtKB is a central repository of protein sequences and is supported by a collaboration between EBI, Swiss Institute of Bioinformatics, and Protein Information.
UniProtKB is a central repository of protein sequences and is supported by a collaboration between EBI, Swiss Institute of Bioinformatics, and Protein Information.
True (A)
Post-translational modifications refer to the changes that occur to proteins after they have been synthesized, transforming them into mature proteins.
Post-translational modifications refer to the changes that occur to proteins after they have been synthesized, transforming them into mature proteins.
True (A)
Motifs or profiles databases contain exhaustive primary sequences of proteins with no abstractions or patterns.
Motifs or profiles databases contain exhaustive primary sequences of proteins with no abstractions or patterns.
Generalist databases, such as UniProtKB, only include sequences from very specifically defined sources.
Generalist databases, such as UniProtKB, only include sequences from very specifically defined sources.
There is only one type of protein sequence database available for researchers.
There is only one type of protein sequence database available for researchers.
The quality level of annotation in databases like UniProtKB can vary between manual and automatic entry.
The quality level of annotation in databases like UniProtKB can vary between manual and automatic entry.
The primary sequence of proteins is unrelated to annotations and cross-references in databases.
The primary sequence of proteins is unrelated to annotations and cross-references in databases.
PAM matrices represent evolutionary information based on distant protein relationships.
PAM matrices represent evolutionary information based on distant protein relationships.
BLOSUM62 is derived from sequences clustered at 62% identity or greater.
BLOSUM62 is derived from sequences clustered at 62% identity or greater.
Higher numbers in BLOSUM matrices indicate more evolutionary distance between sequences.
Higher numbers in BLOSUM matrices indicate more evolutionary distance between sequences.
PAM250 corresponds to a residue identity of 45% between proteins.
PAM250 corresponds to a residue identity of 45% between proteins.
BLOSUM1 corresponds to 1% identity and evaluates highly diverse protein alignments.
BLOSUM1 corresponds to 1% identity and evaluates highly diverse protein alignments.
PAM1 corresponds to a residue identity of 99%.
PAM1 corresponds to a residue identity of 99%.
The BLOSUM matrices are derived from individual sequences without any clustering.
The BLOSUM matrices are derived from individual sequences without any clustering.
PAM matrices are extrapolated from PAM1 to represent various evolutionary distances.
PAM matrices are extrapolated from PAM1 to represent various evolutionary distances.
Introducing a gap in sequence alignment results in a negative score penalty.
Introducing a gap in sequence alignment results in a negative score penalty.
The identity matrix for protein similarity uses a score of 1 for different amino acids.
The identity matrix for protein similarity uses a score of 1 for different amino acids.
Substitution models evaluate the likelihood of one specific amino acid replacing another during mutation.
Substitution models evaluate the likelihood of one specific amino acid replacing another during mutation.
1 PAM is defined as the time it takes for 1 out of 100 amino acids to mutate.
1 PAM is defined as the time it takes for 1 out of 100 amino acids to mutate.
The Dayhoff Mutation Data Matrix is based on inferred evolutionary distances derived from genome sequencing.
The Dayhoff Mutation Data Matrix is based on inferred evolutionary distances derived from genome sequencing.
The PAM matrix product allows for inference of homology in proteins beyond the twilight zone.
The PAM matrix product allows for inference of homology in proteins beyond the twilight zone.
Gaps introduced in sequence alignments are beneficial as they eliminate the need for substitution models.
Gaps introduced in sequence alignments are beneficial as they eliminate the need for substitution models.
Scores in substitution models are based exclusively on the identity of the amino acids involved.
Scores in substitution models are based exclusively on the identity of the amino acids involved.
PDB format is advantageous because it is rarely supported by the majority of tools.
PDB format is advantageous because it is rarely supported by the majority of tools.
A significant disadvantage of the PDB format is the absolute limits on the size of certain items of data.
A significant disadvantage of the PDB format is the absolute limits on the size of certain items of data.
The mmCIF format was developed to simplify the handling of complicated structure data.
The mmCIF format was developed to simplify the handling of complicated structure data.
One disadvantage of the mmCIF format is that it is easily readable by humans and computers.
One disadvantage of the mmCIF format is that it is easily readable by humans and computers.
A notable feature of PDB format is its consistency across individual entries.
A notable feature of PDB format is its consistency across individual entries.
The mmCIF format is more suitable for accessing individual entries compared to the PDB format.
The mmCIF format is more suitable for accessing individual entries compared to the PDB format.
Hydrogen bonding and active sites are part of the data captured in the PDB format.
Hydrogen bonding and active sites are part of the data captured in the PDB format.
The maximum number of chains allowed in the PDB format is over 30.
The maximum number of chains allowed in the PDB format is over 30.
R-factor should always be ≤ 0.4 for reliable models.
R-factor should always be ≤ 0.4 for reliable models.
DRESS and RECOORD web servers provide improved versions of NMR models.
DRESS and RECOORD web servers provide improved versions of NMR models.
Local errors in a structure are indicated by residue B-factors < 50.
Local errors in a structure are indicated by residue B-factors < 50.
Predictions of atomic resolution in NMR structures can be made using the ResProx tool.
Predictions of atomic resolution in NMR structures can be made using the ResProx tool.
No guidelines exist for selecting NMR structures unlike X-ray structures.
No guidelines exist for selecting NMR structures unlike X-ray structures.
Quality checks involve only comparisons against high-resolution structures of nucleic acids.
Quality checks involve only comparisons against high-resolution structures of nucleic acids.
A structure showing a high number of outliers is likely to be problematic.
A structure showing a high number of outliers is likely to be problematic.
B-factor values are irrelevant for assessing the reliability of a structure.
B-factor values are irrelevant for assessing the reliability of a structure.
The Ramachandran plot is used to check the stereochemical quality of protein structures by plotting the Ψ versus the Φ main chain torsion angles.
The Ramachandran plot is used to check the stereochemical quality of protein structures by plotting the Ψ versus the Φ main chain torsion angles.
In a well-defined protein structure, residues are typically dispersed in the 'disallowed' regions of the Ramachandran plot.
In a well-defined protein structure, residues are typically dispersed in the 'disallowed' regions of the Ramachandran plot.
Bad atom-atom contacts in protein structures are defined as two nonbonded atoms that have a center-to-center distance greater than the sum of their van der Waals radii.
Bad atom-atom contacts in protein structures are defined as two nonbonded atoms that have a center-to-center distance greater than the sum of their van der Waals radii.
Counts of unsatisfied hydrogen bond donors are a parameter evaluated in validating protein structures.
Counts of unsatisfied hydrogen bond donors are a parameter evaluated in validating protein structures.
A real space R-factor is used to express how poorly each residue fits its electron density in a protein structure.
A real space R-factor is used to express how poorly each residue fits its electron density in a protein structure.
Knowledge-based potentials assess how 'happy' each residue is in its local environment according to predefined criteria.
Knowledge-based potentials assess how 'happy' each residue is in its local environment according to predefined criteria.
The databases EDS and PDBREPORT provide pre-computed quality criteria for every structure in the Protein Data Bank (PDB).
The databases EDS and PDBREPORT provide pre-computed quality criteria for every structure in the Protein Data Bank (PDB).
Poorly defined protein structures generally show residues clustered tightly in the most favored regions of the Ramachandran plot.
Poorly defined protein structures generally show residues clustered tightly in the most favored regions of the Ramachandran plot.
Protein synthesis includes four steps: Transcription, Splicing, Translation, and Elimination.
Protein synthesis includes four steps: Transcription, Splicing, Translation, and Elimination.
Bioinformatics relies on databases that can provide sequences from any source, such as UniProtKB.
Bioinformatics relies on databases that can provide sequences from any source, such as UniProtKB.
PAM and BLOSUM matrices are interchangeable for evaluating amino acid substitutions across all evolutionary distances.
PAM and BLOSUM matrices are interchangeable for evaluating amino acid substitutions across all evolutionary distances.
The dynamic programming algorithm used for sequence alignments is optimized for both global and local alignments.
The dynamic programming algorithm used for sequence alignments is optimized for both global and local alignments.
Post-translational modifications occur before protein synthesis is completed, altering proteins into their mature forms.
Post-translational modifications occur before protein synthesis is completed, altering proteins into their mature forms.
Transmembrane beta-strand barrels (TMB) typically contain 10 - 30 residues.
Transmembrane beta-strand barrels (TMB) typically contain 10 - 30 residues.
UniProtKB annotations may vary in quality depending on whether they are created manually or automatically.
UniProtKB annotations may vary in quality depending on whether they are created manually or automatically.
Word-based methods for sequence alignments guarantee optimal alignments each time they are applied.
Word-based methods for sequence alignments guarantee optimal alignments each time they are applied.
In the context of multiple sequence alignments, progressive methods begin by aligning the least similar sequences first.
In the context of multiple sequence alignments, progressive methods begin by aligning the least similar sequences first.
Low-quality B-factor values indicate that residues are likely stable in their local environment.
Low-quality B-factor values indicate that residues are likely stable in their local environment.
The positive-inside rule indicates that positively charged residues are more prevalent in loop regions outside the membrane.
The positive-inside rule indicates that positively charged residues are more prevalent in loop regions outside the membrane.
Diagonal transitions in the dynamic programming matrix represent gaps in the sequence alignment.
Diagonal transitions in the dynamic programming matrix represent gaps in the sequence alignment.
Motif databases derive information solely from full primary sequences without abstract representation.
Motif databases derive information solely from full primary sequences without abstract representation.
PDB format is a flexible format that allows variable lengths for its entries.
PDB format is a flexible format that allows variable lengths for its entries.
The Ramachandran plot illustrates the steric arrangement of amino acid residues based on the angles of the main chain torsion.
The Ramachandran plot illustrates the steric arrangement of amino acid residues based on the angles of the main chain torsion.
Hydrophobicity analysis is particularly useful for predicting transmembrane beta-strand barrels.
Hydrophobicity analysis is particularly useful for predicting transmembrane beta-strand barrels.
The final alignment in dynamic programming corresponds to the path in the matrix that minimizes the score.
The final alignment in dynamic programming corresponds to the path in the matrix that minimizes the score.
Methods for solubility and expressability prediction do not rely on machine learning techniques.
Methods for solubility and expressability prediction do not rely on machine learning techniques.
Back-tracing in sequence alignment starts from the top-left corner of the scoring matrix.
Back-tracing in sequence alignment starts from the top-left corner of the scoring matrix.
The mmCIF format is specifically designed to complicate the handling of structure data.
The mmCIF format is specifically designed to complicate the handling of structure data.
Gaps in sequence alignments are always beneficial as they improve alignment scores.
Gaps in sequence alignments are always beneficial as they improve alignment scores.
The substitution model scores are based solely on the identity of the corresponding amino acids.
The substitution model scores are based solely on the identity of the corresponding amino acids.
The PDB format is the least supported format for 3D structure data representation.
The PDB format is the least supported format for 3D structure data representation.
ResProx tool is used to make predictions about atomic resolution in NMR structures.
ResProx tool is used to make predictions about atomic resolution in NMR structures.
Using an identity matrix, a score of 1 is assigned when two different amino acids are present.
Using an identity matrix, a score of 1 is assigned when two different amino acids are present.
The Dayhoff Mutation Data Matrix is based on a large sample of observed mutations for estimating evolutionary distances.
The Dayhoff Mutation Data Matrix is based on a large sample of observed mutations for estimating evolutionary distances.
A gap in sequence alignment is treated as a positive score penalty to encourage shorter alignments.
A gap in sequence alignment is treated as a positive score penalty to encourage shorter alignments.
Evolutionary distance in PAM is measured as the time for 1 out of 100 amino acids to remain unchanged.
Evolutionary distance in PAM is measured as the time for 1 out of 100 amino acids to remain unchanged.
The PAM250 matrix represents a scenario where the proteins considered have approximately 45% residue identity.
The PAM250 matrix represents a scenario where the proteins considered have approximately 45% residue identity.
Substitution models assess the probability of observing mutations without considering evolutionary relations.
Substitution models assess the probability of observing mutations without considering evolutionary relations.
The introduction of more gaps in sequence alignment can enhance the accuracy of biologically meaningful alignments.
The introduction of more gaps in sequence alignment can enhance the accuracy of biologically meaningful alignments.
A Markov chain model is utilized to derive the PAM matrix product, which helps infer protein homology.
A Markov chain model is utilized to derive the PAM matrix product, which helps infer protein homology.
The maximum number of atom records in a PDB file is limited to 99,999.
The maximum number of atom records in a PDB file is limited to 99,999.
The mmCIF format is rarely supported by visualization and computational tools.
The mmCIF format is rarely supported by visualization and computational tools.
PDB format is deemed suitable for computer extraction of information due to its consistency.
PDB format is deemed suitable for computer extraction of information due to its consistency.
Each field of information in the mmCIF format is linked to other fields using a designated syntax.
Each field of information in the mmCIF format is linked to other fields using a designated syntax.
PDB format allows for a maximum of 30 chains in a single file.
PDB format allows for a maximum of 30 chains in a single file.
Inconsistencies within a single PDB entry include different residue numbering in the SEQRES and ATOM sections.
Inconsistencies within a single PDB entry include different residue numbering in the SEQRES and ATOM sections.
The advantages of the PDB format include being difficult to read and use.
The advantages of the PDB format include being difficult to read and use.
The mmCIF format is suitable for accessing individual entries as it is easily readable.
The mmCIF format is suitable for accessing individual entries as it is easily readable.
In a Ramachandran plot, residues of a well-defined protein structure are typically dispersed in the 'disallowed' regions.
In a Ramachandran plot, residues of a well-defined protein structure are typically dispersed in the 'disallowed' regions.
Bad atom-atom contacts are defined as two nonbonded atoms with a center-to-center distance less than the sum of their van der Waals radii.
Bad atom-atom contacts are defined as two nonbonded atoms with a center-to-center distance less than the sum of their van der Waals radii.
Hydrogen bonding energies are not assessed during the validation of protein structures.
Hydrogen bonding energies are not assessed during the validation of protein structures.
The Ramachandran plot is only useful in evaluating RNA structures, not protein structures.
The Ramachandran plot is only useful in evaluating RNA structures, not protein structures.
A high number of unsatisfied hydrogen bond donors in a protein structure is a sign of good structural quality.
A high number of unsatisfied hydrogen bond donors in a protein structure is a sign of good structural quality.
The real space R-factor is a metric that expresses how well each residue fits its electron density.
The real space R-factor is a metric that expresses how well each residue fits its electron density.
Knowledge-based potentials evaluate how 'unhappy' each residue is in its local environment, indicating a problematic overall structure.
Knowledge-based potentials evaluate how 'unhappy' each residue is in its local environment, indicating a problematic overall structure.
All major databases provide pre-computed quality criteria for every structure in the Protein Data Bank (PDB).
All major databases provide pre-computed quality criteria for every structure in the Protein Data Bank (PDB).
Alternative splicing can result in multiple isoforms of proteins that share identical sequences.
Alternative splicing can result in multiple isoforms of proteins that share identical sequences.
The evolutionary information can enhance the accuracy of predictions related to protein properties.
The evolutionary information can enhance the accuracy of predictions related to protein properties.
The process of sequence alignment aims to assess the differences exclusively without considering evolutionary relationships.
The process of sequence alignment aims to assess the differences exclusively without considering evolutionary relationships.
Darwinian evolution posits that variations that enhance an individual's biological fitness will likely be inherited by future generations.
Darwinian evolution posits that variations that enhance an individual's biological fitness will likely be inherited by future generations.
The assumption of large inter-individual differences is essential for Darwinian evolutionary theory.
The assumption of large inter-individual differences is essential for Darwinian evolutionary theory.
Homology can be inferred solely from matching the primary sequences of proteins without any additional information.
Homology can be inferred solely from matching the primary sequences of proteins without any additional information.
Proteins can exhibit properties such as transmembrane regions solely based on their secondary structure.
Proteins can exhibit properties such as transmembrane regions solely based on their secondary structure.
Speciation is a direct result of the accumulation of inherited variations over time due to natural selection.
Speciation is a direct result of the accumulation of inherited variations over time due to natural selection.
Function is solely dictated by sequence without regard for 3D structure.
Function is solely dictated by sequence without regard for 3D structure.
Selective pressure operates primarily at the sequence level in proteins.
Selective pressure operates primarily at the sequence level in proteins.
Homologous proteins arise from genes that evolved from a common ancestor.
Homologous proteins arise from genes that evolved from a common ancestor.
Innovation in proteins occurs exclusively through large-scale genetic changes.
Innovation in proteins occurs exclusively through large-scale genetic changes.
3D structures of proteins are unaffected by their amino acid sequences.
3D structures of proteins are unaffected by their amino acid sequences.
Adaptation in proteins leads to improved function in a given environment.
Adaptation in proteins leads to improved function in a given environment.
Mutations cannot be passed down to subsequent generations.
Mutations cannot be passed down to subsequent generations.
The sequence-structure-function paradigm emphasizes the relationship between these three aspects in proteins.
The sequence-structure-function paradigm emphasizes the relationship between these three aspects in proteins.
Protein synthesis involves processes including Transcription, Splicing, and Translation, followed by Post-translational modifications to form mature proteins.
Protein synthesis involves processes including Transcription, Splicing, and Translation, followed by Post-translational modifications to form mature proteins.
UniProtKB is exclusively a specialist database that focuses solely on sequences from a limited biological pathway.
UniProtKB is exclusively a specialist database that focuses solely on sequences from a limited biological pathway.
Motifs or profiles databases do not provide abstracted information from primary sequences of proteins.
Motifs or profiles databases do not provide abstracted information from primary sequences of proteins.
Post-translational modifications occur prior to the synthesis of proteins and are essential for their final functional state.
Post-translational modifications occur prior to the synthesis of proteins and are essential for their final functional state.
The quality of annotations in databases like UniProtKB is only determined by automatic processes, with no human intervention.
The quality of annotations in databases like UniProtKB is only determined by automatic processes, with no human intervention.
BLOSUM matrices are used to evaluate evolutionary information based on proteins that share at least 62% identity.
BLOSUM matrices are used to evaluate evolutionary information based on proteins that share at least 62% identity.
Multiple databases such as WormBase exclusively provide exhaustive primary sequences without any additional annotations.
Multiple databases such as WormBase exclusively provide exhaustive primary sequences without any additional annotations.
The mmCIF format is specifically designed to restrict access to individual entries, unlike PDB format.
The mmCIF format is specifically designed to restrict access to individual entries, unlike PDB format.
A pairwise alignment technique is only associated with Global alignments.
A pairwise alignment technique is only associated with Global alignments.
Local alignments only consider similarity across the entire sequence of proteins.
Local alignments only consider similarity across the entire sequence of proteins.
Substitution scores in amino-acid alignments are fixed and do not vary.
Substitution scores in amino-acid alignments are fixed and do not vary.
Homologous proteins are those that share structural, functional, or sequence similarities regardless of their evolutionary background.
Homologous proteins are those that share structural, functional, or sequence similarities regardless of their evolutionary background.
Iterative methods are the only techniques used for multiple sequence alignments.
Iterative methods are the only techniques used for multiple sequence alignments.
Gaps in sequence alignments receive a positive score, encouraging their introduction.
Gaps in sequence alignments receive a positive score, encouraging their introduction.
The purpose of a substitution matrix in sequence alignment is to optimize the total alignment score by pairing amino acids.
The purpose of a substitution matrix in sequence alignment is to optimize the total alignment score by pairing amino acids.
The concept of homology in proteins is irrelevant to their structure and function.
The concept of homology in proteins is irrelevant to their structure and function.
The dynamic programming algorithm for sequence alignments allows back-tracing from the top-left corner of the matrix for global alignment.
The dynamic programming algorithm for sequence alignments allows back-tracing from the top-left corner of the matrix for global alignment.
Progressive methods for multiple sequence alignments first align the most divergent sequences before adding similar ones.
Progressive methods for multiple sequence alignments first align the most divergent sequences before adding similar ones.
Word methods in sequence alignment guarantee an optimal alignment by matching short non-overlapping sequence stretches.
Word methods in sequence alignment guarantee an optimal alignment by matching short non-overlapping sequence stretches.
In local alignment, the Smith & Waterman algorithm allows back-tracing from any position in the alignment matrix.
In local alignment, the Smith & Waterman algorithm allows back-tracing from any position in the alignment matrix.
Dynamic programming algorithms for sequence alignments are known for being computationally efficient at all times.
Dynamic programming algorithms for sequence alignments are known for being computationally efficient at all times.
The relative positions of matching regions in word methods define an offset, which is the sum of corresponding coordinates.
The relative positions of matching regions in word methods define an offset, which is the sum of corresponding coordinates.
Substitution models assess the likelihood of residue pairs being aligned based solely on their sequence identity.
Substitution models assess the likelihood of residue pairs being aligned based solely on their sequence identity.
Access to the PDB archive is available only through paid subscriptions.
Access to the PDB archive is available only through paid subscriptions.
Systematic errors in model structures contribute to the overall accuracy of the data.
Systematic errors in model structures contribute to the overall accuracy of the data.
Most structures in the PDB are of high quality, typically containing only systematic errors.
Most structures in the PDB are of high quality, typically containing only systematic errors.
Completely wrong structures can be caused by misinterpretation of the electron density map.
Completely wrong structures can be caused by misinterpretation of the electron density map.
All structures in the PDB are guaranteed to be correct and free from any type of error.
All structures in the PDB are guaranteed to be correct and free from any type of error.
Sequence-based and text-based queries are available through the wwPDB sites.
Sequence-based and text-based queries are available through the wwPDB sites.
Random errors are less common than systematic errors in structural models.
Random errors are less common than systematic errors in structural models.
Quality checks on structures require critical assessment before being used for specific purposes.
Quality checks on structures require critical assessment before being used for specific purposes.
Flashcards
Protein Synthesis Steps
Protein Synthesis Steps
Protein synthesis involves transcription (DNA to RNA), splicing (RNA to mRNA), translation (mRNA to protein), and post-translational modifications (protein to mature protein).
Protein Sequence Databases
Protein Sequence Databases
Databases that store protein sequences, often with annotations and cross-references to other information. Types include generalist (like UniProtKB) and specialist databases (like WormBase) with different scopes.
UniProtKB
UniProtKB
A central repository of protein sequences and functional information, known for detailed and quality annotations.
Protein Sequence Sources
Protein Sequence Sources
Signup and view all the flashcards
Transcription
Transcription
Signup and view all the flashcards
Translation
Translation
Signup and view all the flashcards
Splicing
Splicing
Signup and view all the flashcards
Protein Structure Levels
Protein Structure Levels
Signup and view all the flashcards
Sequence Alignments
Sequence Alignments
Signup and view all the flashcards
Identity Matrix
Identity Matrix
Signup and view all the flashcards
Substitution Models
Substitution Models
Signup and view all the flashcards
PAM Matrix
PAM Matrix
Signup and view all the flashcards
Point Accepted Mutation (PAM)
Point Accepted Mutation (PAM)
Signup and view all the flashcards
Evolutionary Distance
Evolutionary Distance
Signup and view all the flashcards
Gap Penalty
Gap Penalty
Signup and view all the flashcards
250 PAM matrix
250 PAM matrix
Signup and view all the flashcards
BLOSUM Matrices
BLOSUM Matrices
Signup and view all the flashcards
PAM1
PAM1
Signup and view all the flashcards
BLOSUM62
BLOSUM62
Signup and view all the flashcards
Protein Block
Protein Block
Signup and view all the flashcards
R-factor
R-factor
Signup and view all the flashcards
Rfree
Rfree
Signup and view all the flashcards
Side chain torsional conformers
Side chain torsional conformers
Signup and view all the flashcards
Protein structure quality
Protein structure quality
Signup and view all the flashcards
NMR structures quality
NMR structures quality
Signup and view all the flashcards
Structure Validation
Structure Validation
Signup and view all the flashcards
B-Factors
B-Factors
Signup and view all the flashcards
ResProx/DRESS/RECOORD
ResProx/DRESS/RECOORD
Signup and view all the flashcards
PDB format
PDB format
Signup and view all the flashcards
PDB format advantages
PDB format advantages
Signup and view all the flashcards
PDB format disadvantages
PDB format disadvantages
Signup and view all the flashcards
mmCIF format
mmCIF format
Signup and view all the flashcards
mmCIF format advantages
mmCIF format advantages
Signup and view all the flashcards
mmCIF format disadvantages
mmCIF format disadvantages
Signup and view all the flashcards
Data consistency
Data consistency
Signup and view all the flashcards
Database size limitation
Database size limitation
Signup and view all the flashcards
Ramachandran Plot
Ramachandran Plot
Signup and view all the flashcards
Disallowed Regions
Disallowed Regions
Signup and view all the flashcards
Bad Contacts
Bad Contacts
Signup and view all the flashcards
Hydrogen Bond Energy
Hydrogen Bond Energy
Signup and view all the flashcards
Knowledge-Based Potentials
Knowledge-Based Potentials
Signup and view all the flashcards
EDS
EDS
Signup and view all the flashcards
PDBsum
PDBsum
Signup and view all the flashcards
Real-Space R-Factor
Real-Space R-Factor
Signup and view all the flashcards
Protein Synthesis
Protein Synthesis
Signup and view all the flashcards
Protein Sequence Databases (Specialist)
Protein Sequence Databases (Specialist)
Signup and view all the flashcards
Motifs or Profiles Databases
Motifs or Profiles Databases
Signup and view all the flashcards
Levels of Protein Structure
Levels of Protein Structure
Signup and view all the flashcards
Solubility Property Prediction
Solubility Property Prediction
Signup and view all the flashcards
Transmembrane Region Prediction
Transmembrane Region Prediction
Signup and view all the flashcards
Transmembrane Helix (TMH)
Transmembrane Helix (TMH)
Signup and view all the flashcards
Transmembrane Beta-Strand Barrel (TMB)
Transmembrane Beta-Strand Barrel (TMB)
Signup and view all the flashcards
Positive-Inside Rule
Positive-Inside Rule
Signup and view all the flashcards
Why is data consistency important in structural databases?
Why is data consistency important in structural databases?
Signup and view all the flashcards
What are the limitations of certain database formats for storing large complex molecules?
What are the limitations of certain database formats for storing large complex molecules?
Signup and view all the flashcards
Why are gaps important?
Why are gaps important?
Signup and view all the flashcards
What are the limitations of Identity Matrices?
What are the limitations of Identity Matrices?
Signup and view all the flashcards
What is the purpose of substitution models?
What is the purpose of substitution models?
Signup and view all the flashcards
Dynamic Programming Algorithm
Dynamic Programming Algorithm
Signup and view all the flashcards
Global Alignment
Global Alignment
Signup and view all the flashcards
Local Alignment
Local Alignment
Signup and view all the flashcards
Word Methods for Alignment
Word Methods for Alignment
Signup and view all the flashcards
Heuristic Methods
Heuristic Methods
Signup and view all the flashcards
Progressive Alignment
Progressive Alignment
Signup and view all the flashcards
Why are databases important for structure prediction?
Why are databases important for structure prediction?
Signup and view all the flashcards
What is UniProtKB?
What is UniProtKB?
Signup and view all the flashcards
What are the benefits of using evolutionary information in structure predictions?
What are the benefits of using evolutionary information in structure predictions?
Signup and view all the flashcards
What is the purpose of sequence alignment?
What is the purpose of sequence alignment?
Signup and view all the flashcards
How does Darwinian evolution relate to protein sequence variation?
How does Darwinian evolution relate to protein sequence variation?
Signup and view all the flashcards
What is a PAM matrix?
What is a PAM matrix?
Signup and view all the flashcards
How do BLOSUM matrices differ from PAM matrices?
How do BLOSUM matrices differ from PAM matrices?
Signup and view all the flashcards
What is the purpose of structure validation?
What is the purpose of structure validation?
Signup and view all the flashcards
Protein Function
Protein Function
Signup and view all the flashcards
Protein Structure & Evolution
Protein Structure & Evolution
Signup and view all the flashcards
Homologous Proteins
Homologous Proteins
Signup and view all the flashcards
Sequence/Structure/Function Paradigm
Sequence/Structure/Function Paradigm
Signup and view all the flashcards
Selective Pressure
Selective Pressure
Signup and view all the flashcards
Adaptation
Adaptation
Signup and view all the flashcards
Evolution
Evolution
Signup and view all the flashcards
Diversity
Diversity
Signup and view all the flashcards
What are the two main types of protein sequence databases?
What are the two main types of protein sequence databases?
Signup and view all the flashcards
What's a primary source for protein sequences and annotations?
What's a primary source for protein sequences and annotations?
Signup and view all the flashcards
What are motifs or profiles databases?
What are motifs or profiles databases?
Signup and view all the flashcards
What are the main steps of protein synthesis?
What are the main steps of protein synthesis?
Signup and view all the flashcards
What are the levels of protein structure?
What are the levels of protein structure?
Signup and view all the flashcards
What are substitution models?
What are substitution models?
Signup and view all the flashcards
What is the difference between global and local alignment?
What is the difference between global and local alignment?
Signup and view all the flashcards
Molecular Evolution
Molecular Evolution
Signup and view all the flashcards
Homology
Homology
Signup and view all the flashcards
Annotation Problem
Annotation Problem
Signup and view all the flashcards
Substitution Matrix
Substitution Matrix
Signup and view all the flashcards
Dynamic Programming
Dynamic Programming
Signup and view all the flashcards
What is dynamic programming?
What is dynamic programming?
Signup and view all the flashcards
Word methods
Word methods
Signup and view all the flashcards
What is a substitution model?
What is a substitution model?
Signup and view all the flashcards
wwPDB
wwPDB
Signup and view all the flashcards
PDB Archive Access
PDB Archive Access
Signup and view all the flashcards
Structural Quality Assurance
Structural Quality Assurance
Signup and view all the flashcards
Systematic Errors
Systematic Errors
Signup and view all the flashcards
Random Errors
Random Errors
Signup and view all the flashcards
Misstracing
Misstracing
Signup and view all the flashcards
Frame-shift Errors
Frame-shift Errors
Signup and view all the flashcards
Incorrect Fold
Incorrect Fold
Signup and view all the flashcards
Study Notes
Bioinformatics Protein Sequences and Databases
- Bioinformatics protein sequences and databases are a crucial area of study
- Databases are used to store and manage protein sequences
Structure Prediction
- Artificial intelligence (AI) algorithms (like AlphaFold) can predict protein 3D structures from their amino acid sequences
- Google DeepMind developed AlphaFold
- This is a significant advancement in protein science
Protein Synthesis
- Protein synthesis occurs in two steps:
- Transcription: DNA → RNA
- Translation: mRNA → protein
- Post-translational modifications occur after translation, transforming the protein into a mature form.
Levels of Protein Structure
- Primary structure: sequence of amino acids
- Secondary structure: local folding patterns (alpha-helices, beta-sheets)
- Tertiary structure: overall 3D folding of the polypeptide chain
- Quaternary structure: arrangement of multiple polypeptide chains in a protein complex
Sources of Protein Sequences
- UniProtKB: comprehensive generalist database of protein sequences and annotations. Includes manual annotations
- Collaborates with EBI and the Swiss Institute of Bioinformatics
- Includes biological information, quality level, and links to other databases.
- WormBase: specialist resource focusing on organisms like the worm
- UniProtKB- also includes primary sequence information, and annotations and cross-references.
- PFam: focuses on patterns within proteins; finds the most conserved features among related proteins
UniProt KB
- Contains reviewed protein entries (SwissProt) and automatic entries (TrEMBL)
- High-quality manual annotations provide reliable information
- ~570,000 curated protein records (2024)
- ~250,000,000 automatically translated protein records of lower quality (2024)
- Rich human-readable information about functions, names, taxonomy, subcellular locations, phenotypes, PTMs, expression, interactions, family/domains, sequence, and similar proteins.
UniProt KB - Specific Features
- Human-readable explanations of protein function and information about enzymatic parameters (activity, kinetics)
- Pathways and biological interpretations
- Access to mutations and their effect on protein activity
- Displays available 3D structures, links to AlphaFold predictions and other 3D structure databases
UniProt KB - Access
- Unique accession numbers for protein sequences. -Serial accession numbers are used for variants (P21397-1, P21397-2).
Summary of 1D Predictions
- Primary sequence information can predict various properties of proteins like solvent accessibility, solubility, transmembrane regions, and secondary structure.
- Evolutionary information enhances accuracy of prediction methods.
Introduction to Sequence Alignment
- Alignments model similarities and differences between protein sequences to identify conserved regions
- Evolutionary information inferences (homology) can be drawn from sequence similarity
A Few Words on Evolution
- Species develop through natural selection.
- Variations are small and inheritable contributing to evolutionary fitness.
- Selection pressure favors traits that improve functions allowing better survival and reproduction. This leads to speciation.
- Evolutionary adaptation is a key concept.
A Few Words on Molecular Evolution
- Function is determined by 3D shape.
- Adaptations are influenced by environmental pressure.
- Structure is determined by the sequence of a protein.
Sequence, Structure, Function Paradigm
- 3D structure is determined by amino acid sequence.
- Function is determined by 3D structure of the protein.
- This is a core principle of protein science.
A Few Words on Molecular Evolution(2)
- Evolutionary changes occur at the sequence level as mutations.
- Selective pressures act and support variations that enhance function and adaptations through natural selection
A Few Words on Molecular Evolution (3)
- Homology means two proteins trace back a common ancestor.
- Paralogs are homologous proteins that evolved from the same ancestor gene through duplication.
Sequence Alignments
- Alignments aim to align similar regions of different protein sequences.
- Global alignments = consider similarity of the entire sequence.
- Local alignments = consider similarity in small parts of the sequence.
- Pairwise alignments = compare two sequences
- Multiple sequence alignments = align more than two sequences. -Methods for performing these alignments include: Dynamic programming, Progressive methods, Iterative methods -Algorithms like Needleman-Wunsch, Smith-Waterman -Dot plot methods -Word methods
- Using matrices to score similarity (Identity, substitution models)
- matrices like Dayhoff’s PAM matrix, BLOSSUM matrices can help scoring.
Dynamic Programming Algorithm
- Measures similarity in a pairwise alignment.
- Each dimension corresponds to one of the proteins to be aligned. Each square contains the score based on substitutions.
- Diagonal transitions show aligned positions, vertical and horizontal transitions identify gaps.
- The optimal path in the matrix shows the best match.
Word Methods
- Small sequence-stretches (k-tuples or words) in the query are matched across target sequences.
- The position of matches defines alignment offsets.
Multiple Sequence Alignments (MSA)
- MSA methods like Dynamic programming, Progressive Methods and Iterative methods are used.
- These methods often start by aligning most similar pairs and sequentially add less similar sequences, accounting for sequence length variations.
- Additional information from T-Coffee is helpful in this process
Beyond Pure Sequences:Patterns and Models
- Aligned sequences help define patterns for searches in databases.
- Useful tools include Position-Specific Scoring Matrices and Hidden Markov Models.
Secondary Structure Prediction
- Prediction of the conformational state of each amino acid residue.
- Common states include helix (H), strand (S), and coil (C)
- Software that can predict : PSI-PRED
Solvent Accessibility Prediction
- Predict the extent to which a residue is accessible to solvent.
- Different amino acids and accessibility values vary.
- Simplified description is exposed vs buried residues
Solubility and Expressability Prediction
- Predicting solubility in an expression system or propensity for aggregation.
- Major methods rely heavily on machine-learning.
Transmembrane Region Prediction
- Transmembrane (TM) proteins are predicted.
- TMH(Transmembrane helix): 12–35 residues.
- TMB(Transmembrane beta-strand): 10–25 residues.
- Hydrophobicity is not helpful for TMB prediction. Analysis of charged residues and positive inside rule is used.
Structural Databases
- A variety of databases exist for storage and analysis.
- Examples include; wwPDB
Data Formats
- PDB
- mmCIF
- PDBML
- Contains atom coordinates and associated data.
PDB Format
- Designed in the early 1970s.
- Rigid structure, 80 characters per line.
- Most widely supported.
- Contains atom coordinates, biological features and experimental details.
- Advantages: Ease of reading and use, majority of tools support this format; often easy to access individual records.
- Disadvantage: Inconsistency between records, limited information and data constraints.
mmCIF Format
- Designed to handle complex data; contains each item of info explicitly assigned by a tag, linked by syntax
- Advantages: Computer parsable, database-wide consistency
- Disadvantage: Difficult to read.
Structural Databases (2)
- wwPDB: Extensive database of 3D structures (proteins, nucleic acids, oligosaccharides)
- RCSB PDB, EMDB,PDBe, PDBj, BMRB
- Includes data from X-ray crystallography, NMR, and Electron microscopy.
- Other Databases include PDBsum, SCOP, Protopedia and Structural Biology KnowledgeBase
wwPDB - Data Deposition
- All data to the database is deposited in a central repository
- Common standards are used ensuring consistency and quality of structures.
- PDB-IDs are unique identifiers for each structure.
wwPDB - Data Access
- Free access to the archive's resources
- Sites distribute archives that are updated regularly.
- Different interfaces facilitate varied searches
Structural Quality Assurance
- Structures are models that satisfy experimental data.
- These models can have random and systematic errors.
- Important tests are crucial for identifying and interpreting data.
Errors in Deposited Structures
- Types of errors include: systematic errors (related to accuracy relative to true structure).
- Examples include misinterpretations, tracing errors, spectral interpretations
- Random errors (uncertainties in atomic positions)
- Examples include flip in side-chains, or positions
Examples of Systematic Errors
- Completely wrong structures: tracing the protein chain through electron density yields an entirely incorrect fold.
- Incorrect connectivity between secondary elements: a false order of secondary components, and residues in improper places in the 3D model
- "Frame-shifts": Residues fitted into electron density of the next residue resulting in incorrect structure interpretation
Examples of Random Errors
- Uncertainties in atomic positions ( e.g., 0.01-1.27 Ã…)
- Side-chain flips: symmetry in some amino acid side-chain shapes can lead to incorrect placements.
Rules of Thumb for Selecting Structures (X-ray)
- High resolution (≤ 2.0Å) and low R-factor (≤ 0.2) indicate higher accuracy.
- Selection depends on the need for the analysis type and if comparing folds, an even higher resolution is required.
- R-factor is a parameter to consider for reliability
- RFREE is a tool that accounts for a fraction of the data for better reliability.
- Residue B-factor (>50) indicates local errors; checks should be used
Rules of Thumb for Selecting Structures (NMR)
- No simple rule of thumb exists due to differences in accuracy.
- Quality of the structure is found in the original paper and quality checks often required.
Quality Checks of Structures
- Testing for normality compares a given protein or structure against common structures..
- Assessing outliers: multiple outliers suggest problems.
- Examples: Ramachandran plot (favorable/disallowed regions), unfavorable atom-atom contacts analysis, potential energy evaluation, real-space R-factor, and various other parameters.
Validation of Protein Structures
- Ramachandran plot (favored/disallowed regions)
- Unfavorable Atom-Atom Contacts
- Parameters like: H-bond donor counts, H-bond energies, real-space R-factor and others
Quality Information on the Web
- Several databases provide pre-calculated quality criteria for all wwPDB structures. Specific databases: EDS, PDBsum, PDBREPORT, and RCSB PDB are discussed in detail.
- The Electron Density Server, PDBsum provide various parameters.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.