Protein Sequence Analysis and Metagenomics

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which database is known for high-quality annotation of some sequences?

MGnify
Genbank
Swissprot (correct)
DDBJ

The MGnify database at EBI contains only perfect quality amplicons.

False (B)

What scoring scheme is simplest for scoring amino acid residues?

Score 1 for identical amino acids and 0 for different ones.

Genbank translates initial DNA deposition into ___ sequences.

Genpept Signup and view all the answers

Match the following terms with their definitions or descriptions:

Paralogues = Proteins resulting from gene duplication Orthologues = Proteins that arise due to speciation MGnify = Database with amplicons from metagenomes UniProtKB = Database with high-quality annotations from Swissprot Signup and view all the answers

What is the primary purpose of pairwise protein sequence alignment?

To quantify similarity between species (A) Signup and view all the answers

Errors in protein sequence databases are corrected quickly.

False (B) Signup and view all the answers

Who pioneered the work in metagenomics?

Craig Ventor Signup and view all the answers

Which algorithm is known for finding mathematically optimal solutions but is too slow for general searching?

Smith-Waterman (C) Signup and view all the answers

BLAST is significantly slower than the Smith-Waterman algorithm.

False (B) Signup and view all the answers

What is the purpose of the E-value in database searching?

The E-value estimates the number of false positives found in the search. Signup and view all the answers

The BLAST algorithm uses the ____ score to find short segments or seeds in the query.

BLOSUM62 Signup and view all the answers

Match the following terms with their definitions:

P-value = Probability of achieving a score by chance E-value = Expected number of false positives HSP = High scoring pairs from alignments FATSA = A popular but now not widely used search method Signup and view all the answers

For which type of sequences is BLAST typically used?

DNA/DNA and Protein/6 frame DNA translations (D) Signup and view all the answers

Short matches of less than 20 residues can be confidently suggested as true homology.

False (B) Signup and view all the answers

What does the term HSP stand for in the context of the BLAST algorithm?

High Scoring Pair Signup and view all the answers

What method does the Smith-Waterman algorithm primarily use for sequence alignment?

Comparison of segments of all possible lengths (A) Signup and view all the answers

The BLOSUM62 matrix is used to assign numerical values for every cell in the alignment array.

True (A) Signup and view all the answers

What is the primary objective of the Needleman and Wunsch method introduced in 1970?

To produce the highest possible alignment score for two sequences. Signup and view all the answers

The __________ penalty is introduced when constructing the alignment, affecting the best scoring path.

gap Signup and view all the answers

Match the following sequence alignment methods with their primary characteristics:

Smith-Waterman = Local alignments using segments Needleman-Wunsch = Global alignment using entire sequences Dynamic Programming = Rigorous alignment producing highest score BLOSUM62 = Matrix for assigning similarity scores Signup and view all the answers

Where will the maximum match, or highest alignment score, always be found in the matrix?

Somewhere in the outer row or column (D) Signup and view all the answers

What does conservative substitution in proteins refer to?

The maintenance of the chemical property of residues (B) Signup and view all the answers

PAM250 was created to model sequences with 50% identity.

False (B) Signup and view all the answers

What is the main purpose of the Needleman-Wunsch Algorithm?

To find the best global alignment of two sequences. Signup and view all the answers

The gap penalty formula is given by Penalty = o + e × l, where o is the gap opening constant and e is the gap __________ constant.

extension Signup and view all the answers

Which of the following is true about BLOSUM62?

It is the most widely used scoring matrix. (A) Signup and view all the answers

What is the significance of gaps in sequence alignments?

Gaps represent insertions or deletions in sequences. Signup and view all the answers

Flashcards

GenBank

A primary DNA sequence database maintained by NCBI (National Center for Biotechnology Information).

ENA (EMBL)

Another primary DNA sequence database, maintained by EMBL (European Molecular Biology Laboratory).

DDBJ

The Japanese equivalent of GenBank and ENA, also a primary DNA sequence database maintained by DDBJ (DNA Data Bank of Japan).