Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
Document Details
Uploaded by ArticulateBowenite6305
University of Manitoba
Abdullah Zubaer
Tags
Related
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) PDF
- 2023 Fall Zubaer Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- 2023 Fall Computational Molecular Microbiology Notes PDF
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Chapter 6 PCR Identification of Microorganisms PDF
Summary
These lecture notes from the Computational Molecular Microbiology (MBIO 4700) course at the University of Manitoba cover various topics in bioinformatics, including sequence alignment, substitution matrices, and the BLAST algorithm. The notes are provided by Abdullah Zubaer.
Full Transcript
Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Working with sequences • Sequence databases • Sequence comparison • Pairwise alignment • Multiple sequence alignment • Phylogenetic tree 2 How do you compare sequences? ATGTCAGGTCTTA ATGTCA CTGTCAGGTCATA C...
Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Working with sequences • Sequence databases • Sequence comparison • Pairwise alignment • Multiple sequence alignment • Phylogenetic tree 2 How do you compare sequences? ATGTCAGGTCTTA ATGTCA CTGTCAGGTCATA CTGGCAGGTCTTA ATGTCATTA 3 How do you compare sequences? Differences need to consider between two comparing sequences: Deletions/insertions – Gaps Substitutions (transitions/transversions) Recombination 4 Sequence alignment: major components 1. Type of the alignment in consideration 2. Scoring system (e.g., substitution matrix) to rank alignments 3. Algorithm to find the optimal alignment 4. Statistical method to assess the significance of the alignment 5 Alignment types ▪ Pairwise vs Multiple sequence alignment ▪ Local vs Global alignment 6 Local vs Global alignment ▪ Local alignment ▪ Prioritize highest similarity matches ▪ Align substring ▪ Suitable for database search or conserved domain search ▪ Global alignment ▪ Align whole string with end-to-end alignment approach ▪ Good for aligning closely related sequences with similar length 7 Local vs Global alignment https://www.majordifferences.com/2016/05/difference-between-global-and-local.html 8 Practice alignment MEKDIKLNKNKINIFNKYINNKYKLVVPKTRINYEG MAAVQGAISKRRKFVADGVFYAELNEFFQRELAEEG MAAVQGAISKRRKFVADG MQKDTKFLNKSNIFIKNINNKYKLIPFNIKINFVGE MQKDTKFLNKSNIFIKNINNKYKLIPFNIKINFVGE 9 Substitution matrices ▪ Score assigned for every match and mismatch ▪ Assign the score for gaps: ‘gap penalty’ https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm 10 Substitution matrices ▪ DNA substitution ▪ Jukes and Cantor model (JC) ▪ Kimura model (K2P or K3P) ▪ Amino-acid substitution ▪ Dayhoff’s PAM (Accepted Point Mutation) matrix ▪ Henikoffs’s BLOSUM (BLOck SUbstitution Matrix) matrix 11 PAM250 Pevzner J. 2015. Bioinformatic and Functional Genomics 12 BLOSUM62 Pevzner J. 2015. Bioinformatic and Functional Genomics 13 BLOSUM ▪ Based on biological significance – use “observed” substitutions rates ▪ BLOSUM is preferred - as they include the fact that some aminoacids show up in blocks (bias) – regions in related proteins may contain aromatic amino-acids, or acidic or basic, polar and/or nonpolar amino-acids etc. 14 Substitution matrices ▪ Choose a matrix Pevzner J. 2015. Bioinformatic and Functional Genomics 15 Alignment algorithms ▪ Global alignment – Needleman-Wunsch algorithm ▪ Local alignment – Smith-Waterman algorithm Algorithm is used to find the optimal alignment path 16 Global alignment Pevzner J. 2015. Bioinformatic and Functional Genomics 17 Local alignment Pevzner J. 2015. Bioinformatic and Functional Genomics 18 BLAST algorithm ▪ Generates local alignments ▪ Developed for database search ▪ Heuristic ▪ It makes a list of all “neighboring words” of a fixed length and search in database and further extend in both direction ▪ Default word size is 3 (for protein) and 11 (DNA) ▪ Use E-value 19 https://www.ncbi.nlm.nih.gov/books/NBK62051/ 20 http://upload.wikimedia.org/wikipedia/en/5/56/Query_word.jpg 21 Project tips Study other sequences related to your sequence: Comparative analysis of sequences Building data sets (data mining) comprised of related sequences: WHY? Look for features that are conserved. (if segments have not changed over long periods of time they might be of functional significance) 22