Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF

Summary

These lecture notes cover computational molecular microbiology, focusing on topics including working with sequences, performing alignments, and discussing Multiple Sequence Alignment (MSA) methods. The notes also explore applications like building databases and phylogenetic tree generation.

Full Transcript

Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Working with sequences • Sequence databases • Sequence comparison • Pairwise alignment • Multiple sequence alignment • Phylogenetic tree 2 Practice alignment MEKDIKLNKNKINIFNKYINNKYKLVVPKTRINYEG MAAVQGAISK...

Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Working with sequences • Sequence databases • Sequence comparison • Pairwise alignment • Multiple sequence alignment • Phylogenetic tree 2 Practice alignment MEKDIKLNKNKINIFNKYINNKYKLVVPKTRINYEG MAAVQGAISKRRKFVADGVFYAELNEFFQRELAEEG MAAVQGAISKRRKFVADG MQKDTKFLNKSNIFIKNINNKYKLIPFNIKINFVGE MQKDTKFLNKSNIFIKNINNKYKLIPFNIKINFVGE 3 Multiple Sequence Alignment (MSA) ▪ Aligning multiple sequence together ▪ Some positions are more conserved than others, e.g., positionspecific scoring ▪ Guide tree: sequences are not independent, but instead are related by a phylogenetic tree Given the correct phylogenetic tree for the sequences, the probability of a multiple alignment is the product of the probabilities of all the evolutionary events necessary to produce that alignment 4 MSA methods ▪ Exact method ▪ Guided by tree ▪ Guided by sequence ▪ Guided by structure 5 MSA methods ▪ Exact method ▪ Progressive alignment method (ClustalW) ▪ Iterative refinement method (PRALINE, MUSCLE, T-COFFE) ▪ Consistency-based method (MAFFT) ▪ Homology-based method (MAFFT) ▪ Structure-based method (Expresso) 6 Progressive alignment (simplified) Montanola et al. 2016. 7 Progressive alignment in MAFFT https://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html 8 Iterative refinement method Pevzner J. 2015. Bioinformatic and Functional Genomics 9 Homology-based alignment https://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html 10 MSA applications ▪ Building database ▪ Finding conserved domains ▪ Analyzing genomes ▪ Mutation and molecular evolution ▪ Generating phylogenetic tree 11 MSA applications ▪ Database: protein classification database Pfam, SMART, InterPro ▪ Genome analysis (finding conserved region, mobile elements etc.) ▪ Finding conserved domains: Conserved Domain Database (CDD), WebLogo program 12 PRALINE (cyt-P450: N-terminal region – conserved domain) The PRALINE PSI P450 alignment using both PROFsec and DSSP secondary structure integration settings. The alignment has been sectioned to focus on the regions containing the conserved motifs of the cytochrome P450 enzymes (signified by the black bars above the rulers). (A) The oxygen-binding motif. Colour code for “conservation from “blue” to “red”; scale 0 to 9 and *conserved in all sequences; Structure: RED – helix; green – beta-sheet (=stand); clear – coil (no-structure/order) Simossis, V. A., & Heringa, J. (2005). PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic acids research, 33(Web Server issue), W289–W294. https://doi.org/10.1093/nar/gki390 13 PSI-BLAST and PHI-BLAST ▪ Position Specific Iterative BLAST (PSI-BLAST or ψ‐BLAST) is a specialized kind of BLAST search that is used for finding distantly related proteins that match your protein of interest ▪ PSI-BLAST uses MSA strategy to create a profile for iterative search ▪ Pattern-Hit Initiated BLAST (PHI-BLAST) is preferable to search a database with a short query corresponding to a pattern ▪ For example, three amino acids GXW that are almost always present in the lipocalin family (X = any residue) 14 Solve evolution-related problem Pevzner J. 2015. Bioinformatic and Functional Genomics 15 Pevzner J. 2015. Bioinformatic and Functional Genomics 16 Phylogenetic tree building (practice) ▪ Generating phylogenetic tree requires MSA ▪ Accession numbers: NP_000198.1 (human), NP_001008996.1 (chimpanzee), NP_062003.1, (rat), NP_001123565.1 (dog), NP_001172013.1 (mouse), NP_001075804.1 (rabbit), NP_001103242.1 (pig), NP_990553.1 (chicken), NP_001172055.1 (cow), P01318.2 (sheep), XP_003422420.1 (elephant), and P67974.1 (sperm whale) ▪ Collect those sequences, make MSA and generate a tree 17

Use Quizgecko on...
Browser
Browser