Computational Molecular Microbiology (MBIO 4700) PDF
Document Details
Uploaded by ArticulateBowenite6305
University of Manitoba
Abdullah Zubaer
Tags
Related
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) Fall 2023 Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) - Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
Summary
These lecture notes cover computational molecular microbiology and bioinformatics, including topics like Markov and Hidden Markov Models, Monte Carlo simulations, and Bayes' theorem, specifically MBIO 4700 courses at the University of Manitoba.
Full Transcript
Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Concepts of bioinformatics • Computational molecular microbiology and its relation to other disciplines • Data (types and structures) • Database • Algorithm • Markov and Hidden Markov Models • Monte Carlo simu...
Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Concepts of bioinformatics • Computational molecular microbiology and its relation to other disciplines • Data (types and structures) • Database • Algorithm • Markov and Hidden Markov Models • Monte Carlo simulations • Bayesian statistics 2 HMM exercises: GTRAGT --introns AT rich………..AG 3’ 1.0 E END TTAATTAAG TTAATTAAGCCCGGG State path: logP for your HMM chain? 3’ = 3’ splice site – for this example emission probability For 3’ splice site = 0.95 for G (and 0.05 for C) END = 1.0 (like start = 1.0) Generate a more complete “toy” HMM for the gene above: Define the “states”, state path, emission and transition probabilities. Calculate logP - show all your work! (Eddy 2004) 3 HMM and alignment https://en.wikipedia.org/wiki/Multiple_sequence_alignment 4 Profile HMM Transition probability = 1.0 (no gaps) Sequence alignment: Match = Position in Alignment Emission probabilities – define p values for what nucleotide or amino acid exists for each “match” within an alignment: p(M) = 0.983 p(L) = 0.001 the other 18 aa could be the rest (to make up 1.0) http://www.biology.wustl.edu/gcg/hmmanalysis.html 5 HMMER webserver [http://www.ebi.ac.uk/Tools/hmmer] Hidden Markov models (HMM) for detecting sequence similarity in data bases (“fast and sensitive homology searches”) Simon C Potter, Aurélien Luciani, Sean R Eddy, Youngmi Park, Rodrigo Lopez, Robert D Finn; HMMER web server: 2018 update, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W200–W204, https://doi.org/10.1093/nar/gky448 6 HMMER https://hmmer-web-docs.readthedocs.io/en/latest/databases.html 7 HMMER Why HMMER? (MORE sensitive – “deeper levels of similarity can be detected/estimated”) BLAST (convenient and fast) has its limitation – many sequence in GenBank are not properly annotated or have errors. Curated databases tend to have better quality control (but sometimes limited to model systems). Databases may provide more information on your gene/protein of interest. 8 Monte Carlo (MC) algorithm • A type of randomized algorithm. • Can be used in heuristic searches (“trial and error”) • Computer assisted randomization and sampling (or can be reiterative resampling) of data points to be tested against a “model” which is stochastic). ->(protein folding, phylogenies etc.) • Usually in combination with a second element that will remove “data points” that do not fit the model (i.e. low probabilities; free energy) and “store” those that are above a certain cut off (can be set based on experimental/empirical data). • Essential when dealing with “infinite” large data sets. 9 MC simulation (example: proteins) • Move set: The changes applied to the protein structure at each step of the simulation (infinite possibilities). • What to keep or Acceptance criterion: The rule according to which new conformations are accepted or rejected (free energy). • Energy Function: score to allow for ranking (can define “cut off” values) 10 Tree robot “metaphor” – keep ‘tweaking the trees to achieve better scores. YES NO YES Heuristic (trial and error) in approach https://www.wikiwand.com/en/Bayesian_inference_in_phylogeny 11 Trapped* “suboptimal peak” ->ESCAPE ->move to another chain that is making progress Jordan Van Sewell (Winnipeg, ceramic artist) Conditional probability and Bayes’ theorem ▪ When you draw a ball from the urn, the probability that it is a black ball is p, and the probability that it is a white ball is (1-p) ▪ So, what is the probability of drawing 2 black balls and 2 white balls, in any order? ▪ Bayes’ theorem describes the probability of an event, based on some conditions that can be related to the event 13 Bayes’ theorem I N V ERSE P ROBA BI L I TY 14 Bayes’ theorem (example) ▪ For example, the probability of winning a marathon based on how many hours of training per week: P(W|h) = P(h|W) * P(W) / P(h) ▪ Assuming P(W) is 1/1000 (based on the number of runners), P(25 hours) is 1%, P(25 hours|W ) is 20% ▪ What is P(W|25 hours)? 15