Computational Molecular Microbiology (MBIO 4700) Fall 2023 Lecture Notes PDF
Document Details
Uploaded by ArticulateBowenite6305
University of Manitoba
Abdullah Zubaer
Tags
Related
- Computational Molecular Microbiology (MBIO 4700) - Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Computational Protein Structure Prediction PDF
- Computational Protein Structure Prediction PDF
- Protein Structure & Function (Cellular & Molecular Biology MD105) F2024 PDF
Summary
These lecture notes on computational molecular microbiology cover various aspects of proteins, including their structure, databases, and analysis tools. They were delivered by Abdullah Zubaer at the University of Manitoba, in the Fall 2023 semester, and focus on important concepts in bioinformatics and molecular biology.
Full Transcript
Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Proteins Protein databases ▪ UniProt ▪ NCBI Protein ▪ GenBank ▪ InterPro ▪ Pfam ▪ PDB Sequence/alignment editor: AliView Alignment tools (MAFFT or MUSCLE) and alignment editing tools. File conversion fe...
Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Proteins Protein databases ▪ UniProt ▪ NCBI Protein ▪ GenBank ▪ InterPro ▪ Pfam ▪ PDB Sequence/alignment editor: AliView Alignment tools (MAFFT or MUSCLE) and alignment editing tools. File conversion feature -fasta -Nexus etc. http://www.jalview.org/ Multiple sequence alignment editor Jalview “Jalview is a free program for multiple sequence alignment editing, visualization and analysis. Use it to view and edit sequence alignments, analyze them with phylogenetic trees and principal components analysis (PCA) plots and explore molecular structures and annotation.” For information: Jalview and JABAWS Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ (2009) Jalview Version 2-a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189-1191. doi:10.1093/bioinformatics/btp033 Troshin PV, Procter JB, Barton GJ (2011) Java bioinformatics analysis web services for multiple sequence alignment--JABAWS:MSA. Bioinformatics 27: 20012002. doi:10.1093/bioinformatics/btr304 http://www.compbio.dundee.ac.uk/jabaws/ http://www.compbio.dundee.ac.uk/software.html Protein structure: Polypeptide backbone: Primary Secondary Tertiary Quaternary N- terminus and C-terminus (side chains give the a.a. their biochemical/biophysical properties) http://webhost.bridgew.edu/fgorga/proteins/peptides.htm Pevsner 2015 Or “disordered” regions http://www.rothamsted.ac.uk/notebook/courses/guide/prot.htm#Str https://comis.med.uvm.edu/VIC/coursefiles/MD540/MD540Protein_Organization_10400_574581210/Protein-org/Protein_Organization_print.html Hydrophobic interactions Peter Jehl, Jean Manguy, Denis C. Shields, Desmond G. Higgins, Norman E. Davey, ProViz—a web-based visualization tool to investigate the functional and evolutionary features of protein sequences, Nucleic Acids Research, Volume 44, Issue W1, 8 July 2016, Pages W11– W15, https://doi.org/10.1093/nar/gkw265 Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2 https://alphafold.ebi.ac.uk/ Pevsner 2015 Protein structure prediction BASIC information: 1. Hydrophobicity can identify membrane spanning proteins such as transporters or signal transducers “hydrophobic pockets” that need to be “buried” inside a protein --> ExPASY server (ProtScale) https://web.expasy.org/protscale/ Expert Protein Analysis System http://blanco.biomol.uci.edu/hydrophobicity_scales.html For experimentally determined hydrophobicities for each amino acid R group Rps3 (Ophiostoma - mtDNA) LENNIKLENSCCALNKNKSNIFNKYINNKYKLVPFKTLVNYVNEPRYIPS EFKEWNNSIYYFNFNNIKNLPVYDINLNKLLKSYFDLYFISKNKNNKFIS IIKKKQRYSLNKIFISKADLKHTSSKIIITIYIFNRERIILIKNLIFLYS LHFKTKSYLEKNKNLFFFESLKKKLNNKYEIFNKLKLNFNLNNLKFKDIM LYKLSKLLSKFYNKKVEFNIINLNSYKYNSDILTDIFKKKVVNPNSKLIK IMKFIGKKSLRASIGKTGDNYMDKTRISKSINYDLIPNKYKNLNISLIIE NINFNETIKNIYNISNDTNENIIYNSIKYKLVVGVRLAIKGRLTKRYRAD RSKLYSKTVGNLQNIDSSFKGLSSKLYRNKLNSNMQYTLDVYKRHVGAYA VKGWISGR ProtScale - computes and represents (in the form of a two-dimensional plot) the profile produced by any amino acid scale on a selected protein. -based on different chemical and physical properties of the amino acids “hydrophobicity scales, derived from experimental studies on partitioning of peptides in apolar and polar solvents, with the goal of predicting membrane-spanning segments that are highly hydrophobic, and secondary structure conformational parameter scales.”* *Protein Identification and Analysis Tools on the ExPASy Server; Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A.; (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press (2005). Rps3 (ribosomal protein SSU #3) Hydrophobicity scores between 4.6 and -4.6. A score of 4.6 is the most hydrophobic and a score of -4.6 is the most hydrophilic Hydrophobic regions – internal component of the folded protein NOTE: http://ca.expasy.org/tools/protparam.html A good start to get basic information about your protein such as hydrophobicity and other physiochemical properties. 2. Protein motifs (short “secondary feature/fold”) - usually short conserved a.a. sequence associated with a specific function (i.e. ATP binding sites) PROSITE - a library of known motifs + function http://ca.expasy.org/prosite/ (many tools) https://prosite.expasy.org/ (Prosite) (MotifScan https://myhits.sib.swiss/cgi-bin/motif_scan) 3. Protein domains - a region of a protein that folds into a semi-autonomous structure (tertiary structure) (domain can have one or more motifs) Pfam data bases (Protein families - EMBL) http://pfam.xfam.org/ S. El-Gebali et al. 2019. Nucleic Acids Research doi: 10.1093/nar/gky995 Many mammalian proteins: “Mosaic proteins” domain shuffling (evolution of complex proteins) Motif Scan: https://myhits.sib.swiss/cgi-bin/motif_scan VAR1 Lys-rich RNA binding domains UniProt/InterPro “UniProt (The UniProt Consortium, 2019) based on InterPro domains. InterPro classifies proteins into families and predicts the presence of functional domain(s).” Blum M, Chang H, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, RichardsonL, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A and Finn RD. The InterPro protein families and domains database: 20 years on. Nucleic Acids Research, 2020, (doi: 10.1093/nar/gkaa977) https://www.ebi.ac.uk/interpro/ https://www.ebi.ac.uk/interpro/ Secondary Structure predictions Helix, beta strands, coils, order/disorder SWISS MODEL WORKSPACE (also PRALINE or link below A Protein Secondary Structure Prediction Server http://www.compbio.dundee.ac.uk/jpred/ Explains output: http://www.jalview.org/help/html/webServices/jnet.html) mtDNA rps3 (from Ophiostoma novo-ulmi subsp. americana) LENNIKLENSCCALNKNKSNIFNKYINNKYKLVPFKTLVNYVNEPRYIPS EFKEWNNSIYYFNFNNIKNLPVYDINLNKLLKSYFDLYFISKNKNNKFIS IIKKKQRYSLNKIFISKADLKHTSSKIIITIYIFNRERIILIKNLIFLYS LHFKTKSYLEKNKNLFFFESLKKKLNNKYEIFNKLKLNFNLNNLKFKDIM LYKLSKLLSKFYNKKVEFNIINLNSYKYNSDILTDIFKKKVVNPNSKLIK IMKFIGKKSLRASIGKTGDNYMDKTRISKSINYDLIPNKYKNLNISLIIE NINFNETIKNIYNISNDTNENIIYNSIKYKLVVGVRLAIKGRLTKRYRAD RSKLYSKTVGNLQNIDSSFKGLSSKLYRNKLNSNMQYTLDVYKRHVGAYA VKGWISGR Pfam: Family: VAR1 (PF05316) Mitochondrial ribosomal protein (VAR1) VAR1 along with 15S rRNA are necessary for the formation of mature 37S subunits A Protein Secondary Structure Prediction Server http://www.compbio.dundee.ac.uk/jpred/ A Protein Secondary Structure Prediction Server http://www.compbio.dundee.ac.uk/jpred/ rps3 https://www.rcsb.org The Protein Data Bank (PDB) is a database: three-dimensional structural data RED = alpha helix Green = beta sheets http://www.compbio.dundee.ac.uk/jpred4/results/jp_gHLireB/jp_gHLireB.results.html Protein data base: uS3m - rps3 (yeast) https://www.rcsb.org The Protein Data Bank (PDB) is a database: three-dimensional structural data -> PDB ID - ~ accession number Tertiary and higher orders of structure Protein stability and folding native state (should be thermodynamically the most favorable state) 2nd law of thermodynamics at constant temp. and pressure H = enthalpy (“heat content”/relative comfort) G = H - TS S = entropy (“freedom/disorder”) T = absolute temperature G --> minimum Gibbs free energy value (most stable) negative delta G Considerations: 1. Main chain conformation limited by the allowed conformational angles of the polypeptide backbone (also two atoms cannot occupy the same space) 2. Amino acid side chains - offer more physiochemical versatility to generate a variety of different folding patterns 20 amino acids (R groups): - size variation (glycine - H atom, phenylalanine - benzene ring - charge - Asp and Glu +ve, Lys and Arg -ve; (“salt bridges”) - polarity - polar R groups can H bond to other polar groups, or main chain, or polar solvents; nonpolar R groups (hydrophobic) interact with each other and/or like to be “hidden” from polar solvents (water) --> hydrophobicity - shape and rigidity of R group Folds confined by: 1. All residues must be in stereochemically allowed conformations (main chain and side chains) 2. Buried polar atoms must be H-bonded to other polar buried polar atoms 3. Hydrophobic surfaces must be buried (as much as possible), and the interior densely packed (exclude water molecules) Conformation of a protein can change when bound to ligands etc. Reality: Structure prediction need to produce crystals of the pure protein for X-ray crystallography (but maybe not the native state) (also NMR based approaches and cryoEM) Methods for protein structure predictions include: 1. Homology modeling - find related protein(s) for which 3D structure(s) is/are known (a template) --> may provide 3D model 2. Fold recognition - scan sequence for a.a. patterns known from previous work (compiled in a database) to be associated with a particular fold ---> nominates potential structures 3. “attempts” to predict secondary structures - locates potential helical regions and beta strands 4. Prediction of novel folds (i.e. de novo AB initio) Ultimately NEED experimental data. NCBI resources https://www.ncbi.nlm.nih.gov/Structure/index.shtml Expert Protein Analysis System Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modeling. Bioinformatics, 22,195-201. Expert Protein Analysis System SWISS-MODEL WORKSPACE Tools: Tools for searching the SWISS-MODEL template library for suitable template structures (if available). Sequence Feature Scan: Secondary Structure Prediction and Domain Assignment https://swissmodel.expasy.org/ HEG (L.tranc.rps3/HE) MQKDTKFLNKSNIFIKNINNKYKLIPFNIKINFVGENKYFPSDFKEWTNNIYYFNSNYIKNFP VYDLNLNKLLKGYFDLYFNRESIQSKFKFFRKRRLSLNKIFVSKPEIKHTSSKTTITVYVYNRERI VLLKKLIKLRKSLFKINNFFYKCKSISGDLYGKYFINVLYKELVYIRRCKLKLNLNELKFKDQFLH KLSLLISKLYNKKVEFNIINLKSIVYNSSIFTEIMGKKLRNKNTSLLKTMKFILSKGIILEENNKKE RSRLIKSVNFSLLENKYKNLNINSFVKDIDLNETIKDLYNIESKDNKDIVFDSIKYKNIGGIRLEA KGRLTKRYRADRALFKVNWKGGLKNTDSSYKGLSSVNFRGNLKSNVEYSMGISKRRIGAFA VKGWISGK Fusion protein OR SYSTLANFPVQARNDNISPWTITGFADAESSFMLTVSKDSKRNTGWSVRPRFRIGLHNKDV TILKSIREYLGAGIITSDKDARIRFESLKELEVVINHFDKYPLITQKRADYLLFKKAFYLIKNKEHL TEEGLNQILTLKASLNLGLSEELKEAFPNTIPAEKLLVTGQEIPDSNWVAGFTAGEGSFYIRIA KNSTLKTGYQVQSVFQITQDTRDIELMKNLISYLNCGNIRIRKYKGSEGIHDTCVDLVVTNLN DIKEKIIPFFNKNHIIGVKLQDYRDWCKVVTLIDNKEHLTSEGLEKIQKIKEGMNRGRSL OR: ribosomal protein 3/homing endonuclease-like protein fusion (mitochondrion) [Ophiostoma novo-ulmi subsp. americana] >rnl-encoded LAGLIDADG type protein 421 sinpwiltgf adaegsfllr irnnnkssvg ystelgfqit lhnkdksile niqstwkvgv 481 iansgdnavs lkvtrfedlk viidhfekyp litqklgdym lfkqafcvme nkehlkingi 541 kelvrikakl nwgltdelkk afpeiisker slinknipnf kwlagftsge gcffvnliks 601 ksklgvqvql vfsitqhikd knlmnslity lgcgyikekn ksefswldfv vtkfsdindk 661 iipvfqentl igvkledfed wckvakliee kkhltesgld eikkiklnmn kgrvf GenBank: ACCESSION AAY59060 I-OnuI HEG Expert Protein Analysis System Swiss-Model: Protein Structure ExPASy Proteomics Server https://ca.expasy.org/ https://www.expasy.org/proteomics https://swissmodel.expasy.org/ (start modelling) Tools (many) → https://ca.expasy.org/tools/ https://swissmodel.expasy.org/interactive (work space) https://www.proteinmodelportal.org/?pid=documentation#modelquality https://www.proteinmodelportal.org/?pid=101 https://swissmodel.expasy.org/ Can add your own data – templates! Rps 3 amino acid sequence Rps3 (mtDNA version) Can go with default – program will search for templates OR can “input” templates (maybe have unpublished data etc.) Ramachandran plots