Computational Methods Used in Prediction of Protein Structure PDF
Document Details
Uploaded by FavoredMorningGlory
Maulana Abul Kalam Azad University of Technology
Poulami Majumder
Tags
Summary
This chapter provides an overview of computational methods used in predicting protein structure. It covers various approaches like homology modeling, protein threading, and ab initio methods. The chapter explains the four types of protein structure: primary, secondary, tertiary, and quaternary, and how computational methods are used to understand them.
Full Transcript
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/335790495 Computational Methods Used in Prediction of Protein Structure Chapter · September 2019 DOI: 10.1007/978-981-15-2445-5_8 CITATIONS...
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/335790495 Computational Methods Used in Prediction of Protein Structure Chapter · September 2019 DOI: 10.1007/978-981-15-2445-5_8 CITATIONS READS 2 9,286 1 author: Poulami Majumder Maulana Abul Kalam Azad University of Technology (West Bengal University of Technology) 44 PUBLICATIONS 223 CITATIONS SEE PROFILE All content following this page was uploaded by Poulami Majumder on 03 June 2020. The user has requested enhancement of the downloaded file. Computational Methods Used in Prediction of Protein Structure Poulami Majumder 1 Introduction Protein is the basic building block of life. This is the key component of the body which is responsible for various physiological biochemical reactions. Protein is an important chunk in bioinformatics field to understand the possible biological process of life. It is very important to predict the protein structure to pursue the following challenges like drug design, medicinal application as well as in bio-industrial appli- cations. Over 25 years, the way out towards prediction of protein structure has been continued. Multiple kinds of approaches have been taken for protein structure prediction through computational approaches which have been developed as the most popular and useful in recent times. To know about the different protein structure prediction approaches, we should first focus on the overview of the structure of protein. Four types of protein structure have been discovered so far (Fig. 1). Those are primary structure, secondary structure, tertiary structure and quaternary structure. The primary structure of protein is based on the simple linear arrangement of amino acid residue sequences. The secondary protein structure is generally based on the binding pattern of the amino hydrogen and carboxyl oxygen atoms between amino acid sequences throughout the peptide backbone. There are two kinds of protein secondary structure, and those are alpha helices and beta strands. In these structures, the amino acid sequences are linked with each other by hydrogen bonds. The alpha helix structure is generally composed of 3.6 amino acids per turn along with hydrogen bonds which are formed between every fourth residue while in beta strands there are two portions of the chain—one is upward with 5–10 consecutive amino acids and another is downward 5–10 consecutive amino acid sequences. H- bond interactions are formed mostly in between adjacent amino acids and short loops P. Majumder (B) Department of Biotechnology, Maulana Abul Kalam Azad University of Technology, Kolkata, West Bengal 700064, India © Springer Nature Singapore Pte Ltd. 2020 119 K. G. Srinivasa et al. (eds.), Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-15-2445-5_8 120 P. Majumder Primary Structure (amino acids chain sequence) Secondary Structure (alpha helix and beta pleated sheet) Alpha helix Beta pleated sheet Tertiary Structure (3D folded structure) Quaternary Structure (3D complex structure include more than one subunit) Fig. 1 Four levels of structure between them [6, 7]. This secondary structure prediction is more likely related to the pattern of alpha helices and beta strands amino acids residue structure. The prediction of secondary structure is mainly focused on to know about the linear amino acid sequences, i.e. primary protein structure. The pattern of the amino acid residues arrangements, their size and shape directs the ligands to fit with the protein in a better Computational Methods Used in Prediction of Protein Structure 121 way. The tertiary structure is about the three-dimensional structure of monomeric and multimeric molecules. In this structure, alpha helix and beta strands formed a globular structure together. The folding structure of this kind of protein is initiated by the hydrophobic bond, di-sulphide bond, salt bridge and also H-bond. The texture of this structure is not so rigid, as it is fluctuated minutely in continuous manner. The quaternary structure is built up through dimeric and/or multimeric molecules stabilized by the non-covalent bonds. The structural annotation of a protein is key understanding towards the function of a protein. It is also an important thing to know whether the structure of a protein is in its correct conformation or not, if so then is that correct conformation results the efficient function. The pattern of amino acid residue sequences determines the protein structure. Generally, a rough sequential structure of amino acid residues is the key to predict the complex protein structures. However, this experimental prediction is hard to find about the particular function of proteins. The knowledge about the primary structure, i.e. linear amino acid sequence, is not enough as the conformational configuration is getting fluctuated continuously. Recently, a number of techniques have been developed to determine the three- dimensional structure of protein, namely electron microscopy, spectroscopy, X-ray crystallography and nuclear magnetic resonance (NMR). However, there is a wide technical slit between the known sequential structure and the predicted struc- ture that has been found which is also a challenge towards protein structure predic- tion. Computational method is to resolve the protein structure prediction challenges directly from the amino acid sequences. In this book chapter, the major computa- tional tools or approaches have been described for protein structure prediction along with their different software programs available in the market. This chapter aims to deliver an overview on computational methods used in protein structure prediction. 2 Computational Methods for Protein Structure Prediction Three major strategies of computational method have been taken to predict the protein structure and those are as follows: Homology modelling techniques or comparative techniques, Protein threading or protein fold recognition and Ab initio or de novo techniques. In Fig. 2 the basic concept of protein structure prediction has been illustrated schematically based on different protein modelling stated earlier. 122 P. Majumder Protein Sequence Database Searching Multiple Sequence Alignment Secondary No Yes Homology Homologue in Structure Prediction PDB Modelling Fold Recognition Yes Sequence Structure Predicted Fold 3D Protein Model Alignment No Ab-initio Structure Prediction Fig. 2 Decision-making chart for protein structure prediction method 2.1 Homology Modelling Techniques This technique helps to paradigm an unknown atomic-resolution model of the “target” protein retrieves from its amino acid sequence which is followed by the formation of experimental 3D structure of a related homologous protein. This technique iden- tifies one or more known protein structures which are similar to the required protein structure and aligns both the sequences of known and unknown proteins to match them at its best. In this technique, one can also predict an unknown protein structure based on multiple templates which are used for different parts of protein. The structural accuracy generated by homology modelling is highly reliable based on the amino acid sequences resemblance between the target (queried protein) and template (existing known protein) protein. The generated models are thought to be reliable if the sequence resembles more than 50%. In some cases, the resemblance is less than 20%. Hence those protein structure predictions need further techniques other Computational Methods Used in Prediction of Protein Structure 123 Fig. 3 Schematic Related Template Structure illustration of basic process Identification of homology modelling for protein structure prediction Template Selection Alignment of Both Target and Template Sequence Model Building Evaluate Model No Model is OK Yes End Process than homology modelling. This modelling is useful in the pharmaceutical industry to structure-based drug discovery and drug design. This process is comprised of following methods (Fig. 3): template selection; amino acid sequence alignment between template and target protein; alignment correction and model backbone construction; side chain generation and optimization; overall model optimization, assessment and verification. A number of homology modelling techniques are available in the market, and most of them are open source. In Table 1, some significant programs with their significant description and function have been described. In this table, some significant homol- ogy models are highlighted. However, there are many other models which are used throughout the world such as IntFOLD, GeneSilico, Geno3D, STRUCTUROPEDIA and WHAT IF. 2.2 Protein Threading Protein threading is nothing but protein fold recognition. In this technique, the known proteins with same fold are being used as template for modelling the target protein 124 P. Majumder Table 1 Some useful computational methods based on homology modelling techniques (Courtesy Wikipedia) Name Description/function RaptorX One of the most popular methods. It does protein 3D modelling, detection of remote homology and the prediction of binding site Biskit It is an open-source software package programmed in Python. It wraps external programs into automated workflow ESyPred3D It is an automated homology program that helps to predict template sequences, alignment and 3D modelling. It is majorly focused on alignment strategy FoldX It uses empirical force field to design algorithm for protein structure. Energy calculations and protein design are built HHpred It is an open-source software package. Template detection, alignment, 3D modelling of sensitive protein structure MODELLER This model is used to build tertiary and quaternary protein structure Phyre and Phyre2 Free web-based service. It is one of the popular methods which helps in residues alignment, remote template detection and 3D modelling by using multiple templates Prime It works on sequence alignment, secondary structure prediction, homology modelling, protein refinement, loop-prediction and side chain prediction Bhageerath-H This platform was established by IIT Delhi and mainly focuses on tertiary protein structure prediction SWISS-MODEL This homology modelling is currently most accurate method for protein structure prediction. It works by finding the local similarity and fragment assembly YASARA Detection of templates, hybridization of model fragments, alignment, ligands and oligomers designing. There is a minute difference in protein threading from protein homology mod- elling. Protein threading specifically targets the protein with same fold level that means it aligns the sequence to the template structure while homology modelling is for comparatively easier target which aligns the sequence to the template sequence only. There are specific interactions between the amino acid sequences that affect the protein folding like hydrogen bond, hydrophobic bond, Van der Waals interac- tion, electrostatic force, etc.. There are almost 1300 different known protein folds which are existing till now, though each year new folds are being discovered. Protein threading is a process that comprises of four major steps (Fig. 4). Those are as follows: 1. Library of core fold templates which represents the template structures (protein data bank database). 2. The compatibility between the aligned amino acid sequences and template fold including the compatibility evaluation. 3. Search for the best option to optimize the target sequence and the template structure alignment. 4. Evaluate the best match based on statistical significance. Computational Methods Used in Prediction of Protein Structure 125 Fig. 4 Schematic representation of protein Search for Related Template Fold threading in simpler way Unsuccessful Fold Assignment Successful Alignment of Target Sequence and Template Fold Ab initio Modelling Comparative Model Building Evaluate Model Model is OK End Process In Table 2, some significant usable protein threading software is mentioned. These help in computational modelling of target protein based on template fold by fold recognition method. There are plenty of computational algorithms that have been proposed to find the best optimal protein threading of sequences onto a structure, but finding the best template for alignment is still difficult due to very limited resource in PDB, though researchers are trying many combinatorial optimized approaches such as simulated annealing, conditional random fields, branch and bound and linear programming. If homology modelling has been performed to predict protein structure and the aligned sequence is very low (