Molecular Biology Instant Notes PDF

Molecular Biology Third Edition ii Section K – Lipid metabolism BIOS INSTANT NOTES Series Editor: B.D. Hames, School of Biochemistry and Microbiology, University of Leeds, Leeds, UK Biology Animal Biology, Second Edition Biochemistry, Third Edition Bioinformatics Chemistry for Biologists, Second Edition Developmental Biology Ecology, Second Edition Genetics, Second Edition Immunology, Second Edition Mathematics & Statistics for Life Scientists Medical Microbiology Microbiology, Second Edition Molecular Biology, Third Edition Neuroscience, Second Edition Plant Biology, Second Edition Sport & Exercise Biomechanics Sport & Exercise Physiology Chemistry Consulting Editor: Howard Stanbury Analytical Chemistry Inorganic Chemistry, Second Edition Medicinal Chemistry Organic Chemistry, Second Edition Physical Chemistry Psychology Sub-series Editor: Hugh Wagner, Dept of Psychology, University of Central Lancashire, Preston, UK Cognitive Psychology Physiological Psychology Psychology Sport & Exercise Psychology Molecular Biology Third Edition Phil Turner, Alexander McLennan, Andy Bates & Mike White School of Biological Sciences, University of Liverpool, Liverpool, UK Published by: Taylor & Francis Group In US: 270 Madison Avenue New York, NY 10016 In UK: 4 Park Square, Milton Park Abingdon, OX14 4RN © 2005 by Taylor & Francis Group This edition published in the Taylor & Francis e-Library, 2007. “To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” First edition published in 1997 Second edition published in 2000 Third edition published in 2005 ISBN 0–203–96732–1 Master e-book ISBN ISBN: 0-415-35167-7 (Print Edition) This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. All rights reserved. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. A catalogue record for this book is available from the British Library. Library of Congress Cataloging-in-Publication Data Molecular biology / Phil Turner … [et al.]. – – 3rd ed. p. ; cm. – – (BIOS instant notes) Includes bibliographical references and index. ISBN 0-415-35167-7 (alk. paper) 1. Molecular biology – – Outlines, syllabi, etc. [DNLM: 1. Molecular Biology – – Outlines. QH 506 M71902 2005] I. Turner, Philip C. II. Series. QH506.I4815 2005 572.8 – – dc22 2005027956 Editor: Elizabeth Owen Editorial Assistant: Chris Dixon Production Editor: Karin Henderson Taylor & Francis Group is the Academic Division of T&F Informa plc. Visit our web site at http://www.garlandscience.com C ONTENTS Abbreviations ix Preface to the third edition xi Preface to the second edition xii Preface to the ﬁrst edition xiii Section A Cells and macromolecules 1 A1 Cellular classiﬁcation 1 A2 Subcellular organelles 4 A3 Macromolecules 7 A4 Large macromolecular assemblies 11 Section B Protein structure 15 B1 Amino acids 15 B2 Protein structure and function 18 B3 Protein analysis 25 Section C Properties of nucleic acids 33 C1 Nucleic acid structure 33 C2 Chemical and physical properties of nucleic acids 40 C3 Spectroscopic and thermal properties of nucleic acids 44 C4 DNA supercoiling 47 Section D Prokaryotic and eukaryotic chromosome structure 51 D1 Prokaryotic chromosome structure 51 D2 Chromatin structure 53 D3 Eukaryotic chromosome structure 58 D4 Genome complexity 63 D5 The ﬂow of genetic information 68 Section E DNA replication 73 E1 DNA replication: an overview 73 E2 Bacterial DNA replication 78 E3 The cell cycle 82 E4 Eukaryotic DNA replication 86 Section F DNA damage, repair and recombination 91 F1 Mutagenesis 91 F2 DNA damage 95 F3 DNA repair 98 F4 Recombination 101 Section G Gene manipulation 105 G1 DNA cloning: an overview 105 G2 Preparation of plasmid DNA 110 G3 Restriction enzymes and electrophoresis 113 G4 Ligation, transformation and analysis of recombinants 118 vi Contents Section H Cloning vectors 125 H1 Design of plasmid vectors 125 H2 Bacteriophage vectors 129 H3 Cosmids, YACs and BACs 134 H4 Eukaryotic vectors 139 Section I Gene libraries and screening 145 I1 Genomic libraries 145 I2 cDNA libraries 148 I3 Screening procedures 153 Section J Analysis and uses of cloned DNA 157 J1 Characterization of clones 157 J2 Nucleic acid sequencing 162 J3 Polymerase chain reaction 167 J4 Organization of cloned genes 172 J5 Mutagenesis of cloned genes 176 J6 Applications of cloning 180 Section K Transcription in prokaryotes 185 K1 Basic principles of transcription 185 K2 Escherichia coli RNA polymerase 188 K3 The E. coli ␴ 70 promoter 190 K4 Transcription, initiation, elongation and termination 193 Section L Regulation of transcription in prokaryotes 199 L1 The lac operon 199 L2 The trp operon 203 L3 Transcriptional regulation by alternative ␴ factors 207 Section M Transcription in eukaryotes 211 M1 The three RNA polymerases: characterization and function 211 M2 RNA Pol I genes: the ribosomal repeat 213 M3 RNA Pol III genes: 5S and tRNA transcription 217 M4 RNA Pol II genes: promoters and enhancers 221 M5 General transcription factors and RNA Pol II initiation 223 Section N Regulation of transcription in eukaryotes 227 N1 Eukaryotic transcription factors 227 N2 Examples of transcriptional regulation 233 Section O RNA processing and RNPs 239 O1 rRNA processing and ribosomes 239 O2 tRNA processing and other small RNAs 245 O3 mRNA processing, hnRNPs and snRNPs 249 O4 Alternative mRNA processing 255 Section P The genetic code and tRNA 259 P1 The genetic code 259 P2 tRNA structure and function 263 Section Q Protein synthesis 269 Q1 Aspects of protein synthesis 269 Q2 Mechanism of protein synthesis 273 Contents vii Q3 Initiation in eukaryotes 279 Q4 Translational control and post-translational events 283 Section R Bacteriophages and eukaryotic virusesa 289 R1 Introduction to viruses 289 R2 Bacteriophages 293 R3 DNA viruses 298 R4 RNA viruses 302 Section S Tumor viruses and oncogenesb 307 S1 Oncogenes found in tumor viruses 307 S2 Categories of oncogenes 311 S3 Tumor suppressor genes 314 S4 Apoptosis 318 Section T Functional genomics and new technologies 323 T1 Introduction to the ’omics 323 T2 Global gene expression analysis 327 T3 Proteomics 332 T4 Cell and molecular imaging 337 T5 Transgenics and stem cell technology 342 Section U Bioinformatics 347 U1 Introduction to bioinformatics 347 Further reading 357 Index 362 a Contributed by Dr M. Bennett, Department of Veterinary Pathology, University of Liverpool, Leahurst, Neston, South Wirral L64 7TE, UK. b Contributed by Dr C. Green, School of Biological Sciences, University of Liverpool, PO Box 147, Liverpool L69 7ZB, UK. A BBREVIATIONS ADP adenosine 5⬘-diphosphate ␤-gal ␤-galactosidase AIDS acquired immune deﬁciency GFP green ﬂuorescent protein syndrome GMO genetically modiﬁed organism AMP adenosine 5⬘-monophosphate GST glutathione-S-transferase ARS autonomously replicating GTP guanosine 5⬘-triphosphate sequence HIV human immunodeﬁciency virus ATP adenosine 5⬘-triphosphate HLH helix–loop–helix BAC bacterial artiﬁcial chromosome hnRNA heterogeneous nuclear RNA BER base excision repair hnRNP heterogeneous nuclear BLAST basic local alignment search tool ribonucleoprotein bp base pairs HSP heat-shock protein BRF TFIIB-related factor HSV-1 herpes simplex virus-1 BUdR bromodeoxyuridine ICAT isotope-coded afﬁnity tag bZIP basic leucine zipper ICC immunocytochemistry CDK cyclin-dependent kinase ICE interleukin-1 β converting cDNA complementary DNA enzyme CHEF contour clamped homogeneous IF initiation factor electric ﬁeld Ig immunoglobulin CJD Creutzfeld–Jakob disease IHC immunohistochemistry CRP cAMP receptor protein IHF integration host factor CSF-1 colony-stimulating factor-1 IP immunoprecipitation CTD carboxy-terminal domain IPTG isopropyl-␤-D-thiogalactopyra- Da Dalton noside dNTP deoxynucleoside triphosphate IRE iron response element ddNTP dideoxynucleoside triphosphate IS insertion sequence DMS dimethyl sulfate ISH in situ hybridization DNA deoxyribonucleic acid ITS internal transcribed spacer DNase deoxyribonuclease JAK Janus activated kinase DOP-PCR degenerate oligonucleotide primer kb kilobase pairs in duplex nucleic PCR acid, kilobases in single-stranded dsDNA double-stranded DNA nucleic acid EDTA ethylenediamine tetraacetic acid kDa kiloDalton EF elongation factor LAT latency-associated transcript ELISA enzyme-linked immunosorbent LC liquid chromatography assay LINES long interspersed elements EMBL European Molecular Biology LTR long terminal repeat Laboratory MALDI matrix-assisted laser ENU ethylnitrosourea desorption/ionization ER endoplasmic reticulum MCS multiple cloning site ES embryonic stem miRNA micro RNA ESI electrospray ionization MMS methylmethane sulfonate EST expressed sequence tag MMTV mouse mammary tumor virus ETS external transcribed spacer mRNA messenger RNA FADH reduced ﬂavin adenine MS mass spectrometry dinucleotide NAD+ nicotinamide adenine FIGE ﬁeld inversion gel electrophoresis dinucleotide FISH ﬂuorescent in situ hybridization NER nucleotide excision repair x Abbreviations NLS nuclear localization signal RT–PCR reverse transcriptase–polymerase NMN nicotinamide mononucleotide chain reaction NMD nonsense mediated mRNA decay SAGE serial analysis of gene expression NMR nuclear magnetic resonance SAM S-adenosylmethionine nt nucleotide SDS sodium dodecyl sulfate NTP nucleoside triphosphate siRNA short interfering RNA ORC origin recognition complex SINES short interspersed elements ORF open reading frame SL1 selectivity factor 1 PAGE polyacrylamide gel snoRNP small nucleolar RNP electrophoresis SNP single nucleotide polymorphism PAP poly(A) polymerase snRNA small nuclear RNA PCNA proliferating cell nuclear snRNP small nuclear ribonucleoprotein antigen SRP signal recognition particle PCR polymerase chain reaction Ssb single-stranded binding protein PDGF platelet-derived growth factor SSCP single stranded conformational PFGE pulsed ﬁeld gel electrophoresis polymorphism PTH phenylthiohydantoin ssDNA single-stranded DNA RACE rapid ampliﬁcation of cDNA STR simple tandem repeat ends SV40 simian virus 40 RBS ribosome-binding site TAF TBP-associated factor RER rough endoplasmic reticulum TBP TATA-binding protein RF replicative form ␣-TIF ␣-trans-inducing factor RFLP restriction fragment length tm RNA transfer-messenger RNA polymorphism TOF time-of-ﬂight RISC RNA-induced silencing complex Tris tris(hydroxymethyl)amino- RNA ribonucleic acid methane RNAi RNA interference tRNA transfer RNA RNA Pol I RNA polymerase I UBF upstream binding factor RNA Pol II RNA polymerase II UCE upstream control element RNA Pol III RNA polymerase III URE upstream regulatory element RNase A ribonuclease A UV ultraviolet RNase H ribonuclease H VNTR variable number tandem repeat RNP ribonucleoprotein X-gal 5-bromo-4-chloro-3-indolyl-␤-D- ROS reactive oxygen species galactopyranoside RP-A replication protein A XP xeroderma pigmentosum rRNA ribosomal RNA YAC yeast artiﬁcial chromosome RT reverse transcriptase YEp yeast episomal plasmid P REFACE TO THE THIRD EDITION It doesn’t seem like ﬁve years have passed since we were all involved in writing the second edition of Instant Notes in Molecular Biology. However, during that period there have been a number of events and discoveries that are worthy of comment. We are impressed with how popular the text has become with students, not only in the United Kingdom, but all around the world. The second edition has been translated into Portuguese, Turkish, Polish, French, and Japanese. We have received comments and suggestions from as far aﬁeld as Katmandu and Istanbul as well as from much nearer home and we would like to thank everyone who has taken the time to make their comments known, as they have helped our improvements to the third edition. Even though our text is a basic introduction to Molecular Biology, there have been some dramatic developments in the ﬁeld since the last edition was written. These include the whole area of small RNA molecules, including micro RNAs and RNA interference and we have updated the relevant sections to include this material. Other important developments include the rapid growth in the areas of genomics, proteomics, cell imaging and bio- informatics and since we recognize that these areas will rapidly change in the future, we have pragmatically included two new sections at the end of the book to deal with these fast moving topics. We hope this will make changes to future editions less complex. Several people who have helped us to keep on track with writing and production on the third edition deserve our thanks for their encour- agement and patience, including Sarah Carlson, Liz Owen and Alison Nick, not to mention all our families who have tolerated our necessary preoccupations. We sincerely hope that the third edition continues to help students to get to grips with this interesting area of biology. Phil Turner, Sandy McLennan, Andy Bates and Mike White September 2005 P REFACE TO THE SECOND EDITION To assess how to improve Instant Notes in Molecular Biology for the second edition, we studied the ﬁrst edition reader’s comments carefully and were pleasantly surprised to discover how little was deemed to have been omitted and how few errors had been brought to our attention. Thus, the problem facing us was how to add a number of fairly disparate items and topics without substan- tially affecting the existing structure of the book. We therefore chose to ﬁt new material into existing topics as far as possible, only creating new topics where absolutely necessary. A superﬁcial com- parison might therefore suggest that little has changed in the second edition, but we have included, updated or extended the following areas: proteomics, LINES/SINES, signal transduction, BACs, Z- DNA, gene gun, genomics, DNA ﬁngerprinting, DNA chips, microarrays, RFLPs, genetic polymorphism, genome sequencing projects, SSCP, automated DNA sequencing, positional cloning, chromosome jumping, PFGE, multiplex DNA ampliﬁcation, RT-PCR, quantitative PCR, PCR screening, PCR mutagenesis, degenerate PCR and transgenic animals. In addition, three completely new topics have been added. Arguably, no molecular biology text should omit a discussion of Crick’s central dogma and it now forms the basis of Topic D5 – The ﬂow of genetic information. Two other rapidly expanding and essential subjects are The cell cycle and Apoptosis, each of which, we felt, deserved its own topic. These have been added to Section E on DNA replication and Section S on Tumor viruses and oncogenes, respectively. Finally, in keeping with the ethos of the ﬁrst edition that Instant Notes in Molecular Biology should be used as a study guide and revision aid, we have added approx- imately 100 multiple choice questions grouped in section order. This single improvement will, we feel, greatly enhance the educational utility of the book. Phil Turner, Sandy McLennan, Andy Bates and Mike White Acknowledgments We thank all those readers of the ﬁrst edition who took the trouble to return their comments and suggestions, without which the second edition would have been less improved, Will Sansom, Andrea Bosher and Jonathan Ray at BIOS who kept encouraging us and ﬁnally our families, who once again had to suffer during the periods of (re)writing. P REFACE TO THE FIRST EDITION The last 20 years have witnessed a revolution in our understanding of the processes responsible for the maintenance, transmission and expression of genetic information at the molecular level – the very basis of life itself. Of the many technical advances on which this explosion of knowledge has been based, the ability to remove a speciﬁc fragment of DNA from an organism, manipulate it in the test tube, and return it to the same or a different organism must take pride of place. It is around this essence of recombinant DNA technology, or genetic engineering to give it its more popular title, that the subject of molecular biology has grown. Molecular biology seeks to explain the relation- ships between the structure and function of biological molecules and how these relationships contribute to the operation and control of biochemical processes. Of principal interest are the macro- molecules and macromolecular complexes of DNA, RNA and protein and the processes of replication, transcription and translation. The new experimental technologies involved in manipulating these molecules are central to modern molecular biology. Not only does it yield fundamental information about the molecules, but it has tremendous practical applications in the development of new and safe products such as therapeutics, vaccines and foodstuffs, and in the diagnosis of genetic disease and in gene therapy. An inevitable consequence of the proliferation of this knowledge is the concomitant proliferation of comprehensive, glossy textbooks, which, while beautifully produced, can prove somewhat over- whelming in both breadth and depth to ﬁrst and second year undergraduate students. With this in mind, Instant Notes in Molecular Biology aims to deliver the core of the subject in a concise, easily assim- ilated form designed to aid revision. The book is divided into 19 sections containing 70 topics. Each topic consists of a ‘Key Notes’ panel, with extremely concise statements of the key points covered. These are then ampliﬁed in the main part of the topic, which includes simple and clear black and white ﬁgures, which may be easily understood and reproduced. To get the best from this book, material should ﬁrst be learnt from the main part of the topic; the Key Notes can then be used as a rapid revi- sion aid. Whilst there is a reasonably logical order to the topics, the book is designed to be ‘dipped into’ at any point. For this reason, numerous cross-references are provided to guide the reader to related topics. The contents of the book have been chosen to reﬂect both the major techniques used and the conclu- sions reached through their application to the molecular analysis of biological processes. They are based largely on the molecular biology courses taught by the authors to ﬁrst and second year under- graduates on a range of biological science degree courses at the University of Liverpool. Section A introduces the classiﬁcation of cells and macromolecules and outlines some of the methods used to analyze them. Section B considers the basic elements of protein structure and the relationship of struc- ture to function. The structure and physico-chemical properties of DNA and RNA molecules are discussed in Section C, including the complex concepts involved in the supercoiling of DNA. The organization of DNA into the intricate genomes of both prokaryotes and eukaryotes is covered in Section D. The related subjects of mutagenesis, DNA replication, DNA recombination and the repair of DNA damage are considered in Sections E and F. Section G introduces the technology available for the manipulation of DNA sequences. As described above, this underpins much of our detailed understanding of the molecular mechanisms of cellular processes. A simple DNA cloning scheme is used to introduce the basic methods. Section H describes a number of the more sophisticated cloning vectors which are used for a variety of purposes. Section I considers the use of DNA libraries in the isolation of new gene sequences, while Section J covers more complex and detailed methods, including DNA sequencing and the analysis of cloned sequences. This section concludes with a discussion of some of the rapidly expanding applications of gene cloning techniques. xiv Preface to the ﬁrst edition The basic principles of gene transcription in prokaryotes are described in Section K, while Section L gives examples of some of the sophisticated mechanisms employed by bacteria to control speciﬁc gene expression. Sections M and N provide the equivalent, but necessarily more complex, story of transcription in eukaryotic cells. The processing of newly transcribed RNA into mature molecules is detailed in Section O, and the roles of these various RNA molecules in the translation of the genetic code into protein sequences are described in Sections P and Q. The contributions that prokaryotic and eukaryotic viruses have made to our understanding of molecular information processing are detailed in Section R. Finally, Section S shows how the study of viruses, combined with the knowl- edge accumulated from many other areas of molecular biology is now leading us to a detailed understanding of the processes involved in the development of a major human afﬂiction – cancer. This book is not intended to be a replacement for the comprehensive mainstream textbooks; rather, it should serve as a direct complement to your lecture notes to provide a sound grounding in the subject. The major texts, some of which are listed in the Further Reading section at the end of the book, can then be consulted for more detail on topics speciﬁc to the particular course being studied. For those of you whose fascination and enthusiasm for the subject has been sufﬁciently stimulated, the reading list also directs you to some more detailed and advanced articles to take you beyond the scope of this book. Inevitably, there have had to be omissions from Instant Notes in Molecular Biology and we are sure each reader will spot a different one. However, many of these will be covered in other titles in the Instant Notes series, such as the companion volume, Instant Notes in Biochemistry. Phil Turner, Sandy McLennan, Andy Bates and Mike White Acknowledgments We would like to acknowledge the support and understanding of our families for those many lost evenings and weekends when we could all have been in the pub instead of drafting and redrafting manuscripts. We are also indebted to our colleagues Malcolm Bennett and Chris Green for their contri- butions to the chapters on bacteriophages, viruses and oncogenes. Our thanks, too, go to the series editor, David Hames, and to Jonathan Ray, Rachel Robinson and Lisa Mansell of BIOS Scientiﬁc Publishers for providing prompt and helpful advice when required and for keeping the pressure on us to ﬁnish the book on time. Section A – Cells and macromolecules A1 C ELLULAR CLASSIFICATION Key Notes Eubacteria Structurally deﬁned as prokaryotes, these cells have a plasma membrane, usually enclosed in a rigid cell wall, but no intracellular compartments. They have a single, major circular chromosome. They may be unicellular or multicellular. Escherichia coli is the best studied eubacterium. Archaea The Archaea are structurally deﬁned as prokaryotes but probably branched off from the eukaryotes after their common ancestor diverged from the eubacteria. They tend to inhabit extreme environments. They are biochemically closer to eubacteria in some ways but to eukaryotes in others. They also have some biochemical peculiarities. Eukaryotes Cells of plants, animals, fungi and protists possess well-deﬁned subcellular compartments bounded by lipid membranes (e.g. nuclei, mitochondria, endoplasmic reticulum). These organelles are the sites of distinct biochemical processes and deﬁne the eukaryotes. Differentiation In most multicellular eukaryotes, groups of cells differentiate during development of the organism to provide specialized functions (e.g. as in liver, brain, kidney). In most cases, they contain the same DNA but transcribe different genes. Like all other cellular processes, differentiation is controlled by genes. Co-ordination of the activities of different cell types requires communication between them. Related topics Subcellular organelles (A2) Bacteriophages and eukaryotic Prokaryotic and eukaryotic viruses (Section R) chromosome structure (Section D) Eubacteria The Eubacteria are one of two subdivisions of the prokaryotes. Prokaryotes are the simplest living cells, typically 1–10 ␮m in diameter, and are found in all envi- ronmental niches from the guts of animals to acidic hot springs. Classically, they are deﬁned by their structural organization (Fig. 1). They are bounded by a cell (plasma) membrane comprising a lipid bilayer in which are embedded proteins that allow the exit and entry of small molecules. Most prokaryotes also have a rigid cell wall outside the plasma membrane which prevents the cell from swelling or shrinking in environments where the osmolarity differs signiﬁcantly from that inside the cell. The cell interior (cytoplasm or cytosol) usually contains a single, circular chromosome compacted into a nucleoid and attached to the membrane (see Topic D1), and often plasmids [small deoxyribonucleic acid (DNA) molecules with limited genetic information, see Topic G2], ribonucleic acid (RNA), ribosomes (the sites of protein synthesis, see Section Q) and most of the proteins which perform the metabolic reactions of the cell. Some of these proteins are attached to the plasma membrane, but there are no distinct subcellular 2 Section A – Cells and macromolecules organelles as in eukaryotes to compartmentalize different parts of the metabo- lism. The surface of a prokaryote may carry pili, which allow it to attach to other cells and surfaces, and ﬂagella, whose rotating motion allows the cell to swim. Most prokaryotes are unicellular; some, however, have multicellular forms in which certain cells carry out specialized functions. The Eubacteria differ from the Archaea mainly in their biochemistry. The eubacterium Escherichia coli has a genome size (DNA content) of 4600 kilobase pairs (kb) which is sufﬁcient genetic information for about 3000 proteins. Its molecular biology has been studied exten- sively. The genome of the simplest bacterium, Mycoplasma genitalium, has only 580 kb of DNA and encodes just 470 proteins. It has a very limited metabolic capacity. Fig. 1. Schematic diagram of a typical prokaryotic cell. Archaea The Archaea, or archaebacteria, form the second subdivision of the prokary- otes and tend to inhabit extreme environments. Structurally, they are similar to eubacteria. However, on the basis of the evolution of their ribosomal RNA (rRNA) molecules (see Topic O1), they appear as different from the eubacteria as both groups of prokaryotes are from the eukaryotes and display some unusual biochemical features, for example ether in place of ester linkages in membrane lipids (see Topic A3). The 1740 kb genome of the archaeon Methanococcus jannaschii encodes a maximum of 1738 proteins. Comparisons reveal that those involved in energy production and metabolism are most like those of eubacteria while those involved in replication, transcription and trans- lation are more similar to those of eukaryotes. It appears that the Archaea and the eukaryotes share a common evolutionary ancestor which diverged from the ancestor of the Eubacteria. Eukaryotes Eukaryotes are classiﬁed taxonomically into four kingdoms comprising animals, plants, fungi and protists (algae and protozoa). Structurally, eukaryotes are deﬁned by their possession of membrane-enclosed organelles (Fig. 2) with specialized metabolic functions (see Topic A2). Eukaryotic cells tend to be larger than prokaryotes: 10–100 ␮m in diameter. They are surrounded by a plasma membrane, which can have a highly convoluted shape to increase its surface area. Plants and many fungi and protists also have a rigid cell wall. The cyto- plasm is a highly organized gel that contains, in addition to the organelles and ribosomes, an array of protein ﬁbers called the cytoskeleton which controls the shape and movement of the cell and which organizes many of its metabolic functions. These ﬁbers include microtubules, made of tubulin, and microﬁla- ments, made of actin (see Topic A4). Many eukaryotes are multicellular, with groups of cells undergoing differentiation during development to form the specialized tissues of the whole organism. A1 – Cellular classiﬁcation 3 Fig. 2. Schematic diagram of a typical eukaryotic cell. Differentiation When a cell divides, the daughter cells may be identical in every way, or they may change their patterns of gene expression to become functionally different from the parent cell. Among prokaryotes and lower eukaryotes, the formation of spores is an example of such cellular differentiation (see Topic L3). Among complex multi- cellular eukaryotes, embryonic cells differentiate into highly specialized cells, for example muscle, nerve, liver and kidney. In all but a few exceptional cases, the DNA content remains the same, but the genes which are transcribed have changed. Differentiation is regulated by developmental control genes (see Topic N2). Mutations in these genes result in abnormal body plans, such as legs in the place of antennae in the fruit ﬂy Drosophila. Studying such gene mutations allows the process of embryonic development to be understood. In multicellular organisms, co-ordination of the activities of the various tissues and organs is controlled by communication between them. This involves signaling molecules such as neurotransmitters, hormones and growth factors which are secreted by one tissue and act upon another through speciﬁc cell-surface receptors. Section A – Cells and macromolecules A2 S UBCELLULAR ORGANELLES Key Notes Nuclei The membrane-bound nucleus contains the bulk of the cellular DNA in multiple chromosomes. Transcription of this DNA and processing of the RNA occurs here. Nucleoli are contained within the nucleus. Mitochondria Mitochondria are the site of cellular respiration where nutrients are oxidized and chloroplasts to CO2 and water, and adenosine 5⬘-triphosphate (ATP) is generated. They are derived from prokaryotic symbionts and retain some DNA, RNA and protein synthetic machinery, though most of their proteins are encoded in the nucleus. Photosynthesis takes place in the chloroplasts of plants and eukaryotic algae. Chloroplasts have a basically similar structure to mitochondria but with a thylakoid membrane system containing the light- harvesting pigment chlorophyll. Endoplasmic The smooth endoplasmic reticulum is a cytoplasmic membrane system where reticulum many of the reactions of lipid biosynthesis and xenobiotic metabolism are carried out. The rough endoplasmic reticulum has attached ribosomes engaged in the synthesis of membrane-targeted and secreted proteins. These proteins are carried in vesicles to the Golgi complex for further processing and sorting. Microbodies The lysosomes contain degradative, hydrolytic enzymes; the peroxisomes contain enzymes which destroy certain potentially dangerous free radicals and hydrogen peroxide; the glyoxysomes of plants carry out the reactions of the glyoxylate cycle. Organelle isolation After disruption of the plasma membrane, the subcellular organelles can be separated from each other and puriﬁed by a combination of differential centrifugation and density gradient centrifugation (both rate zonal and isopycnic). Purity can be assayed by measuring organelle-speciﬁc enzymes. Related topics Cellular classiﬁcation (A1) Translational control and post- rRNA processing and translational events (Q4) ribosomes (O1) Nuclei The eukaryotic nucleus carries the genetic information of the cell in multiple chromosomes, each containing a single DNA molecule (see Topics D2 and D3). The nucleus is bounded by a lipid double membrane, the nuclear envelope, containing pores which allow passage of moderately large molecules (see Topic A1, Fig. 2). Transcription of RNA takes place in the nucleus (see Section M) and the processed RNA molecules (see Section O) pass into the cytoplasm where translation takes place (see Section Q). Nucleoli are bodies within the nucleus A2 – Subcellular organelles 5 where rRNA is synthesized and ribosomes are partially assembled (see Topics M2 and O1). Mitochondria and Cellular respiration, that is the oxidation of nutrients to generate energy in the chloroplasts form of adenosine 5′-triphosphate (ATP), takes place in the mitochondria. These organelles are roughly 1–2 ␮m in diameter and there may be 1000–2000 per cell. They have a smooth outer membrane and a convoluted inner membrane that forms protrusions called cristae (see Topic A1, Fig. 2). They contain a small circular DNA molecule, mitochondrial-speciﬁc RNA and ribosomes on which some mitochondrial proteins are synthesized. However, the majority of mito- chondrial (and chloroplast) proteins are encoded by nuclear DNA and synthe- sized in the cytoplasm. These latter proteins have speciﬁc signal sequences that target them to the mitochondria (see Topic Q4). The chloroplasts of plants are the site of photosynthesis, the light-dependent assimilation of CO2 and water to form carbohydrates and oxygen. Though larger than mitochondria, they have a similar structure except that, in place of cristae, they have a third membrane system (the thylakoids) in the inner membrane space. These contain chlorophyll, which traps the light energy for photosynthesis. Chloroplasts are also partly genetically inde- pendent of the nucleus. Both mitochondria and chloroplasts are believed to have evolved from prokaryotes which had formed a symbiotic relationship with a primitive nucleated eukaryote. Endoplasmic The endoplasmic reticulum is an extensive membrane system within the reticulum cytoplasm and is continuous with the nuclear envelope (see Topic A1, Fig. 2). Two forms are visible in most cells. The smooth endoplasmic reticulum carries many membrane-bound enzymes, including those involved in the biosynthesis of certain lipids and the oxidation and detoxiﬁcation of foreign compounds (xenobi- otics) such as drugs. The rough endoplasmic reticulum (RER) is so-called because of the presence of many ribosomes. These ribosomes speciﬁcally synthesize proteins intended for secretion by the cell, such as plasma or milk proteins, or those destined for the plasma membrane or certain organelles. Apart from the plasma membrane proteins, which are initially incorporated into the RER mem- brane, these proteins are translocated into the interior space (lumen) of the RER where they are modiﬁed, often by glycosylation (see Topic Q4). The lipids and proteins synthesized on the RER are transported in specialized transport vesicles to the Golgi complex, a stack of ﬂattened membrane vesicles which further modiﬁes, sorts and directs them to their ﬁnal destinations (see Topic A1, Fig. 2). Microbodies Lysosomes are small membrane-bound organelles which bud off from the Golgi complex and which contain a variety of digestive enzymes capable of degrading proteins, nucleic acids, lipids and carbohydrates. They act as recycling centers for macromolecules brought in from outside the cell or from damaged organelles. Some metabolic reactions which generate highly reactive free radi- cals and hydrogen peroxide are conﬁned within organelles called peroxisomes to prevent these species from damaging cellular components. Peroxisomes contain the enzyme catalase, which destroys hydrogen peroxide: 2H2O2 → 2H2O + O2 Glyoxysomes are specialized plant peroxisomes which carry out the reactions of the glyoxylate cycle. Lysosomes, peroxisomes and glyoxysomes are collec- tively known as microbodies. 6 Section A – Cells and macromolecules Organelle The plasma membrane of eukaryotes can be disrupted by various means isolation including osmotic shock, controlled mechanical shear or by certain nonionic detergents. Organelles displaying large size and density differences, for example nuclei and mitochondria, can be separated from each other and from other organelles by differential centrifugation according to the value of their sedi- mentation coefﬁcients (see Topic A4). The cell lysate is centrifuged at a speed which is high enough to sediment only the heaviest organelles, usually the nuclei. The supernatant containing all the other organelles is removed then centrifuged at a higher speed to sediment the mitochondria, and so on (Fig. 1a). This technique is also used to fractionate suspensions containing cell types of different sizes, for example red cells, white cells and platelets in blood. These crude preparations of cells, nuclei and mitochondria usually require further puriﬁcation by density gradient centrifugation. This is also used to separate organelles of similar densities. In rate zonal centrifugation, the mixture is layered on top of a pre-formed concentration (and, therefore, density) gradient of a suitable medium in a centrifuge tube. Upon centrifugation, bands or zones of the different components sediment at different rates depending on their sedimentation coefﬁcients, and separate (Fig. 1b). The purpose of the density gradient of the supporting medium is to prevent convective mixing of the components after separation (i.e. to provide stability) and to ensure linear sedi- mentation rates of the components (it compensates for the acceleration of the components as they move further down the tube). In equilibrium (isopycnic) centrifugation, the density gradient extends to a density higher than that of one or more components of the mixture so that these components come to equi- librium at a point equal to their own density and stop moving. In this case, the density gradient can either be pre-formed, and the sample layered on top, or self-forming, in which case the sample may be mixed with the gradient material (Fig. 1c). Density gradients are made from substances such as sucrose, Ficoll (a synthetic polysaccharide), metrizamide (a synthetic iodinated heavy compound) or cesium chloride (CsCl), for separation of nucleic acids (see Topics C2 and G2). Purity of the subcellular fraction can be determined using an electron microscope or by assaying enzyme activities known to be associated speciﬁcally with particular organelles, for example succinate dehydrogenase in mitochondria. Fig. 1. Centrifugation techniques. (a) Differential, (b) rate zonal and (c) isopycnic (equilibrium). Section A – Cells and macromolecules A3 M ACROMOLECULES Key Notes Proteins and Proteins are polymers of amino acids, and the nucleic acids DNA and RNA nucleic acids are polymers of nucleotides. They are both essential components of the machinery which stores and expresses genetic information. Proteins have many additional structural and functional roles. Polysaccharides ␣-Amylose and cellulose are polymers of glucose linked ␣(1→4) and ␤(1→4) respectively. Starch, a storage form of glucose found in plants, contains ␣-amylose together with the ␣(1→6) branched polymer amylopectin. Cellulose forms strong structural ﬁbers in plants. Glucose is stored as glycogen in animals. Chitin, a polymer of N-acetylglucosamine, is found in fungal cell walls and arthropod exoskeletons. Mucopolysaccharides are important components of connective tissue. Lipids Triglycerides containing saturated and unsaturated fatty acids are the major storage lipids of animals and plants respectively. Structural differences between the two types of fatty acid result in animal triglycerides being solid and plant triglycerides (oils) liquid. Phospholipids and sphingolipids have polar groups in addition to the fatty acid components, and are important constituents of all cell membranes. Complex Nucleoproteins contain both nucleic acid and protein, as in the enzymes macromolecules telomerase and ribonuclease P. Glycoproteins and proteoglycans (mucoproteins) are proteins with covalently attached carbohydrate and are generally found on extracellular surfaces and in extracellular spaces. Lipid- linked proteins and lipoproteins have lipid and protein components attached covalently or noncovalently respectively. Glycolipids have both lipid and carbohydrate parts. Mixed macromolecular complexes such as these provide a wider range of functions than the component parts. Related topics Large macromolecular assemblies (A4) Properties of nucleic acids Protein structure (Section B) (Section C) Proteins and Proteins are polymers of amino acids linked together by peptide bonds. The nucleic acids structures of the amino acids and of proteins are dealt with in detail in Section B. Proteins have both structural and functional roles. The nucleic acids DNA and RNA are polymers of nucleotides, which themselves consist of a nitroge- nous base, a pentose sugar and phosphoric acid. Their structures are detailed in Section C. There are three main types of cellular RNA: messenger RNA (mRNA), ribosomal RNA (rRNA) and transfer RNA (tRNA). Nucleic acids are involved in the storage and processing of genetic information (see Topic D5), but the expression of this information requires proteins. 8 Section A – Cells and macromolecules Polysaccharides Polysaccharides are polymers of simple sugars covalently linked by glycosidic bonds. They function mainly as nutritional sugar stores and as structural materials. Cellulose and starch are abundant components of plants. Both are glucose poly- mers, but differ in the way the glucose monomers are linked. Cellulose is a linear polymer with ␤(1→4) linkages (Fig. 1a) and is a major structural component of the plant cell wall. About 40 parallel chains form horizontal sheets which stack verti- cally above one another. The chains and sheets are held together by hydrogen bonds (see Topic A4) to produce tough, insoluble ﬁbers. Starch is a sugar store and is found in large intracellular granules which can be hydrolyzed quickly to release glucose for metabolism. It contains two components: ␣-amylose, a linear polymer with ␣(1→4) linkages (Fig. 1b), and amylopectin, a branched polymer with addi- tional ␣(1→6) linkages. With up to 106 glucose residues, amylopectins are among the largest molecules known. The different linkages in starch produce a coiled conformation that cannot pack tightly, hence starch is water-soluble. Fungi and some animal tissues (e.g. liver and muscle) store glucose as glycogen, a branched polymer like amylopectin. Chitin is found in fungal cell walls and in the exoskeleton of insects and crustacea. It is similar to cellulose, but the monomer unit is N-acetylglucosamine. Mucopolysaccharides (glycosaminoglycans) form the gel-like solutions in which the ﬁbrous proteins of connective tissue are embedded. Determination of the structures of large polysaccharides is complicated because they are heterogeneous in size and composition and because they cannot be studied genetically like nucleic acids and proteins. Fig. 1. The structures of (a) cellulose, with ␤(1→4) linkages and (b) starch ␣-amylose, with ␣(1→4) linkages. Carbon atoms 1, 4 and 6 are labeled. Additional ␣(1→6) linkages produce branches in amylopectin and glycogen. Lipids While individual lipids are not strictly macromolecules, many are built up from smaller monomeric units and they are involved in many macromolecular assem- blies (see Topic A4). Large lipid molecules are predominantly hydrocarbon in nature and are poorly soluble in water. Some are involved in the storage and transport of energy while others are key components of membranes, protective coats and other cell structures. Glycerides have one, two or three long-chain fatty acids esteriﬁed to a molecule of glycerol. In animal triglycerides, the fatty acids have no double bonds (saturated) so the chains are linear, the molecules can pack tightly and the resulting fats are solid. Plant oils contain unsaturated fatty acids with one or more double bonds. The angled structures of these chains prevent close packing so they tend to be liquids at room temperature. Membranes contain phospholipids, which consist of glycerol esteriﬁed to two fatty acids and phosphoric acid. The phosphate is also usually esteriﬁed to a small molecule such as serine, ethanolamine, inositol or choline (Fig. 2). Membranes also contain sphingolipids such as ceramide, in which the long- chain amino alcohol sphingosine has a fatty acid linked by an amide bond. Attachment of phosphocholine to a ceramide produces sphingomyelin. A3 – Macromolecules 9 Fig. 2. A typical phospholipid: phosphatidylcholine containing esteriﬁed stearic and oleic acids. Complex Many macromolecules contain covalent or noncovalent associations of more macromolecules than one of the major classes of large biomolecules. This can greatly increase the functionality or structural capabilities of the resulting complex. For example, nearly all enzymes are proteins, but some have a noncovalently attached RNA component which is essential for catalytic activity. Associations of nucleic acid and protein are known as nucleoproteins. Examples are telomerase, which is responsible for replicating the ends of eukaryotic chromosomes (see Topics D3 and E3) and ribonuclease P, an enzyme which matures transfer RNA (tRNA). In telomerase, the RNA acts as a template for telomere DNA synthesis, while in ribonuclease P, the RNA contains the catalytic site of the enzyme. Ribo- nuclease P is an example of a ribozyme (see Topic O2). Glycoproteins contain both protein and carbohydrate (between 90% of the weight) components; glycosylation is the commonest form of post-translational modiﬁcation of proteins (see Topic Q4). The carbohydrate is always covalently attached to the surface of the protein, never the interior, and is often variable in composition, causing microheterogeneity (Fig. 3). This has made glycoproteins difﬁcult to study. Glycoproteins have functions that span the entire range of protein activities, and are usually found extracel- lularly. They are important components of cell membranes and mediate cell–cell recognition. Proteoglycans (mucoproteins) are large complexes (>107 Da) of protein and mucopolysaccharide found in bacterial cell walls and in the extracellular space in connective tissue. Their sugar units often have sulfate groups, which makes Fig. 3. Glycoprotein structure. The different symbols represent different monosaccharide units (e.g. galactose, N-acetylglucosamine). 10 Section A – Cells and macromolecules them highly hydrated. This, coupled with their lengths (> 1000 units), produces solutions of high viscosity. Proteoglycans act as lubricants and shock absorbers in extracellular spaces. Lipid-linked proteins have a covalently attached lipid component. This is usually a fatty acyl (e.g. myristoyl or palmitoyl) or isoprenoid (e.g. farnesyl or geranylgeranyl) group. These groups serve to anchor the proteins in membranes through hydrophobic interactions with the membrane lipids and also promote protein–protein associations (see Topic A4). In lipoproteins, the lipids and proteins are linked noncovalently. Because lipids are poorly soluble in water, they are transported in the blood as lipopro- teins. These are basically particles of triglycerides and cholesterol esters coated with a layer of phospholipids, cholesterol and protein (the apolipoproteins). The structures of the apolipoproteins are such that their hydrophobic amino acids face towards the lipid interior of the particles while the charged and polar amino acids (see Topic B1) face outwards into the aqueous environment. This renders the particles soluble. Glycolipids, which include cerebrosides and gangliosides, have covalently linked lipid and carbohydrate components, and are especially abundant in the membranes of brain and nerve cells. Section A – Cells and macromolecules A4 L ARGE MACROMOLECULAR ASSEMBLIES Key Notes Protein complexes The eukaryotic cytoskeleton consists of various protein complexes: microtubules (made of tubulin), microﬁlaments (made of actin and myosin) and intermediate ﬁlaments (containing various proteins). These organize the shape and movement of cells and subcellular organelles. Cilia and ﬂagella are also composed of microtubules complexed with dynein and nexin. Nucleoprotein Bacterial 70S ribosomes comprise a large 50S subunit, with 23S and 5S RNA molecules and 31 proteins, and a small 30S subunit, with a 16S RNA molecule and 21 proteins. Eukaryotic 80S ribosomes have 60S (28S, 5.8S and 5S RNAs) and 40S (18S RNA) subunits. Chromatin contains DNA and the basic histone proteins. Viruses are also nucleoprotein complexes. Membranes Membrane phospholipids and sphingolipids form bilayers with the polar groups on the exterior surfaces and the hydrocarbon chains in the interior. Membrane proteins may be peripheral or integral and act as receptors, enzymes, transporters or mediators of cellular interactions. Noncovalent A large number of weak interactions hold macromolecular assemblies interactions together. Charge–charge, charge–dipole and dipole–dipole interactions involve attractions between fully or partially charged atoms. Hydrogen bonds and hydrophobic interactions which exclude water are also important. Related topics Macromolecules (A3) Bacteriophages and eukaryotic Chromatin structure (D2) viruses (Section R) rRNA processing and ribosomes (O1) Protein complexes Cellular architecture contains many large complexes of the different classes of macromolecules with themselves or with each other. Many of the major struc- tural and locomotory elements of the cell consist of protein complexes. The cytoskeleton is an array of protein ﬁlaments which organizes the shape and motion of cells and the intracellular distribution of subcellular organelles. Microtubules are long polymers of tubulin, a 110 kiloDalton (kDa) globular protein (Fig. 1, see Topic B2). These are a major component of the cytoskeleton and of eukaryotic cilia and ﬂagella, the hair-like structures on the surface of many cells which whip to move the cell or to move ﬂuid across the cell surface. Cilia also contain the proteins nexin and dynein. 12 Section A – Cells and macromolecules (a) (b) Fig. 1. Schematic diagram showing the (a) cross-sectional and (b) surface pattern of tubulin ␣ and ␤ subunits in a microtubule (see Topic B2). From: BIOCHEMISTRY, 4/E by Stryer © 1995 by Lubert Stryer. Used with permission of W.H. Freeman and Company. [ After J.A. Snyder and J.R. McIntosh (1976) Annu. Rev. Biochem. 45, 706. With permission from the Annual Review of Biochemistry, Volume 45, © 1976, by Annual Reviews Inc.] Microﬁlaments consisting of the protein actin form contractile assemblies with the protein myosin to cause cytoplasmic motion. Actin and myosin are also major components of muscle ﬁbers. The intermediate ﬁlaments of the cytoskeleton contain a variety of proteins including keratin and have various functions, including strengthening cell structures. In all cases, the energy for motion is provided by the coupled hydrolysis of ATP or guanosine 5⬘-triphosphate (GTP). Nucleoprotein Nucleoproteins comprise both nucleic acid and protein. Ribosomes are large cytoplasmic ribonucleoprotein complexes which are the sites of protein synthesis (see Section Q). Bacterial 70S ribosomes have large (50S) and small (30S) subunits with a total mass of 2.5 ⫻ 106 Da. (The S value, e.g. 50S, is the numerical value of the sedimentation coefﬁcient, s, and describes the rate at which a macromolecule or particle sediments in a centrifugal ﬁeld. It is deter- mined by both the mass and shape of the molecule or particle; hence S values are not additive.) The 50S subunit contains 23S and 5S RNA molecules and 31 different proteins while the 30S subunit contains a 16S RNA molecule and 21 proteins. Eukaryotic 80S ribosomes have 60S (with 28S, 5.8S and 5S RNAs) and 40S (with 18S RNA) subunits. Under the correct conditions, mixtures of the rRNAs and proteins will self-assemble in a precise order into functional ribo- somes in vitro. Thus, all the information for ribosome structure is inherent in the structures of the components. The RNAs are not simply frameworks for the assembly of the ribosomal proteins, but participate in both the binding of the messenger RNA and in the catalysis of peptide bond synthesis (see Topic Q2). Chromatin is the material from which eukaryotic chromosomes are made. It is a deoxyribonucleoprotein complex made up of roughly equal amounts of DNA and small, basic proteins called histones (see Topic D2). These form a repeating unit called a nucleosome. Correct assembly of nucleosomes and many A4 – Large macromolecular assemblies 13 other protein complexes requires assembly proteins, or chaperones. Histones neutralize the repulsion between the negative charges of the DNA sugar– phosphate backbone and allow the DNA to be tightly packaged within the chromosomes. Viruses are another example of nucleoprotein complexes. They are discussed in Section R. Membranes When placed in an aqueous environment, phospholipids and sphingolipids naturally form a lipid bilayer with the polar groups on the outside and the nonpolar hydrocarbon chains on the inside. This is the structural basis of all biological membranes. Such membranes form cellular and organellar bound- aries and are selectively permeable to uncharged molecules. The precise lipid composition varies from cell to cell and from organelle to organelle. Proteins are also a major component of cell membranes (Fig. 2). Peripheral membrane proteins are loosely bound to the outer surface or are anchored via a lipid or glycosyl phosphatidylinositol anchor and are relatively easy to remove. Integral membrane proteins are embedded in the membrane and cannot be removed without destroying the membrane. Some protrude from the outer or inner surface of the membrane while transmembrane proteins span the bilayer completely and have both extracellular and intracellular domains (see Topic B2). The transmembrane regions of these proteins contain predominantly hydrophobic amino acids. Membrane proteins have a variety of functions, for example: receptors for signaling molecules such as hormones and neurotransmitters; enzymes for degrading extracellular molecules before uptake of the products; pores or channels for the selective transport of small, polar ions and molecules; mediators of cell–cell interactions (mainly glycoproteins). Fig. 2. Schematic diagram of a plasma membrane showing the major macromolecular components. Noncovalent Most macromolecular assemblies are held together by a large number of interactions different noncovalent interactions. Charge–charge interactions (salt bridges) operate between ionizable groups of opposite charge at physiological pH, for example between the negative phosphates of DNA and the positive lysine and arginine side chains of DNA-binding proteins such as histones (see Topic D2). Charge–dipole and dipole–dipole interactions are weaker and form when either or both of the participants is a dipole due to the asymmetric distribution of charge in the molecule (Fig. 3a). Even uncharged groups like methyl groups 14 Section A – Cells and macromolecules Fig. 3. Examples of (a) van der Waals forces and (b) a hydrogen bond. can attract each other weakly through transient dipoles arising from the motion of their electrons (dispersion forces). Noncovalent associations between electrically neutral molecules are known collectively as van der Waals forces. Hydrogen bonds are of great importance. They form between a covalently bonded hydrogen atom on a donor group (e.g. –O-H or –N-H) and a pair of nonbonding electrons on an acceptor group (e.g. :O=C– or :N–) (Fig. 3b). Hydrogen bonds and other interactions involving dipoles are directional in character and so help deﬁne macromolecular shapes and the speciﬁcity of molecular interactions. The presence of uncharged and nonpolar substances, for example lipids, in an aqueous environment tends to force a highly ordered structure on the surrounding water molecules. This is energetically unfavorable as it reduces the entropy of the system. Hence, nonpolar molecules tend to clump together, reducing the overall surface area exposed to water. This attraction is termed a hydrophobic (water-hating) inter- action and is a major stabilizing force in protein–protein and protein–lipid interactions and in nucleic acids. Section B – Protein structure B1 A MINO ACIDS Key Notes Structure The 20 common amino acids found in proteins have a chiral ␣-carbon atom linked to a proton, amino and carboxyl groups, and a speciﬁc side chain which confers different physical and chemical properties. They behave as zwitterions in solution. Nonstandard amino acids in proteins are formed by post-translational modiﬁcation. Charged side chains Glutamic acid and aspartic acid have additional carboxyl groups and usually impart a negative charge to proteins. Lysine has an ␧-amino group, arginine a guanidino group and histidine an imidazole group. These three basic amino acids generally impart a positive charge to proteins. Polar uncharged Serine and threonine have hydroxyl groups, asparagine and glutamine have side chains amide groups and cysteine has a thiol group. Nonpolar aliphatic Glycine is the simplest amino acid with no side chain. Proline is a secondary side chains amino acid (imino acid). Alanine, valine, leucine and isoleucine have hydrophobic alkyl groups. Methionine has a thioether sulfur atom. Aromatic side Phenylalanine, tyrosine and tryptophan have bulky aromatic side chains chains which absorb ultraviolet light. Related topics Protein structure and function (B2) Mechanism of protein synthesis (Q2) Structure Proteins are polymers of L-amino acids. Apart from proline, all of the 20 amino acids found in proteins have a common structure in which a carbon atom (the ␣-carbon) is linked to a carboxyl group, a primary amino group, a proton and a side chain (R) which is different in each amino acid (Fig. 1). Except in glycine, the ␣-carbon atom is asymmetric – it has four chemically different groups attached. Thus, amino acids can exist as pairs of optically active stereoisomers (D- and L-). However, only the L-isomers are found in proteins. Amino acids are dipolar ions (zwitterions) in aqueous solution and behave as both acids and bases (they are amphoteric). The side chains differ in size, shape, charge and chemical reactivity, and are responsible for the differences in the properties of different proteins (Fig. 2). A few proteins contain nonstandard amino acids, such as 4-hydroxyproline and 5-hydroxylysine in collagen. These are formed by post- translational modiﬁcation of the parent amino acids proline and lysine (see Topic Q4). Charged Taking pH 7 as a reference point, several amino acids have ionizable groups side chains in their side chains which provide an extra positive or negative charge at this 16 Section B – Protein structure pH. The ‘acidic’ amino acids, aspartic acid and glutamic acid, have additional carboxyl groups which are usually ionized (negatively charged). The ‘basic’ amino acids have positively charged groups – lysine has a second amino group attached to the ␧-carbon atom while arginine has a guanidino group. The imida- zole group of histidine has a pKa near neutrality. Reversible protonation of this Fig. 1. General struc- group under physiological conditions contributes to the catalytic mechanism of ture of an L-amino acid. The R group is many enzymes. Together, acidic and basic amino acids can form important salt the side chain. bridges in proteins (see Topic A4). Fig. 2. Side chains (R) of the 20 common amino acids. The standard three-letter abbreviations and one-letter code are shown in brackets. aThe full structure of proline is shown as it is a secondary amino acid. Polar uncharged These contain groups that form hydrogen bonds with water. Together with the side chains charged amino acids, they are often described as hydrophilic (‘water-loving’). Serine and threonine have hydroxyl groups, while asparagine and glutamine are the amide derivatives of aspartic and glutamic acids. Cysteine has a thiol (sulfhydryl) group which often oxidizes to cystine, in which two cysteines form a structurally important disulﬁde bond (see Topic B2). B1 – Amino acids 17 Nonpolar aliphatic Glycine has a hydrogen atom in place of a side chain and is optically inactive. side chains Proline is unusual in being a secondary amino (or imino) acid. Alanine, valine, leucine and isoleucine have hydrophobic (‘water-hating’) alkyl groups for side chains and participate in hydrophobic interactions in protein structure (see Topic A4). Methionine has a sulfur atom in a thioether link within its alkyl side chain. Aromatic Phenylalanine, tyrosine and tryptophan have bulky hydrophobic side chains. side chains Their aromatic structure accounts for most of the ultraviolet (UV) absorbance of proteins, which absorb maximally at 280 nm. The phenolic hydroxyl group of tyrosine can also form hydrogen bonds. Section B – Protein structure B2 P ROTEIN STRUCTURE AND FUNCTION Key Notes Sizes and shapes Globular proteins, including most enzymes, behave in solution like compact, roughly spherical particles. Fibrous proteins have a high axial ratio and are often of structural importance, for example ﬁbroin and keratin. Sizes range from a few thousand to several million Daltons. Some proteins have associated nonproteinaceous material, for example lipid or carbohydrate or small co-factors. Primary structure Amino acids are linked by peptide bonds between ␣-carboxyl and ␣-amino groups. The resulting polypeptide sequence has an N terminus and a C terminus. Polypeptides commonly have between 100 and 1500 amino acids linked in this way. Secondary structure Polypeptides can fold into a number of regular structures. The right-handed ␣-helix has 3.6 amino acids per turn and is stabilized by hydrogen bonds between peptide N–H and C=O groups three residues apart. Parallel and antiparallel ␤-pleated sheets are stabilized by hydrogen bonds between different portions of the polypeptide chain. Tertiary structure The different sections of secondary structure and connecting regions fold into a well-deﬁned tertiary structure, with hydrophilic amino acids mostly on the surface and hydrophobic ones in the interior. The structure is stabilized by noncovalent interactions and sometimes disulﬁde bonds. Denaturation leads to loss of secondary and tertiary structure. Quaternary Many proteins have more than one polypeptide subunit. Hemoglobin has two structure ␣ and two ␤ chains. Large complexes such as microtubules are constructed from the quaternary association of individual polypeptide chains. Allosteric effects usually depend on subunit interactions. Prosthetic groups Conjugated proteins have associated nonprotein molecules which provide additional chemical functions to the protein. Prosthetic groups include nicotinamide adenine dinucleotide (NAD+), heme and metal ions, for example Zn2+. What do Proteins have a wide variety of functions. proteins do? Enzymes catalyze most biochemical reactions. Binding of substrate depends on speciﬁc noncovalent interactions. Membrane receptor proteins signal to the cell interior when a ligand binds. Transport and storage. For example, hemoglobin transports oxygen in the blood and ferritin stores iron in the liver. B2 – Protein structure and function 19 Collagen and keratin are important structural proteins. Actin and myosin form contractile muscle ﬁbers. Casein and ovalbumin are nutritional proteins providing amino acids for growth. The immune system depends on antibody proteins to combat infection. Regulatory proteins such as transcription factors bind to and modulate the functions of other molecules, for example DNA. Domains, motifs, Domains form semi-independent structural and functional units within a families and single polypeptide chain. Domains are often encoded by individual exons evolution within a gene. New proteins may have evolved through new combinations of exons and, hence, protein domains. Motifs are groupings of secondary structural elements or amino acid sequences often found in related members of protein families. Similar structural motifs are also found in proteins which have no sequence similarity. Protein families arise through gene duplication and subsequent divergent evolution of the new genes. Related topics Macromolecules (A3) Amino acids (B1) Large macromolecular Protein synthesis (Section Q) assemblies (A4) Sizes and shapes Two broad classes of protein may be distinguished. Globular proteins are folded compactly and behave in solution more or less as spherical particles; most enzymes are globular in nature. Fibrous proteins have very high axial ratios (length/width) and are often important structural proteins, for example silk ﬁbroin and keratin in hair and wool. Molecular masses can range from a few thousand Daltons (Da) (e.g. the hormone insulin with 51 amino acids and a molecular mass of 5734 Da) to at least 5 million Daltons in the case of the enzyme complex pyruvate dehydrogenase. Some proteins contain bound nonprotein material, either in the form of small prosthetic groups, which may act as co-factors in enzyme reactions, or as large associations (e.g. the lipids in lipoproteins or the carbohydrate in glycoproteins, see Topic A3). Primary structure The ␣-carboxyl group of one amino acid is covalently linked to the ␣-amino group of the next amino acid by an amide bond, commonly known as a peptide bond when in proteins. When two amino acid residues are linked in this way the product is a dipeptide. Many amino acids linked by peptide bonds form a polypeptide (Fig. 1). The repeating sequence of ␣-carbon atoms and peptide bonds provides the backbone of the polypeptide while the different amino acid side chains confer functionality on the protein. The amino acid at one end of a polypeptide has an unattached ␣-amino group while the one at the other end has a free ␣-carboxyl group. Hence, polypeptides are directional, with an N terminus and a C terminus. Sometimes the N terminus is blocked with, for example, an acetyl group. The sequence of amino acids from the N to the C terminus is the primary structure of the polypeptide. Typical sizes for single polypeptide chains are within the range 100–1500 amino acids, though longer and shorter ones exist. 20 Section B – Protein structure Fig. 1. Section of a polypeptide chain. The peptide bond is boxed. In the ␣-helix, the CO group of amino acid residue n is hydrogen-bonded to the NH group of residue n + 4 (arrowed). Secondary The highly polar nature of the C=O and N–H groups of the peptide bonds structure gives the C–N bond partial double bond character. This makes the peptide bond unit rigid and planar, though there is free rotation between adjacent peptide bonds. This polarity also favors hydrogen bond formation between appropriately spaced and oriented peptide bond units. Thus, polypeptide chains are able to fold into a number of regular structures which are held together by these hydrogen bonds. The best known secondary structure is the ␣-helix (Fig. 2a). The polypep- tide backbone forms a right-handed helix with 3.6 amino acid residues per turn such that each peptide N–H group is hydrogen bonded to the C=O group of the peptide bond three residues away (Fig. 1). Sections of ␣-helical secondary struc- ture are often found in globular proteins and in some ﬁbrous proteins. The ␤- pleated sheet (␤-sheet) is formed by hydrogen bonding of the peptide bond N–H and C=O groups to the complementary groups of another section of the polypep- tide chain (Fig. 2b). Several sections of polypeptide chain may be involved side-by- side, giving a sheet structure with the side chains (R) projecting alternately above and below the sheet. If these sections run in the same direction (e.g. N terminus→C terminus), the sheet is parallel; if they alternate N→C and C→N, then the sheet is antiparallel. ␤-Sheets are strong and rigid and are important in structural proteins, for example silk ﬁbroin. The connective tissue protein collagen has an unusual triple helix secondary structure in which three polypep- tide chains are intertwined, making it very strong. Tertiary structure The way in which the different sections of ␣-helix, ␤-sheet, other minor secondary structures and connecting loops fold in three dimensions is the tertiary structure of the polypeptide (Fig. 3). The nature of the tertiary struc- ture is inherent in the primary structure and, given the right conditions, most polypeptides will fold spontaneously into the correct tertiary structure as it is generally the lowest energy conformation for that sequence. However, in vivo, correct folding is often assisted by proteins called chaperones which help prevent misfolding of new polypeptides before their synthesis (and primary structure) is complete. Folding is such that amino acids with hydrophilic side chains locate mainly on the exterior of the protein where they can interact with water or solvent ions, while the hydrophobic amino acids become buried in the interior from which water is excluded. This gives overall stability to the struc- ture. Various types of noncovalent interaction between side chains hold the tertiary structure together: van der Waals forces, hydrogen bonds, electrostatic salt bridges between oppositely charged groups (e.g. the ␧-NH3+ group of lysine and the side chain COO– groups of aspartate or glutamate) and hydrophobic interactions between the nonpolar side chains of the aliphatic and aromatic amino acids (see Topic A4). In addition, covalent disulﬁde bonds can form between two cysteine residues which may be far apart in the primary structure but close together in the folded tertiary structure. Disruption of secondary and B2 – Protein structure and function 21 Fig. 2. (a) ␣-Helix secondary structure. Only the ␣-carbon and peptide bond carbon and nitrogen atoms of the polypeptide backbone are shown for clarity. (b) Section of a ␤-sheet secondary structure. Fig. 3. Schematic diagram of a section of protein tertiary structure. tertiary structure by heat or extremes of pH leads to denaturation of the protein and formation of a random coil conformation. Quaternary Many proteins are composed of two or more polypeptide chains (subunits). structure These may be identical or different. Hemoglobin has two ␣-globin and two ␤-globin chains (␣2␤2). The same forces which stabilize tertiary structure hold these subunits together, including disulﬁde bonds between cysteines on separate polypeptides. This level of organization is known as the quaternary structure and has certain consequences. First, it allows very large protein molecules to 22 Section B – Protein structure be made. Tubulin is a dimeric protein made up of two small, nonidentical ␣ and ␤ subunits. Upon hydrolysis of tubulin-bound GTP, these dimers can poly- merize into structures containing many hundreds of ␣ and ␤ subunits (see Topic A4, Fig. 1). These are the microtubules of the cytoskeleton. Secondly, it can provide greater functionality to a protein by combining different activities into a single entity, as in the fatty acid synthase complex. Often, the interactions between the subunits are modiﬁed by the binding of small molecules and this can lead to the allosteric effects seen in enzyme regulation. Prosthetic groups Many conjugated proteins contain covalently or noncovalently attached small molecules called prosthetic groups which give chemical functionality to the protein that the amino acid side chains cannot provide. Many of these are co-factors in enzyme-catalyzed reactions. Examples are nicotinamide adenine dinucleotide (NAD+) in many dehydrogenases, pyridoxal phosphate in transaminases, heme in hemoglobin and cytochromes, and metal ions, for example Zn2+. A protein without its prosthetic group is known as an apoprotein. What do Enzymes. Apart from a few catalytically active RNA molecules (see Topic O2), proteins do? all enzymes are proteins. These can enhance the rate of biochemical reac- tions by several orders of magnitude. Binding of the substrate involves various noncovalent interactions with speciﬁc amino acid side chains, including van der Waals forces, hydrogen bonds, salt bridges and hydrophobic forces. Speciﬁcity of binding can be extremely high, with only a single substrate binding (e.g. glucose oxidase binds only glucose), or it can be group-speciﬁc (e.g. hexokinase binds a variety of hexose sugars). Side chains can also be directly involved in catalysis, for example by acting as nucleophiles, or proton donors or abstractors. Signaling. Receptor proteins in cell membranes can bind ligands (e.g. hormones) from the extracellular medium and, by virtue of the resulting conformational change, initiate reactions within the cell in response to that ligand. Ligand binding is similar to substrate binding but the ligand usually remains unchanged. Some hormones are themselves small proteins, such as insulin and growth hormone. Transport and storage. Hemoglobin transports oxygen in the red blood cells while transferrin transports iron to the liver. Once in the liver, iron is stored bound to the protein ferritin. Dietary fats are carried in the blood by lipoproteins. Many other molecules and ions are transported and stored in a protein-bound form. This can enhance solubility and reduce reactivity until they are required. Structure and movement. Collagen is the major protein in skin, bone and connective tissue, while hair is made mainly from keratin. There are also many structural proteins within the cell, for example in the cytoskeleton. The major muscle proteins actin and myosin form sliding ﬁlaments which are the basis of muscle contraction. Nutrition. Casein and ovalbumin are the major proteins of milk and eggs, respectively, and are used to provide the amino acids for growth of devel- oping offspring. Seed proteins also provide nutrition for germinating plant embryos. Immunity. Antibodies, which recognize and bind to bacteria, viruses and other foreign material (the antigen) are proteins. B2 – Protein structure and function 23 Regulation. Transcription factors bind to and modulate the function of DNA. Many other proteins modify the functions of other molecules by binding to them. Domains, motifs, Many proteins are composed of structurally independent units, or domains, families and that are connected by sections with limited higher order structure within the evolution same polypeptide. The connections can act as hinges to permit the individual domains to move in relation to each other, and breakage of these connections by limited proteolysis can often separate the domains, which can then behave like independent globular proteins. The active site of an enzyme is sometimes formed in a groove between two domains, which wrap around the substrate. Domains can also have a speciﬁc function such as binding a commonly used molecule, for example ATP. When such a function is required in many different proteins, the same domain structure is often found. In eukaryotes, domains are often encoded by discrete parts of genes called exons (see Topic O3). Therefore, it has been suggested that, during evolution, new proteins were created by the duplication and rearrangement of domain-encoding exons in the genome to produce new combinations of binding sites, catalytic sites and structural elements in the resulting new polypeptides. In this way, the rate of evolution of new functional proteins may have been greatly increased. Structural motifs (also known as supersecondary structures) are groupings of secondary structural elements that frequently occur in globular proteins. They often have functional signiﬁcance and can represent the essential parts of binding or catalytic sites that have been conserved during the evolution of protein families from a common ancestor. Alternatively, they may represent the best solution to a structural–functional requirement that has been arrived at independently in unrelated proteins. A common example is the ␤␣␤ motif in which the connection between two consecutive parallel strands of a ␤ sheet is an ␣-helix (Fig. 4). Two overlapping ␤␣␤ motifs (␤␣␤␣␤) form a dinucleotide (e.g. NAD+) binding site in many otherwise unrelated proteins. Sequence motifs consist of only a few conserved, functionally important amino acids ra

Molecular Biology Instant Notes PDF

Document Details

Tags

Related

Summary

Full Transcript