BIOC 301 Biochemistry of Nucleic Acids Course Content Sep. 2024 PDF
Document Details
Uploaded by FirstRateByzantineArt
2024
Tags
Summary
BIOC 301 course content details the biochemistry of nucleic acids. The syllabus covers chromosome organization, structure, DNA replication, and gene manipulation. The course is for Y3S1 students with a prerequisite of BIOC 200 or 206.
Full Transcript
**BIOC 301: Biochemistry of Nucleic Acids (30/30: C.F. 3.0) Y3S1** Chromosome organization; structure of nucleic acids (nucleoside and nucleotide structure, secondary structure of nucleic acids, thermal properties of DNA, DNA replication; DNA recombination; the gene; bacterial conjugation, transfor...
**BIOC 301: Biochemistry of Nucleic Acids (30/30: C.F. 3.0) Y3S1** Chromosome organization; structure of nucleic acids (nucleoside and nucleotide structure, secondary structure of nucleic acids, thermal properties of DNA, DNA replication; DNA recombination; the gene; bacterial conjugation, transformation and transduction; the genetic code and protein biosynthesis; (structure and properties) mutability of DNA and DNA repair mechanisms; gene manipulation; cancer genes. *(Prerequisite: BIOC 200/206).* **3.1 BIOC 301: Biochemistry of Nucleic Acids (30/30: C.F. 3.0) Y3S1** ***Prerequisites: BIOC 200 or BIOC 206.*** **3.2 Purpose of the course** This course explores the biochemical properties and functions of nucleic acids, focusing on DNA and RNA's structure, synthesis, and regulation. Students will gain an in-depth understanding of the molecular mechanisms that underpin genetic information flow from DNA to RNA to protein. **3.3 Expected learning outcomes of the course** At the end of the course, learners are expected to be able to: 1. List cellular components, differences and function of nucleic acids. 2. Discuss the principles of gene regulation and expression 3. Demonstrate understanding of the consequences of nucleotide alterations on protein primary structure and cellular functions. 4. Analyze structural-functional relationships of nucleic acids and proteins, and outline their application in genome engineering, cancer research etc. **3.4 Course content** **Chromosome organization** -structure of nucleic acids (nucleoside and nucleotide structure, secondary structure of nucleic acids, thermal properties of DNA, DNA replication, DNA recombination; **The gene** -- gene structure, introns exons, regulatory elements; **Bacterial genetics** - conjugation, transformation and transduction; **The genetic code and protein biosynthesis** -(structure and properties) mutability of DNA and DNA repair mechanisms; **Gene manipulation -**site directed mutagenesis**,** genome engineering or editing techniques; **Cancer** -oncogenes, tumour suppressor genes, gene therapy for cancer. **3.5** **Instructional methods** **This course will be delivered by: class lectures, independent learning and laboratory practicals.** **3.6 Instructional materials/equipment** **Lecture rooms fitted with writing boards or projector screens, lecture notes, laboratories, lab coats, practical manuals and writing materials** **3.7 Student Assessment at course level** **Type Weighting (%)** **Continuous Assessment Tests(CATs) 30** **Final Examination 70** **Total 100** **3.8 Core Reading Materials for the course** 1. **Bruce Alberts, Alexander Johnson, Julian Lewis, David Morgan, Martin Raff, Keith Roberts, Peter Walter (2014). Molecular Biology of the Cell, 6th Edition. Garland Science ISBN-13: 978-0-8153-4432-2, ISBN: 0-8153-4432-5.** 2. **Nelson, D. L. and. Cox, M. M. (2013). Lehninger\'s.Principles.of.Biochemistry. 6th. Edition. W. H Freeman and Company, New York. ISBN-13: 978-1429234146.** **3.9 Recommended reference materials for the course** Reading material/course: [[https://www.ebi.ac.uk/training/]](https://www.ebi.ac.uk/training/) BIOC 301 : The biochemistry of nucleic acids; **Course Overview** **Organization of the chromosome** DNA occurs in various forms in different cells. The single chromosome of prokaryotic cells is typically a circular DNA molecule. Relatively little protein is associated with prokaryotic chromosomes. **Prokaryotic genome** Genetic material in a cell: All cells have the capability to give rise to new cells and the encoded information in a living cell is passed from one generation to another. The information encoding material is the genetic or hereditary material of the cell. Prokaryotic genetic material: The prokaryotic (bacterial) genetic material is usually concentrated in a specific clear region of the cytoplasm called nucleiod. The bacterial chromosome is a single, circular, double stranded DNA molecule mostly attached to the plasma membrane at one point. It does not contain any histone protein. *Escherichia coli* DNA is circular molecule 4.6 million base pairs in length, containing 4288 annotated protein-coding genes (organized into 2584 operons), seven ribosomal RNA (rRNA) operons, and 86 transfer RNA (tRNA) genes. Certain bacteria like the *Borrelia burgdorferi* possess array of linear chromosome like eukaryotes. Besides the chromosomal DNA many bacteria may also carry extra chromosomal genetic elements in the form of small, circular and closed DNA molecules, called plasmids. They generally remain floated in the cytoplasm and bear different genes based on which they have been studied. Some of the different types of plasmids are F plasmids, R plasmids, virulent plasmids, metabolic plasmids etc. ***E. coli*: A Model Prokaryote** Much of what is known about prokaryotic chromosome structure was derived from studies of *Escherichia coli*, a bacterium that lives in the human colon and is commonly used in laboratory cloning experiments. In the 1950s and 1960s, this bacterium became the model organism of choice for prokaryotic research when a group of scientists used phase-contrast microscopy and autoradiography to show that the essential genes of *E. coli* are encoded on a single circular chromosome packaged within the cell nucleoid (Mason & Powelson, 1956; Cairns, 1963). Prokaryotic cells do not contain nuclei or other membrane-bound organelles. In fact, the word \"prokaryote\" literally means \"before the nucleus.\" The nucleoid is simply the area of a prokaryotic cell in which the chromosomal DNA is located. This arrangement is not as simple as it sounds, however, especially considering that the *E. coli* chromosome is several orders of magnitude larger than the cell itself. So, if bacterial chromosomes are so huge, how can they fit comfortably inside a cell---much less in one small corner of the cell? **DNA Supercoiling in prokaryotes** The answer to this question lies in DNA packaging. Whereas eukaryotes wrap their DNA around proteins called histones to help package the DNA into smaller spaces, most prokaryotes do not have histones (with the exception of those species in the domain Archaea). Thus, one way prokaryotes compress their DNA into smaller spaces is through supercoiling. Imagine twisting a rubber band so that it forms tiny coils. Now twist it even further, so that the original coils fold over one another and form a condensed ball. When this type of twisting happens to a bacterial genome, it is known as supercoiling. Genomes can be negatively supercoiled, meaning that the DNA is twisted in the opposite direction of the double helix, or positively supercoiled, meaning that the DNA is twisted in the same direction as the double helix. Most bacterial genomes are negatively supercoiled during normal growth. ![](media/image2.png) **Proteins Involved in Supercoiling** During the 1980s and 1990s, researchers discovered that multiple proteins act together to fold and condense prokaryotic DNA. In particular, one protein called HU, which is the most abundant protein in the nucleoid, works with an enzyme called topoisomerase I to bind DNA and introduce sharp bends in the chromosome, generating the tension necessary for negative supercoiling. Recent studies have also shown that other proteins, including integration host factor (IHF), can bind to specific sequences within the genome and introduce additional bends (Rice *et al*., 1996). The folded DNA is then organized into a variety of conformations (Sinden & Pettijohn, 1981) that are supercoiled and wound around tetramers of the HU protein, much like eukaryotic chromosomes are wrapped around histones (Murphy & Zimmerman, 1997). Once the prokaryotic genome has been condensed, DNA topoisomerase I, DNA gyrase, and other proteins help maintain the supercoils. One of these maintenance proteins, H-NS, plays an active role in transcription by modulating the expression of the genes involved in the response to environmental stimuli. Another maintenance protein, factor for inversion stimulation (FIS), is abundant during exponential growth and regulates the expression of more than 231 genes, including DNA topoisomerase I (Bradley *et al*., 2007). ![](media/image4.png) **Eukaryotic Genome** The **Eukaryotic Genome Organisation** is the functional and spatial arrangement of **DNA** within the nucleus of eukaryotic cells. Eukaryotic genomes are defined by **linear chromosomes** contained within a **membrane-bound nucleus**, in contrast to prokaryotic genomes, which are usually arranged as circular chromosomes within the cytoplasm. In this article, we will learn about **the organization of the eukaryotic genome, epigenetic modifications, chromatin remodeling, and eukaryotic gene families in detail.** **Other differences include:** DNA prokaryotic cell DNA in eukaryotic cell ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------- Double stranded and circular. The genetic material of the prokaryotic DNA is in the form of circular DNA. The DNA is present in the nucleoid, which is not surrounded by the nuclear membrane. Double stranded and linear, with two ends Contains significantly less DNA Contains more DNA The prokaryotic DNA is found in a coiled loop floating in the cytoplasm, DNA is found in the nucleus Prokaryotic DNA does not have introns. They have transcription coupled with translation. Therefore, there is no space for intron splicing since intron splicing stops the coupling. Has introns and exons In contrast, the DNA molecules of eukaryotic cells, each of which defines a chromosome, are linear and richly adorned with proteins. A class of arginine- and lysine-rich basic proteins called **histones** interact ionically with the anionic phosphate groups in the DNA backbone to form **nucleosomes,** structures in which the DNA double helix is wound around a protein "core" composed of pairs of four different histone polypeptides). A diagram of the histone octamer. Nucleosomes consist of two turns of DNA supercoiled about a histone "core". Chromosomes also contain a varying mixture of other proteins, so-called **nonhistone chromosomal proteins,** many of which are involved in regulating which genes in DNA are transcribed at any given moment. The amount of DNA in a diploid mammalian cell is typically more than 1000 times that found in an *E. coli* cell. Some higher plant cells contain more than 50,000 times as much. Nucleic acids, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), carry genetic information which is read in cells to make the RNA and proteins by which living things function. The well-known structure of the DNA double helix allows this information to be copied and passed on to the next generation. In this article we summarise the structure and function of nucleic acids. The article includes a historical perspective and summarises some of the early work which led to our understanding of this important molecule and how it functions; many of these pioneering scientists were awarded Nobel Prizes for their work. We explain the structure of the DNA molecule, how it is packaged into chromosomes and how it is replicated prior to cell division. We look at how the concept of the gene has developed since the term was first coined and how DNA is copied into RNA (transcription) and translated into protein (translation). There are additional layers of organization in the eukaryotic nucleus. Just before cell division during mitosis, chromosomes can be seen as highly condensed and organized structures **(See figure below (a))**. During interphase, chromosomes appear dispersed (, top), b **See figure below , (b) top), b**ut they do not meander randomly in nuclear space (**See figure below(b)**, bottom). Each chromosome is constrained within a subnuclear domain called a **chromosome territory** (**See figure below, (c)**). The exact location of chromosome territories varies from cell to cell in an organism, but some spatial patterns are evident. Some chromosomes have a higher density of genes than others (for example, human chromosomes 1, 16, 17, 19, and 22), and these tend to have territories in the center of the nucleus. Chromosomes with more heterochromatin tend to be located on the nuclear periphery. Spaces between chromosomes are often sites where transcriptional machinery and transcriptionally active genes on adjacent chromosomes are concentrated. ![](media/image6.png)**Chromosomal organization in the eukaryotic nucleus. (a)** Condensed chromosomes at the mitotic anaphase in cells of the bluebell (*Endymion sp.*). **(b)** Interphase nuclei of human breast epithelial cells. The nucleus on the bottom has been treated so that its two copies of chromosome 11 fluoresce green. **(c)** Cartoon showing chromosome territories in a eukaryotic nucleus. The interchromatin compartments are enriched in transcriptional machinery and have abundant actively transcribed genes. The nucleolus is a suborganelle within the nucleus where ribosomes are synthesized and assembled. \[Sources: (a) Pr. G. Giménez-Martín/Science Source. (b) Karen Meaburn and Tom Misteli/National Cancer Institute. **The chromosome and scaffold proteins** The structural and functional roles of the main scaffold proteins, condensin, Topo IIα, and KIF4, have been widely, but it is unclear how these proteins organize the scaffold and regulate chromosome condensation. Chromosome higher order structure has been an enigma for over a century. The most important structural finding has been the presence of a chromosome scaffold composed of non-histone proteins; so-called scaffold proteins. However, the organization and function of the scaffold are still controversial. **Localization of hCAP-E and Topo IIα revealed by 3D-SIM. a**--**c**, Maximum intensity projections of z-stack images. Scale bars, 1μm. **a**, wide-field microscopy applying deconvolution imaging of PA chromosome immunostained for hCAP-E and Topo IIα. **b**, 3D-SIM image of the same PA chromosome as **a**. **c**, 3D-SIM image of HeLa-wt metaphase chromosome immunostained for hCAP-E and Topo IIα. Arrowheads indicate the double strands. Insets show magnified views of the white boxes in **a** and **b** as indicated. Scale bars, 250 nm. Red dotted lines represent double strands of chromosome scaffold. DNA is shown in blue. **d**, RGB line profile of yellow path in **c**. Experiments have also shown loops of DNA that is attached to chromosomal scaffold. ![](media/image8.png)**Loops of DNA attached to a chromosomal scaffold. (a)** A swollen mitotic chromosome, produced in a buffer of low ionic strength, as seen in the electron microscope. Notice the appearance of chromatin loops at the margins. **(b)** Extraction of the histones leaves a proteinaceous chromosomal scaffold surrounded by naked DNA. **(c)** The DNA appears to be organized in loops attached at their base to the scaffold in the upper left corner; scale bar = 1 *μ*m. The three ima ges are at different magnifications. \[Sources: (a, b) Don W. Fawcett/Science Source. (c) U. K. Laemmli et al., "Metaphase chromosome structure: The role of nonhistone proteins," *Cold Spring Harb. Symp. Quant. Biol.* 42:351, 1978.© Cold Spring Harbor Laboratory Press.\] Packaging of DNA into eukaryotic cells DNA has to be highly condensed to fit into the bacterial cell or eukaryotic nucleus. In eukaryotes, histone proteins are used to condense the DNA into chromatin. The basic structure of chromatin is the nucleosome, a nucleosome contains DNA wrapped almost two times around the histone octamer (comprising two copies each of the histone proteins H2A, H2B, H3 and H4) (Figure 4). Further levels of compaction are required to fit the DNA into the nucleus the nucleosomes are folded upon themselves to form the 30-nm fibre, this is then folded again to form the 300-nm fibre and during mitosis further compaction can occur forming the chromatid which is 700 nm in diameter. Processes such as DNA replication and DNA transcription need to occur in the chromatin environment and because of the level of compaction, this acts as a barrier to proteins that need to interact with DNA. Therefore, chromatin structure plays an important role in processes such as regulation of gene expression in eukaryotes. DNA and the histone proteins can be chemically modified, these are called epigenetic modifications as they do not change the DNA sequence, however, they can be passed on during cell division and to subsequent generations, a process known as epigenetic inheritance. As these epigenetic modifications can alter the chromatin structure they regulate gene transcription and can affect the phenotype. Epigenetics plays key roles in many processes, including development, cancer and behaviour and addiction. Nuclear organisation plays an important role in many biological processes including regulation of gene transcription. In recent years the development of several techniques, including microscopy, have allowed us to gain an understanding of the way the genome is organised in 3D. Individual chromosomes are not randomly spaced within the nucleus; each chromosome has a distinct territory. Actively transcribed regions from different chromosomes are often close to each other and near the interior of the nucleus, whereas, inactive genes are on the periphery or near a special area called the nucleolus where ribosomal RNA is transcribed. During, DNA replication, whenever a cell divides there is a need to synthesize two copies of each chromosome present within the cell. **The pentose components of nucleic acids**. D-Ribose is a component of ribonucleic acid (RNA), and 2-deoxy-D-ribose is a component of deoxyribonucleic acid (DNA). The aldopentoses D-ribose and 2- deoxy-D-ribose are components of nucleotides and nucleic acids. **Ring form of pentoses in nucleic acids** ![](media/image10.png) **Conformations of ribose. (a)** In solution, the straight-chain (aldehyde) and ring (*β*-furanose) forms of free ribose are in equilibrium. RNA contains only the ring form, *β*-D-ribofuranose. Deoxyribose undergoes a similar interconversion in solution, but in DNA exists solely as *β*-2′-deoxy-D-ribofuranose. **(b)** Ribofuranose rings in nucleotides can exist in four different puckered conformations. In all cases, four of the five atoms are nearly in a single plane. The fifth atom (C-2′ or C-3′) is on either the same (endo) or the opposite (exo) side of the plane relative to the C-5′ atom. Nucleotides have a variety of roles in cellular metabolism. They are the energy currency in metabolic transactions, the essential chemical links in the response of cells to hormones and other extracellular stimuli, and the structural components of an array of enzyme cofactors and metabolic intermediates. And, last but certainly not least, they are the constituents of nucleic acids: **deoxyribonucleic acid (DNA)** and **ribonucleic acid (RNA)**, the molecular repositories of genetic information. The structure of every protein, and ultimately of every biomolecule and cellular component, is a product of information programmed into the nucleotide sequence of cellular (or viral) nucleic acids. The ability to store and transmit genetic information from one generation to the next is a fundamental condition for life. **The nucleotide sequence in nucleic acids** The amino acid sequence of every protein in a cell, and the nucleotide sequence of every RNA, is specified by a nucleotide sequence in the cell's DNA. A segment of a DNA molecule that contains the information required for the synthesis of a functional biological product, whether protein or RNA, is referred to as a **gene**. A cell typically has many thousands of genes, and DNA molecules, not surprisingly, tend to be very large. The storage and transmission of biological information are the only known functions of DNA. RNAs have a broader range of functions, and several classes are found in cells. **Ribosomal RNAs (rRNAs)** are components of ribosomes, the complexes that carry out the synthesis of proteins. **Messenger RNAs (mRNAs)** are intermediaries, carrying information for the synthesis of a protein from one or a few genes to a ribosome. **Transfer RNAs (tRNAs)** are adapter molecules that faithfully translate the information in mRNA into a specific sequence of amino acids. In addition to these major classes, there are many RNAs with special functions. Nucleotides and Nucleic Acids Have Characteristic Bases and Pentoses A **nucleotide** has three characteristic components: (1) a nitrogenous (nitrogen-containing) base, (2) a pentose, and (3) one or more phosphates. The molecule without a phosphate group is called a **nucleoside**. The nitrogenous bases are derivatives of two parent compounds, **pyrimidine** and **purine**. The bases and pentoses of the common nucleotides are heterocyclic compounds. ![](media/image12.png) **Structure of nucleotides. (a) General structure showing the numbering convention for the pentose ring. This is a ribonucleotide. In deoxyribonucleotides the ---OH group on the 2′ carbon (in red) is replaced with ---H. (b) The parent compounds of the pyrimidine and purine bases of nucleotides and nucleic acids, showing the numbering conventions.** **Key Convention:** The carbon and nitrogen atoms in the parent structures are conventionally numbered to facilitate the naming and identification of the many derivative compounds. A. A nucleotide (guanosine triphosphate). The nitrogenous base (guanine in this example) is linked to the 1′ carbon of the deoxyribose and the phosphate groups are linked to the 5′ carbon. A nucleoside is a base linked to a sugar. A nucleotide is a nucleoside with one or more phosphate groups. (**B**) A DNA strand containing four nucleotides with the nitrogenous bases thymine (T), cytosine (C), adenine (A) and guanine (G) respectively. The 3′ carbon of one nucleotide is linked to the 5′ carbon of the next via a phosphodiester bond. The 5′ end is at the top and the 3′ end at the bottom. The base of a nucleotide is joined covalently (at N-1 of pyrimidines and N-9 of purines) in an *N*- *β*-glycosyl bond to the 1′ carbon of the pentose, and the phosphate is esterified to the 5′ carbon. The *N*-*β*-glycosyl bond is formed by removal of the elements of water (a hydroxyl group from the pentose and hydrogen from the base), as in *O*-glycosidic bond formation. ![](media/image14.png) Both DNA and RNA contain two major purine bases, **adenine** (A) and **guanine** (G), and two major pyrimidines. In both DNA and RNA one of the pyrimidines is **cytosine** (C), but the second common pyrimidine is not the same in both: it is **thymine** (T) in DNA and **uracil** (U) in RNA. Only occasionally does thymine occur in RNA or uracil in DNA. The structures of the five major bases are: , and the nomenclature of their corresponding nucleotides and nucleosides is summarized below: Purines ![](media/image16.png) **The nomenclature for nucleosides and nucleotides of purines** Pyrimidines ![](media/image18.png)The numbering convention is as in the pentose ring, but in the pentoses of nucleotides and nucleosides the carbon numbers are given a prime (′) designation to distinguish them from the numbered atoms of the nitrogenous bases. **The nomenclature for the major nucleosides and nucleotides** **[Pyrimidine Nucleoside Nucleotide Nucleic acid]** ![](media/image20.png) ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Note: "Nucleoside" and "nucleotide" are generic terms that include both ribo- and deoxyribo- forms. Also, ribonucleosides and ribonucleotides are here designated simply as nucleosides and nucleotides (e.g., riboadenosine as adenosine), and deoxyribonucleosides and deoxyribonucleotides as deoxynucleosides and deoxynucleotides (e.g., deoxyriboadenosine as deoxyadenosine). Both forms of naming are acceptable, but the shortened names are more commonly used. Thymine is an exception; "ribothymidine" is used to describe its unusual occurrence in RNA ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- More examples on nucleotide nomenclature 5'-ATP (Adenosine 5'-triphosphate 3'-dGMP (deoxyguanosine 3'-monophosphate ![](media/image22.png) ![](media/image24.png) **Deoxyribonucleotides and ribonucleotides of nucleic acids.** All nucleotides are shown in their free form at pH 7.0. The nucleotide units of DNA **(a)** are usually symbolized as A, G, T, and C, sometimes as dA, dG, dT, and dC; those of RNA **(b)** as A, G, U, and C. In their free form the deoxyribonucleotides are commonly abbreviated dAMP, dGMP, dTMP, and dCMP; the ribonucleotides, AMP, GMP, UMP, and CMP. For each nucleotide in the figure, the more common name is followed by the complete name in parentheses. All abbreviations assume that the phosphate group is at the 5′ position. The nucleoside portion of each molecule is shaded in light red. The figures above show the structures and names of the four major **deoxyribonucleotides** (deoxyribonucleoside 5′-monophosphates; sometimes referred to as deoxynucleotides and deoxynucleoside triphosphates), the structural units of DNAs, and the four major **ribonucleotides** (ribonucleoside 5′-monophosphates), the structural units of RNAs Some adenosine monophosphates These adenosine monophosphates ae formed by enzymatic and alkaline hydrolysis of RNA. Thus, cells also contain nucleotides with the phosphate group at different positions other than the 5'-carbon. **Ribonucleoside 2′,3′-cyclic monophosphates** are isolatable intermediates, and **ribonucleoside 3**′**-monophosphates** are end products of the hydrolysis of RNA by certain ribonucleases. Other variations are adenosine 3′,5′-cyclic monophosphate (cAMP) and guanosine 3′,5′-cyclic monophosphate (cGMP). ![](media/image26.png) **Minor nucleosides** ![](media/image27.png) **Some minor purine and pyrimidine bases, shown as the nucleosides. (a)** Minor bases of DNA. 5- Methylcytidine occurs in the DNA of animals and higher plants, *N*6-methyladenosine in bacterial DNA, and 5- **The above are some minor purine and pyrimidine bases, shown as the nucleosides. (a)** Minor bases of DNA. 5- Methylcytidine occurs in the DNA of animals and higher plants, *N*6-methyladenosine in bacterial DNA, and 5- hydroxymethylcytidine in the DNA of animals and of bacteria infected with certain bacteriophages. **(b)** Some minor bases of tRNAs. Inosine contains the base hypoxanthine. Note that pseudouridine, like uridine, contains uracil; they are distinct in the point of attachment to the ribose---in uridine, uracil is attached through N-1, the usual attachment point for pyrimidines; in pseudouridine, through C-5. Although nucleotides bearing the major purines and pyrimidines are most common, both DNA and RNA also contain some minor bases **as shown above.**. In DNA the most common of these are methylated forms of the major bases; in some viral DNAs, certain bases may be hydroxymethylated or glucosylated. Altered or unusual bases in DNA molecules often have roles in regulating or protecting the genetic information. Minor bases of many types are also found in RNAs, especially in tRNAs. **The nomenclature of minor nucleosides and nucleotides** The nomenclature for the minor bases can be confusing. Like the major bases, many have common names ![](media/image29.png) - The nitrogenous base hypoxanthine (above), for example, may be shown as its nucleoside inosine When an atom in the purine or pyrimidine ring is substituted, the usual convention (used here) is simply to indicate the ring position of the substituent by its number---for example, 5-methylcytosine, 7-methylguanine, and 5-hydroxymethylcytosine. The element to which the substituent is attached (N, C, O) is not identified. The convention changes when the substituted atom is exocyclic (not within the ring structure), in which case the type of atom is identified, and the ring position to which it is attached is denoted with a superscript. The amino nitrogen attached to C-6 of adenine is *N*6; similarly, the carbonyl oxygen and amino nitrogen at C-6 and C-2 of guanine are *O*6 and *N*2, respectively. Examples of this nomenclature are *N*6- methyladenosine and *N*2-methylguanosine ![](media/image31.png) **Key Convention:** Although DNA and RNA seem to have two distinguishing features---different pentoses and the presence of uracil in RNA and thymine in DNA---it is the pentoses that uniquely define the identity of a nucleic acid. If the nucleic acid contains 2′-deoxy-D-ribose, it is DNA by definition, even if it contains uracil. Similarly, if the nucleic acid contains D-ribose, it is RNA, regardless of its base composition. Phosphodiester Bonds Link Successive Nucleotides in Nucleic Acids The successive nucleotides of both DNA and RNA are covalently linked through phosphate-group "bridges," in which the 5′-phosphate group of one nucleotide unit is joined to the 3′-hydroxyl group of the next nucleotide, creating a **phosphodiester linkage.** ![](media/image33.png) The phosphodiester bonds (one of which is shaded in the DNA) link successive nucleotide units. The backbone of alternating pentose and phosphate groups in both types of nucleic acid is highly polar. The 5′ and 3′ ends of the macromolecule may be free or may have an attached phosphoryl group. Thus, the covalent backbones of nucleic acids consist of alternating phosphate and pentose residues, and the nitrogenous bases may be regarded as side groups joined to the backbone at regular intervals. The backbones of both DNA and RNA are hydrophilic. The hydroxyl groups of the sugar residues form hydrogen bonds with water. The phosphate groups, with a p*K*a near 0, are completely ionized and negatively charged at pH 7, and the negative charges are generally neutralized by ionic interactions with positive charges on proteins, metal ions, and polyamines. **Key Convention:** All the phosphodiester linkages in DNA and RNA have the same orientation along the chain , giving each linear nucleic acid strand a specific polarity and distinct 5′ and 3′ ends. By definition, the **5′ end** lacks a nucleotide attached at the 5′ position, and the **3′ end** lacks a nucleotide attached at the 3′ position. Other groups (most often one or more phosphates) may be present on one or both ends. The 5′→3′ orientation of a strand of nucleic acid refers to the *ends* of the strand and the orientation of individual nucleotides, not the orientation of the individual phosphodiester bonds linking its constituent nucleotides. The covalent backbone of DNA and RNA is subject to slow, nonenzymatic hydrolysis of the phosphodiester bonds. In the test tube, RNA is hydrolyzed rapidly under alkaline conditions, but DNA is not; the 2′-hydroxyl groups in RNA (absent in DNA) are directly involved in the process. Cyclic 2′,3′-monophosphate nucleotides are the first products of the action of alkali on RNA and are rapidly hydrolyzed further to yield a mixture of 2′- and 3′-nucleoside monophosphates The 2′ hydroxyl acts as a nucleophile in an intramolecular displacement. The 2′,3′-cyclic monophosphate derivative is further hydrolyzed to a mixture of 2′- and 3′-monophosphates. DNA, which lacks 2′ hydroxyls, is stable under similar conditions. The nucleotide sequences of nucleic acids can be represented schematically, as illustrated below by a segment of DNA with five nucleotide units. The phosphate groups are symbolized by **P** , and each deoxyribose is symbolized by a vertical line, from C-1′ at the top to C-5′ at the bottom (but keep in mind that the sugar is always in its closed-ring *β*-furanose form in nucleic acids). The connecting lines between nucleotides (which pass through **P** ) are drawn diagonally from the middle (C-3′) of the deoxyribose of one nucleotide to the bottom (C-5′) of the next. ![](media/image35.png) Some simpler representations of this pentadeoxyribonucleotide are pA-C-G-T-AOH, pApCpGpTpA, and pACGTA. **Key Convention:** The sequence of a single strand of nucleic acid is always written with the 5′ end at the left and the 3′ end at the right---that is, in the 5′→3′ direction. A short nucleic acid is referred to as an **oligonucleotide**. The definition of "short" is somewhat arbitrary, but polymers containing 50 or fewer nucleotides are generally called oligonucleotides. A longer nucleic acid is called a **polynucleotide**. The Properties of Nucleotide Bases Affect the Three-Dimensional Structure of Nucleic Acids Free pyrimidines and purines are weakly basic compounds and thus are called bases. The purines and pyrimidines common in DNA and RNA are aromatic molecules, a property with important consequences for the structure, electron distribution, and light absorption of nucleic acids. Electron delocalization among atoms in the ring gives most of the bonds in the ring partial double-bond character. One result is that pyrimidines are planar molecules and purines are very nearly planar, with a slight pucker. **Tautomeric forms of purines and pyrimidines** **Tautomerization** The interconversion of two isomers that differ only in the position of protons (and, often, double bonds). Watson and Crick suggested a mechanism for the spontaneous occurrence of transitions in a classic paper on the DNA double helix. They noted that some of the hydrogen atoms on each of the four bases can change their location to produce a *tautomer*. An amino group (-NH2) can tautomerize to an imino form ( NH). Likewise, a keto group (can form nonstandard base pairs that fit into a double helix. For example, the imino tautomer of adenine can pair with cytosine.. This A\*-C pairing (the asterisk denotes the imino tautomer) would allow C to become incorporated into a growing DNA strand where T was expected, and it would lead to a mutation if left uncorrected. In the next round of replication, A\* will probably retautomerize to the standard form, which pairs as usual with thymine, but the cytosine residue will pair with guanine. Hence, one of the daughter DNA molecules will contain a G-C base pair in place of the normal A-T base pair. **Base Pair with Mutagenic Tautomer.** The bases of DNA can exist in rare tautomeric forms. The imino tautomer of adenine can pair with cytosine, eventually leading to a transition from A-T to G-C ![](media/image37.png) Free pyrimidine and purine bases may exist in two or more tautomeric forms depending on the pH. Uracil, for example, occurs in lactam, lactim, and double lactim forms The structures of nitrogenous base tautomers that predominate at pH 7.0 are as shown below: ![](media/image39.png) **Tautomeric forms of uracil.** The lactam form predominates at pH 7.0; the other forms become more prominent as pH decreases. The other free pyrimidines and the free purines also have tautomeric forms, but they are more rarely encountered. The purine and pyrimidine bases are hydrophobic and relatively insoluble in water at the near neutral pH of the cell. At acidic or alkaline pH, the bases become charged and their solubility in water increases. Hydrophobic stacking interactions in which two or more bases are positioned with the planes of their rings parallel (like a stack of coins) are one of two important modes of interaction between bases in nucleic acids. The stacking also involves a combination of van der Waals and dipole-dipole interactions between the bases. Base stacking helps to minimize contact of the bases with water, and base-stacking interactions are very important in stabilizing the three-dimensional structure of nucleic acids. **Molar extinction coefficients of nucleic acids** All nucleotide bases absorb UV light, and nucleic acids are characterized by a strong absorption at wavelengths near 260 nm. **Absorption spectra of the common nucleotides.** The spectra are shown as the variation in molar extinction coefficient with wavelength. The molar extinction coefficients at 260 nm and pH 7.0 (*ε*260) are listed in the table. The spectra of corresponding ribonucleotides and deoxyribonucleotides, as well as the nucleosides, are essentially identical. For mixtures of nucleotides, a wavelength of 260 nm (dashed vertical line) is used for absorption measurements. The functional groups of pyrimidines and purines are ring nitrogens, carbonyl groups, and exocyclic amino groups. Hydrogen bonds involving the amino and carbonyl groups are the most important mode of interaction between two (and occasionally three or four) complementary strands of nucleic acid. The most common hydrogen-bonding patterns are those defined by James D. Watson and Francis Crick in 1953, in which A bonds specifically to T (or U) and G bonds to C, ![](media/image41.png) These two types of **base pairs** predominate in double-stranded DNA and RNA, and the tautomers that predominate at pH 7.0 (see above) are responsible for these patterns. It is this specific pairing of bases that permits the duplication of genetic information. **Hydrogen-bonding patterns in the base pairs defined by Watson and Crick.** Here as elsewhere, hydrogen bonds are represented by three blue lines. **summary on Some Basics** A nucleotide consists of a nitrogenous base (purine or pyrimidine), a pentose sugar, and one or more phosphate groups. Nucleic acids are polymers of nucleotides, joined together by phosphodiester linkages between the 5′-hydroxyl group of one pentose and the 3′-hydroxyl group of the next. There are two types of nucleic acid: RNA and DNA. The nucleotides in RNA contain ribose, and the common pyrimidine bases are uracil and cytosine. In DNA, the nucleotides contain 2′- deoxyribose, and the common pyrimidine bases are thymine and cytosine. The primary purines are adenine and guanine in both RNA and DNA **Functions of Nucleotides** Nucleotides have a variety of roles in cellular metabolism. They are the energy currency in metabolic transactions, the essential chemical links in the response of cells to hormones and other extracellular stimuli, and the structural components of an array of enzyme cofactors and metabolic intermediates. And, last but certainly not least, they are the constituents of nucleic acids: **deoxyribonucleic acid (DNA)** and **ribonucleic acid (RNA)**, the molecular repositories of genetic information. The structure of every protein, and ultimately of every biomolecule and cellular component, is a product of information programmed into the nucleotide sequence of cellular (or viral) nucleic acids. The ability to store and transmit genetic information from one generation to the next is a fundamental condition for life In addition to their roles as the subunits of nucleic acids, nucleotides have a variety of other functions in every cell: as energy carriers, components of enzyme cofactors, and chemical messengers. 1.Nucleotides Carry Chemical Energy in Cells The phosphate group covalently linked at the 5′ hydroxyl of a ribonucleotide may have one or two additional phosphates attached. The resulting molecules are referred to as nucleoside mono-, di-, and triphosphates. **Nucleoside phosphates.** General structure of the nucleoside 5′-mono-, di-, and triphosphates (NMPs, NDPs, and NTPs) and their standard abbreviations. In the deoxyribonucleoside phosphates (dNMPs, dNDPs, and dNTPs), the pentose is 2′-deoxy-D-ribose. ![](media/image43.png) **The phosphate ester and phosphoanhydride bonds of ATP.** Hydrolysis of an anhydride bond yields more energy than hydrolysis of the ester. A carboxylic acid anhydride and carboxylic acid ester are shown for comparison. Starting from the ribose, the three phosphates are generally labeled *α*, *β*, and *γ*. Hydrolysis of nucleoside triphosphates provides the chemical energy to drive many cellular reactions. Adenosine 5′-triphosphate, ATP, is by far the most widely used nucleoside triphosphate for this purpose, but UTP, GTP, and CTP are also used in some reactions. Nucleoside triphosphates also serve as the activated precursors of DNA and RNA synthesis. The energy released by hydrolysis of ATP and the other nucleoside triphosphates is accounted for by the structure of the triphosphate group. The bond between the ribose and the *α* phosphate is an ester linkage. The *α*, *β* and *β*, *γ* linkages are phosphoanhydrides. Hydrolysis of the ester linkage yields about 14 kJ/mol under standard conditions, whereas hydrolysis of each anhydride bond yields about 30 kJ/mol. ATP hydrolysis often plays an important thermodynamic role in biosynthesis. When coupled to a reaction with a positive free-energy change, ATP hydrolysis shifts the equilibrium of the overall process to favor product formation 2 Adenine nucleotides are components of many enzyme cofactors. **Some coenzymes containing adenosine.** The adenosine portion is shaded in light red. Coenzyme A (CoA) functions in acyl group transfer reactions; the acyl group (such as the acetyl or acetoacetyl group) is attached to the CoA through a thioester linkage to the *β*-mercaptoethylamine moiety. NAD+ functions in hydride transfers, and FAD, the active form of vitamin B2 (riboflavin), in electron transfers. Another coenzyme incorporating adenosine is 5′- deoxyadenosylcobalamin, the active form of vitamin B12 3.Adenine Nucleotides Also Serve as Signals ATP and ADP also serve as signaling molecules in many unicellular and multicellular organisms, including humans. In mammals, certain neurons release ATP at synapses, which binds P~2γ~ receptors on the post synaptic cell, triggering changes in membrane potential or the release of an intracellular second messenger that initiates diverse physiological processes, including taste, inflammation, and smooth muscle contraction. One important class of ATP receptors that mediate the sensation of pain is an obvious target for drug development. Extracellular ADP is a signaling molecule that acts through P2Y receptors in sensitive cell types. By preventing ADP from binding the P2Y receptors of platelets, the drug clopidogrel (Plavix) inhibits undesirable blood clotting in patients with cardiac disease. 4.Some Nucleotides Are Regulatory Molecules Cells respond to their environment by taking cues from hormones or other external chemical signals. The interaction of these extracellular chemical signals ("first messengers") with receptors on the cell surface often leads to the production of **second messengers** inside the cell, which in turn leads to adaptive changes in the cell interior. Often, the second messenger is a nucleotide. One of the most common is **adenosine 3′,5′-cyclic monophosphate (cyclic AMP**, or **cAMP**), formed from ATP in a reaction catalyzed by adenylyl cyclase, an enzyme associated with the inner face of the plasma membrane. Cyclic AMP serves regulatory functions in virtually e ![](media/image45.png) very cell outside the plant kingdom. Guanosine 3′,5′-cyclic monophosphate (cGMP) also has regulatory functions in many cells. Another regulatory nucleotide, ppGpp, which is produced in bacteria in response to a slowdown in protein synthesis during amino acid starvation. This nucleotide inhibits the synthesis of the rRNA and tRNA molecules needed for protein synthesis, preventing the unnecessary production of nucleic acids. ![](media/image47.png) **Summary On Other Functions of Nucleotides** ATP is the central carrier of chemical energy in cells. The presence of an adenosine moiety in a variety of enzyme cofactors may be related to binding-energy requirements. Cyclic AMP, formed from ATP in a reaction catalyzed by adenylyl cyclase, is a common second messenger produced in response to hormones and other chemical signals. ATP and ADP serve as neurotransmitters in a variety of signaling pathways. **Nucleic Acid Structure** The discovery of the structure of DNA by Watson and Crick in 1953 gave rise to entirely new disciplines and influenced the course of many established ones. In this section we focus on DNA structure, some of the events that led to its discovery, and more recent refinements in our understanding of DNA. We also introduce RNA structure. As in the case of protein structure, it is sometimes useful to describe nucleic acid structure in terms of hierarchical levels of complexity (primary, secondary, tertiary). The primary structure of a nucleic acid is its covalent structure and nucleotide sequence. Any regular, stable structure taken up by some or all of the nucleotides in a nucleic acid can be referred to as secondary structure. Most structures considered in the remainder of this chapter fall under the heading of secondary structure. The complex folding of large chromosomes within eukaryotic chromatin and bacterial nucleoids, or the elaborate folding of large tRNA or rRNA molecules, is generally considered tertiary structure. **DNA Is a Double Helix That Stores Genetic Information** DNA was first isolated and characterized by Friedrich Miescher in 1868. He called the phosphorus containing substance "nuclein." Not until the 1940s, with the work of Oswald T. Avery, Colin MacLeod, and Maclyn McCarty, was there any compelling evidence that DNA was the genetic material. Avery and his colleagues found that an extract of a virulent strain of the bacterium *Streptococcus pneumoniae* (causing disease in mice) could be used to transform a nonvirulent strain of the same bacterium into a virulent strain. They were able to demonstrate through various chemical tests that it was DNA from the virulent strain (not protein, polysaccharide, or RNA, for example) that carried the genetic information for virulence. Then in 1952, experiments by Alfred D. Hershey and Martha Chase, in which they studied the infection of bacterial cells by a virus (bacteriophage) with radioactively labeled DNA or protein, removed any remaining doubt that DNA, not protein, carried the genetic information. Another important clue to the structure of DNA came from the work of Erwin Chargaff and his colleagues in the late 1940s. They found that the four nucleotide bases of DNA occur in different ratios in the DNAs of different organisms and that the amounts of certain bases are closely related. These data, collected from DNAs of a great many different species, led Chargaff to the following conclusions: 1\. The base composition of DNA generally varies from one species to another. 2\. DNA specimens isolated from different tissues of the same species have the same base composition. 3\. The base composition of DNA in a given species does not change with an organism's age, nutritional state, or changing environment. 4\. In all cellular DNAs, regardless of the species, the number of adenosine residues is equal to the number of thymidine residues (that is, A = T), and the number of guanosine residues is equal to the number of cytidine residues (G = C). From these relationships it follows that the sum of the purine residues equals the sum of the pyrimidine residues; that is, A + G = T + C. These quantitative relationships, sometimes called "Chargaff's rules," were confirmed by many subsequent researchers. They were a key to establishing the three-dimensional structure of DNA and yielded clues to how genetic information is encoded in DNA and passed from one generation to the next. To shed more light on the structure of DNA, Rosalind Franklin and Maurice Wilkins used the powerful method of x-ray diffraction to analyze DNA fibers in the early 1950s. Although lacking the molecular definition of diffraction from crystals, the x-ray diffraction pattern generated from the fibers was informative **confirming a helical structure and recurring bases**. The pattern revealed that DNA molecules are helical, with two periodicities along their long axis, a primary one of 3.4 Å and a secondary one of 34 Å. The problem then was to formulate a three-dimensional model of the DNA molecule that could account not only for the x-ray diffraction data but also for the specific A = T and G = C base equivalences discovered by Chargaff and for the other chemical properties of DNA. James Watson and Francis Crick relied on this accumulated information about DNA to set about deducing its structure. In 1953 they postulated a three-dimensional model of DNA structure that accounted for all the available data. It consists of two helical DNA chains wound around the same axis to form a right-handed double helix The hydrophilic backbones of alternating deoxyribose and phosphate groups are on the outside of the double helix, facing the surrounding water. The furanose ring of each deoxyribose is in the C-2′ endo conformation. The purine and pyrimidine bases of both strands are stacked inside the double helix, with their hydrophobic and nearly planar ring structures very close together and perpendicular to the long axis. The offset pairing of the two strands creates a **major groove** and **minor groove** on the surface of the duplex. Each nucleotide base of one strand is paired in the same plane with a base of the other strand. Watson and Crick found that the hydrogen-bonded base pair were illustrated in which, G with C and A with T, are those that fit best within the structure, providing a rationale for Chargaff's rule that in any DNA, G = C and A = T. It is important to note that three hydrogen bonds can form between G and C, symbolized G≡C, but only two can form between A and T, symbolized A=T. Pairings of bases other than G with C and A with T tend (to varying degrees) to destabilize the double-helical structure. **Watson-Crick model for the structure of DNA.** The original model proposed by Watson and Crick had 10 base pairs, or 34 Å (3.4 nm), per turn of the helix; subsequent measurements revealed 10.5 base pairs, or 36 Å (3.6 nm), per turn. **(a)** Schematic representation, showing dimensions of the helix. **(b)** Stick representation showing the backbone and stacking of the bases. **(c)** Space-filling model. When Watson and Crick constructed their model, they had to decide at the outset whether the strands of DNA should be **parallel** or **antiparallel**---whether their 3′,5′-phosphodiester bonds should run in the same or opposite directions. An antiparallel orientation produced the most convincing model, and later work with DNA polymerases provided experimental evidence that the strands are indeed antiparallel, a finding ultimately confirmed by x-ray analysis. To account for the periodicities observed in the x-ray diffraction patterns of DNA fibers, Watson and Crick manipulated molecular models to arrive at a structure in which the vertically stacked bases inside the double helix would be 3.4 Å apart; the secondary repeat distance of about 34 Å was accounted for by the presence of 10 base pairs in each complete turn of the double helix. The structure in aqueous solution differs slightly from that in fibers, having 10.5 base pairs per helical turn. ![](media/image49.png) As **shown in the figure above**, the two antiparallel polynucleotide chains of double-helical DNA are not identical in either base sequence or composition. Instead they are **complementary** to each other. Wherever adenine occurs in one chain, thymine is found in the other; similarly, wherever guanine occurs in one chain, cytosine is found in the other. **Complementarity of strands in the DNA double helix** The complementary antiparallel strands of DNA follow the pairing rules proposed by Watson and Crick. The base-paired antiparallel strands differ in base composition: the left strand has the composition A3T2G1C3; the right, A2T3G3C1. They also differ in sequence when each chain is read in the 5′→3′ direction. Note the base equivalences: A = T and G = C in the duplex. The DNA double helix, or duplex, is held together by hydrogen bonding between complementary base pairs and by base-stacking interactions. The complementarity between the DNA strands is attributable to the hydrogen bonding between base pairs; however, the hydrogen bonds do not contribute significantly to the stability of the structure. The double helix is primarily stabilized by metal cations, which shield the negative charges of backbone phosphates, and by base-stacking interactions between complementary base pairs. Base-stacking interactions between adjacent G≡C pairs are stronger than those between adjacent A=T pairs or adjacent pairs including all four bases. Because of this, DNA duplexes with higher G≡C content are more stable. The important features of the double-helical model of DNA structure are supported by much chemical and biological evidence. Moreover, the model immediately suggested a mechanism for the transmission of genetic information. The essential feature of the model is the complementarity of the two DNA strands. As Watson and Crick were able to see, well before confirmatory data became available, this structure could logically be replicated by (1) separating the two strands and (2) synthesizing a complementary strand for each. Because nucleotides in each new strand are joined in a sequence specified by the base-pairing rules stated above, each preexisting strand functions as a template to guide the synthesis of one complementary strand (See figure below**)**. **Replication of DNA as suggested by Watson and Crick.** The pre-existing or "parent" strands become separated, and each is the template for biosynthesis of a complementary "daughter" strand (in pink These expectations were experimentally confirmed, inaugurating a revolution in our understanding of biological inheritance.. **DNA Can Occur in Different Three-Dimensional Forms** DNA is a remarkably flexible molecule. Considerable rotation is possible around several types of bonds in the sugar--phosphate (phosphodeoxyribose) backbone, and thermal fluctuation can produce bending, stretching, and unpairing (melting) of the strands. Many significant deviations from the Watson-Crick DNA structure are found in cellular DNA, some or all of which may be important in DNA metabolism. These structural variations generally do not affect the key properties of DNA defined by Watson and Crick: strand complementarity, antiparallel strands, and the requirement for A=T and G≡C base pairs. Structural variation in DNA reflects three things: the different possible conformations of the deoxyribose, rotation about the contiguous bonds that make up the phosphodeoxyribose backbone; and free rotation about the C-1′--*N*-glycosyl bond (see figure below). Because of steric constraints, purines in purine nucleotides are restricted to two stable conformations with respect to deoxyribose, called syn and anti - - - ![](media/image51.png) **Structural variation in DNA. (a)** The conformation of a nucleotide in DNA is affected by rotation about seven different bonds. Six of the bonds rotate freely. The limited rotation about bond 4 gives rise to ring pucker. This conformation is endo or exo, depending on whether the atom is displaced to the same side of the plane as C-5′ or to the opposite side. For purine bases in nucleotides, only two conformations with respect to the attached ribose units are sterically permitted, anti or syn. Pyrimidines occur in the anti conformation. Z-form DNA is a more radical departure from the B structure; the most obvious distinction is the left-handed helical rotation. There are 12 base pairs per helical turn, and the structure appears more slender and elongated. The DNA backbone takes on a zigzag appearance. Certain nucleotide sequences fold into left-handed Z helices much more readily than others. Prominent examples are sequences in which pyrimidines alternate with purines, especially alternating C and G (that is, in the helix, alternating C≡G and G≡C pairs) or 5-methyl-C and G residues. To form the left-handed helix in Z-DNA, the purine residues flip to the syn conformation, alternating with pyrimidines in the anti conformation. The major groove is barely apparent in Z-DNA, and the minor groove is narrow and deep. Whether A-DNA occurs in cells is uncertain, but there is evidence for some short stretches (tracts) of Z-DNA in both bacteria and eukaryotes. These Z-DNA tracts may play a role (as yet undefined) in regulating the expression of some genes or in genetic recombination **DNA Supercoiling** **Supercoil:** The twisting of a helical (coiled) molecule on itself; a coiled coil. **supercoiled DNA:** DNA that twists upon itself because it is under- or overwound (and thereby strained) relative to B-form DNA. Cellular DNA, as we have seen, is extremely compacted, implying a high degree of structural organization. The folding mechanism must not only pack the DNA but also permit access to the information in the DNA. Before considering how this is accomplished in processes such as replication and transcription, we need to examine an important property of DNA structure known as **supercoiling**. "Supercoiling" means the coiling of a coil. An old-fashioned telephone cord, for example, is typically a coiled wire. The path taken by the wire between the base of the phone and the receiver often includes one or more supercoils ![](media/image53.png) **(Fig. 24-9)**. DNA is coiled in the form of a double helix, with both strands of the DNA coiling around an axis. The further coiling of that axis upon itself **(Fig. 24- 10)** produces DNA supercoiling. As detailed below, DNA supercoiling is generally a manifestation of structural strain. When there is no net bending of the DNA axis upon itself, the DNA is said to be in a **relaxed** state. Supercoiling affects, and is affected by, replication and transcription, both of which require a separation of DNA strands---a process complicated by the helical interwinding of the strands**.** That a DNA molecule would bend on itself and become supercoiled in tightly packaged cellular DNA would seem logical, and perhaps even trivial, were it not for one additional fact: many circular DNA molecules remain highly supercoiled even after they are extracted and purified, freed from protein and other cellular components. This indicates that supercoiling is an intrinsic property of DNA tertiary structure. It occurs in all cellular DNAs and is highly regulated by each cell. Certain DNA Sequences Adopt Unusual Structures Other sequence-dependent structural variations found in larger chromosomes may affect the function and metabolism of the DNA segments in their immediate vicinity. For example, bends occur in the DNA helix wherever four or more adenosine residues appear sequentially in one strand. Six adenosines in a row produce a bend of about 18°. The bending observed with this and other sequences may be important in the binding of some proteins to DNA. **Palindromes in DNA** Palindrome In DNA, the term is applied to regions of DNA with **inverted repeats**, such that an inverted, self-complementary sequence in one strand is repeated in the opposite orientation in the paired strand. ![](media/image55.png) The self-complementarity within each strand confers the potential to form **hairpin** or **cruciform** (cross-shaped) structures When the inverted repeat occurs within each individual strand of the DNA, the sequence is called a **mirror repeat** Mirror repeats do not have complementary sequences within the same strand and thus cannot form hairpin or cruciform structures. Sequences of these types are found in almost every large DNA molecule and can encompass a few base pairs or thousands. The extent to which palindromes occur as cruciforms in cells is not known, although some cruciform structures have been demonstrated in vivo in *Escherichia coli*. Self-complementary sequences cause isolated single strands of DNA (or RNA) in solution to fold into complex structures containing multiple hairpins **Palindromes and mirror repeats.** Palindromes are sequences of double-stranded nucleic acids with two -fold symmetry. To superimpose one repeat (shaded sequence) on the other, it must be rotated 180° about the horizontal axis then 180° about the vertical axis, as shown by the coloured arrows. A mirror repeat, on the other hand, has a symmetric sequence within each strand. Superimposing one repeat on the other requires only a single 180° rotation about the vertical axis. ![](media/image57.png) **Hairpins and cruciforms.** Palindromic DNA (or RNA) sequences can form alternative structures with intr astrand base pairing. **(a)** When only a single DNA (or RNA) strand is involved, the structure is called a hairpin. **(b)** When both strands of a duplex DNA are involved, it is called a cruciform. Blue shading highlights asymmetric sequences that can pair with the complementary sequence either in the same strand or in the complementary strand. A palindrome in DNA consists of two closely spaced or adjacent inverted repeats. Certain palindromes have important biological functions as parts of various cis-acting elements and protein binding sites. However, many palindromes are known as fragile sites in the genome, sites prone to chromosome breakage which can lead to various genetic rearrangements or even cell death. 1. The ability of certain palindromes to initiate genetic recombination lies in their ability to form secondary structures in DNA which can cause replication stalling and double-strand breaks. Given their recombinogenic nature, it is not surprising that 2. palindromes in the human genome are involved in genetic rearrangements in cancer cells as well as 3. other known recurrent translocations and deletions associated with certain syndromes in humans. Here, we bring an overview of current understanding and knowledge on 4. molecular mechanisms of palindrome recombinogenicity and hence possible implications of DNA palindromes in carcinogenesis. Several unusual DNA structures are formed from three or even four DNA strands. **The Recombinogenic Nature of Palindromic Sequences** DNA Palidromes Can Form Secondary Structures A palindrome in DNA is a sequence consisting of two identical or highly similar inverted repeats which are either adjacent to one another or separated by a spacer region. If the repeats (also called the palindrome arms) are identical and have no spacer in between, the palindrome is referred to as perfect. The term quasipalindrome can be used to refer to a non-perfect palindrome. Palindromes are found in genomes of all species investigated so far and they often play important roles as binding sites for homodimeric proteins, parts of promoters, replication origins or other regulatory sequences ![](media/image59.png) However, many of the discovered palindromes have no known biological function and can be relatively long (from several dozen to several hundred base pairs). If a palindrome is of sufficient length, intra strand base pairing can occur and this results in formation of secondary structures in DNA. In the single-stranded DNA, a hairpin structure forms, while in the double-stranded DNA, a cruciform structure consisting of two hairpins, one in each strand, forms. Each hairpin consists of a stem comprised of complementary paired inverted repeats and a loop. The loop either consists of bases within the spacer, if such region is present in a specific palindrome, or of four--six bases which lie in the center of symmetry of the two inverted repeats and cannot be complementarily paired due to the rigidity of the DNA strand It is considered that a hairpin structure can occur in the single-stranded lagging strand during DNA replication. A cruciform structure forms in dsDNA by gradual extrusion which begins at the center of the palindrome. After denaturation of a short region of DNA at the center, intra strand base pairing occurs and a small proto-cruciform forms which can further extrude into a larger cruciform. Conditions which lead to cruciform extrusion have been extensively studied in vitro and theoretical kinetic models of cruciform extrusion were postulated. These experiments show that cruciform extrusion is thermodynamically unfavorable, and in those early studies, it was debated if extrusion is even possible in vivo. However, processes such as transcription and replication can increase the density of negative supercoils in DNA. The added torsion can be released through cruciform extrusion and this is considered to be a mechanism driving extrusion in vivo. **Triplex DNAs** Nucleotides participating in a Watson-Crick base pair can form additional hydrogen bonds with a third strand, particularly with functional groups arrayed in the major groove. For example, the guanosine residue of a G≡C nucleotide pair can pair with a cytidine residue (if protonated) on a third strand; the adenosine of an A=T pair can pair with a thymidine residue. The N-7, *O*6, and *N*6 of purines, the atoms that participate in the hydrogen bonding with a third DNA strand, are often referred to as **Hoogsteen positions**, and the non-Watson-Crick pairing is called **Hoogsteen pairing**, after Karst Hoogsteen, who in 1963 first recognized the potential for these unusual pairings. Hoogsteen pairing allows the formation of **triplex DNAs**. The triplexes shown in (a, b) are most stable at low pH because the C≡G·C+ triplet requires a protonated cytosine. In the triplex, the p*K*a of this cytosine is \>7.5, altered from its normal value of 4.2. The triplexes also form most readily within long sequences containing only pyrimidines or only purines in a given strand. Some triplex DNAs contain two pyrimidine strands and one purine strand; others contain two purine strands and one pyrimidine strand. This is shown in Figures a and b below: ![](media/image61.png) **Quadruplex DNA** Four DNA strands can also pair to form a tetraplex (quadruplex), but this occurs readily only for DNA sequences with a very high proportion of guanosine residues (Fig. c, d). ![](media/image63.png) The guanosine tetraplex, or **G tetraplex**, is quite stable over a broad range of conditions. The orientation of strands in the tetraplex can vary as shown in Figure e. **Polycistronic Versus monocistronic mRNA** In bacteria and archaea, a single mRNA molecule may code for one or several polypeptide chains. If it carries the code for only one polypeptide, the mRNA is **monocistronic**; if it codes for two or more different polypeptides, the mRNA is **polycistronic**. In eukaryotes, most mRNAs are monocistronic. (For the purposes of this discussion, "cistron" refers to a gene. The term itself has historical roots in the science of genetics, and its formal genetic definition is beyond the scope of this text.) The minimum length of an mRNA is set by the length of the polypeptide chain for which it codes. For example, a polypeptide chain of 100 amino acid residues requires an RNA coding sequence of at least 300 nucleotides, because each amino acid is coded by a nucleotide triplet (this and other details of protein synthesis) However, mRNAs transcribed from DNA are always somewhat longer than the length needed simply to code for a polypeptide sequence (or sequences). The additional, noncoding RNA includes sequences that regulate protein synthesis' **The general structure of bacterial mRNAs is shown below:** ![](media/image65.png) **Bacterial mRNA.** Schematic diagrams show **(a)** monocistronic and **(b)** polycistronic mRNAs of bacteria. Red segments represent RNA coding for a gene product; gray segments represent noncoding RNA. In the polycistronic transcript, noncoding RNA separates the three genes. **Many RNAs Have More Complex Three-Dimensional Structures** Messenger RNA is only one of several classes of cellular RNA. Transfer RNAs are adapter molecules that act in protein synthesis; covalently linked to an amino acid at one end, each tRNA pairs with the mRNA in such a way that amino acids are joined to a growing polypeptide in the correct sequence. Ribosomal RNAs are components of ribosomes. There is also a wide variety of special-function RNAs, including some (called ribozymes) that have enzymatic activity. The diverse and often complex functions of these RNAs reflect a diversity of structure much richer than that observed in DNA molecules. The product of transcription of DNA is always single-stranded RNA. The single strand tends to assume a right-handed helical conformation dominated by base-stacking interactions **(See Figure below)**, The strands are stronger between two purines than between a purine and pyrimidine or between two pyrimidines. The purine-purine interaction is so strong that a pyrimidine separating two purines is often displaced from the stacking pattern so that the purines can interact. Any self-complementary sequences in the molecule produce more complex structures. RNA can base-pair with complementary regions of either RNA or DNA. Base pairing matches the pattern for DNA: G pairs with C and A pairs with U (or with the occasional T residue in some RNAs). One difference is that base pairing between G and U residues is allowed in RNA when complementary sequences in two single strands of RNA (or within a single strand of RNA that folds back on itself to align the residues) pair with each other. The paired strands in RNA or RNA-DNA duplexes are antiparallel, as in DNA. When two strands of RNA with perfectly complementary sequences are paired, the predominant double-stranded structure is an A-form right-handed double helix. However, strands of RNA that are perfectly paired over long regions of sequence are uncommon. The three-dimensional structures of many RNAs, like those of proteins, are complex and unique. Weak interactions, especially base stacking interactions, help stabilize RNA structures, just as they do in DNA. Z-form helices have been made in the laboratory (under very high-salt or high-temperature conditions). The B form of RNA has not been observed. Breaks in the regular A-form helix caused by mismatched or unmatched bases in one or both strands are common and result in bulges or internal loops' ![](media/image67.png) **Secondary structure of RNAs. (a)** Bulge, internal loop, and hairpin loop. **(b)** The paired regions generally have an A-form right-handed helix, as shown for a hairpin. \[Source: (b) Modified from PDB ID 1GID, J. H. Cate et al., *Science* 273:1678, 1996.\] Hairpin loops form between nearby self-complementary (palindromic) sequences. Extensive base-paired helical segments are formed in many RNAs **Typical right-handed stacking pattern of single-stranded RNA.** The bases are shown in yellow, the phosphorus atoms in orange, and the riboses and phosphate oxygens in green. Hairpin loops form between nearby self-complementary (palindromic) sequences. Extensive base-paired helical segments are formed in many RNAs' and the resulting hairpins are the most common type of secondary structure in RNA. ![](media/image69.png) **Base-paired helical structures in an RNA.** Shown here is the possible secondary structure of the M1 RNA component of the enzyme RNase P of *E. coli*, with many hairpins. RNase P, which also contains a protein component (not shown), functions in the processing of transfer RNAs. The two square brackets indicate additional complementary sequences that may be paired in the three-dimensional structure. The blue dots indicate non Watson-Crick G=U base pairs (boxed inset). Note that G=U base pairs are allowed only when presynthesized strands of RNA fold up or anneal with each other. There are no RNA polymerases (the enzymes that synthesize RNAs on a DNA template) that insert a U opposite a template G, or vice versa, during RNA synthesis. \[Source: B. D. James et al., *Cell* 52:19, 1988.\] **UUCG short base sequences** Specific short base sequences (such as UUCG) are often found at the ends of RNA hairpins and are known to form particularly tight and stable loops. Such sequences may act as starting points for the folding of an RNA molecule into its precise three-dimensional structure. Other contributions are made by hydrogen bonds that are not part of standard Watson-Crick base pairs. For example, the 2′-hydroxyl group of ribose can hydrogen-bond with other groups. Some of these properties are evident in the tertiary structure of the phenylalanine transfer RNA of yeast---the tRNA responsible for inserting Phe residues into polypeptides---and in two RNA enzymes, or ribozymes, whose functions, like those of protein enzymes, depend on their three-dimensional structures. **The figure below shows the three-dimensional structure in RNA. (a)** Three-dimensional structure of phenylalanine tRNA of yeast. Some unusual base-pairing patterns found in this tRNA are shown. Note also the involvement of the oxygen of a ribose phosphodiester bond in one hydrogen-bonding arrangement, and a ribose 2′-hydroxyl group in another (both in red). **(b)** A hammerhead ribozyme (so named because the secondary structure at the active site looks like the head of a hammer), derived from certain plant viruses. Ribozymes, or RNA enzymes, catalyze a variety of reactions, primarily in RNA metabolism and protein synthesis. **(c)** A segment of mRNA known as an intron, from the ciliated protozoan *Tetrahymena thermophila*. This intron (a ribozyme) catalyzes its own excision from between exons in an mRNA strand \[Sources: (a) PDB ID 1TRA, E. Westhof and M. Sundaralingam, *Biochemistry* 25:4868, 1986. (b) Modified from PDB ID 1MME, W. G. Scott et al., *Cell* 81:991, 1995. (c) Modified from PDB ID 1GRZ, B. L. Golden et al., *Science* 282:259, 1998.\] **Summary on structure of nucleic acids** - Many lines of evidence show that DNA bears genetic information. Some of the earliest evidence came from the Avery-MacLeod-McCarty experiment, which showed that DNA isolated from one bacterial strain can enter and transform the cells of another strain, endowing it with some of the inheritable characteristics of the donor. The Hershey-Chase experiment showed that the DNA of a bacterial virus, but not its protein coat, carries the genetic message for replication of the virus in a host cell. Putting together the available data, Watson and Crick postulated that native DNA consists of two antiparallel chains in a right-handed double-helical arrangement. Complementary base pairs, A=T and G≡C, are formed by hydrogen bonding within the helix. The base pairs are stacked perpendicular to the long axis of the double helix, 3.4 Å apart, with 10.5 base pairs per turn. DNA can exist in several structural forms. Two variations of the Watson-Crick form, or B-DNA, are A- and Z-DNA. Some sequence-dependent structural variations cause bends in the DNA molecule. DNA strands with appropriate sequences can form hairpin or cruciform structures or triplex or tetraplex DNA. Messenger RNA transfers genetic information from DNA to ribosomes for protein synthesis. Transfer RNA and ribosomal RNA are also involved in protein synthesis. RNA can be structurally complex; single RNA strands can fold into hairpins, double-stranded regions, or complex loops. **Nucleic Acid Chemistry** The role of DNA as a repository of genetic information depends in part on its inherent stability. The chemical transformations that do occur are generally very slow in the absence of an enzyme catalyst. The long-term storage of information without alteration is so important to a cell, however, that even very slow reactions that alter DNA structure can be physiologically significant. Processes such as carcinogenesis and aging may be intimately linked to slowly accumulating, irreversible alterations of DNA. Other, nondestructive alterations also occur and are essential to function, such as the strand separation that must precede DNA replication or transcription. In addition to providing insights into physiological processes, our understanding of nucleic acid chemistry has given us a powerful array of technologies that have applications in molecular biology, medicine, and forensic science. We now examine the chemical properties of DNA and a few of these technologies. Double-Helical DNA and RNA Can Be Denatured Solutions of carefully isolated, native DNA are highly viscous at pH 7.0 and room temperature (25 °C). When such a solution is subjected to extremes of pH or to temperatures above 80 °C, its viscosity decreases sharply, indicating that the DNA has undergone a physical change. Just as heat and extremes of pH denature globular proteins, they also cause denaturation, or melting, of double helical DNA. Disruption of the hydrogen bonds between paired bases and of base-stacking interactions causes unwinding of the double helix to form two single strands, completely separate from each other along the entire length or part of the length (partial denaturation) of the molecule. No covalent bonds in the DNA are broken **(See Fig** of reversible denaturation and (annealing) renaturation of DNA below). Renaturation of a partially denatured DNA molecule is a rapid one-step process, as long as a double-helical segment of a dozen or more residues still unites the two strands. When the temperature or pH is returned to the range in which most organisms live, the unwound segments of the two strands spontaneously rewind, or **anneal**, to yield the intact duplex (**see figure below**). However, if the two strands are completely separated, renaturation occurs in two steps. In the first, relatively slow step, the two strands "find" each other by random collisions and form a short segment of complementary double helix. The second step is much faster: the remaining unpaired bases successively come into register as base pairs, and the two strands "zipper" themselves together to form the double helix. ![](media/image71.png) **Figure showing Reversible denaturation and annealing (renaturation) of DNA.** The close interaction between stacked bases in a nucleic acid has the effect of decreasing its absorption of UV light relative to that of a solution with the same concentration of free nucleotides, and the absorption is decreased further when two complementary nucleic acid strands are paired. This is called the hypochromic effect. Denaturation of a double-stranded nucleic acid produces the opposite result: an increase in absorption called the hyperchromic effect. The transition from double-stranded DNA to the denatured, single-stranded form can thus be detected by monitoring UV absorption at 260 nm. Viral or bacterial DNA molecules in solution denature when they are heated slowly. Each species of DNA has a characteristic denaturation temperature, or melting point (*t*m; formally, the temperature at which half the DNA is present as separated single strands): the higher its content of G≡C base pairs, the higher the melting point of the DNA. This is primarily because, as we saw earlier, G≡C base pairs make greater contributions to base stacking than do A=T base pairs. Thus the melting point of a DNA molecule, determined under fixed conditions of pH and ionic strength, can yield an estimate of its base composition. If denaturation conditions are carefully controlled, regions that are rich in A=T base pairs will denature while most of the DNA remains double-stranded. Such denatured regions (called bubbles) can be visualized with electron microscopy. In the strand separation of DNA that occurs in vivo during processes such as DNA replication and transcription, the site where strand separation is initiated is often rich in A=T base pairs, as we shall see. **Figure showing** **heat denaturation of two DNA specimens**. The temperature at the midpoint of the transition (*tm*) is the melting point; it depends on pH and ionic strength and on the size and base composition of the DNA. The temperature at the midpoint of the transition (*tm*) is the melting point; it depends on pH and ionic strength and on the size and base composition of the DNA. ![](media/image73.png) Relationship between *tm* and the G + C content of a DNA. \[Source: (b) Adapted from J. Marmur and P. Doty, *J. Mol. Biol.* 5:109, 1962.\] Duplexes of two RNA strands or one RNA strand and one DNA strand (RNA-DNA hybrids) can also be denatured. Notably, RNA duplexes are more stable to heat denaturation than DNA duplexes. At neutral pH, denaturation of a double-helical RNA often requires temperatures 20 °C or more higher than those required for denaturation of a DNA molecule with a comparable sequence, assuming that the strands in each molecule are perfectly complementary. The stability of an RNA-DNA hybrid is generally intermediate between that of RNA and DNA duplexes. The physical basis for these differences in thermal stability is not known. **WORKED EXAMPLE on DNA Base Pairs and DNA Stability** In samples of DNA isolated from two unidentified species of bacteria, X and Y, adenine makes up 32% and 17%, respectively, of the total bases. What relative proportions of adenine, guanine, thymine, and cytosine would you expect to find in the two DNA samples? What assumptions have you made? One of these species was isolated from a hot spring (64 °C). Which species is most likely the thermophilic bacterium, and why? **Solution:** For any double-helical DNA, A = T and G = C. The DNA from species X has 32% A and therefore must contain 32% T. This accounts for 64% of the bases and leaves 36% as G≡C pairs: 18% G and 18% C. The sample from species Y, with 17% A, must contain 17% T, accounting for 34% of the base pairs. The remaining 66% of the bases are thus equally distributed as 33% G and 33% C. This calculation is based on the assumption that both DNA molecules are double-stranded. The higher the G + C content of a DNA molecule, the higher the melting temperature. Species Y, having the DNA with the higher G + C content (66%), most likely is the thermophilic bacterium; its DNA has a higher melting temperature and thus is more stable at the temperature of the hot spring. Nucleotides and Nucleic Acids Undergo Nonenzymatic Transformations Purines and pyrimidines, along with the nucleotides of which they are a part, undergo spontaneous alterations in their covalent structure. The rate of these reactions is generally *very slow*, but they are physiologically significant because of the cell's very low tolerance for alterations in its genetic information. Alterations in DNA structure that produce permanent changes in the genetic information encoded therein are called **mutations**, and much evidence suggests an intimate link between the accumulation of mutations in an individual organism and the process of aging and carcinogenesis. Several nucleotide bases undergo spontaneous loss of their exocyclic amino groups (deamination). 1\. For example, under typical cellular conditions, deamination of cytosine (in DNA) to uracil occurs in about one of every 107 cytidine residues in 24 hours. This rate of deamination corresponds to about 100 spontaneous events per day, on average, in a mammalian cell. Deamination of adenine and guanine occurs at about 1/100th this rate. The slow cytosine deamination reaction seems innocuous enough, but it is almost certainly the reason why DNA contains thymine rather than uracil. The product of cytosine deamination (uracil) is readily recognized as foreign in DNA and is removed by a repair system. If DNA normally contained uracil, recognition of uracils resulting from cytosine deamination would be more difficult, and unrepaired uracils would lead to permanent sequence changes as they were paired with adenines during replication. Cytosine deamination would gradually lead to a decrease in G≡C base pairs and an increase in A=U base pairs in the DNA of all cells. Over the millennia, cytosine deamination could eliminate G≡C base pairs and the genetic code that depends on them. Establishing thymine as one of the four bases in DNA may well have been one of the crucial turning points in evolution, making the long-term storage of genetic information possible. **Some well-characterized nonenzymatic reactions of nucleotides involving: (a)** Deamination reactions. Only the base is shown. ![](media/image75.png) **Some well-characterized nonenzymatic reactions of nucleotides involving: (b)** Depurination, in which a purine is lost by hydrolysis of the *N*-*β*-glycosyl bond. Loss of pyrimidines through a similar reaction occurs, but much more slowly. The resulting lesion, in which the deoxyribose is present but the base is not, is called an abasic site or an AP site (apurinic site or, rarely, apyrimidinic site). The deoxyribose remaining after depurination is readily converted from the *β*-furanose to the aldehyde form, further destabilizing the DNA at this position Another important reaction in deoxyribonucleotides is the hydrolysis of the *N*-*β*-glycosyl bond between the base and the pentose. The base is lost, creating a DNA lesion called an AP (apurinic, apyrimidinic) site or abasic site. Purines are lost at a higher rate than pyrimidines. As many as one in 105 purines (10,000 per mammalian cell) are lost from DNA every 24 hours under typical cellular conditions. Depurination of ribonucleotides and RNA is much slower and less physiologically significant. In the test tube, loss of purines can be accelerated by dilute acid. Incubation of DNA at pH 3 causes selective removal of the purine bases, resulting in a derivative called apurinic acid. Other non-enzymatic methods **Formation of pyrimidine dimers induced by UV light as a means of non-enzymatic DNA transformation** One type of reaction (on the left) results in the formation of a cyclobutyl ring involving C-5 and C-6 of adjacent pyrimidine residues. An alternative reaction (on the right) results in a 6-4 photoproduct, with a linkage between C-6 of one pyrimidine and C-4 of its neighbor. **(b)** Formation of a cyclobutane pyrimidine dimer introduces a bend or kink into the DNA. \[Source: (b) PDB ID 1TTD, K. McAteer et al., *J. Mol. Biol.* 282:1013, 1998.\] 2.Other reactions are promoted by radiation. UV light induces the condensation of two ethylene groups to form a cyclobutane ring. In the cell, the same reaction between adjacent pyrimidine bases in nucleic acids forms cyclobutane pyrimidine dimers. This happens most frequently between adjacent thymidine residues on the same DNA strand. A second type of pyrimidine dimer, called a 6-4 photoproduct, is also formed during UV irradiation. Ionizing radiation (x rays and gamma rays) can cause ring opening and fragmentation of bases as well as breaks in the covalent backbone of nucleic acids. Virtually all forms of life are exposed to energy-rich radiation capable of causing chemical changes in DNA. Near-UV radiation (with wavelengths of 200 to 400 nm), which makes up a significant portion of the solar spectrum, is known to cause pyrimidine dimer formation and other chemical changes in the DNA of bacteria and of human skin cells. We are subjected to a constant field of ionizing radiation in the form of cosmic rays, which can penetrate deep into the earth, as well as radiation emitted from radioactive elements, such as radium, plutonium, uranium, radon, 14C, and 3H. X rays used in medical and dental examinations and in radiation therapy of cancer and other diseases are another form of ionizing radiation. It is estimated that UV and ionizing radiations are responsible for about 10% of all DNA damage caused by environmental agents. 3. **Effect of deaminating and alkylating agents, other mutagens and nitrous oxide** DNA also may be damaged by reactive chemicals introduced into the environment as products of industrial activity. There are two prominent classes of such agents: (1) deaminating agents, particularly nitrous acid (HNO~2~) or compounds that can be metabolized to nitrous acid or nitrites, and (2) alkylating agents. Nitrous acid, formed from organic precursors such as nitrosamines and from nitrite and nitrate salts, is a potent accelerator of the deamination of bases. *Base analogs* such as 5-bromouracil and 2-aminopurine can be incorporated into DNA and are even more likely than normal nucleic acid bases to form transient tautomers that lead to transition mutations. 5-Bromouracil, an analog of thymine, normally pairs with adenine. However, the proportion of 5-bromouracil in the enol tautomer is higher than that of thymine because the bromine atom is more electronegative than is a methyl group on the C-5 atom. Thus, the incorporation of 5-bromouracil is especially likely to cause altered base-pairing in a subsequent round of DNA replication. **Base Pair with 5-Bromouracil.** This analog of thymine has a higher tendency to form an enol tautomer than does thymine itself. The pairing of the enol tautomer of 5-bromouracil with guanine will lead to a transition from TA to C-G. ![](media/image77.png) Other mutagens act by chemically modifying the bases of DNA. For example, nitrous acid (HNO2) reacts with bases that contain amino groups. Adenine is oxidatively deaminated to hypoxanthine, cytosine to uracil, and guanine to xanthine. Hypoxanthine pairs with cytosine rather than with thymine Uracil pairs with adenine rather than with guanine. Xanthine, like guanine, pairs with cytosine. Consequently, nitrous acid causes A-T G-C transitions. A different kind of mutation is produced by flat aromatic molecules such as the acridines. ![](media/image79.png) **Acridines.** Acridine dyes induce frameshift mutations by intercalating into the DNA, leading to the incorporation of an additional base on the opposite strand. These compounds intercalate in DNA that is, they slip in between adjacent base pairs in the DNA double helix. Consequently, they lead to the insertion or deletion of one or more base pairs. The effect of such mutations is to alter the reading frame in translation unless an integral multiple of three base pairs is inserted or deleted. In fact, the analysis of such mutants contributed greatly to the revelation of the triplet nature of the genetic code. Some compounds are converted into highly active mutagens through the action of enzymes that normally play a role in detoxification. A striking example is aflatoxin B1, a compound produced by molds that grows on peanuts and other foods. A cytochrome P450 enzyme (Section 26.4.3) converts this compound into a highly reactive epoxide (Figure 27.45). This agent reacts with the N-7 atom of guanosine to form an adduct that frequently leads to a G-C-to-T-A transversion. **flatoxin Reaction.** The compound, produced by molds that grow on peanuts, is activated by cytochrome P450 to form a highly reactive species that modifies bases such as guanine in DNA, leading to mutations. Bisulfite has similar effects. Both agents are used as preservatives in processed foods to prevent the growth of toxic bacteria. They do not seem to increase cancer risks significantly when used in this way, perhaps because they are used in only small amounts and make only a minor contribution to the overall levels of DNA damage. (The potential health risk from food spoilage if these preservatives were not used is much greater.) Alkylating agents can alter certain bases of DNA. For example, the highly reactive chemical dimethylsulfate can methylate a guanine to yield *O*6-methylguanine, which cannot base pair with cytosine ![](media/image81.png) **Chemical agents that cause DNA damage. (a)** Precursors of nitrous acid, which promotes deamination reactions. **(b)** Alkylating agents. Most generate modified nucleotides nonenzymatically. ![](media/image83.png) Alkylating agent such as dimethylsulfate can methylate a guanine to yield *O*6-methylguanine, which cannot base pair with cytosine. 4. **Mutations by reactive oxygen species** The most important source of mutagenic alterations in DNA is oxidative damage. Reactive oxygen species such as hydrogen peroxide, hydroxyl radicals, and superoxide radicals arise during irradiation or (more commonly) as a byproduct of aerobic metabolism. These species damage DNA through any of a large, complex group of reactions, ranging from oxidation of deoxyribose and base moieties to strand breaks. Of these species, the hydroxyl radicals are responsible for most oxidative DNA damage. Cells have an elaborate defense system to destroy reactive oxygen species, including enzymes such as catalase and superoxide dismutase that convert reactive oxygen species to harmless products. A fraction of these oxidants inevitably escape cellular defenses, however, and are able to damage DNA. Accurate estimates for the extent of this damage are not yet available, but every day the DNA of each human cell is subjected to thousands of damaging oxidative reactions. This is merely a sampling of the best-understood reactions that damage DNA. Many carcinogenic compounds in food, water, or air exert their cancer-causing effects by modifying bases in DNA. Nevertheless, the integrity of DNA as a polymer is better maintained than that of either RNA or protein, because DNA is the only macromolecule that has the benefit of extensive biochemical repair systems. These repair processes greatly lessen the impact of damage to DNA. DNA ENZYMATIC TRANSFORMATION 5. Methylation by enzymes Some Bases of DNA Are Methylated Certain nucleotide bases in DNA molecules are enzymatically methylated. Adenine and cytosine are methylated more often than guanine and thymine. Methylation is generally confined to certain sequences or regions of a DNA molecule. In some cases, the function of methylation is well understood; in others, the function remains unclear. All known DNA methylases use *S*adenosyl methionine as a methyl group donor. *E. coli* has two prominent methylation systems. One serves as part of a defense mechanism that helps the cell to distinguish its DNA from foreign DNA by marking its own DNA with methyl groups and destroying DNA (that is, foreign DNA) without the methyl groups (this is known as a restriction-modification system.. The other system methylates adenosine residues within the sequence (5′)GATC(3′) to *N*6-methyladenosine ![](media/image85.png) Methyl groups are added by the Dam (*D*NA *a*denine *m*ethylation) methylase, a component of a system that repairs mismatched base pairs formed occasionally during DNA replication. **Methylation and mismatch repair.** Methylation of DNA strands can serve to distinguish parent (template) strands from newly synthesized strands in *E. coli* DNA, a function that is critical to mismatch repair The methylation occurs at the *N*6 of adenines in (5′)GATC sequences. This sequence is a palindrome present in opposite orientations on the two strands. In eukaryotic cells, about 5% of cytidine residues in DNA are methylated to 5-methylcytidine. Methylation is most common at CpG sequences, producing methyl-CpG symmetrically on both strands of the DNA. The extent of methylation of CpG sequences varies by region in large eukaryotic DNA molecules. In eukaryotic cells, about 5% of cytidine residues in DNA are methylated to 5-methylcytidine' ![](media/image87.png) Methylation is most common at CpG sequences, producing methyl-CpG symmetrically on both strands of the DNA. The extent of methylation of CpG sequences varies by molecular region in large eukaryotic DNA molecules. Methylation suppresses the migration of segments of DNA called transposons. These methylations of cytosine also have structural significance. The presence of 5-methylcytosine in an alternating CpG sequence markedly increases the tendency for that segment of DNA to assume the Z form **DNA TRANSFORMATION USING PLASMIDS** **Plasmid** An extrachromosomal, independently replicating, small circular DNA molecule; commonly employed in genetic engineering. In the laboratory, small plasmids can be introduced into bacterial cells by a process called **transformation**. The cells (often *E. coli*, but other bacterial species are also used) and plasmid DNA are incubated together at 0 °C in a calcium chloride solution, then are subjected to heat shock by rapidly shifting the temperature to between 37 °C and 43 °C. For reasons not well understood, some of the cells treated in this way take up the plasmid DNA. Some species of bacteria, such as *Acinetobacter baylyi*, are naturally competent for DNA uptake and do not require the calcium chloride--heat shock treatment. In an alternative method, called **electroporation**, cells incubated with the plasmid DNA are subjected to a high-voltage pulse, which transiently renders the bacterial membrane permeable to large molecules. Regardless of the approach, relatively few cells take up the plasmid DNA, so a method is needed to identify those that do. The usual strategy is to utilize one of two types of genes in the plasmid, referred to as selectable and screenable markers. A **selectable marker** either permits the growth of a cell (positive selection) or kills the cell (negative selection) under a defined set of conditions. The plasmid pBR322 provides markers for both positive and negative selection **(See Figure below)**. A **screenable marker** is a gene encoding a protein that causes the cell to produce a colored or fluorescent molecule. Cells are not harmed when the gene is present, and the cells that carry the plasmid are easily identified by the colored or fluorescent colonies they produce. Transformation of typical bacterial cells with purified DNA (never a very efficient process) becomes less successful as plasmid size increases, and it is difficult to clone DNA segments longer than about 15,000 bp when plasmids are used as the vector. **Figure Showing Use of pBR322 to clone foreign DNA in *E. coli* and identify cells containing the DNA.** \[Source: Elizabeth A. Wood, University of Wisconsin-Madison, Department of Biochemistry.