Lecture 1 - Tan: Bioanalytical Techniques PDF
Document Details
Uploaded by DignifiedKangaroo
Nanyang Technological University
Assoc Prof Tan Meng How
Tags
Summary
This lecture covers the fundamentals of biomolecules, including DNA, RNA, and proteins within the context of bioanalytical techniques. It outlines the structure, function and interactions of these molecules. The course is likely at an undergraduate level.
Full Transcript
CH4306 Bioanalytical Techniques Assoc Prof TAN Meng How N1.2-B2-33 [email protected] 1 Course Administration • Lectures: Tuesdays 2.30-5.20pm (LT15) • Textbook: Andreas Manz, Petra Dittrich, Nicole Pamme, and Dimitri Iossifidis. Bioanalytical Chemistry, Imperial College Press (2015, 2nd Edition...
CH4306 Bioanalytical Techniques Assoc Prof TAN Meng How N1.2-B2-33 [email protected] 1 Course Administration • Lectures: Tuesdays 2.30-5.20pm (LT15) • Textbook: Andreas Manz, Petra Dittrich, Nicole Pamme, and Dimitri Iossifidis. Bioanalytical Chemistry, Imperial College Press (2015, 2nd Edition). • Grading scheme: Quiz 1 (20%) Quiz 2 (20%) Final Exam (60%) • Topics: All 8 chapters in the textbook, plus functional genomics and enzymology 2 Lecture 1 – Biomolecules Loading… 3 Lecture Outline • DNA (Hereditary Information) • RNA (Coding and Non-Coding) • Proteins (Traditional Workhorses) • The Human Genome (Who We Are) 4 Lecture Outline • DNA (Hereditary Information) • RNA (Coding and Non-Coding) Loading… • Proteins (Traditional Workhorses) • The Human Genome (Who We Are) 5 What is DNA? • Building blocks of DNA: Deoxyribonucleotides • Each building block is composed of: (1) Phosphoric acid/ phosphate (2) Sugar (deoxyribose) (3) Base AT , 26 • The four DNA bases: adenine, thymine, cytosine, guanine - Adenine and guanine belong to the double-ringed class of molecules called purines (abbreviated as R). - Cytosine and thymine are all pyrimidines (abbreviated as Y). DNA base-pairing • The most fundamental role of DNA in the cell is in the storage and retrieval of biological information. • DNA in a cell is double-stranded: - Adenine forms two hydrogen bonds with thymine (A = T) - Cytosine forms three hydrogen bonds with guanine (C ≡ G) • The two DNA strands are reverse complement of each other. • An example (the KRAS oncogene): 5’-ATGACTGAATATAAACTTGTGGTAGTTGGAGCTGGTGGCGTAGGCAAG … -3’ 3’-TACTGACTTATATTTGAACACCATCAACCTCGACCACCGCATCCGTTC … -5’ Chargaff’s Rules Erwin Chargaff was a biochemist who had analysed the purine and pyrimidine base content in DNA from a variety of organisms. He observed that although the base composition varied from species to species, in all the organisms that he studied, the percentage of A equalled that of T and the percentage of G equalled that of C. 8 DNA strands have directionality 5’ end has a free phosphate group Phosphodiester bond (in yellow) A nucleotide subunit (in grey) DNA is negatively charged 3’ end has a free hydroxyl group DNA double helix • The double-helix model of DNA structure was deduced from X-ray diffraction images of DNA • James Watson, Francis Crick, and Maurice Wilkins shared the 1962 Nobel Prize in Physiology or Medicine for the discovery Rosalind Franklin controversy • A British molecular biologist (born 1920, died 1958) • After obtaining her doctorate in physical chemistry from University of Cambridge, she spent three years in Paris learning X-ray diffraction techniques. • In the 1950s, she took beautiful X-ray photographs of DNA at King’s College London, where she was leading her own research group. • Maurice Wilkins mistook her for a technician and showed her photographs to James Watson and Francis Crick without her permission. Loading… 11 B-DNA helix • There are three types of DNA helices: A-DNA, B-DNA (most common), and Z-DNA. • B-DNA is a right-handed helix. • The bases form the core of the double helix, while the sugar/phosphate backbones are on the outside. • The helical axis passes through the central bases. • The two grooves between the backbones are called the major and minor groove based on their sizes. • B-DNA double helix makes one complete turn about its axis every 10.5 base pairs in solution. • Although DNA is a relatively rigid polymer, it has three significant degrees of freedom - bending, twisting, and compression 12 Comparison of DNA helices - - A-DNA: Right-handed Planes of bases are tilted 20o relative to axis 0.23nm rise between base pairs Broadest helix type Z-DNA: Left-handed Base pairs are rotated Exists transiently in the cell, as conformation is unstable Occasionally induced by biological activity (e.g. transcription) and then quickly disappears. 13 Landmark paper It has not escaped our notice that the specific pairing we have postulated suggests a possible copying mechanism for the genetic material. A-T and G-C base pairings are also called Watson-Crick base pairings 14 DNA Replication • DNA replication is the process by which a double-stranded DNA molecule is copied to produce two identical DNA molecules. • Replication is an essential process because, whenever a cell divides, the two new daughter cells must contain the same genetic information, or DNA, as the parent cell. • DNA replication occurs at an extraordinarily high fidelity. The error rate or mutation rate is approximately 1 nucleotide change per 109 nucleotides each time the DNA is replicated. • The DNA replication machinery is highly conserved from bacteria to human. The mutation rate is roughly the same for all organisms. • DNA replication occurs at a very fast rate. DNA is duplicated at rates as high as 1000 nucleotides per second. Base-pairing underlies DNA replication • Since A can only pair with T, while C can only pair with G, each strand of DNA can serve as a template. • DNA replication is semi-conservative. DNA synthesis is catalyzed by DNA polymerase • Substrates: Single stranded DNA, deoxyribonucleoside triphosphates (dNTPs). • The polymerase catalyzes the stepwise addition of a deoxyribonucleotide to one end of the primer strand. • The reaction is driven by a large favorable free-energy change, caused by the release of pyrophosphate, which is further hydrolyzed to inorganic phosphate. • The structure of DNA polymerase resembles a right hand in which the palm, fingers, and thumb grasp the DNA. • The DNA polymerase has a proof-reading capability. Recall: The two DNA strands in a double helix are anti-parallel 5’-ATGGATTTATCTGCTCTTCG-3’ 3’-TACCTAAATAGACGAGAAGC-5’ (This is part of the BRCA1 gene, which can cause breast cancer when mutated.) 18 Two possible models for DNA replication 1) 5’-ATGGATTTATCTGCTCTTCG-3’ 3’-TACC... Both strands grow continuously. 5’-ATGG... 3’-TACCTAAATAGACGAGAAGC-5’ 2) 5’-ATGGATTTATCTGCTCTTCG-3’ ...AAGC-5’ 5’-ATGG... 3’-TACCTAAATAGACGAGAAGC-5’ DNA polymerization can occur in only one direction. 19 The incorrect model No 3’-to-5’ DNA polymerase has ever been found! Why is the simplest model incorrect? DNA replication occurs only in the 5’ to 3’ direction • The replication fork has an asymmetric structure: - The DNA daughter strand that is synthesized continuously is known as the leading strand. - The daughter strand that is synthesized discontinuously is known as the lagging strand. The DNA synthesized on the lagging strand must be made initially as a series of short DNA molecules called Okazaki fragments. For the lagging strand, the direction of nucleotide polymerization is opposite to the overall direction of DNA chain growth. More details on the lagging strand • Enzymes involved: - DNA primase synthesizes short RNA primers, which are approximately 200 nucleotides apart. - DNA polymerase extends from a RNA primer, until it reaches another primer. - RNase H erases RNA primers, thereby leaving gaps. - DNA polymerase fills in the gaps. - DNA ligase seals two consecutive fragments to produce a longer continuous DNA molecule. DNA replication: Preserving and propagating the cellular message A new daughter strand is assembled on each parent strand Uses complementary base pairing, and requires a series of enzymes Highly regulated process DNA packing in eukaryotes A human cell's DNA totals ~3 meters in length. All this DNA has to fit into a tiny nucleus of 5-10µm in diameter. This is like trying to stuff a piece of string 2km long into a tiny bead smaller than 1cm! To do this seemingly impossible feat, cells devised an ingenious packaging system: it wraps DNA around proteins called histones. The resulting DNA-protein complex is called chromatin. What is a gene? • A gene is the molecular unit of heredity of a living organism. It refers to some stretches of DNA that code for a polypeptide or for an RNA chain that has a function in the organism. • Living beings depend on genes, as they specify all proteins and functional RNA chains. • Genes hold the information to build and maintain an organism's cells and pass genetic traits to offspring. • E. coli has ~4000 genes, Saccharomyces cerevisiae has ~6000 genes, while human has ~20,000 genes • An operon contains a cluster of genes under the control of a single promoter. The genes are transcribed together into only one mRNA strand. Operons are commonly found in bacteria. • Example of an operon (lac operon): Gene structures • Prokaryote: UTR: untranslated region RBS: ribosome binding site • Eukaryote (particular in higher organisms): Alternative splicing generates protein diversity DNA and histones can be modified 28 Lecture Outline • DNA (Hereditary Information) • RNA (Coding and Non-Coding) Loading… • Proteins (Traditional Workhorses) • The Human Genome (Who We Are) 29 What is RNA? • Like DNA, RNA is assembled as a chain of nucleotides. • However, there are some important differences between DNA and RNA. The four DNA bases: adenine, thymine, cytosine, guanine The four RNA bases: adenine, uracil, cytosine, guanine Different sugars are used in DNA and RNA Deoxyribonucleotid e Ribonucleotide The 2’ free hydroxyl group is highly reactive and makes RNA unstable No B-RNA helix is possible Steric clash 32 What does RNA look like in the cell? • Unlike DNA, RNA exist as single-stranded molecules that can fold back on themselves to form complex secondary structures. • Stem-loop structures are commonly observed in RNA molecules. (Structure of a rRNA) • RNA structure is dynamic and can change depending on biological context and what other molecules bind to the RNA. Riboswitches A riboswitch is a regulatory segment of a messenger RNA molecule that binds a small molecule, resulting in a change in production of the proteins encoded by the mRNA. Ribozymes • RNA molecules can form complex shapes and have reactive functional groups. • Unlike DNA, some RNA molecules can function as catalysts, just like protein enzymes. These RNA catalysts are known as ribozymes. • In 1989, Thomas Cech and Sidney Altman shared the Nobel Prize in chemistry for their "discovery of catalytic properties of RNA. • It is now possible to make ribozymes that will specifically cleave any RNA molecule. (E.g. a ribozyme has been designed to cleave the RNA of HIV ) 35 Different types of RNA in the cell 36 Non-coding RNAs • Not all RNAs in the cell go on to produce proteins. • A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. • For decades, scientists thought that there were only two classes of ncRNAs – tRNAs and rRNAs. • Now, we know that there are other important classes of ncRNAs, such as microRNAs that are involved in RNA silencing and long ncRNAs (lncRNAs). • Recently, numerous unannotated RNA transcripts have been uncovered. Are they protein-coding or non-coding? - Check coding potential - Check conservation - Perform further experiments What are tRNAs? • tRNA is involved in the translation of mRNA into proteins. • One end of the tRNA contains an anticodon loop that pairs with three basepairs (codon) in the mRNA to specify a certain amino acid. • The other end of the tRNA has the amino acid attached to the 3' OH group via an ester linkage. • There is a specific tRNA for each amino acid, 20 in all. From DNA to RNA • Transcription • 5’ end capping • Splicing of pre-mRNA • 3’ end polyadenylation • Nuclear export of mature mRNA Some of the steps may occur concurrently. 39 Transcription Creating RNA from a DNA Template • DNA bases are exposed • One of the two strands of the DNA double helix, the antisense strand, acts as a template • The nucleotide sequence of the RNA chain is determined by complementary base-pairing • The RNA chain is elongated one nucleotide at a time • RNA molecules produced by transcription are single strands Key Players of Transcription DNA template (with promoter) The promoter is the site where the transcription machinery binds for the initiation of transcription. The two DNA strands are separated in that region, and the RNA polymerase then begins the transcription process. RNA polymerase The enzyme catalyzes the formation of the phosphodiester bonds that link the nucleotides together to form a linear chain. Ribonucleoside triphosphates (NTPs) Many RNA transcripts can be synthesized simultaneously • Once synthesized, the RNA strand does not remain hydrogen bonded to the DNA template. Instead, the RNA chain behind the RNA polymerase is displaced and the DNA double helix re-forms. • The almost immediate release of the RNA strand from the DNA means that many RNA copies can be made from the same gene in a short time. The synthesis of additional RNA molecules starts before the first RNA is completed. • Over a thousand transcripts can be synthesized in an hour from a single gene. Transcription of two genes, as observed under the electron microscope Protein isoforms Alternative splicing is a regulated process whereby multiple proteins are produced from a single gene. In this process, particular exons of a gene may be included within or excluded from the final, processed mRNA produced from that gene. 4 di parts -> I part Skipping of I part ein compression of I protein only unect : shorter Shorter splice from Start spliced Rem end 43 An example of alternative splicing Cleavage sitaa poly A in part op main pation body . • α-tropomyosin is an integral component of the actin cytoskeleton. • Pink arrowheads indicate sites where cleavage and poly-A addition can occur. 44 RNA can be modified in >150 ways 45 Lecture Outline • DNA (Hereditary Information) • RNA (Coding and Non-Coding) • Proteins (Traditional Workhorses) • The Human Genome (Who We Are) 46 What are proteins? • Amino Acids: basic building blocks • Synthesis – Transcription of DNA to mRNA by NTPs and RNA polymerase – Translation of mRNA to protein by tRNA and ribosomes • The shape of a protein is important for its function. protein Primary Secondary Tertiary all di proteins combined Quaternary Agt Protein structures • Polypeptides can fold into two common secondary structures: X-helix - -helix --- The polypeptide backbone follows a helical path. There are 3.6 amino acid residues per turn of the helix. B-sheet - -sheet --- strands of protein lie adjacent to one another, interacting laterally via H bonds between backbone carbonyl oxygen and amino H atoms. The strands may be parallel or antiparallel. • A higher order structure is created by a combination of loops, -helices, and -sheets. & - helices B-sheet Proteins are modular PDE domain • A protein can contain several domains of known or unknown functions RapGap • The presence of a wellstudied domain in an uncharacterized protein can suggest the function of the protein SAM 127 NO • Novel non-natural proteins can be produced by swapping domains or joining different domains together Synthase St OK Functions of proteins Proteins have diverse biological functions, which can be classified into five main categories: • Structural proteins: glycoproteins, collagen, keratin • Catalytic proteins: enzymes • Transport proteins: hemoglobin, serum albumin • Regulatory proteins: hormones (insulin, growth hormones) • Protective proteins: antibodies, thrombin Kwashiorkor • Kwashiorkor is a severe form of malnutrition, caused by a deficiency in dietary protein. • The extreme lack of protein causes an osmotic imbalance particularly in the gastro-intestinal system causing swelling of the gut diagnosed as an edema or retention of water. • Classic symptoms include swelling of the ankles and feet as well as a distended abdomen. • Generally, the disease can be treated by adding protein to the diet; however, it can have a longterm impact on a child's physical and mental development. 51 Amino Acids • General Structure: α-carbon connected to four groups - amino group, carboxylic group, hydrogen atom, and a substituent group (R group) R=side chain (varies between different amino acids) • 20 amino acids found in living organisms. • The names for amino acids are often abbreviated to either three symbol or a one symbol short form (eg: Glycine, Gly, G). Classifications of amino acids • Amino acids can be assorted into six main groups, on the basis of their structure and the general chemical characteristics of their R groups. • Knowing the class of an amino acid is useful for predicting the impact of a particular mutation (e.g. a mutation from Asp to Glu is likely to have less impact than a mutation from Gly to His) Formation of polypeptides • Two amino acids can join to form a dipeptide. • Polypeptides are chains of ≥3 amino acids. 55 Translation Translation is the process whereby a mRNA molecule is decoded by a ribosome to produce a specific amino acid chain or polypeptide, which later folds into an active protein. • A mRNA sequence is decoded in sets of three nucleotides Translating an mRNA • Amino acids are added to the C’terminus end • The amino acid to be added is determined by complementary basepairing between the anticodon of tRNA and the next codon on the mRNA chain • The following cycle is repeated: - A spent tRNA with polypeptide sits in P-site of ribosome - A new tRNA binds to the adjacent vacant A-site on the ribosome - A peptide bond is formed between the existing polypeptide chain and the amino acid brought in by the new tRNA - The polypeptide is transferred to the new tRNA and the old tRNA leaves the P-site - The ribosome moves, so that the originally new tRNA is now at the P-site instead, leaving the A-site vacant 1. The Genetic Code The code is a triplet code 2. No gaps or overlaps between codons 3. The code is degenerate (≥1 codon per aa) 4. The 3rd base is often flexible (WOBBLE) 5. Some codons encode "stop" 6. The code is universal The Genetic Code uracil cytosine adenine VAA , UGA , UAG Gamine Reading frames in protein translation • In principle, three reading frames are possible from any mRNA sequence. • In reality, only one polypeptide chain will generally be produced. • A ribosomal frameshift allows alternative translation of an mRNA sequence by changing the open reading frame. This technique is commonly found in viruses, as it allows the virus to encode multiple types of proteins from the same mRNA. Binding of a ribosome to mRNA ① 5'cap & m7G 7--methylguarytecap : • A ribosome binding site (RBS) is a mRNA sequence to which ribosomes can bind and initiate translation. • In prokaryotes, it is a region 6-8 nucleotides upstream of the AUG codon called the Shine-Dalgarno sequence. The consensus sequence is AGGAGG; in E. coli, for example, the sequence is AGGAGGU • In eukaryotes, there is no Shine-Dalgarno sequence. Instead, the ribosomes recognize the 5’ cap of mature mRNAs. The 5’ cap consists of a guanine nucleotide connected to mRNA via an unusual 5’-to-5’ triphosphate linkage. This guanosine is methylated on the 7 position directly after capping in vivo by a methyltransferase. It is referred to as a 7-methylguanylate cap, abbreviated m7G. • For viruses, ribosomes recognize a nucleotide sequence known as the internal ribosome binding site (IRES). IRESes allow translation in a cap-independent manner. Post-translational modifications • Many proteins undergo post-translational modifications, which can alter their functions or regulate their activities • Different moieties can be attached to the proteins, such as - acetate - phosphate - lipids - carbohydrates - other peptides • Glycosylation refers to the enzymatic process that attaches glycans to proteins (and also lipids). Protein glycosylation is an important research area in the biopharmaceutical industry. Production of human therapeutics with incorrect glycosylation patterns can trigger undesirable side reactions or immune responses in patients. To have saper medicines -> . • Phosphorylation, a very common modification, is performed by a class of enzymes known as kinases, while the removal of the phosphate group is performed by phosphatases. Humans have ~500 distinct kinases. Protein glycosylation • Glycosylation refers to the attachment of sugar moieties to proteins. • Protein glycosylation has multiple functions in the cell: - The glycosylation pattern can serve to target the protein to a particular compartment. - The sugars can also act as ligands for receptors on the cell surface to mediate cell attachment or stimulate signal transduction pathways. - Because they can be very large and bulky, oligosaccharides can affect proteinprotein interactions by either facilitating or preventing proteins from binding to cognate interaction domains. • Glycosylation is thought to be the most complex post-translational modification due to the large number of enzymes involved. • Glycosylated proteins (glycoproteins) are found in almost all living organisms that have been studied. 63 Diversity of glycosylation Glycosylation increases the diversity of the proteome to a level unmatched by any other post-translational modification. The cell is able to facilitate this diversity, because almost every aspect of glycosylation can be modified, including: Glycosidic linkage – the site of glycan (oligosaccharide) binding Glycan composition – the types of sugars that are linked to a particular protein Glycan structure – branched or unbranched chains Glycan length – short- or long-chain oligosaccharides Many cell signaling pathways rely on protein phosphorylation • Regulation of proteins by phosphorylation is one of the most common modes of regulation of protein function. • The targeted protein can be in a phosphorylated form or a dephophorylated form. One of these two is an active form, while the other one is an inactive form. • Protein kinases and phosphatases work separately and in a balance to regulate the function of the targeted protein. • In bacteria, the phosphorylated residues are histidines, aspartates, and (to a smaller extent), tyrosines. • In eukaryotes, the phosphorylated residues are serines, threonines, or tyrosines. Two component signal transduction systems for environmental sensing • A typical system consists of (at least) two proteins, a sensor histidine kinase and a response regulator. • The histidine kinase detects a specific environmental stimulus through its (often periplasmic) sensor domain. This leads to a conformational change, resulting in ATP-dependent autophosphorylation of a invariant His residue. • His~P serves as the phosphate donor for the receiver domain of the cognate response regulator, resulting in phosphorylation of a conserved Asp residue • Frequently, the response regulator dimerizes after being phosphorylated. • This leads to an activation of the effector domain of the response regulator, mediating the cellular response, usually by mediating differential expression of specific target genes. In other words, response regulators are often transcription factors. • Reset of the system to pre-stimulus state is achieved by dephosphorylation, either by the intrinsic phosphatase activity exhibited by the sensor kinase or by other phosphatases. An example of a two component system • Activity of FixL histidine kinase is inhibited by oxygen. • In the absence of oxygen, FixL autophosphorylates at the membrane and initiates the signaling cascade via transphosphorylation of FixJ response regulator. • FixJ activates the transcription of fixK and FixK then activates the expression of high affinity terminal oxidases. • A negative feedback loop exists in the network design: FixK turns on FixT, which acts as an inhibitor of FixL by mimicking a response regulator. Lecture Outline • DNA (Hereditary Information) • RNA (Coding and Non-Coding) • Proteins (Traditional Workhorses) • The Human Genome (Who We Are) 68 What is a genome? • A genome is a cell’s or an organism's complete set of hereditary information. • The genome is typically encoded in DNA, except for some viruses where the genome is encoded in RNA instead. • The genome includes all the genes and the non-coding sequences of the DNA. • Each genome contains all of the information needed to build and maintain that organism. • Characteristics of an organism’s genome include number of chromsomes, genome size, gene order, codon usage bias, GC-content, number of repetitive elements etc. Parts of a Genome • Structural Genes DNA segments that code for some specific RNAs or proteins (e.g. mRNAs and tRNAs) • Functional Sequences Regulatory elements, including promoters, operators, and insulators • Non-Functional Sequences Introns and repetitive sequences. Used to be thought of as mostly “junk”, but evidence suggest that they might be functional in reality. 70 Timeline of key genome projects 1977 First DNA genome – Bacteriophage Φ-X174 (1) First mitochondrion genome 1982 First shotgun sequenced genome – Bacteriophage lambda (2) First prokaryotic genome – Haemophilus influenzae (3) First unicellular eukaryotic genome – Yeast 1998 First multicellular eukaryotic genome – Caenorhabditis elegans (4) First insect genome - Drosophila melanogaster 2000 First plant genome - Arabidopsis thaliana 2001 Draft human genome published (5) Draft mouse genome published See Genome OnLine Database (https://gold.jgi.doe.gov/) for completed and ongoing genome sequencing projects. Living organisms have a wide range of genome sizes 72 The Human Genome The Human Genome Project is an international scientific research project with the goal of determining the sequence of chemical basepairs that make up human DNA, and of identifying all the genes (and other functional elements) in the human genome. The project was declared complete in 2003. An analogy to the human genome stored on DNA is that of instructions stored in a book: • The book (genome) would contain 23 chapters (chromosomes); • Each chapter contains 48 to 250 million letters (A,C,G,T) without spaces; • Hence, the book contains over 3.2 billion letters total; • The book fits into a cell nucleus the size of a pinpoint; • At least one copy of the book (all 23 chapters) is contained in most cells of our body. The only exception in humans is found in mature red blood cells, which become enucleated during development and therefore lack a genome. Public vs. Private Approaches Public: • • Project formally launched in 1990. World's largest collaborative biological project, which was performed in twenty universities and research centers in the United States, the United Kingdom, Japan, France, Germany, and China. Cost $3 billion. First draft announced in 2000 by Bill Clinton (U.S.) and Tony Blair (U.K.) and published in 2001. • • Private: • • • • Launched in 1998. Funded by Craig Venter and his firm Celera Genomics. $300 million. Relied upon data made available by the publicly funded project. Celera’s view of International Consortium Unfair competition: IC delivering the same goods but with state funding. International Consortium’s view of Celera Unfair competition: Celera delivering the same goods but can use IC data, while IC cannot use Celera data. 74 What are transposons? • A transposon is a small piece of DNA that inserts itself into another place in the genome. It was first observed in maize. • Barbara McClintock was awarded a Nobel Prize in Physiology or Medicine in 1983 for her discovery of transposons. 75 Two subclasses of transposons 1) Retrotransposons are genetic elements that can amplify themselves in a genome and are ubiquitous components of the DNA of many eukaryotic organisms. These DNA sequences use a "copy-and-paste" mechanism, whereby they are first transcribed into RNA, then converted back into identical DNA sequences using reverse transcription, and these sequences are then inserted into the genome at target sites. Retrotransposons are particularly abundant in plants. For example, in maize, 49–78% of the genome is made up of retrotransposons. 2) DNA transposons move in the genome of an organism via a single- or double-stranded DNA intermediate (no RNA involvement). The DNA transposons in the human genome today are no longer active and are thus called “fossils”. 76 Summary of the human genome Retrotransposons The human genome is full of transposon-based repetitive elements! Much of the human genome is unexplored! Types of repeats in the human genome 78 Satellite DNA • Satellite DNA consists of very large arrays of tandemly repeating, non-coding DNA. • Most satellite DNA is localized to the centromeric or telomeric region of the chromosome. • Satellite DNA is also a key constituent of heterochromatin. Centromere • A region of DNA that helps to ensure that the replicated chromosomes are moved correctly into the two daughter cells. • It is easily recognized as the most constricted part of the mitotic chromosomes (indicated by white arrows). • Spindle fibers (“ropes”) are attached to the centromere via the kinetochore (a multiprotein complex) during cell division. Telomere • Telomeres (labeled in red) are the caps at the end of each strand of DNA that protect our chromosomes, like the plastic tips at the end of shoelaces. • Telomeres get shorter each time a cell copies itself, but the important DNA stays intact (recall the Okazaki fragments). • Eventually, telomeres get too short to do their job, causing our cells to age and stop functioning properly. Therefore, telomeres act as the aging clock in every cell. 79 Pseudogenes Pseudogenes are genomic DNA sequences similar to normal genes but are considered to be (generally) non-functional; they are regarded as defunct relatives of functional genes. There are about 12,000 pseudogenes in the human genome! 80 Some facts about the human genome • Presently estimated Gene Number: 20,000 • Average Gene Size: 27 kb • The largest gene: Dystrophin 2.4 Mb - 0.6% coding – 16 hours to transcribe. • The shortest gene: tRNATYR - 100% coding • Largest exon: ApoB exon 26 is 7.6 kb; Smallest exon: <10bp • Average exon number: 9 • Largest exon number: 363 (Titin); Smallest exon number: 1 • Largest intron: WWOX intron 8 is 800 kb; Smallest intron: Tens of bp The UCSC Genome Browser (http://genome.ucsc.edu/) Some terminology • Genotype: The genetic makeup of a cell, an organism, or an individual; sum of all the genetic instructions encoded in the DNA. • Phenotype: The characteristics expressed by a cell; the outward appearance of an organism; observable traits that can be seen or measured, such as hair or eye color. • Mutation: A permanent change in the nucleotide sequence of the genome of an organism, virus, or extrachromosomal DNA or other genetic elements. • Allele: A particular form of a gene. Genes can acquire mutations in their sequence, leading to different variants, known as alleles, in the population. These alleles encode slightly different versions of a protein, which may cause different phenotype traits. • Dominance (in genetics): Expression of one allele over another allele.83 Loading… An example of genotype vs phenotype (Recessive allele = b; dominant allele = B) 84 Causes of mutations • Most mutations are due to errors in DNA replication • - Some mutations are due to environmental factors or external agents Smoking Excessive exposure to the sun (UV) 85 Types of mutations Point mutation: Results from the change of a single base Win 3rd base which is known as the webblebase Silent mutation: The nucleotide change occurs in the wobble base, such that the amino acid remains unchanged (e.g. UCU and UCA both code for serine) Missense mutation: a point mutation in which a single nucleotide change results in a codon that codes for a different amino acid Nonsense mutation: a point mutation in a sequence of DNA that results in a premature stop codon (a truncated and often nonfunctional protein is obtained) shortform of insections/ I deletions Insertions/ deletions (indels): nucleotides are inserted or deleted from a DNA sequence and may cause a frameshift if the number of bases added or removed is not a multiple of three Back mutations or reversions: a second mutation that converts the mutated DNA (from a first mutation) back to its original wild type sequence Suppressor mutations: a second mutation that occurs at a different site from the first mutation and restores the original wild type phenotype (note that the 86 genotype is not restored) . An example of a missense mutation versus a nonsense mutation & pay attention ↓ porple box Missense mutation: Missense - motation COT -> CTT GCA-> GAA Arg -> : Lew op Inoderfide -Missense motation as the new nudestide gives rise to another a 9 . : mutation Nonsense mutation: - CGA-TGA G27 + ACT since the new nudesfide codes por a stop codon : UAA UAG , UGA , it is a , missense motation Everytime a cell divides a motation , there is 87 . An example of a suppressor mutation Mutation in A : A and B cannot interact anymore a Leversion motation Was back A to original mudertidestate or a suppressor mutation occurs inB Either so that Bean now , interact wided A . 88 Conditional mutation physical only has an expect on the cell order certain conditions A conditional mutation is a mutation that has wild-type (or less severe) phenotype under certain "permissive" environmental conditions and a mutant phenotype under certain "restrictive" conditions. • A temperature-sensitive mutation can cause cell death at a high temperature (restrictive condition), but might have no deleterious consequences at a lower temperature (permissive condition). • A cold-sensitive mutation causes alteration in an essential protein so that it is inactivated at a low temperature instead; the mutant cells will not be able to grow at low temperatures that normally support growth of wild type cells. 28o C 35o C Mutations can occur outside coding regions • Mutations outside coding sequences can impact gene expression • Mutations can affect the following regulatory elements: - Promoter or enhancer sequences - Termination sequences - Splice donor and acceptor sites - Ribosome binding sites Throckmssome For coding sequences - Degradation signals If but diabetes Dre my gene transcription Dom very Dar away appect factors • Enhancers are cis-acting elements that specify where and when particular genes are expressed. By definition, they can act at a distance through chromosome looping and its function is independent of orientation. do not encude combine . and gene expression • Mutations in promoters or enhancers can affect the binding of transcription factors Large-scale mutations Examples: • Down syndrome: duplication of the entire chromosome other that duplicate will not have 21 viablechromosomes Profuses (Roetos will die before birth) . • A reciprocal translocation between chromosome 9 and chromosome 22 gives rise to a fusion gene, BCR-ABL, which is oncogenic and causes 95% of chronic myelogenous leukemia (CML) cases BCR-ABL hyperactive kinase that regulates cell proliferation : . 91 Mutations can also be classified by their impact on protein function Loss-of-function A Al • Amorph: complete loss-of-function; null protein • Hypomorph: reduction of protein’s ability to work Gain-of-function A • Hypermorph: increase in the protein’s function A• Antimorph: a protein that interferes with the wild type protein’s function; dominant negative ⑰ • Neomorph: acquisition of a new function A connective op figoue is madeup as "busy" Pibres The inbetwoch slide Ares good Dibues . . I which provides Porthe connective supportand organs tissue body aPRec : Single nucleotide polymorphisms Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation among people. Each SNP represents a difference in a single nucleotide. With the accumulation of whole genome sequences, we are now obtaining a comprehensive understanding of SNPs in different organisms, including human. ↑ database . dbSNP: the NCBI (National Center for Biotechnology Information) database of genetic variation 94 An example of genotype-tophenotype mapping Alcohol dehydrogenase Ship -> I position : creates hall-Ring also hel dehydrogenase a hypomorph . 95 An introduction to GWAS pow healthy + diseased individuals , perform whole gene/ship-arrays -> th was healthy itentiswipan sequencing ~a A genome-wide association study (GWAS) aims to find the associations between genetic variations and observable traits (e.g. major human diseases). Case (with disease) VS Control (no disease) diesea In GWAS, the DNA of two groups of participants are compared against each other. The first group may be people with a disease (cases) and the second group may be healthy people without the disease (controls). This approach is known as phenotype-first, in which the participants are classified first by their clinical manifestation(s), as opposed to genotype-first. Each person gives a sample of DNA, from which millions of genetic variants are read using SNP arrays. If one type of the variant (one allele) is more frequent in people with the disease, the variant is said to be associated with the disease. The associated SNPs are then considered to mark a region of the human genome that may influence the risk of disease. 96 GWAS: an overview CASES CONTROLS ships- actual word amy hearing) 97