Genome 23: Functional Genomics, Proteomics, and Bioinformatics PDF

CHAPTER OUTLINE 23.1 Functional Genomics 23.2 Proteomics 23.3 Bioinformatics I: Overview of Computer Analyses and Gene Prediction 23.4 Bioinformat...

CHAPTER OUTLINE 23.1 Functional Genomics 23.2 Proteomics 23.3 Bioinformatics I: Overview of Computer Analyses and Gene Prediction 23.4 Bioinformatics II: Databases 23.5 Bioinformatics III: Homology A DNA microarray for measuring gene expression at the genome level. 23 Each spot on the array corresponds to a specific gene. The color of each spot, which is produced via computer imaging techniques, indicates the amount of RNA transcribed from that gene. A DNA microarray allows researchers to simultaneously analyze the expression of many genes. Alfred Pasieka/Science Source GENOMICS II: FUNCTIONAL GENOMICS, PROTEOMICS, AND BIOINFORMATICS Chapter 22 discussed how genomics involves the mapping of an function to create and maintain cells and, ultimately, to determine entire genome and the determination of a species’ complete DNA the traits of a given species. sequence. The amount of information contained within a species’ From a research perspective, functional genomics and pro- genome is enormous. The goal of functional genomics is to teomics can be broadly categorized in two ways: experimental and understand the roles of genetic sequences—DNA and RNA computational. The experimental approach involves the study of sequences—in a given species. In most cases, functional genomics groups of genes or proteins using molecular techniques in the lab- is aimed at understanding gene function. At the genomic level, oratory. In the first two sections in this chapter, we will focus on researchers can study genes as large groups. For example, the in- these techniques. In the last section, we will consider bioinformat- formation gained from a genome-sequencing project can help re- ics. As a very general definition, bioinformatics is the use of searchers study entire metabolic pathways. Such research provides computers, mathematical tools, and statistical techniques to re- a description of the ways in which gene products interact to carry cord, store, and analyze biological information. We often think of out cellular processes. bioinformatics in the context of examining genetic research data, Because most genes code proteins, a goal of many molecu- such as DNA sequences. In addition, bioinformatics can be ap- lar biologists is to understand the functional roles of all of the plied to information from other sources, such as clinical data. This proteins a species produces. The entire collection of proteins that rapidly developing branch of biology is highly interdisciplinary, a given cell or organism makes is called its proteome, and the incorporating principles from mathematics, statistics, information study of the functions and interactions of these proteins is termed science, chemistry, and physics. We will see how the field of bio- proteomics. An objective of researchers in the field of proteomics informatics has provided great insights in our understanding of is to understand the interplay among many proteins as they functional genomics and proteomics. 609 bro50795_ch23_609-630.indd 609 17/06/23 11:48 AM 610 C H A P T E R 2 3 :: GENOMICS II: FUNCTIONAL GENOMICS, PROTEOMICS, AND BIOINFORMATICS synthesized on the surface of the slide. In this case, the 23.1 FUNCTIONAL GENOMICS DNA sequence at a given spot is produced by selectively Learning Outcomes: controlling the growth of an oligonucleotide using narrow beams of light. Such oligonucleotides are typically 25–30 1. Describe the composition of a DNA microarray, and explain nucleotides in length. Hundreds of thousands of different how it is used. spots can be found on a single array. 2. Outline the method of RNA sequencing (RNA-Seq). 3. Define gene knockout, and explain why gene knockouts are Overall, the technology of making DNA microarrays is pretty useful. amazing. A DNA microarray is used as a hybridization tool, as shown in Figure 23.1. Though the rapid sequencing of genomes, particularly the human genome, has generated great excitement among geneticists, many 1. To begin this experiment, mRNAs were isolated from a would argue that an understanding of genome function is funda- sample of cells and then used to make fluorescently labeled mentally more interesting. In the past, our ability to study genes in- cDNA. In this simplified example, the cells made three volved many of the techniques described in Chapter 20, such as different mRNAs, from genes A, D, and F. gene cloning, Northern blotting, and gene editing. These approaches 2. The mRNAs were mixed with fluorescently labeled nucle- continue to provide a solid foundation for our understanding of gene otides and reverse transcriptase to make fluorescently function. More recently, genome-sequencing projects have enabled labeled cDNA. researchers to consider gene function at a more complex level. It is 3. The labeled cDNAs were then applied onto a DNA micro- now possible to analyze groups of many genes simultaneously to array. The cDNAs will be complementary to some of the determine how they work as integrated units to produce the charac- DNA spots in the microarray. The cDNAs bind to the DNA teristics of cells and the traits of multicellular organisms. in these spots—that is, they hybridize—thereby becoming In this section, we will examine two methods, DNA micro- bound to the microarray. arrays and RNA sequencing, that enable researchers to monitor 4. The array is then washed with a buffer to remove any un- the expression of thousands of genes simultaneously. We will also bound cDNAs and placed in a microscopy device called a consider why researchers are producing gene knockout collections laser scanner, which produces higher-resolution images in which each gene of a given species is separately inactivated in than a conventional optical microscope. The device scans order to understand gene function. each pixel—the smallest element in a visual image—and after correction for local background, the final fluorescence A DNA Microarray Can Quantify Gene intensity for each spot is obtained by averaging across the Transcription at the Whole Genome Level pixels in each spot. This results in a group of fluorescent In the 1990s, researchers developed a technology, called a DNA spots at defined locations in the microarray. microarray (also called a gene chip), that makes it possible to High intensity of fluorescence in a particular spot means that a large quantify the expression of thousands of genes simultaneously. A amount of the cDNAs in the sample hybridized to the DNA at that DNA microarray is a small silica, glass, or plastic slide that is dot- location. Because the DNA sequence of each spot is already known, ted with many different DNA sequences, each corresponding to a a fluorescent spot identifies cDNAs that are complementary to each short sequence within a known gene. For example, one spot in a DNA sequence. Furthermore, because the cDNAs were generated microarray may correspond to a sequence within the β-globin gene, from mRNAs, this technique identifies RNAs that have been made whereas another could correspond to a gene that codes actin, which in a particular cell type under a given set of conditions. is a cytoskeletal protein. A single slide may contain tens of thou- The technology of DNA microarrays has found many im- sands of different spots in an area the size of a postage stamp. The portant uses (Table 23.1). Its most common use is for studying relative location of each gene represented in the array is known. gene expression patterns. Such studies help us understand how How are microarrays made? genes are regulated in a cell-specific manner and how environ- ∙∙ Some are produced by spotting different samples of DNA mental conditions can induce or repress the transcription of genes. onto a slide, much like the way an inkjet printer works. For In some cases, microarrays can even help identify which genes many species, researchers know the entire genome se- code the proteins that participate in a complicated metabolic path- quence. With this information, they can make primers that way. Microarrays can also be used as identification tools. For ex- flank any given gene and use PCR to synthesize the DNA ample, gene expression patterns can aid in the categorization of from a specific gene. The DNA segments from many dif- various tumors. Such identification is used to determine the best ferent gene sequences are then individually spotted onto course of clinical treatment for a patient. the slide. Such DNA segments are typically 500–5000 Instead of using labeled cDNAs, researchers can also hy- nucleotides in length, and a few thousand to tens of thou- bridize labeled genomic DNA to a microarray. This technique can sands are spotted to make a single array. be used to identify mutant alleles in a population and to detect ∙∙ Alternatively, other microarrays contain shorter deletions and duplications. In addition, it is proving useful in cor- DNA segments—oligonucleotides—that are directly rectly identifying closely related bacterial species and subspecies. bro50795_ch23_609-630.indd 610 17/06/23 11:48 AM 23.1 Functional Genomics 611 A D D A mixture of 3 A portion of a DNA microarray. different types of F Each spot contains single-stranded mRNA F A DNA from a specific gene. A B A D F Add reverse transcriptase, poly-dT primers that anneal to the polyA C D tail of all mRNAs, and fluorescent nucleotides. Note: Only 1 complementary DNA strand is made. E F Fluorescently A labeled cDNA that D D is complementary to the mRNA F A A F D F Hybridize cDNAs to spots on the microarray. A B C D E F View with a laser scanner. FIGURE 23.1 Using a DNA microarray to study gene expression. A mixture of mRNAs isolated from a sample of cells is used to create cDNAs that are fluorescently labeled. The cDNAs are applied to the microarray. In this simplified example, three cDNAs specifically hybridize to spots on the microarray. In an actual experiment, the sample produces hundreds or thousands of different cDNAs and the array contains tens of thousands of different spots. After hybridization, some spots become fluorescent and can be visualized using a laser scanner. CONCEPT CHECK: Explain how this experiment provides information regarding the expression of genes. bro50795_ch23_609-630.indd 611 17/06/23 11:48 AM 612 C H A P T E R 2 3 :: GENOMICS II: FUNCTIONAL GENOMICS, PROTEOMICS, AND BIOINFORMATICS TAB L E 23.1 Applications of DNA Microarrays Application Description Cell-specific gene expression A comparison of microarray data using cDNAs derived from RNAs of different cell types can identify genes that are expressed in a cell-specific manner. Gene regulation Environmental conditions play an important role in gene regulation. A comparison of microarray data may reveal genes that are induced under one set of conditions and repressed under another. Elucidation of metabolic pathways Genes that code proteins that participate in a common metabolic pathway are often expressed in a parallel manner. This application overlaps with the study of gene regulation via microarrays. Tumor profiling Different types of cancer cells exhibit striking differences in their profiles of gene expression. Such a profile can be revealed by a DNA microarray analysis. This approach can be used as a method to classify tumors that are sometimes morphologically indistinguishable, which may provide information that can improve a patient’s clinical treatment. Genetic variation A mutant allele may not hybridize to a spot on a microarray as well as a wild-type allele. Therefore, microarrays have been used as a tool for detecting genetic variation. For example, they are used to identify disease-causing alleles in humans and mutations that contribute to quantitative traits in plants and other species. Microbial strain identification Microarrays can distinguish between closely related bacterial species and subspecies. DNA-protein binding Chromatin immunoprecipitation, which is described in Chapter 15 (refer back to Figure 15.14), can be used with DNA microarrays to determine where in the genome a particular protein binds to the DNA. RNA Sequencing (RNA-Seq) Is a Newer Method GENETIC TIPS THE QUESTION: Samples of liver cells for Identifying Expressed Genes were collected from a healthy donor and from an individual with liver cancer. mRNAs were isolated from both samples of cells and The transcriptome is the set of all RNA molecules, including subjected to DNA microarray analysis. In the results from the two mRNAs and non-coding RNAs, that are transcribed in one cell or samples, 77 spots on the microarray for the cells from the cancer in a population of cells. Researchers may focus on the identifica- patient were much brighter compared to those for the cells from the tion of each type of RNA molecule and also on the relative con- healthy donor. How would you interpret these results? Explain their centrations of the different types. The invention of next-generation meaning with regard to the growth of the cancer cells. (Note: Assume sequencing technologies, described in Chapter 22, has changed that each spot corresponds to a different gene.) the way in which transcriptomes are studied. A particularly excit- ing advance is RNA sequencing (RNA-Seq), which was devel- T OPIC: What topic in genetics does this question address? oped by Michael Snyder and colleagues in 2008. This method The topic is DNA microarray analysis. More specifically, the involves the sequencing of complementary DNAs (derived from question is about applying the technique to compare healthy cells RNAs) using next-generation DNA-sequencing methods. with cancer cells. RNA-Seq has several important applications. It is used to I NFORMATION: What information do you know based on the compare transcriptomes in the following ways: question and your understanding of the topic? From the ∙∙ In different cell types question, you know the results from a DNA microarray analysis ∙∙ In healthy versus diseased cells that compares healthy cells and cancer cells. From your ∙∙ At different stages of development understanding of the topic, you may remember that the brightness ∙∙ In response to different environmental agents, such as of a spot on a microarray indicates how much of the cDNA, which exposure to a hormone or to toxic chemicals was reverse transcribed from the cells’ mRNA, hybridized to the known DNA sequence at that location. Figure 23.2 outlines a general strategy for RNA-Seq, but some steps may vary depending on the types of RNA molecules that are P ROBLEM-SOLVING S TRATEGY: Analyze data. Compare being studied and the method of next-generation sequencing that and contrast. One strategy to solve this problem is compare the is used. The procedure always begins with the isolation of RNA results from the cancer cells and the healthy cells and relate them molecules from a sample of one or more types of cells. Research- to the transcription levels in the cells. ers or clinicians may want to analyze the entire population of RNAs, or they may want to analyze a subpopulation. For example, ANSWER: The cancer cells are overexpressing 77 genes that the nor- if researchers wanted to focus on eukaryotic mRNAs, they could mal cells are not overexpressing. The overexpression of some of these “pull out” mRNAs by using heavy beads that are attached to polyT genes is likely to be contributing to the cancerous growth. oligonucleotides. PolyT oligonucleotides bind to the polyA tails of eukaryotic mRNAs. Because the polyT oligonucleotides are bro50795_ch23_609-630.indd 612 17/06/23 11:48 AM 23.1 Functional Genomics 613 Isolate RNA from a sample of cells. In some cases, a researcher may want to focus on attached to heavy beads, the mRNAs can be separated a subpopulation of RNAs, such as mRNAs or short non-coding RNAs. The illustration below shows three different types of RNAs in different colors. In an actual from the rest of the RNAs by centrifugation. experiment, there would be hundreds or thousands of different RNAs. Note: The green RNA is highly expressed, and the red RNA is expressed at a low level. After the desired population of RNAs has been obtained, the next step is to produce cDNAs. First, the RNAs are fragmented into small pieces. One way to make cDNAs is to attach short segments of DNA, called linkers, to the 5′ and 3′ ends of the RNA frag- Break the RNAs into small ments. After the attachment of linkers, primers are fragments. added that are complementary to the linkers, and Fragments of RNA cDNAs are made via reverse transcriptase PCR, which is described in Chapter 20. The population of cDNAs is then subjected to next-generation DNA sequencing (see Chapter 22). This DNA sequencing produces a diverse collection of cDNA sequences. The next phase of RNA-Seq is to compare the Attach short oligonucleotide Oligonucleotide linkers linkers to the ends of the RNAs. collection of cDNA sequences with the already known genome sequence of the organism from which the RNA was isolated. In other words, the cDNA se- quences are aligned with the genomic DNA sequence (as shown at the bottom of Figure 23.3). This phase is accomplished with computer technology. When a Synthesize cDNAs via reverse cDNA sequence aligns with a gene sequence within transcriptase PCR, using the RNAs Double-stranded cDNAs as templates. The PCR primers are the genome, this result means that the gene was complementary to the linkers. expressed, because each cDNA sequence is derived from an RNA molecule. Also, as discussed in Chapter 12, eukaryotic pre-mRNAs often undergo splicing and alternative splicing (refer back to Figures 12.20 and 12.21). The alignment of cDNAs with the genomic DNA allows researchers to deter- Sequence the cDNAs using a next- generation sequencing technology. mine the pattern of RNA splicing that is found in a GG C particular cell type under a given set of conditions. GC CG GC ATGGC TT G C T CC ATGGC G CG GC C C AA GG CA T T GG TACCG AA C T GA T T A A CCG CC T G AAAT AA GG G T CC A CT TTC T CG A ATGGC A G CG G TTA GGG GC T TACCG AA C T GG CGC CT CC GG C GG C CC GGGCG A G C CA TA AT TA GG CCCGC G T C CC C GG CCC GC C GG T T GG TC TG CG CC CG AAG T G GA G AA GC T A C GG A G CC A G GA AA C GC GG CGC CGAT TA TTC T CT T C C C GGTA Using computer technology, align the cDNA F I G U R E 2 3. 2 The technique of sequences along the genomic sequence. RNA sequencing (RNA-Seq). This sim- plified example involves a population of ATGGC GGGCG TTCCT three different RNA molecules, each TACCG CCCGC AAGGA shown in a different color. Protein-coding ATGGC GGGCG TTCCT genes in complex eukaryotes typically TACCG CCCGC AAGGA contain introns, which are spliced out of ATGGC GGGCG TTCCT the pre-mRNAs. The alignment at the TACCG CCCGC AAGGA bottom of the figure corresponds to the ATGGC GGGCG TTCCT TACCG CCCGC AAGGA sequences of the mature mRNAs. The gaps between the cDNA sequences in the ATGGC GGGCG TTCCT GCGGC GAAAT CCCAT TACCG CCCGC AAGGA CGCCG CTTTA GGGTA alignment indicate the locations of in- ATGGC GGGCG TTCCT CCCCA CGATG GATTC GCGGC GAAAT CCCAT trons. (Note: In actual RNA-Seq experi- TACCG CCCGC AAGGA GGGGT GGTAC CTAAG CGCCG CTTTA GGGTA ments, the RNA fragments are much longer than the ones shown here. Though ATGGC GGGCG TTCCT CCCCA CGATG GATTC GCGGC GAAAT CCCAT the optimal length varies depending on TACCG CCCGC AAGGA GGGGT GGTAC CTAAG CGCCG CTTTA GGGTA the method of DNA sequencing used, a Gene A Gene B Gene C common length for the RNA fragments Region of genomic DNA containing 3 genes, and their corresponding cDNA sequences designated A, B and C is 100–300 nucleotides.) bro50795_ch23_609-630.indd 613 17/06/23 11:49 AM 614 C H A P T E R 2 3 :: GENOMICS II: FUNCTIONAL GENOMICS, PROTEOMICS, AND BIOINFORMATICS Compared to the use of microarrays, RNA-Seq has several advantages: 23.1 COMPREHENSION QUESTIONS ∙∙ RNA-Seq is more accurate at quantifying the amount of 1. A DNA microarray is a slide that is dotted with each RNA transcript; the number of times that a particular a. mRNAs from a sample of cells. cDNA sequence aligns with a gene is a measurement of the b. fluorescently labeled cDNAs. gene’s expression level. c. known sequences of DNA. ∙∙ It is superior at detecting RNA transcripts that are in low d. known cellular proteins. abundance. ∙∙ It identifies the precise boundaries between exons and introns 2. For the method of RNA sequencing (RNA-Seq), which of the following is the correct order of steps? and allows researchers to discover new splice variants. a. Isolate RNAs, synthesize cDNAs, fragment RNAs, sequence ∙∙ It identifies the 5′ and 3′ ends of RNA transcripts and aids cDNAs, align cDNA sequences in the identification of transcriptional start sites. b. Synthesize cDNAs, sequence cDNAs, isolate RNAs, fragment RNAs, align cDNA sequences Gene Knockout Collections Allow Researchers c. Isolate RNAs, fragment RNAs, synthesize cDNAs, sequence to Study Gene Function at the Genomic Level cDNAs, align cDNA sequences d. Synthesize cDNAs, isolate RNAs, fragment RNAs, sequence One broad goal of functional genomics is to determine the func- cDNAs, align cDNA sequences tions of all of the genes in a species’ genome. Because each spe- cies has thousands of different genes, this is a very complicated 3. A gene knockout results in a gene task. One approach to achieving this goal is to produce collections a. whose function has been inactivated. of organisms of the same species in which each strain has one of b. that has been transferred to a different species. its genes knocked out. For example, in E. coli, which has 4377 c. that has been moved to a new location in the genome. different genes, a complete knockout collection would be com- d. that has been eliminated from a species during evolution. posed of 4377 different strains, with a different gene knocked out in each strain. A gene knockout is the alteration of a gene in a way that inactivates its function. Why are knockout collections useful? Consider, for exam- 23.2 PROTEOMICS ple, the phenotype produced by a particular gene knockout, which Learning Outcomes: causes deafness in mice. Such a result suggests that the function of 1. List reasons why the proteome is larger than the genome. the normal gene is critical for hearing. Geneticists may also pro- 2. Outline the techniques of two-dimensional gel electrophore- duce knockouts involving two or more genes to understand how sis and tandem mass spectroscopy, and explain why they the protein products of genes participate in a particular cellular are used. pathway or contribute to a complex trait. In addition, gene knock- 3. Describe two different types of protein microarrays, and outs in mice are used in the study of inherited human diseases. For discuss their uses. example, as discussed in Chapter 21, gene knockouts have been used to study sickle cell disease. Knockout collections are made in different ways. One way is Thus far, we have considered ways to characterize the genome via transposable elements. When a transposable element is in- of a given species and study its function. Because most genes serted into a gene, it often inactivates the gene’s function. Another code proteins, a logical next step is to examine the functional way to produce knockouts is via CRISPR-Cas technology, which roles of the proteins that a cell or a species can make. As noted was described in Chapter 20. In 2006, the National Institutes of earlier in this chapter, this field is called proteomics, and the Health (NIH) launched the Knockout Mouse Project. The goal of entire collection of a cell’s or organism’s proteins is its this program is to build a comprehensive and publicly available proteome. resource comprising a collection of mouse embryonic stem cells Genomics represents only the first step in our comprehen- (ES cells) containing a loss-of-function mutation in every gene in sive understanding of protein structure and function. Researchers the mouse genome. often use genomic information to initiate proteomic studies, but The NIH Knockout Mouse Project collaborates with other such information must be followed up with research that involves large-scale efforts to produce mouse knockouts, including one in the direct analysis of proteins. For example, as discussed in Sec- Canada, called the North American Conditional Mouse Mutagenesis tion 23.1, the use of DNA microarrays and RNA-Seq can provide Project (NorCOMM), and one in Europe, called the European Con- insights regarding the level of transcription of particular genes. ditional Mouse Mutagenesis Program (EUCOMM). The collective However, an mRNA level may not provide an accurate measure of goal of these programs is to create at least one loss-of-function muta- the abundance of a protein coded by a given gene. Protein levels tion in each of the approximately 20,000 protein-coding genes in the are greatly affected, not only by the levels of mRNAs, but also by mouse genome. In addition, knockout collections are currently the rate of mRNA translation and by the turnover rate for a given available for other model organisms, including E. coli, S. cerevisiae, protein. Therefore, data from DNA microarrays and RNA-Seq and C. elegans. must be corroborated using other methods, such as Western bro50795_ch23_609-630.indd 614 17/06/23 11:49 AM pre-mRNA Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Exon 6 blotting (discussed in Chapter 20), which directly determine the Alternative splicing abundance of a protein in a given cell type. Translation As we move forward in the twenty-first century, the study of proteomes represents a key challenge facing molecular biologists. Exon 2 Exon 5 Much like genomics, proteomics will require the collective contri- Exon 1 Exon 4 Exon 6 butions of many research scientists, as well as improvements in or technologies that are aimed at unraveling the complexities of the proteome. In this section, we will discuss the phenomena that in- crease protein diversity beyond genetic diversity. In addition, we Exon 3 Exon 5 Exon 1 Exon 4 Exon 6 will see how the techniques of two-dimensional gel electrophore- or sis and mass spectrometry can be used to isolate and identify cellular proteins and how protein microarrays are used to study protein expression and function. Exon 2 Exon 6 Exon 1 Exon 4 (a) Alternative splicing The Proteome Is Much Larger Than the Genome From the sequencing and analysis of an entire genome, researchers Permanent modifications can identify all or nearly all of the genes for a given species. The Proteolytic entire proteome of a species, however, is larger than the genome, processing and its actual size is somewhat more difficult to determine. What phenomena account for the larger size of the pro- teome? First, changes in pre-mRNAs may ultimately affect the SH SH Disulfide bond S S resulting amino acid sequence of a protein. formation ∙∙ The most important alteration to pre-mRNAs that occurs Attachment of Heme commonly in eukaryotic species is alternative splicing molecules, such group (Figure 23.3a). A single pre-mRNA coded by a particular a heme group, gene is frequently spliced into more than one mRNA. The sugar, or phospholipid splicing is often cell-specific, or it may be related to envi- ronmental conditions. As discussed in Chapter 12, alterna- tive splicing is widespread, particularly among complex Sugar multicellular organisms. It can lead to the production of several or perhaps dozens of different polypeptide se- quences from the same pre-mRNA, which greatly in- creases the number of potential proteins in the proteome. Phospholipid ∙∙ Similarly, RNA editing, the process of making a change in the nucleotide sequence of RNA after it has been tran- scribed (see Chapter 12), can lead to changes in the coding Reversible modifications sequence of an mRNA. However, RNA editing is much less common than alternative splicing. Phosphate Phosphorylation PO42− group Another process that greatly diversifies the composition of a proteome is the phenomenon of posttranslational covalent O Acetylation modification (Figure 23.3b). C CH3 Acetyl group ∙∙ Certain types of modifications can occur during the as- sembly and construction of a functional protein. These al- Methylation Methyl terations include proteolytic processing, disulfide bond CH3 group formation, and the covalent attachment of molecules, such as sugars or lipids. These are typically irreversible changes that are necessary to produce a functional protein. (b) Posttranslational covalent modification ∙∙ Other types of alterations, such as phosphorylation, acety- lation, and methylation, are often reversible modifications F I G URE 2 3. 3 Cellular mechanisms that increase protein that transiently affect the function of a protein. diversity. (a) Due to alternative splicing, the patterns of exons that re- main in mature mRNAs can be different, creating multiple types of tran- A protein may be subject to several different types of posttrans- scripts from the same gene. (b) After a protein is made, it can be modified lational covalent modification, which can greatly increase the in a variety of ways, some of which are permanent and some reversible. number of different forms of the protein found in a cell at any These changes are called posttranslational covalent modifications. given time. CONCEPT CHECK: Explain how these mechanisms affect protein diversity. bro50795_ch23_609-630.indd 615 17/06/23 11:49 AM 616 C H A P T E R 2 3 :: GENOMICS II: FUNCTIONAL GENOMICS, PROTEOMICS, AND BIOINFORMATICS Protein Purification Methods Are Often Used their chemical and/or physical properties. A sample containing a in Proteomics mixture of many different proteins is dissolved in a liquid solvent and exposed to some type of matrix, such as a gel, a column con- As we have just discussed, a species’ proteome is usually much taining beads, or a strip of paper. Appendix A describes the general larger than its genome. However, any given cell within a complex principles of chromatography (see Section A-3). multicellular organism produces only a subset of the proteins One type of chromatography that can be used to separate found in the proteome of that species. For example, the human and purify proteins is two-dimensional (2D) gel electrophoresis. genome has approximately 20,000 different protein-coding genes, This technique can separate hundreds or even thousands of differ- yet a muscle cell expresses only a subset of those genes at signifi- ent proteins within a cell extract. The steps in this procedure are cant levels, perhaps 12,000 or so. The proteins a cell makes de- shown in Figure 23.4. As its name suggests, the technique involves pend primarily on the type of cell, the stage of development, and two sequential gel electrophoresis procedures. the environmental conditions. An objective of researchers in the field of proteomics is the identification and functional character- 1. A sample of cells is lysed, and the proteins are loaded onto ization of all the proteins a particular type of cell can make. Be- the top of a tube gel that separates them according to their cause cells produce thousands of different proteins, this is a net charge at a given pH. daunting task. Nevertheless, as in genomics, the past decade has 2. A protein migrates to the point in the gel where its net seen important advances in researchers’ ability to isolate and charge is zero. This step is termed isoelectric focusing. identify specific proteins. 3. After the tube gel has been run, it is laid horizontally on top The first step in protein identification is to purify the protein of a polyacrylamide slab gel, which is a flat, platelike gel of interest. Most methods of protein purification rely on the tech- that contains sodium dodecyl sulfate (SDS). The SDS coats nique of chromatography, which separates proteins based on the proteins with negative charges and denatures them. Lyse a sample of cells and pH 4.0 pH 10.0 load the resulting mixture 200 kDa of proteins onto an isoelectric focusing gel. pH 4.0 Proteins migrate until they reach the pH where their net charge is 0. At this point, a single band could contain 2 or more different proteins. pH 10.0 10 kDa (b) An SDS slab gel that has been stained to visualize proteins Lay the tube gel onto an SDS gel following 2D gel electrophoresis and separate proteins according to SDS gel their molecular mass. pH 4.0 pH 10.0 200 kDa F I G URE 2 3. 4 Using two-dimensional gel elec- trophoresis to separate a mixture of cellular proteins. (a) The technique involves two electrophoresis proce- dures. First, a mixture of proteins is separated on an iso- electric focusing gel that has the shape of a tube. Proteins migrate to the pH in the gel where their net charge is zero. This tube gel is placed into a long well on top of a sodium dodecyl sulfate (SDS) polyacrylamide gel. This SDS slab gel separates the proteins according to their molecular mass. In this diagram, only a few spots are shown, but an actual experi- ment involves a mixture of hundreds or thousands of different proteins. (b) A photograph of an SDS slab gel that has been stained to visualize proteins following 2D gel electrophoresis. Each spot represents a unique 10 kDa cellular protein. (a) The technique of two-dimensional gel electrophoresis SPL/Science Source bro50795_ch23_609-630.indd 616 17/06/23 11:49 AM 23.2 Proteomics 617 4. As proteins move through the SDS slab gel, they are separated N according to their molecular mass. Smaller proteins move Purified protein toward the bottom of the gel more quickly than larger ones. C 5. After the SDS slab gel has run, the proteins within the gel are stained with a dye. As seen in Figure 23.4, the end re- sult is a collection of spots, each of which corresponds to a Digest protein into unique cellular protein, with the spots for proteins having small fragments greater molecular masses remaining higher in the gel. using a protease. The resolving power of 2D gel electrophoresis is extraordinary. N Proteins that differ by a single charged amino acid can be resolved as two distinct spots using this method. C Various approaches can be followed to identify spots on a 2D gel that may be of interest to researchers. ∙∙ One possibility is that the gel for a given cell type may Determine the mass show a few very large spots that are not found when pro- of these fragments with a first spectrometry step. teins from other cell types are analyzed. The relative abun- dance of those spots may indicate that a particular protein is important to that cell’s structure or function. 1652 Da ∙∙ Secondly, certain spots on a gel may be seen only under a given Abundance set of conditions. For example, a researcher may be interested in the effects of a hormone on the function of a particular cell type. Two-dimensional gel electrophoresis could be conducted on a sample in which the cells had not been exposed to the 0 4000 hormone and on another sample in which they had. Compari- Mass/charge son of the results may reveal particular spots that are present only when the cells are exposed to the given hormone. Analyze this fragment with ∙∙ Abnormal cells, such as cancer cells, often express proteins a second spectrometry step. that are not found in normal cells. A researcher may compare The peptide is fragmented from one end. normal and cancer cells via two-dimensional gel electropho- resis to identify proteins expressed only in the cancer cells. Abundance 1201 1428 1652 Mass Spectrometry Is Used to Identify Proteins 1008 1315 1565 1114 Two-dimensional gel electrophoresis may be used as the first step in the separation of cellular proteins. The next goal is to correlate a given spot on a 2D gel with a particular protein. To accomplish this goal, a 900 1800 spot on a 2D gel can be cut out of the gel to obtain a tiny amount of Mass/charge the protein within the spot. In essence, the two-dimensional gel elec- –Asn Ser Asn Leu His Ser– trophoresis purifies a small amount of the cellular protein of interest. The next step is to identify that protein. This can be accomplished via mass spectrometry, a technique that accurately measures the mass of a molecule, such as a peptide that is a fragment of a protein. Figure 23.5 shows how mass spectrometry can be used to Short peptide sequences, such determine the amino acid sequence of a protein. The procedure as this one, are used to search a shown here has two mass spectrometry steps and therefore is database and identify the protein. called tandem mass spectrometry. It begins with a purified pro- tein that is digested into small peptides. These peptides are sub- F I G URE 2 3. 5 The use of tandem mass spectrometry to de- jected to the first mass spectrometry step. Figure 23.5 does not termine the amino acid sequence of a peptide and identify a protein. show the steps in mass spectrometry, but they are listed here: CONCEPT CHECK: What is the purpose of tandem mass spectrometry? 1. The peptides are mixed with an organic acid, and the mixture is dried on a metal slide. 3. The charged peptides are then accelerated via an electric 2. The sample is then subjected to a laser beam. This causes field and fly toward a detector. The time it takes for them to the peptides to be ejected from the slide in the form of an reach the detector is directly related to their mass/charge ratio, ionized gas in which each peptide contains one or more which provides an extremely accurate way of determining the positive charges. mass of a peptide. bro50795_ch23_609-630.indd 617 17/06/23 11:49 AM 618 C H A P T E R 2 3 :: GENOMICS II: FUNCTIONAL GENOMICS, PROTEOMICS, AND BIOINFORMATICS In Figure 23.5, the first mass spectrometry step determines the TA B L E 23.2 masses of six different peptides; we will focus on the second pep- Some Applications of Protein Microarrays tide, which has a mass of 1652 daltons (Da). This 1652-Da peptide is then broken down into many smaller fragments. The mixture of Application Description smaller fragments is analyzed by a second mass spectrometry step. Protein expression An antibody microarray can measure protein The second step reveals the amino acid sequence of the expression because each antibody in a given spot peptide, because the masses of all 20 amino acids are known. recognizes a specific amino acid sequence. Such a microarray can be used to study the expression of ∙∙ As shown at the bottom of Figure 23.5, the starting peptide proteins in a cell-specific manner. It can also be had a mass of 1652 Da. used to determine how environmental conditions ∙∙ When one amino acid at the end was removed, the resulting affect the levels of particular proteins. smaller peptide had a mass that was 87 Da less (i.e., 1565 Da); Protein function The substrate specificity and enzymatic activities of this indicated that a serine is at one end of the peptide, because groups of proteins can be analyzed by exposing a the mass of serine within a polypeptide chain is 87 Da. functional protein microarray to a variety of substrates. ∙∙ When two amino acids were removed at one end, the mass Protein-protein The ability of proteins to interact with each other was 224 Da less; this corresponds to the removal of one interactions can be determined by exposing a functional protein microarray to fluorescently labeled proteins. serine (87 Da) and one histidine (137 Da). ∙∙ When three amino acids were removed, the mass was de- Pharmacology The ability of drugs to bind to cellular proteins can creased by 337 Da; this corresponds to the removal of one ser- be determined by exposing a functional protein microarray to different kinds of labeled drugs. This ine (87 Da), one histidine (137 Da), and one leucine (113 Da). type of experiment can help to identify the proteins Therefore, from these measurements, we conclude that the amino within a cell to which a given drug may bind. acid sequence at one end of the 1652-Da peptide is serine- histidine-leucine. technology of protein microarrays. In addition, the synthesis and pu- How does this information lead to the identification of a spe- rification of proteins tend to be more time-consuming than the cific protein? As discussed in Chapter 22, the genome sequences of production of DNA, which can be amplified by PCR or directly syn- many species have already been determined. This information has thesized on the microarray itself. In spite of these technical difficul- allowed researchers to predict the amino acid sequences of most ties, the last few years have seen progress in the production and uses proteins that such species make and enter those sequences into a of protein microarrays (Table 23.2). database. With computer software described in Section 23.5, the The two common types of protein microarrays are antibody amino acid sequences obtained by tandem mass spectrometry can microarrays and functional protein microarrays. The purpose of an be used as query sequences to search a large database that contains antibody microarray is to quantify the amounts of particular pro- a collection of protein sequences. The computer program may lo- teins that are made by cells. Antibodies are proteins that are made cate a match between the experimental sequences and a specific by the immune systems of mammals and recognize antigens. One protein within a particular species. In this way, tandem mass spec- type of antigen that an antibody can recognize is a short peptide trometry makes it possible to identify a purified protein. sequence found within another protein. Therefore, an antibody can Tandem mass spectrometry can also identify covalent modi- specifically recognize a cellular protein. Researchers and commer- fications that have been made to proteins. For example, if an amino cial laboratories can produce different antibodies; each antibody acid within a peptide was phosphorylated, the mass of the peptide recognizes a specific peptide sequence. The antibodies can be spot- is increased by the mass of a phosphate group. This increase in ted onto a microarray. Cellular proteins can then be isolated, fluo- mass can be detected via tandem mass spectrometry. rescently labeled, and exposed to the antibody microarray. When a given protein is recognized by an antibody on the microarray, it is captured by the antibody and remains bound to that spot. The level Protein Microarrays Are Used to Study Protein of fluorescence at a given spot indicates the amount of the protein Expression and Function that is recognized by the particular antibody. Earlier in this chapter, we considered DNA microarrays, which The other type of array is a functional protein microarray. have been widely used to study gene expression at the RNA level. To make this type of array, researchers must purify cellular pro- The technology for making DNA microarrays is also being ap- teins and then spot them onto a microarray. The microarray can plied to make protein microarrays. In this type of technology, then be analyzed with regard to specific kinds of protein function. proteins, rather than DNA molecules, are spotted onto a small In 2000, for example, Heng Zhu, Michael Snyder, and colleagues silica, glass, or plastic slide. purified 119 proteins from yeast that were known to function as The production of protein microarrays is more challenging than protein kinases. These kinds of proteins attach phosphate groups the production of DNA microarrays because proteins are much more to other cellular proteins. A microarray was made consisting of easily damaged by the manipulations that occur during microarray different possible proteins that may or may not be phosphorylated formation. For example, the three-dimensional structure of a protein by these 119 kinases, and then the array was exposed to each of may be severely damaged by drying, which usually occurs during the kinases in the presence of radiolabeled ATP. By following the preparation of a microarray. This tendency for damage to occur has incorporation of phosphate into the proteins on the array, the created additional challenges for researchers who are developing the researchers determined the protein specificity of each kinase. bro50795_ch23_609-630.indd 618 17/06/23 11:49 AM 23.3 Bioinformatics I: Overview of Computer Analyses and Gene Prediction 619 On a much larger scale, the same group of researchers puri- In recent years, the marriage between genetics and computa- fied 5800 different yeast proteins and spotted them onto a microar- tional tools has yielded an important branch of science known as ray. The array was then exposed to fluorescently labeled calmodulin, bioinformatics. As mentioned earlier, bioinformatics is the use of which is a regulatory protein that binds calcium ions. Several pro- computers, mathematical tools, and statistical techniques to re- teins in the microarray were found to bind calmodulin. Although cord, store, and analyze biological information. In this section, we some of these were already known to do so, others that had not first consider the fundamental concepts that underlie the analysis been previously known to bind calmodulin were identified. of genetic sequences. We then explore how these methods are used to provide insights into functional genomics and proteomics. Chapter 29 will describe applications of bioinformatics in the area 23.2 COMPREHENSION QUESTIONS of evolutionary biology. In addition to reading this section, you may wish to actu- 1. Which of the following is a reason why the proteome of a ally run computer programs, which are widely available at uni- eukaryotic cell is usually much larger than its genome? versity and government websites (e.g., see www.ncbi.nlm.nih a. Alternative splicing.gov/Tools). This type of hands-on learning will help you to see b. RNA editing how valuable the computer has become as a tool for analysis of c. Posttranslational covalent modifications genetic data. d. All of the above are reasons for the larger size of the proteome of a eukaryotic cell. Sequence Files Are Analyzed by Computer 2. During two-dimensional gel electrophoresis, proteins are Programs separated based on Most people are familiar with computer programs, which consist a. their net charge at a given pH. of a series of operations that can manipulate and analyze data in a b. their mass. desired way. Simple computer programs that work on mobile de- c. their ability to bind to a specific resin. vices are referred to as apps (short for “applications”). In func- d. both a and b. tional genomics, many different types of computer programs have been designed. For example, a computer program can begin with 3. The technique of tandem mass spectrometry is used to determine a DNA sequence within a protein-coding gene and translate it into a. the amino acid sequence of a peptide fragment. an amino acid sequence. b. the nucleotide sequence of a segment of RNA. A first step in the computer analysis of genetic data is the c. the nucleotide sequence of a segment of DNA. creation of a computer data file to store the data. Such a file is d. the number of genes in a species’ genome. simply a collection of information in a form suitable for storage 4. Which of the following can be analyzed using a protein microarray? and manipulation on a computer. In genetic studies, a computer a. The amounts of particular proteins made by a sample of cells data file might contain an experimentally obtained DNA, RNA, or b. Protein function amino acid sequence. For example, a file could contain the DNA sequence of one strand of the lacY gene from Escherichia coli c. Protein-protein interactions (Figure 23.6). The numbers to the left represent the position in the d. All of the above sequence file of the first base in each row. 23.3 BIOINFORMATICS I: 1 51 AT G TA C TAT T C T T T TA C T T T TAAAAAACAC T T TAT C AT G G A A AC T T T TG G GAGCCTACTT ATG T TC G G T T C C C G T T T T TC TAT T C T T T T T CCGATTTGGC OVERVIEW OF COMPUTER 101 151 TAC ATG AC AT G C TAT T T C T C C A AC C ATATC TG T TC TC G C T AGCAAAGTG AT TAT T C C A A ATAC G G G TAT CCGCTGTTTG TAT T T TG C C GTCTGCTTC ANALYSES AND GENE 201 251 TGACAAACTC TAG T C AT G T T GGGCTGCGCA TGCGCCGTTC A ATAC C TG C T T T TAT T T T TA G TG G AT TAT T TCTTCGGGCC AC G G C AT G T AC TG T TAC A A 301 TAG TAG G ATC G AT TG T TG G T G G TAT T TAT C TAG G C T T T TG PREDICTION TAC A AC AT T T 351 T T T TA AC G C C GGTGCGCCAG CAGTAGAGGC AT T TAT T G AG AAAGTCAGCC 401 GTCGCAGTAA T T T C G A AT T T GGTCGCGCGC GGATGTTTGG CTGTGTTGGC Learning Outcomes: 451 TGGGCGCTGT G T G C C T G AT TGTCGGCATC ATG T TC AC C A TC A ATA ATC A 501 GTTTGTTTTC TGGCTGGGCT CTGGCTGTGC AC TC ATC C TC G C C G T T T TAC 551 TC T T T T TC G C CAAAACGGAT GCGCCCTCTT CTGCCACGGT TGCCAATGCG 1. Describe how sequence files may be analyzed by computer 601 GTAGGTGCCA ACCATTCGGC AT T TAG C C T T A AGTG G CAC TGGAACTGTT AAACTGTGGT T T T G T C TAC T G TAT G T TAT T GGCGTTTCCT programs. 651 CAGACAGCCA 701 GCACCTACGA TG T T T T TG AC CAACAGTTTG C TA AT T T C T T TAC T TC G T TC 2. Outline different strategies for identifying gene sequences. 751 801 T T TG C TAC C G GGGCGCAATTA GTGAACAGGG CTTAACGCCT TACGCGGGTA C G AT TAT C T T TTTGGC TAC G CTTGCGCCA TA AC GACA AT C TG ATC AT TA 851 ATCGCATCGG TGGGAAAAAC GCCCTGCTGC TGGCTGGCAC TAT TAT C T C T 901 C TAC G TAT TA T TG G C TC ATC GTTCGCCACC TCAGCGCTGG A AG TG G T TAT Geneticists use computers to collect, store, manipulate, and ana- 951 1,001 TCTGAAAACG T TA A ATATAT C TG C ATATG T TACCAGCCAG T TGA AGTAC C TTTGAAGTGC GTTCCTGCTG G T T T T TC AG C GTGGGCTGCT

Genome 23: Functional Genomics, Proteomics, and Bioinformatics PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue