Genome Sequencing Overview PDF
Document Details
Uploaded by WorthString7417
Philadelphia University - Jordan
Tags
Summary
This document provides an overview of genome sequencing, including chromosome walking, primer walking, and shotgun sequencing. It discusses different sequencing methods, their advantages and disadvantages, and how they are used to sequence genomes. The document also explores specific sequencing techniques like Illumina sequencing technology and its implementation.
Full Transcript
22.5 Overview of Genome Sequencing 593 unaffected and an affected individual. Each set of clones is sub- goal of physical mapping is to obtain a complete DNA sequence of jected to DNA sequencing, and those DNA sequences are com-...
22.5 Overview of Genome Sequencing 593 unaffected and an affected individual. Each set of clones is sub- goal of physical mapping is to obtain a complete DNA sequence of jected to DNA sequencing, and those DNA sequences are com- a chromosome or a genome. In a physical map, the DNA sequence pared with each other. When the researchers reach a spot where of a chromosome is annotated to show the locations of genes, mo- the DNA sequences differ between the unaffected and the affected lecular markers, and other sequences of interest. individual, such a site may be within the gene of interest. How- Due to their small sizes, the earliest genomes to be se- ever, this has to be confirmed by sequencing the region from sev- quenced were those of viruses, such as phage λ (discussed in eral unaffected and affected individuals to be certain the change in Chapter 18). As we will see in Section 22.6, the first genome se- DNA sequence is correlated with the disease. quence of a bacterium was obtained in 1995. Since then, the ge- Chromosome walking was invented at a time when inserting nomes of hundreds of thousands of species have been sequenced. DNA fragments into a cloning vector was needed to carry out di- Historically, genome sequencing in the 1990s and early deoxy sequencing of DNA. With the advent of new DNA sequenc- twenty-first century began with the insertion of chromosomal DNA ing methods that do not require cloning vectors, a modification to fragments into cloning vectors. The inserts within cloning vectors chromosome walking is a newer method called primer walking. were then subjected to DNA sequencing via the Sanger dideoxy This method is also called directed DNA sequencing because the sequencing method. (Methods of vector cloning and dideoxy se- first primer is designed from a known region of DNA and it guides quencing are discussed in Chapter 20.) However, as methods of the sequencing in a specific direction. DNA sequencing have advanced over the last few decades, the need Figure 22.7b shows the steps in the process. to insert chromosomal DNA fragments into vectors has become obsolete. In this section, we will focus on recent advances in ge- 1. A first primer is made that is complementary to one of the nome sequencing methods that do not rely on vector cloning. DNA strands where the marker is found. 2. Dideoxy sequencing is conducted in the 3′ direction to ob- Genome Sequencing Involves the Sequencing tain a sequence that is several hundred bases long. 3. The 3′ end of this sequence is used to make a second of Many Fragments of Chromosomal DNA primer. When sequencing an entire genome, researchers must consider 4. This second primer is used to sequence another stretch of factors such as genome size, the efficiency of the methods used to DNA, and the 3′ end of this second sequence is used to sequence DNA, and the costs of the project. Since genome-se- make a third primer. quencing projects began in the 1990s, researchers have learned 5. This process is repeated over and over until the gene of in- that the most efficient and inexpensive way to sequence genomes terest has been reached. is via an approach called shotgun sequencing. This sequencing strategy involves breaking the genome into a collection of many DNA fragments that are sequenced individually. The production 22.4 COMPREHENSION QUESTION of a genome sequence involves two main methods: 1. Chromosome walking is a technique in which a researcher be- 1. Determine the base sequence of many DNA fragments that gins at a specific site on a chromosome and analyzes are derived from chromosomal (genomic) DNA. __________________ until the gene of interest is reached. 2. Determine the order of the DNA fragments as they would a. bands on gel occur along the chromosome(s) in a given species’ b. a series of subclones genome. This order is deduced from overlaps of the DNA c. the ability of primers to bind to sites along a chromosome sequences among different DNA fragments, a process called de novo genome assembly. d. both a and c As a matter of chance, some DNA fragments will have overlap- ping regions, as shown schematically in Figure 22.8. The DNA 22.5 OVERVIEW OF GENOME sequences in the two different fragments are identical in the over- SEQUENCING lapping region. This allows researchers to order those fragments as they would occur in the intact chromosome. Learning Outcomes: De novo genome assembly is done with the aid of computer 1. Describe how shotgun sequencing and de novo genome software. The data from the sequencing of many DNA fragments are assembly are used to obtain a complete DNA sequence of a directly entered into sequence files. These files are analyzed using species’ genome. software that identifies overlapping regions. Based on these overlap- 2. Compare and contrast short-read sequencing and long-read ping regions, the software will generate a contiguous DNA sequence sequencing. that occurs along a chromosome. However, certain problems may be 3. Outline different methods of DNA sequencing, and explain encountered during the de novo genome assembly process: how Illumina sequencing technology works. ∙∙ Some regions of chromosomes, such as telomeres and cen- tromeres in eukaryotic species, may contain long stretches As described in Section 22.1, the distances between genes in a of highly repetitive sequences (refer back to Chapter 10). physical map are computed as the number of base pairs. A key The lengths of these regions are often difficult to bro50795_ch22_584-608.indd 593 17/06/23 11:47 AM 594 C H A P T E R 2 2 :: GENOMICS I: ANALYSIS OF DNA Overlapping region T T A C GG T A C C A G T T A C A A A T T C C A G A C C T A G T A C C A A T G C C A T GG T C A A T G T T T A A GG T C T GG A T C A T GG G A C C T A G T A C C GG A C T T A T T C G A T C C C C A A T T T T G C A T C T GG A T C A T GG C C T G A A T A A G C T A GGGG T T A A A A C G T A FI GURE 22.8 A comparison of two DNA fragments that contain an overlapping region. determine due to an inability to accurately identify over- Innovations in DNA Sequencing Have Made It Less laps within them. Expensive and Faster ∙∙ Even though researchers may sequence many DNA frag- ments, an occasional chromosomal region may not be in- Since DNA sequencing was invented in the 1970s, technological cluded as a matter of chance. This leaves one or more gaps advances have been aimed at making it less expensive and faster. in the physical map. Various methods can be used to close ∙∙ As discussed in Section 22.6, the Human Genome Project, gaps, such as using long-read DNA sequencing methods, which began in the early 1990s, was originally estimated to which are described next. cost about $3 billion to sequence a single genome. However, cost reductions due to innovations in DNA-sequencing tech- nology drove the actual cost down to about $300 million. By Genome Sequences Can Be Produced Using the end of the project, researchers estimated that if they were Short-Read and Long-Read DNA Sequencing starting again, they could have sequenced the genome for Methods less than $50 million. The project took about 13 years to DNA sequencing technologies can be broadly categorized as complete the sequencing of a single human genome. short-read sequencing or long-read sequencing depending on the ∙∙ In 2007, researchers undertook the sequencing of James length of the base sequences that are obtained from each fragment Watson’s genome, which cost less than $1 million. It took of chromosomal DNA. Short-read sequencing (SRS) produces 2 months to complete. base sequences up to a few hundred bases in length, whereas long- ∙∙ By 2011, the cost of sequencing a human genome had been read sequencing (LRS) typically generates sequences with reduced to about $5000 and could be completed in a matter lengths in the range from 10,000 to 30,000 bases, and sometimes of weeks. longer. Specific examples of short-read and long-read sequencing ∙∙ Depending on the method used, the sequencing of a single methods are described later in Table 22.2. human genome currently costs $600 or less, and it typi- What are the advantages and disadvantages of short-read cally takes one to two days! Such innovation makes it fea- versus long-read methods? sible to sequence an individual’s genome as a routine diagnostic procedure. Cost. SRS is usually less expensive than LRS, but the cost of LRS is coming down. The ability to rapidly sequence large amounts of genomic DNA is referred to as high-throughput sequencing. Different types of Accuracy. SRS is currently more accurate than LRS, but the technological advances have made this possible. accuracy of LRS is improving. De Novo Genome Assembly. LRS is better because the assembly ∙∙ First, different aspects of DNA sequencing have become process involves the alignment of far fewer DNA sequences, due automated, so samples can now be processed rapidly in a to the greater lengths of the fragments. LRS is also better at machine. For example, in Chapter 20, we considered how assembling regions that contain repetitive sequences. fluorescent labeling of nucleotides can automate the read- Detection of Genetic Variation. Both SRS and LRS are good at ing of a DNA sequence using a machine with a laser and a detecting genetic variation among different individuals, though fluorescence detector. SRS is considered better due to its higher accuracy. Even so, an ∙∙ High-throughput sequencers are able to use samples that advantage of LRS is that it can more easily determine if genetic contain mixtures of DNA fragments that have not been sub- variation in an offspring is inherited from the female or male par- jected to vector-based cloning. By comparison, the earliest ent. LRS is also better at detecting structural variations, such as methods of genome sequencing, such as the one featured an inversion. later in Figure 22.10, involved the insertion of DNA frag- ments into cloning vectors. At that time, dideoxy sequenc- Some challenges in genome sequencing can be resolved by ing required cloned DNA. The elimination of such cloning using a combination of both short-read and long-read technolo- steps in newer methods saves a great deal of time and money. gies. LRS can be used for an initial de novo assembly of an entire ∙∙ Another advance in sequencing technologies involves par- genome, and then SRS can be used to improve the accuracy of the allel sequencing, which allows multiple samples to be pro- sequencing. The use of SRS to improve the accuracy of LRS is cessed at once. The first parallel sequencing machines sometimes called “polishing” the DNA sequence. could simultaneously perform many sequencing runs via bro50795_ch22_584-608.indd 594 17/06/23 11:47 AM 22.5 Overview of Genome Sequencing 595 TA BL E 22.2 Examples of Next-Generation Sequencing Technologies Short or PCR Amplified DNA Technology* Long Read or Single Molecule Description Illumina Short read: a few hundred bases PCR amplified Adaptors are attached to the ends of short DNA fragments followed by a bridge amplification step. The sequences are determined by sequencing by synthesis, one nucleotide at a time, using fluorescently tagged dNTPs. See Figure 22.9. Ion-torrent (Thermo Fisher) Short read: several hundred bases PCR amplified Uses microwells on a semiconductor chip to measure changes in pH during the sequencing cycles. DNA nanoball Short read: