Real-Time DNA Sequencing from Single Polymerase Molecules PDF
Document Details
John Eid, * Adrian Fehr, * Jeremy Gray, * Khai Luong, * John Lyle, * Geoff Otto, * Paul Peluso, * David Rank, * Primo Baybayan, Brad Bettman, Arkadiusz Bibillo, Keith Bjornson, Bidhan Chaudhuri, Frede
Tags
Summary
This document details a study on real-time DNA sequencing from single polymerase molecules. The research focuses on using a zero-mode waveguide (ZMW) nanostructure to observe DNA synthesis, exploring polymerase dynamics, and analyzing sequence data. The authors aim to understand biophysical parameters of polymerase polymerization.
Full Transcript
REPORTS previously been shown in both birds and butter- 10. H. Gorton, T. Vogelmann, Plant Physiol. 112, 879 26. Q. O. N. Kay, H. S. Daoud, C. H. Stirton, Bot. J. Linn. Soc. flies that structural color can enhance pigment (1996)....
REPORTS previously been shown in both birds and butter- 10. H. Gorton, T. Vogelmann, Plant Physiol. 112, 879 26. Q. O. N. Kay, H. S. Daoud, C. H. Stirton, Bot. J. Linn. Soc. flies that structural color can enhance pigment (1996). 83, 57 (1981). 11. C. Hebant, D. W. Lee, Am. J. Bot. 71, 216 (1984). 27. P. Kevan, M. Lane, Proc. Natl. Acad. Sci. U.S.A. 82, 4750 color either by an additive or a contrast effect 12. T. C. Vogelmann, Annu. Rev. Plant Physiol. Plant Mol. (1985). (8, 16, 29, 30). This interplay of structure and Biol. 44, 231 (1993). 28. B. Heuschen, A. Gumbert, K. Lunau, Plant Syst. Evol. pigment may therefore also add to the diversity of 13. P. B. Green, P. Linstead, Protoplasma 158, 33 252, 121 (2005). pollination cues utilized by the flowers of many (1990). 29. M. D. Shawkey, G. E. Hill, Biol. Lett. 1, 121 (2005). 14. C. Palmer, Diffraction Grating Handbook (Newport 30. R. O. Prum, in Bird Coloration, Mechanisms and angiosperm species. Corporation, Rochester, NY, ed. 6, 2005). Measurements, G. E. Hill, K. J. McGraw, Eds. (Harvard 15. P. Skorupski, T. Döring, L. Chittka, J. Comp. Physiol. A Univ. Press, Boston, 2006), vol. 1, pp. 295–353. References and Notes 193, 485 (2007). 31. We thank S. Rands, P. Rudall, R. Bateman, P. Cicuta, and 1. A. R. Parker, J. Opt. A Pure Appl. Opt. 2, R15 (2000). 16. R. L. Rutowski et al., Proc. R. Soc. London B Biol. Sci. J. Baumberg for discussions and Syngenta for bees. 2. A. Sweeney, C. Jiggins, S. Johnsen, Nature 423, 31 272, 2329 (2005). Funded by Natural Environment Research Council grant (2003). 17. P. Kevan, L. Chittka, A. Dyer, J. Exp. Biol. 204, 2571 NE/C000552/1, Engineering and Physical Sciences 3. R. L. Rutowski et al., Biol. J. Linn. Soc. London 90, 349 (2001). Research Council grant EP/D040884/1, the European (2007). 18. K. Daumer, Z. Vgl. Physiol. 41, 49 (1958). RTN-6 Network Patterns, the Cambridge University 4. D. J. Kemp, Proc. R. Soc. London B Biol. Sci. 274, 1043 19. F. Gandia-Herrero, F. Garcia-Carmona, J. Escribano, Research Exchange, and German Academic Exchange (2007). Nature 437, 334 (2005). Service DAAD. 5. N. I. Morehouse, P. Vukusic, R. L. Rutowski, Proc. R. Soc. 20. R. Thorp, D. Briggs, J. Estes, E. Erickson, Science 189, London B Biol. Sci. 274, 359 (2007). 476 (1975). Supporting Online Material 6. A. R. Parker, Z. Hegedus, J. Opt. A Pure Appl. Opt. 5, 21. P. Kevan, Science 194, 341 (1976). www.sciencemag.org/cgi/content/full/323/5910/130/DC1 S111 (2003). 22. L. Chittka, J. Comp. Physiol. A 170, 533 (1992). Materials and Methods 7. J. E. Kettler, Am. J. Phys. 59, 367 (1991). 23. A. G. Dyer, L. Chittka, Naturwissenschaften 91, 224 Figs. S1 to S4 8. P. Kevan, W. Backhaus, in Color Vision: Perspectives from (2004). Tables S1 and S2 Different Disciplines, W. Backhaus, R. Kliegl, J. S. Werner, 24. H. M. Whitney, A. G. Dyer, L. Chittka, S. A. Rands, References Eds. (Walter de Gruyter, Berlin, 1998), pp. 163–168. B. J. Glover, Naturwissenschaften 95, 845 (2008). 9. K. Noda, B. Glover, P. Linstead, C. Martin, Nature 369, 25. P. L. Jokiel, R. H. York, Limnol. Oceanogr. 29, 192 22 September 2008; accepted 6 November 2008 661 (1994). (1984). 10.1126/science.1166256 Real-Time DNA Sequencing from have been reported [(7–10), reviewed in (11, 12)]. Several of these methods have been deployed as Single Polymerase Molecules commercial sequencing systems (13–16), which have greatly increased overall throughput, enabl- ing many applications that were previously un- feasible. However, because these methods all John Eid,* Adrian Fehr,* Jeremy Gray,* Khai Luong,* John Lyle,* Geoff Otto,* Paul Peluso,* gate enzymatic activity, using various termination David Rank,* Primo Baybayan, Brad Bettman, Arkadiusz Bibillo, Keith Bjornson, approaches, they have not yielded longer sequence Bidhan Chaudhuri, Frederick Christians, Ronald Cicero, Sonya Clark, Ravindra Dalal, reads (limited to ~400 nucleotides), nor do they Alex deWinter, John Dixon, Mathieu Foquet, Alfred Gaertner, Paul Hardenbol, Cheryl Heiner, exploit the high intrinsic rates of polymerase- Kevin Hester, David Holden, Gregory Kearns, Xiangxu Kong, Ronald Kuse, Yves Lacroix, catalyzed DNA synthesis. Steven Lin, Paul Lundquist, Congcong Ma, Patrick Marks, Mark Maxham, Devon Murphy, The use of DNA polymerase as a real-time Insil Park, Thang Pham, Michael Phillips, Joy Roy, Robert Sebra, Gene Shen, Jon Sorenson, sequencing engine—that is, direct observation Austin Tomaney, Kevin Travers, Mark Trulson, John Vieceli, Jeffrey Wegener, Dawn Wu, of processive DNA polymerization with base- Alicia Yang, Denis Zaccarin, Peter Zhao, Frank Zhong, Jonas Korlach,† Stephen Turner† pair resolution—has long been proposed but has been difficult to realize (7, 8, 17–22). To fully We present single-molecule, real-time sequencing data obtained from a DNA polymerase harness the intrinsic speed, fidelity, and proces- performing uninterrupted template-directed synthesis using four distinguishable fluorescently sivity of these enzymes, several technical chal- labeled deoxyribonucleoside triphosphates (dNTPs). We detected the temporal order of their lenges must be met simultaneously. First, the enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure speed at which each polymerase synthesizes DNA arrays, which provide optical observation volume confinement and enable parallel, simultaneous exhibits stochastic fluctuation, so polymerase detection of thousands of single-molecule sequencing reactions. Conjugation of fluorophores to the molecules would need to be observed individually terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over while they undergo template-directed synthesis. thousands of bases without steric hindrance. The data report directly on polymerase dynamics, Because of the high nucleotide concentrations revealing distinct polymerization states and pause sites corresponding to DNA secondary structure. required by DNA polymerases (20), a reduction Sequence data were aligned with the known reference sequence to assay biophysical parameters of in the observation volume beyond what is afforded polymerization for each template position. Consensus sequences were generated from the by conventional methods, such as confocal or total single-molecule reads at 15-fold coverage, showing a median accuracy of 99.3%, with no internal reflection microscopy, directly improves systematic error beyond fluorophore-dependent error rates. single-molecule detection. Second, deoxyribo- nucleoside triphosphate (dNTP) substrates must he Sanger method for DNA sequencing on the low error rate of DNA polymerases, but carry detection labels that do not inhibit DNA T (1) uses DNA polymerase to incorporate the 3′-dideoxynucleotide that terminates the synthesis of a DNA copy. This method relies exploits neither their potential for high catalytic rates nor high processivity (2–4). Increasing the speed and length of individual sequencing reads polymerization even when 100% of the native nucleotides are replaced with their labeled coun- terparts. Third, a surface chemistry is required that beyond the current Sanger technology limit will retains activity of DNA polymerase molecules Pacific Biosciences, 1505 Adams Drive, Menlo Park, CA 94025, shorten cycle times, accelerate sequence assembly, and inhibits nonspecific adsorption of labeled USA. reduce cost, enable accurate sequencing analysis dNTPs. Finally, an instrument is required that can *These authors contributed equally to this work. †To whom correspondence should be addressed. E-mail: of repeat-rich areas of the genome, and reveal faithfully detect and distinguish incorporation of [email protected] (J.K.); sturner@pacificbiosciences. large-scale genomic complexity (5, 6). Alternative four different labeled dNTPs. Here, we provide com (S.T.) approaches that increase sequencing performance proof-of-concept for an approach to highly www.sciencemag.org SCIENCE VOL 323 2 JANUARY 2009 133 REPORTS multiplexed single-molecule, real-time DNA se- devised a linkage chemistry that allows 100% longer than the time scales associated with diffu- quencing based on the observation of the temporal replacement of native nucleotides with four sion (2 to 10 ms) or noncognate sampling (99% sion rate from each of the two phospholinked contrast, when a fluorophore is linked to the ter- of the incident light. dNTPs as a function of time (Fig. 2C). minal phosphate moiety (phospholinked), phos- The architecture of our method is shown in Single-molecule events corresponding to phodiester bond formation catalyzed by the DNA Fig. 1A. DNA sequence is determined by detect- phospholinked dNTP incorporations manifested polymerase results in release of the fluorophore ing fluorescence from binding of correctly base- as fluorescent pulses whose variable duration from the incorporated nucleotide, thus generating paired (cognate) phospholinked dNTPs in the reflected the enzyme kinetics and exhibited natural, unmodified DNA (21, 29–31). F29 DNA active site of the polymerase (Fig. 1B). A fluo- stochastic fluctuations in intensity (because of polymerase was selected for these studies because rescence pulse is produced by the polymerase counting statistics and dye photophysics). The it is a stable, single-subunit enzyme with high retaining the cognate nucleotide with its color- reads contained pulses with the expected pattern: speed, accuracy, and processivity that efficiently coded fluorophore in the detection region of the alternating blocks of like-colored pulses corre- uses phospholinked dNTPs (32). It is capable of ZMW. It lasts for a period governed principally sponding to the alternating blocks in the tem- strand-displacement DNA synthesis and has been by the rate of catalysis, and ends upon cleavage of plate. Furthermore, we observed the hallmarks of used in whole-genome amplification, showing the dye-linker-pyrophosphate group, which quick- single-molecule fluorescent events: single-frame minimal sequencing context bias (33). We intro- ly diffuses from the ZMW detection region. The rise and fall times at the start and end of the duced site-specific mutations in the enzyme and duration of the fluorophore retention is much pulse, respectively (4000 bases www.sciencemag.org SCIENCE VOL 323 2 JANUARY 2009 135 REPORTS synthesized. The mean number of pulses per block detected. Unlabeled nucleotide contamination creasing the dissociation rate before catalysis. was uniform over at least 1000 bases of incorpo- (dark nucleotides) can be a source of deletion er- Mismatches in the reads were mainly caused by ration (Fig. 3D); hence, this sequencing approach rors in single-molecule sequencing systems. Here, spectral misassignments of the A647 and A660 maintains accuracy irrespective of read length. this was not the case because the initial phospho- dyes (accounting for ~60% of the mismatch The measurements described above were ex- linked dNTP composition was >99.5% pure (fig. error), which show the least spectral separation tended to four-color DNA sequencing. All four S5) and, unlike with base-linked nucleotides, amongst the four dyes (table S3). The remainder native nucleotides were fully replaced with the the polymerase showed no preference for un- of the mismatches involved misassignments following set of phospholinked dNTPs: A555- labeled versus labeled substrates (32). Addition- between the A555 and A568 dyes (other factors dATP, A568-dTTP, A647-dGTP, and A660-dCTP ally, a comparison of our observed deletion error were below the sensitivity of the assay). Finding (fig. S4). Two lasers were used for the excitation rate with a deletion rate predicted solely from compatible dye sets with larger spectral separa- of the four fluorophores. Fluorescence pulses were pulse width distributions shows that dark nucleo- tions, as well as increasing the brightness of the identified by a threshold detection algorithm (37) tides need not be invoked as a source of error. For dyes and collection efficiency of the instrument, based on dye-weighted summation as above. The example, fig. S6 shows the pulse width distri- will reduce the frequency of these errors. base identities of pulses were automatically as- bution for A555-dATP and the projected proba- To survey possible sequence context depen- signed by least-squares fitting of the four phospho- bility of pulse detection for that nucleotide as a dencies of these error types, we quantitated the linked dNTP reference spectra to the measured function of pulse width. From these data, the two most important kinetic parameters—pulse spectra (fig. S4) (26). The read extracted from the deletion rate is estimated to be 7.8%, consistent width and interpulse duration—as a function of measured pulses matched well with the underly- with the observed 7.4% deletion rate for this sequence position over the 150-base template. To ing sequence of the nascent DNA strand, spanning nucleotide. This error type can be addressed by extract these parameters for each template loca- the entire length of the 150-base linear template engineering the enzyme to reduce the fraction tion, we associated individual pulses from the (Fig. 4, A and B). Of the 158 total bases in the of short incorporation events, increasing fluo- 449 reads with their sequence positions using a alignment, 131 were correctly identified by the rophore brightness, and improving efficiency Smith-Waterman alignment algorithm (38). Pulse automated base caller. The 27 errors consisted of 12 of light collection. widths and interpulse durations are displayed as a deletions, eight insertions, and seven mismatches. The majority of insertion errors were caused function of sequence position in Fig. 4, C and D, Sequencing performance analysis was extended by dissociation of a cognate nucleotide from the respectively. The average pulse widths depend to a set of 449 reads that showed pulse trains con- active site before phosphodiester bond formation weakly on dNTP identity and show statistically sistent with single polymerase occupancy (table S3). can occur, resulting in the erroneous duplication significant but only moderate variation across In these data, errors are dominated by deletions, of a pulse. This error type can be addressed by template position. The average interpulse dura- which stem from incorporation events or inter- modifying the enzyme to decrease the free energy tions were typically between 200 and 700 ms, vals between them that are too short to be reliably of the enzyme-substrate bound state, thus de- except for a few instances with much higher Fig. 3. Long read length activity of DNA polymerase. (A) DNA template design. The sequence of a circular, single-stranded template was designed to yield contin- uous incorporation via strand-displacement DNA syn- thesis of alternating blocks of two phospholinked nucleotides (A555-dCTP and A647-dGTP), interspersed with the other two unmodified dNTPs. (B) Time-resolved spectrum of fluorescence emission as in Fig. 2B with fluorescence time trace from a single ZMW. The cor- responding total length of synthesized DNA is indicated by the top axis. (C) DNA polymerization rate profiles for several molecules. Examples of pause sites are indicated by arrows. The two lines indicate two persistent polymerization rates. (D) Error as a function of length of read for 14 rolling circle cycles (1008 total base incorporations; n = 186 reads). The fractional deviation from the average number of pulses per block (12 A555-dCTP and 12 A647-dGTP observed phospholinked dNTP pulses per cycle, respectively), mean T SE, is plotted as a function of template position. The 95% confidence interval for the slope is –0.027 to +0.036 blocks per 1008 bases of incorporation. 136 2 JANUARY 2009 VOL 323 SCIENCE www.sciencemag.org REPORTS values. These pause sites corresponded to regions electrophoresis data (fig. S7). The major pause enzymatic rate of incorporation increased imme- with predicted stable secondary structure in the point seen at position 40 did not result in an in- diately after passing through the putative hairpin template and matched well with bulk capillary creased frequency of dissociation events. The for experiments performed at 100 nM dNTP (from Fig. 4. Single-molecule, real-time, four-color DNA sequencing. (A) Total pause site observed for both conditions at position 40, corresponding to intensity output of all four dye-weighted channels, with pulses colored predicted secondary structure in the template at position 46 (fig. S7), taking corresponding to the least-squares fitting decisions of the algorithm. This into account the enzyme’s footprint on the template (42). (E) Histogram of section of a fluorescence time trace shows 28 bases of incorporations and the sequence accuracy of 100 consensus sequences created by subsampling three errors. The expected template sequence is shown above, with dashed from 449 single-molecule reads to 15-fold average coverage. The median lines corresponding to matches; errors are in lowercase. (B) The entire read accuracy of the distribution is 99.3%. (F) Observed systematic bias com- that proceeds through all 150 bases of the linear template. On average, pared with prediction from a random model free of sequence context bias. ~63% of reads proceeded through the entire length of the DNA template. The error frequencies for observed (gray bars) and bias-free model data (C) Average pulse width as a function of template position (extracted from (black bars) are plotted in a histogram with the number of errors on the x n = 449 reads). (D) Cumulative interpulse duration plotted as a function of axis and the number of different reference positions showing this many template position for two different phospholinked dNTP concentrations errors in 100 trials on the y axis. The random model is based on the (250 nM, n = 449 reads; 100 nM, n = 868 reads). The arrow indicates a observed error frequencies (table S3) (26). www.sciencemag.org SCIENCE VOL 323 2 JANUARY 2009 137 REPORTS 0.7 to 1.25 bases/s) and at 250 nM dNTP (from 1.1 gle 5-min experiment. Because polymerase ki- References and Notes to 1.5 bases/s). This increased rate resulted from a netics is sensitive to biological perturbation, our 1. F. Sanger, S. Nicklen, A. R. Coulson, Proc. Natl. Acad. Sci. U.S.A. 74, 5463 (1977). decrease in interpulse duration; the pulse widths approach would allow investigation of DNA 2. L. Blanco et al., J. Biol. Chem. 264, 8935 (1989). remained nearly constant. It is not surprising that binding proteins, DNA polymerase inhibitors, 3. A. Kornberg, T. A. Baker, DNA Replication (Freeman, the interpulse durations, which encompass motion and the effects of base methylation. New York, ed. 2, 1992). of the polymerase relative to the DNA template, Commercially available high-throughput se- 4. S. Tabor, H. E. Huber, C. C. Richardson, J. Biol. Chem. 262, 16212 (1987). would be strongly affected by DNA secondary quencing systems that rely on stepwise flushing 5. C. Feschotte, E. J. Pritham, Annu. Rev. Genet. 41, 331 structure, whereas variations in the pulse widths, of a solid support with reactants and subsequent (2007). which are governed by local chemical processes scanning to read out a single base currently oper- 6. E. Tuzun et al., Nat. Genet. 37, 727 (2005). in the active site, are less affected. ate in the regime of ~1 hour per base sequenced 7. S. Balasubramanian, D. R. Bentley, Patent WO 01/057248 (2001). Pulse widths showed only moderate varia- (13, 14, 16). This low rate of sequence pro- 8. I. Braslavsky, B. Hebert, E. Kartalov, S. R. Quake, bility with sequence context, and the interpulse duction is compensated by high multiplex levels Proc. Natl. Acad. Sci. U.S.A. 100, 3960 (2003). durations, although highly dependent on second- (~106 to 108). The single-molecule real-time DNA 9. M. Ronaghi, M. Uhlen, P. Nyren, Science 281, 363 (1998). ary structure, always produced average values sequencing approach demonstrated here repre- 10. J. Shendure et al., Science 309, 1728 (2005). above 200 ms. Thus, sequence errors in individ- sents an increase in the speed of the underlying 11. D. R. Bentley, Curr. Opin. Genet. Dev. 16, 545 (2006). 12. M. L. Metzker, Genome Res. 15, 1767 (2005). ual reads should be predominantly uncorrelated sequencing cycle by approximately four orders 13. J. B. Fan et al., Methods Enzymol. 410, 57 (2006). and amenable to molecular ensemble averaging. of magnitude. Stepwise sequencing systems are 14. T. D. Harris et al., Science 320, 106 (2008). To test this hypothesis, we formed 100 consensus characterized by relatively short read lengths 15. M. Margulies et al., Nature 437, 376 (2005). sequences with reads randomly subsampled from because of the deleterious effects of interrupting 16. A. Valouev et al., Genome Res. 18, 1051 (2008). 17. E. Y. Chan, U.S. Patent 6,210,896 (2001). the data set to yield 15-molecule coverage, using enzyme activity. Exploiting uninterrupted DNA 18. S. L. Cockroft, J. Chu, M. Amorin, M. R. Ghadiri, J. Am. the center-star algorithm (39). The median ac- synthesis will enable sequence reads thousands Chem. Soc. 130, 818 (2008). curacy over this set of sequences was 99.3%, of bases in length. 19. W. J. Greenleaf, S. M. Block, Science 313, 801 (2006). with a distribution of values shown in Fig. 4E. We have shown that with just 15 molecules, a 20. M. J. Levene et al., Science 299, 682 (2003). 21. B. A. Mulder et al., Nucleic Acids Res. 33, 4865 (2005). The consensus accuracy as a function of fold consensus sequence with 99.3% median accura- 22. B. Reynolds, R. Miller, J. G. Williams, J. P. Anderson, coverage is shown in fig. S8. To explore the pos- cy can be formed with no detectable sequence Nucleosides Nucleotides Nucleic Acids 27, 18 (2008). sibility of systematic error beyond the fluorophore- context bias and a uniform error profile within 23. M. Foquet et al., J. Appl. Phys. 103, 034301 (2008). dependent error rates (table S3), we analyzed the reads. The present level of accuracy can produce 24. M. J. Lang, P. M. Fordyce, A. M. Engh, K. C. Neuman, S. M. Block, Nat. Methods 1, 133 (2004). dependence of consensus error frequency on se- alignment and consensus adequate for resequenc- 25. A. M. Lieto, R. C. Cush, N. L. Thompson, Biophys. J. 85, quence context via the distribution of the number ing applications. However, it would create chal- 3294 (2003). of times out of the 100 trials that each reference lenges for de novo assembly or alignment into 26. See supporting material on Science Online. sequence position was reported incorrectly (Fig. highly repetitive DNA. The accuracy of the sys- 27. J. Ju et al., Proc. Natl. Acad. Sci. U.S.A. 103, 19635 4F) (26). This histogram is in agreement with a tem could be enhanced by improvements in en- (2006). 28. R. D. Mitra, J. Shendure, J. Olejnik, O. Edyta Krzymanska, context bias–free random model, showing that zyme kinetics. Reducing the free energy of the G. M. Church, Anal. Biochem. 320, 55 (2003). within the sensitivity of this study there were no nucleotide-bound state through polymerase mu- 29. C. C. Kao, T. Widlanski, W. Vassiliou, J. Epp, U.S. Patent other biophysical sources of systematic error. tation and nucleotide modification would reduce 6,399,335 (2002). The systematic variations in pulse width and the occurrence of cognate nucleotide dissociation 30. S. Kumar et al., Nucleosides Nucleotides Nucleic Acids 24, 401 (2005). interpulse duration seen in Fig. 4 do not interfere and the attendant insertion errors. Lowering the 31. A. Sood et al., J. Am. Chem. Soc. 127, 2394 (2005). with the development of accurate consensus se- rate of phosphodiester bond formation would 32. J. Korlach et al., Nucleosides Nucleotides Nucleic Acids quence. In fact, such variations constitute an ad- lengthen the pulses, reducing the incidence of 27, 1072 (2008). ditional signal that is dependent on DNA primary deletion errors. Deletions could also be reduced 33. F. B. Dean et al., Proc. Natl. Acad. Sci. U.S.A. 99, 5261 (2002). and secondary structure that can be exploited to through increases in fluorophore brightness and 34. J. Korlach et al., Proc. Natl. Acad. Sci. U.S.A. 105, 1176 increase the accuracy of the consensus. Another system optical collection efficiency. Finally, cir- (2008). appealing feature of this sequencing approach is cular consensus sequencing can be used to elimi- 35. P. M. Lundquist et al., Opt. Lett. 33, 1026 (2008). that, through the strand-displacing capability of the nate stochastic errors in single-molecule sequencing. 36. C. Castro et al., Proc. Natl. Acad. Sci. U.S.A. 104, 4267 (2007). polymerase (demonstrated in Fig. 3), closed circular The limited experimental multiplex used here 37. K. Horne, Publ. Astron. Soc. Pac. 98, 609 (1986). templates can be sequenced multiple times by a could be applied to sequencing small viral and 38. O. Gotoh, J. Mol. Biol. 162, 705 (1982). DNA polymerase in a single run. This allows deter- bacterial genomes. Given that each ZMW is 39. D. Gusfield, Algorithms on Strings, Trees, and Sequences: mination of a circular consensus sequence using capable of producing sequence at a rate greater Computer Science and Computational Biology only one DNA molecule. The resulting insensitivity than 400 kb per day, just 14,000 functioning (Cambridge Univ. Press, Cambridge, 1997). 40. J. T. Bosiers et al., Proc. SPIE 6996, 69960Z (2008). to sample heterogeneity will greatly improve ZMWs are required to produce a raw read 41. A. J. P. Theuwissen, Solid State Electron. 52, 1401 (2008). detection of rare mutations. This single-molecule throughput equivalent to 1-fold coverage of a 42. A. J. Berman et al., EMBO J. 26, 3494 (2007). aspect also enables simplified sample preparation diploid human genome per day. This number is 43. We thank the entire staff at Pacific Biosciences, and and minimizes reagent consumption because only attainable using optics and detector technology J. Puglisi, M. Hunkapiller, R. Kornberg, K. Johnson, D. Haussler, W. Webb, and H. Craighead for many helpful small amounts of genomic DNA are required. available today. Even larger numbers of ZMWs discussions. Supported by National Human Genome In addition to the sequence, the real-time as- could be simultaneously monitored using multi- Research Institute grant R01HG003710. pect of our approach generates unprecedented megapixel charge-coupled device or complemen- Supporting Online Material information about DNA polymerase kinetics that tary metal-oxide semiconductor cameras expected www.sciencemag.org/cgi/content/full/1162986/DC1 will allow other uses of the technology. Because within five years (40, 41). As these technologies Materials and Methods the system reports the kinetics of every base evolve, it will be possible to provide later gen- Figs. S1 to S8 incorporation through the pulse width and the erations of this instrument with multiplex com- Tables S1 to S3 Movie S1 interpulse duration, the system can be used today mensurate with current stepwise sequencing References to investigate kinetics of DNA polymerization systems. Combining this level of multiplex with 9 July 2008; accepted 20 October 2008 with unprecedented resolution and speed, pro- the high intrinsic speed and read length of single- Published online 20 November 2008; viding the distribution of kinetic parameters over molecule, real-time DNA sequencing will enable 10.1126/science.1162986 hundreds of different sequence contexts in a sin- low-cost rapid genome sequencing. Include this information when citing this paper. 138 2 JANUARY 2009 VOL 323 SCIENCE www.sciencemag.org