Algorithm Design and Applications PDF
Document Details
Uploaded by SkilledHawk
Michael T. Goodrich, Roberto Tamassia
Tags
Summary
This textbook, Algorithm Design and Applications, by Michael T. Goodrich and Roberto Tamassia, provides a comprehensive exploration of algorithms and data structures. It covers fundamental topics like analysis, basic data structures, sorting, graph algorithms, and computational complexity. The book is ideal for undergraduate computer science students.
Full Transcript
Algorithm Design and Applications Michael T. Goodrich Department of Information and Computer Science University of California, Irvine Roberto Tamassia Department of Computer Science Brown University iii To Karen, Paul, Anna, and Jack...
Algorithm Design and Applications Michael T. Goodrich Department of Information and Computer Science University of California, Irvine Roberto Tamassia Department of Computer Science Brown University iii To Karen, Paul, Anna, and Jack – Michael T. Goodrich To Isabel – Roberto Tamassia Contents Preface xi 1 Algorithm Analysis 1 1.1 Analyzing Algorithms....................... 3 1.2 A Quick Mathematical Review.................. 19 1.3 A Case Study in Algorithm Analysis.............. 29 1.4 Amortization........................... 34 1.5 Exercises............................. 42 Part I: Data Structures 2 Basic Data Structures 51 2.1 Stacks and Queues....................... 53 2.2 Lists................................ 60 2.3 Trees............................... 68 2.4 Exercises............................. 84 3 Binary Search Trees 89 3.1 Searches and Updates...................... 91 3.2 Range Queries.......................... 101 3.3 Index-Based Searching..................... 104 3.4 Randomly Constructed Search Trees.............. 107 3.5 Exercises............................. 110 4 Balanced Binary Search Trees 115 4.1 Ranks and Rotations....................... 117 4.2 AVL Trees............................. 120 4.3 Red-Black Trees......................... 126 4.4 Weak AVL Trees......................... 130 4.5 Splay Trees............................ 139 4.6 Exercises............................. 149 5 Priority Queues and Heaps 155 5.1 Priority Queues.......................... 157 5.2 PQ-Sort, Selection-Sort, and Insertion-Sort.......... 158 5.3 Heaps............................... 163 5.4 Heap-Sort............................. 174 5.5 Extending Priority Queues.................... 179 5.6 Exercises............................. 182 v vi Contents 6 Hash Tables 187 6.1 Maps............................... 189 6.2 Hash Functions.......................... 192 6.3 Handling Collisions and Rehashing............... 198 6.4 Cuckoo Hashing......................... 206 6.5 Universal Hashing........................ 212 6.6 Exercises............................. 215 7 Union-Find Structures 219 7.1 Union-Find and Its Applications................. 221 7.2 A List-Based Implementation.................. 225 7.3 A Tree-Based Implementation.................. 228 7.4 Exercises............................. 236 Part II: Sorting and Selection 8 Merge-Sort and Quick-Sort 241 8.1 Merge-Sort............................ 243 8.2 Quick-Sort............................. 250 8.3 A Lower Bound on Comparison-Based Sorting........ 257 8.4 Exercises............................. 259 9 Fast Sorting and Selection 265 9.1 Bucket-Sort and Radix-Sort................... 267 9.2 Selection............................. 270 9.3 Weighted Medians........................ 276 9.4 Exercises............................. 279 Part III: Fundamental Techniques 10 The Greedy Method 283 10.1 The Fractional Knapsack Problem............... 286 10.2 Task Scheduling......................... 289 10.3 Text Compression and Huffman Coding............ 292 10.4 Exercises............................. 298 11 Divide-and-Conquer 303 11.1 Recurrences and the Master Theorem............. 305 11.2 Integer Multiplication....................... 313 11.3 Matrix Multiplication....................... 315 11.4 The Maxima-Set Problem.................... 317 11.5 Exercises............................. 319 Contents vii 12 Dynamic Programming 323 12.1 Matrix Chain-Products...................... 325 12.2 The General Technique..................... 329 12.3 Telescope Scheduling...................... 331 12.4 Game Strategies......................... 334 12.5 The Longest Common Subsequence Problem......... 339 12.6 The 0-1 Knapsack Problem................... 343 12.7 Exercises............................. 346 13 Graphs and Traversals 353 13.1 Graph Terminology and Representations............ 355 13.2 Depth-First Search........................ 365 13.3 Breadth-First Search....................... 370 13.4 Directed Graphs......................... 373 13.5 Biconnected Components.................... 386 13.6 Exercises............................. 392 Part IV: Graph Algorithms 14 Shortest Paths 397 14.1 Single-Source Shortest Paths.................. 399 14.2 Dijkstra’s Algorithm........................ 400 14.3 The Bellman-Ford Algorithm................... 407 14.4 Shortest Paths in Directed Acyclic Graphs........... 410 14.5 All-Pairs Shortest Paths..................... 412 14.6 Exercises............................. 418 15 Minimum Spanning Trees 423 15.1 Properties of Minimum Spanning Trees............ 425 15.2 Kruskal’s Algorithm........................ 428 15.3 The Prim-Jarnı́k Algorithm.................... 433 15.4 Barůvka’s Algorithm....................... 436 15.5 Exercises............................. 439 16 Network Flow and Matching 443 16.1 Flows and Cuts.......................... 445 16.2 Maximum Flow Algorithms.................... 452 16.3 Maximum Bipartite Matching.................. 458 16.4 Baseball Elimination....................... 460 16.5 Minimum-Cost Flow....................... 462 16.6 Exercises............................. 469 viii Contents Part V: Computational Intractability 17 NP-Completeness 473 17.1 P and NP............................. 476 17.2 NP-Completeness........................ 483 17.3 CNF-SAT and 3SAT........................ 489 17.4 VERTEX-COVER, CLIQUE, and SET-COVER............ 492 17.5 SUBSET-SUM and KNAPSACK................... 496 17.6 HAMILTONIAN-CYCLE and TSP.................. 499 17.7 Exercises............................. 502 18 Approximation Algorithms 507 18.1 The Metric Traveling Salesperson Problem.......... 511 18.2 Approximations for Covering Problems............. 515 18.3 Polynomial-Time Approximation Schemes........... 518 18.4 Backtracking and Branch-and-Bound.............. 521 18.5 Exercises............................. 525 Part VI: Additional Topics 19 Randomized Algorithms 529 19.1 Generating Random Permutations............... 531 19.2 Stable Marriages and Coupon Collecting............ 534 19.3 Minimum Cuts.......................... 539 19.4 Finding Prime Numbers..................... 546 19.5 Chernoff Bounds......................... 551 19.6 Skip Lists............................. 557 19.7 Exercises............................. 563 20 B-Trees and External Memory 569 20.1 External Memory......................... 571 20.2 (2,4) Trees and B-Trees..................... 574 20.3 External-Memory Sorting.................... 590 20.4 Online Caching Algorithms................... 593 20.5 Exercises............................. 600 21 Multidimensional Searching 603 21.1 Range Trees........................... 605 21.2 Priority Search Trees....................... 609 21.3 Quadtrees and k-d Trees.................... 614 21.4 Exercises............................. 618 Contents ix 22 Computational Geometry 623 22.1 Operations on Geometric Objects................ 625 22.2 Convex Hulls........................... 630 22.3 Segment Intersection...................... 638 22.4 Finding a Closest Pair of Points................. 642 22.5 Exercises............................. 646 23 String Algorithms 651 23.1 String Operations......................... 653 23.2 The Boyer-Moore Algorithm................... 656 23.3 The Knuth-Morris-Pratt Algorithm................ 660 23.4 Hash-Based Lexicon Matching................. 664 23.5 Tries................................ 669 23.6 Exercises............................. 680 24 Cryptography 685 24.1 Greatest Common Divisors (GCD)............... 687 24.2 Modular Arithmetic........................ 691 24.3 Cryptographic Operations.................... 699 24.4 The RSA Cryptosystem..................... 703 24.5 The El Gamal Cryptosystem................... 706 24.6 Exercises............................. 708 25 The Fast Fourier Transform 711 25.1 Convolution............................ 713 25.2 Primitive Roots of Unity..................... 715 25.3 The Discrete Fourier Transform................. 717 25.4 The Fast Fourier Transform Algorithm............. 721 25.5 Exercises............................. 727 26 Linear Programming 731 26.1 Formulating the Problem..................... 734 26.2 The Simplex Method....................... 739 26.3 Duality............................... 746 26.4 Applications of Linear Programming.............. 750 26.5 Exercises............................. 753 A Useful Mathematical Facts 761 Bibliography 765 Index 774 Preface This book is designed to provide a comprehensive introduction to the design and analysis of computer algorithms and data structures. We have made each chapter to be relatively independent of other chapters so as to provide instructors and readers greater flexibility with respect to which chapters to explore. Moreover, the exten- sive collection of topics we include provides coverage of both classic and emerging algorithmic methods, including the following: Mathematics for asymptotic analysis, including amortization and random- ization General algorithm design techniques, including the greedy method, divide- and-conquer, and dynamic programming Data structures, including lists, trees, heaps, search trees, B-trees, hash ta- bles, skip lists, union-find structures, and multidimensional trees Algorithmic frameworks, including NP-completeness, approximation algo- rithms, and external-memory algorithms Fundamental algorithms, including sorting, graph algorithms, computational geometry, numerical algorithms, cryptography, Fast Fourier Transform (FFT), and linear programming. Application-Motivated Approach This is an exciting time for computer science. Computers have moved beyond their early uses as computational engines to now be used as information processors, with applications to every other discipline. Moreover, the expansion of the Inter- net has brought about new paradigms and modalities for computer applications to society and commerce. For instance, computers can be used to store and retrieve large amounts of data, and they are used in many other application areas, such as sports, video games, biology, medicine, social networking, engineering, and sci- ence. Thus, we feel that algorithms should be taught to emphasize not only their mathematical analysis but also their practical applications. To fulfill this need, we have written each chapter to begin with a brief discus- sion of an application that motivates the topic of that chapter. In some cases, this application comes from a real-world use of the topic discussed in the chapter, and in other cases it is a contrived application that highlights how the topic of the chapter could be used in practice. Our intent in providing this motivation is to give readers a conceptual context and practical justification to accompany their reading of each chapter. In addition to this application-based motivation we include also detailed pseudocode descriptions and complete mathematical analysis. Indeed, we feel that mathematical rigor should not simply be for its own sake, but also for its pragmatic implications. xi xii Preface For the Instructor This book is structured to allow an instructor a great deal of freedom in how to orga- nize and present material. The dependence between chapters is relatively minimal, which allows the instructor to cover topics in her preferred sequence. Moreover, each chapter is designed so that it can be covered in 1–3 lectures, depending on the depth of coverage. Example Courses This book has several possible uses as a textbook. It can be used, for instance, for a core Algorithms course, which is classically known as CS7. Alternatively, it could be used for an upper-division/graduate data structures course, an upper- division/graduate algorithms course, or a two-course sequence on these topics. To highlight these alternatives, we give an example syllabus for each of these possible courses below. Example syllabus for a core Algorithms (CS7) course: 1. Algorithm Analysis (Skip, skim, or review Chapters 2–4 on fundamental data structures)1 5. Priority Queues and Heaps 6. Hash Tables 7. Union-Find Structures 8. Merge-Sort and Quick-Sort 9. Fast Sorting and Selection (if time permits) 10. The Greedy Method 11. Divide-and-Conquer 12. Dynamic Programming 13. Graphs and Traversals 14. Shortest Paths 15. Minimum Spanning Trees 16. Network Flow and Matching (if time permits) 17. NP-Completeness 18. Approximation Algorithms Optional choices from Chapters 19–26, as time permits The optional choices from Chapters 19–26 that could be covered at the end of the course include randomized algorithms, computational geometry, string algorithms, cryptography, Fast Fourier Transform (FFT), and linear programming. 1 These topics, and possibly even the topics of Chapters 5 and 6, are typically covered to at least a basic level in a Data Structures course that could be a prerequisite to this course. Preface xiii Example syllabus for an upper-division/graduate Data Structures course: 1. Algorithm Analysis 2. Basic Data Structures 3. Binary Search Trees 4. Balanced Binary Search Trees 5. Priority Queues and Heaps 6. Hash Tables 7. Union-Find Structures 8. Merge-Sort and Quick-Sort 13. Graphs and Traversals 14. Shortest Paths 15. Minimum Spanning Trees 20. B-Trees and External-Memory 21. Multi-Dimensional Searching Example syllabus for an upper-division/graduate Algorithms course: (Skip, skim, or review Chapters 1–8) 9. Fast Sorting and Selection 10. The Greedy Method 11. Divide-and-Conquer 12. Dynamic Programming 16. Network Flow and Matching 17. NP-Completeness 18. Approximation Algorithms 19. Randomized Algorithms 22. Computational Geometry 23. String Algorithms 24. Cryptography 25. The Fast Fourier Transform (FFT) 26. Linear Programming This course could be taught either as a stand-alone course or in conjunction with an upper-division Data Structures course, such as that given above. Of course, other options are also possible. Let us not belabor this point, how- ever, leaving such creative arrangements to instructors. xiv Preface Three Kinds of Exercises This book contains many exercises—over 800—which are divided between the fol- lowing three categories: reinforcement exercises, which test comprehension of chapter topics creativity exercises, which test creative utilization of techniques from the chapter application exercises, which test uses of the topics of the chapter for real- world or contrived applications The exercises are distributed so that roughly 35% are reinforcement exercises, 40% are creativity exercises, and 25% are application exercises. Web Added-Value Education This book comes accompanied by an extensive website: http://www.wiley.com/college/goodrich/ This site includes an extensive collection of educational aids that augment the topics of this book. Specifically for students we include the following: Presentation handouts in PDF format for most topics in this book Hints on selected exercises. The hints should be of particular interest for creativity and application problems that may be quite challenging for some students. For instructors using this book, there is a dedicated portion of the site just for them, which includes the following additional teaching aids: Solutions to selected exercises in this book Editable presentations in PowerPoint format for most topics in this book. Prerequisites We have written this book assuming that the reader comes to it with certain knowl- edge. In particular, we assume that the reader has a basic understanding of elemen- tary data structures, such as arrays and linked lists, and is at least vaguely familiar with a high-level programming language, such as C, C++, Java, or Python. Thus, all algorithms are described in a high-level “pseudocode,” which avoids some de- tails, such as error condition testing, but is suitable for a knowledgeable reader to convert algorithm descriptions into working code. In terms of mathematical background, we assume the reader is familiar with exponents, logarithms, summations, limits, and elementary probability. Even so, we review many of these concepts in Chapter 1, and we give a summary of other useful mathematical facts, including elementary probability, in Appendix A. Preface xv About the Authors Professors Goodrich and Tamassia are well-recognized researchers in algorithms and data structures, having published many papers in this field, with applications to computer security, cryptography, Internet computing, information visualization, and geometric computing. They have served as principal investigators in several joint projects sponsored by the National Science Foundation, the Army Research Office, and the Defense Advanced Research Projects Agency. They are also active in educational technology research. Michael Goodrich received his Ph.D. in Computer Sciences from Purdue Uni- versity in 1987. He is a Chancellor’s Professor in the Department of Computer Science at University of California, Irvine. Previously, he was a professor at Johns Hopkins University. His research interests include analysis, design, and implemen- tation of algorithms, data security, cloud computing, graph drawing, and computa- tional geometry. He is a Fulbright scholar and a fellow of the American Association for the Advancement of Science (AAAS), Association for Computing Machinery (ACM), and Institute of Electrical and Electronics Engineers (IEEE). He is a re- cipient of the IEEE Computer Society Technical Achievement Award, the ACM Recognition of Service Award, and the Pond Award for Excellence in Undergrad- uate Teaching. He serves on the advisory boards of the International Journal of Computational Geometry & Applications (IJCGA) and of the Journal of Graph Algorithms and Applications (JGAA). Roberto Tamassia received his Ph.D. in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign in 1988. He is the Plastech Professor of Computer Science in the Department of Computer Science at Brown University. He is also the Director of Brown’s Center for Geometric Computing. His research interests include data security, applied cryptography, cloud computing, analysis, design, and implementation of algorithms, graph drawing and computa- tional geometry. He is a fellow of the American Association for the Advancement of Science (AAAS), Association for Computing Machinery (ACM), and Institute for Electrical and Electronic Engineers (IEEE). He is also a recipient of the Tech- nical Achievement Award from the IEEE Computer Society. He co-founded the Journal of Graph Algorithms and Applications (JGAA) and the Symposium on Graph Drawing. He serves as co-editor-in-chief of JGAA. Acknowledgments There are a number of individuals with whom we have collaborated on research and educational projects about algorithms. Working with them helped us refine the vision and content of this book. Specifically, we thank Jeff Achter, Vesselin Ar- naudov, James Baker, Ryan Baker, Benjamin Boer, John Boreiko, Devin Borland, Lubomir Bourdev, Ulrik Brandes, Stina Bridgeman, Bryan Cantrill, Yi-Jen Chiang, xvi Preface Robert Cohen, David Ellis, David Emory, Jody Fanto, Ben Finkel, Peter Fröhlich, Ashim Garg, David Ginat, Natasha Gelfand, Esha Ghosh, Michael Goldwasser, Mark Handy, Michael Horn, Greg Howard, Benoı̂t Hudson, Jovanna Ignatowicz, James Kelley, Evgenios Kornaropoulos, Giuseppe Liotta, David Mount, Jeremy Mullendore, Olga Ohrimenko, Seth Padowitz, Bernardo Palazzi, Charalampos Pa- pamanthou, James Piechota, Daniel Polivy, Seth Proctor, Susannah Raub, Haru Sakai, John Schultz, Andrew Schwerin, Michael Shapiro, Michael Shim, Michael Shin, Galina Shubina, Amy Simpson, Christian Straub, Ye Sun, Nikos Triandopou- los, Luca Vismara, Danfeng Yao, Jason Ye, and Eric Zamore. We are grateful to our editor, Beth Golub, for her enthusiastic support of this project. The production team at Wiley has been great. Many thanks go to people who helped us with the book development, including Jayne Ziemba, Jennifer Wel- ter, Debbie Martin, Chris Ruel, Julie Kennedy, Wanqian Ye, Joyce Poh, and Ja- nis Soo. We are especially grateful to Michael Bannister, Jenny Lam, and Joseph Si- mons for their contributions to the chapter on linear programming. We would like to thank Siddhartha Sen and Robert Tarjan for an illuminating discussion about balanced search trees. We are truly indebted to the outside reviewers, and especially to Jack Snoeyink, for detailed comments and constructive criticism, which were extremely useful. These other outside reviewers included John Donald, Hui Yang, Nicholas Tran, John Black, My Thai, Dana Randall, Ming-Yang Kao, Qiang Cheng, Ravi Janar- den, Fikret Ercal, Jack Snoeyink, S. Muthukrishnan, Elliot Anshelevich, Mukkai Krishnamoorthy, Roxanne Canosa, Michael Cutler, Roger Crawfis, Glencora Bor- radaile, and Jennifer Welch. This manuscript was prepared primarily with LATEX for the text and Microsoft PowerPoint R , Visio R , and Adobe FrameMaker R for the figures. Finally, we warmly thank Isabel Cruz, Karen Goodrich, Giuseppe Di Battista, Franco Preparata, Ioannis Tollis, and our parents for providing advice, encourage- ment, and support at various stages of the preparation of this book. We also thank them for reminding us that there are things in life beyond writing books. Michael T. Goodrich Roberto Tamassia Chapter 1 Algorithm Analysis Microscope: U.S. government image, from the N.I.H. Medical Instrument Gallery, DeWitt Stetten, Jr., Museum of Medical Research. Hubble Space Tele- scope: U.S. government image, from NASA, STS-125 Crew, May 25, 2009. Contents 1.1 Analyzing Algorithms................... 3 1.2 A Quick Mathematical Review............... 19 1.3 A Case Study in Algorithm Analysis........... 29 1.4 Amortization......................... 34 1.5 Exercises........................... 42 2 Chapter 1. Algorithm Analysis Scientists often have to deal with differences in scale, from the microscopi- cally small to the astronomically large, and they have developed a wide range of tools for dealing with the differences in scale in the objects they study. Similarly, computer scientists must also deal with scale, but they deal with it primarily in terms of data volume rather than physical object size. In the world of information technology, scalability refers to the ability of a system to gracefully accommodate growing sizes of inputs or amounts of workload. Being able to achieve scalability for a computer system can mean the difference between a technological solution that can succeed in the marketplace or scientific application and one that becomes effectively unusable as data volumes increase. In this book, we are therefore inter- ested in the design of scalable algorithms and data structures. Simply put, an algorithm is a step-by-step procedure for performing some task in a finite amount of time, and a data structure is a systematic way of organiz- ing and accessing data. These concepts are central to computing, and this book is dedicated to the discussion of paradigms and principles for the design and imple- mentation of correct and efficient data structures and algorithms. But to be able to determine the degree to which algorithms and data structures are scalable, we must have precise ways of analyzing them. The primary analysis tool we use in this book is to characterize the running time of an algorithm or data structure operation, with space usage also being of interest. Running time is a natural measure for the purposes of scalability, since time is a precious resource. It is an important consideration in economic and sci- entific applications, since everyone expects computer applications to run as fast as possible. We begin this chapter by describing the basic framework needed for analyzing algorithms, which includes the language for describing algorithms, the computa- tional model that language is intended for, and the main factors we count when considering running time. We also include a brief discussion of how recursive algorithms are analyzed. In Section 1.1.5, we present the main notation we use to characterize running times—the so-called “big-Oh” notation. These tools comprise the main theoretical tools for designing and analyzing algorithms. In Section 1.2, we take a short break from our development of the framework for algorithm analysis to review some important mathematical facts, including dis- cussions of summations, logarithms, proof techniques, and basic probability. Given this background and our notation for algorithm analysis, we present a case study on algorithm analysis in Section 1.3, focusing on a problem often used as a test ques- tion during job interviews. We follow this case study in Section 1.4 by presenting an interesting analysis technique, known as amortization, which allows us to ac- count for the group behavior of many individual operations. Finally, we conclude the chapter with some exercises that include several problems inspired by questions commonly asked during job interviews at major software and Internet companies. 1.1. Analyzing Algorithms 3 1.1 Analyzing Algorithms The running time of an algorithm or data structure operation typically depends on a number of factors, so what should be the proper way of measuring it? If an algorithm has been implemented, we can study its running time by executing it on various test inputs and recording the actual time spent in each execution. Such measurements can be taken in an accurate manner by using system calls that are built into the language or operating system for which the algorithm is written. In general, we are interested in determining the dependency of the running time on the size of the input. In order to determine this, we can perform several experiments on many different test inputs of various sizes. We can then visualize the results of such experiments by plotting the performance of each run of the algorithm as a point with x-coordinate equal to the input size, n, and y-coordinate equal to the running time, t. (See Figure 1.1.) To be meaningful, this analysis requires that we choose good sample inputs and test enough of them to be able to make sound statistical claims about the algorithm. In general, the running time of an algorithm or data structure method increases with the input size, although it may also vary for distinct inputs of the same size. Also, the running time is affected by the hardware environment (processor, clock rate, memory, disk, etc.) and software environment (operating system, program- ming language, compiler, interpreter, etc.) in which the algorithm is implemented, compiled, and executed. All other factors being equal, the running time of the same algorithm on the same input data will be smaller if the computer has, say, a much faster processor or if the implementation is done in a program compiled into native machine code instead of an interpreted implementation run on a virtual machine. t (ms) 60 50 40 30 20 10 n 0 50 100 Figure 1.1: Results of an experimental study on the running time of an algorithm. A dot with coordinates (n, t) indicates that on an input of size n, the running time of the algorithm is t milliseconds (ms). 4 Chapter 1. Algorithm Analysis Requirements for a General Analysis Methodology Experimental studies on running times are useful, but they have some limitations: Experiments can be done only on a limited set of test inputs, and care must be taken to make sure these are representative. It is difficult to compare the efficiency of two algorithms unless experiments on their running times have been performed in the same hardware and soft- ware environments. It is necessary to implement and execute an algorithm in order to study its running time experimentally. Thus, while experimentation has an important role to play in algorithm analysis, it alone is not sufficient. Therefore, in addition to experimentation, we desire an analytic framework that Takes into account all possible inputs Allows us to evaluate the relative efficiency of any two algorithms in a way that is independent from the hardware and software environment Can be performed by studying a high-level description of the algorithm with- out actually implementing it or running experiments on it. This methodology aims at associating with each algorithm a function f (n) that characterizes the running time of the algorithm in terms of the input size n. Typical functions that will be encountered include n and n2. For example, we will write statements of the type “Algorithm A runs in time proportional to n,” meaning that if we were to perform experiments, we would find that the actual running time of algorithm A on any input of size n never exceeds cn, where c is a constant that depends on the hardware and software environment used in the experiment. Given two algorithms A and B, where A runs in time proportional to n and B runs in time proportional to n2 , we will prefer A to B, since the function n grows at a smaller rate than the function n2. We are now ready to “roll up our sleeves” and start developing our method- ology for algorithm analysis. There are several components to this methodology, including the following: A language for describing algorithms A computational model that algorithms execute within A metric for measuring algorithm running time An approach for characterizing running times, including those for recursive algorithms. We describe these components in more detail in the remainder of this section. 1.1. Analyzing Algorithms 5 1.1.1 Pseudo-Code Programmers are often asked to describe algorithms in a way that is intended for human eyes only. Such descriptions are not computer programs, but are more struc- tured than usual prose. They also facilitate the high-level analysis of a data structure or algorithm. We call these descriptions pseudocode. An Example of Pseudo-Code The array-maximum problem is the simple problem of finding the maximum el- ement in an array A storing n integers. To solve this problem, we can use an algorithm called arrayMax, which scans through the elements of A using a for loop. The pseudocode description of algorithm arrayMax is shown in Algorithm 1.2. Algorithm arrayMax(A, n): Input: An array A storing n ≥ 1 integers. Output: The maximum element in A. currentMax ← A for i ← 1 to n − 1 do if currentMax < A[i] then currentMax ← A[i] return currentMax Algorithm 1.2: Algorithm arrayMax. Note that the pseudocode is more compact than an equivalent actual software code fragment would be. In addition, the pseudocode is easier to read and under- stand. Using Pseudo-Code to Prove Algorithm Correctness By inspecting the pseudocode, we can argue about the correctness of algorithm ar- rayMax with a simple argument. Variable currentMax starts out being equal to the first element of A. We claim that at the beginning of the ith iteration of the loop, currentMax is equal to the maximum of the first i elements in A. Since we compare currentMax to A[i] in iteration i, if this claim is true before this iteration, it will be true after it for i + 1 (which is the next value of counter i). Thus, after n − 1 itera- tions, currentMax will equal the maximum element in A. As with this example, we want our pseudocode descriptions to always be detailed enough to fully justify the correctness of the algorithm they describe, while being simple enough for human readers to understand. 6 Chapter 1. Algorithm Analysis What Is Pseudo-Code? Pseudo-code is a mixture of natural language and high-level programming con- structs that describe the main ideas behind a generic implementation of a data structure or algorithm. There really is no precise definition of the pseudocode language, however, because of its reliance on natural language. At the same time, to help achieve clarity, pseudocode mixes natural language with standard program- ming language constructs. The programming language constructs we choose are those consistent with modern high-level languages such as Python, C++, and Java. These constructs include the following: Expressions: We use standard mathematical symbols to express numeric and Boolean expressions. We use the left arrow sign (←) as the assignment operator in assignment statements (equivalent to the = operator in C, C++, and Java) and we use the equal sign (=) as the equality relation in Boolean expressions (equivalent to the “==” relation in C, C++, and Java). Method declarations: Algorithm name(param1, param2,...) declares a new method “name” and its parameters. Decision structures: if condition then true-actions [else false-actions]. We use indentation to indicate what actions should be included in the true-actions and false-actions, and we assume Boolean operators allow for short-circuit evaluation. While-loops: while condition do actions. We use indentation to indicate what actions should be included in the loop actions. Repeat-loops: repeat actions until condition. We use indentation to indicate what actions should be included in the loop actions. For-loops: for variable-increment-definition do actions. We use indentation to indicate what actions should be included among the loop actions. Array indexing: A[i] represents the ith cell in the array A. We usually index the cells of an array A of size n from 1 to n, as in mathematics, but sometimes we instead such an array from 0 to n − 1, consistent with C, C++, and Java. Method calls: object.method(args) (object is optional if it is understood). Method returns: return value. This operation returns the value specified to the method that called this one. When we write pseudocode, we must keep in mind that we are writing for a human reader, not a computer. Thus, we should strive to communicate high-level ideas, not low-level implementation details. At the same time, we should not gloss over important steps. Like many forms of human communication, finding the right balance is an important skill that is refined through practice. Now that we have developed a high-level way of describing algorithms, let us next discuss how we can analytically characterize algorithms written in pseu- docode. 1.1. Analyzing Algorithms 7 1.1.2 The Random Access Machine (RAM) Model As we noted above, experimental analysis is valuable, but it has its limitations. If we wish to analyze a particular algorithm without performing experiments on its running time, we can take the following more analytic approach directly on the high-level code or pseudocode. We define a set of high-level primitive operations that are largely independent from the programming language used and can be iden- tified also in the pseudocode. Primitive operations include the following: Assigning a value to a variable Calling a method Performing an arithmetic operation (for example, adding two numbers) Comparing two numbers Indexing into an array Following an object reference Returning from a method. Specifically, a primitive operation corresponds to a low-level instruction with an execution time that depends on the hardware and software environment but is nev- ertheless constant. Instead of trying to determine the specific execution time of each primitive operation, we will simply count how many primitive operations are executed, and use this number t as a high-level estimate of the running time of the algorithm. This operation count will correlate to an actual running time in a spe- cific hardware and software environment, for each primitive operation corresponds to a constant-time instruction, and there are only a fixed number of primitive opera- tions. The implicit assumption in this approach is that the running times of different primitive operations will be fairly similar. Thus, the number, t, of primitive opera- tions an algorithm performs will be proportional to the actual running time of that algorithm. RAM Machine Model Definition This approach of simply counting primitive operations gives rise to a computational model called the Random Access Machine (RAM). This model, which should not be confused with “random access memory,” views a computer simply as a CPU connected to a bank of memory cells. Each memory cell stores a word, which can be a number, a character string, or an address—that is, the value of a base type. The term “random access” refers to the ability of the CPU to access an arbitrary memory cell with one primitive operation. To keep the model simple, we do not place any specific limits on the size of numbers that can be stored in words of memory. We assume the CPU in the RAM model can perform any primitive operation in a constant number of steps, which do not depend on the size of the input. Thus, an accurate bound on the number of primitive operations an algorithm performs corresponds directly to the running time of that algorithm in the RAM model. 8 Chapter 1. Algorithm Analysis 1.1.3 Counting Primitive Operations We now show how to count the number of primitive operations executed by an al- gorithm, using as an example algorithm arrayMax, whose pseudocode was given back in Algorithm 1.2. We do this analysis by focusing on each step of the algo- rithm and counting the primitive operations that it takes, taking into consideration that some operations are repeated, because they are enclosed in the body of a loop. Initializing the variable currentMax to A corresponds to two primitive op- erations (indexing into an array and assigning a value to a variable) and is executed only once at the beginning of the algorithm. Thus, it contributes two units to the count. At the beginning of the for loop, counter i is initialized to 1. This action corresponds to executing one primitive operation (assigning a value to a vari- able). Before entering the body of the for loop, condition i < n is verified. This action corresponds to executing one primitive instruction (comparing two numbers). Since counter i starts at 1 and is incremented by 1 at the end of each iteration of the loop, the comparison i < n is performed n times. Thus, it contributes n units to the count. The body of the for loop is executed n − 1 times (for values 1, 2,... , n − 1 of the counter). At each iteration, A[i] is compared with currentMax (two primitive operations, indexing and comparing), A[i] is possibly assigned to currentMax (two primitive operations, indexing and assigning), and the counter i is incremented (two primitive operations, summing and assigning). Hence, at each iteration of the loop, either four or six primitive operations are performed, depending on whether A[i] ≤ currentMax or A[i] > currentMax. Therefore, the body of the loop contributes between 4(n − 1) and 6(n − 1) units to the count. Returning the value of variable currentMax corresponds to one primitive op- eration, and is executed only once. To summarize, the number of primitive operations t(n) executed by algorithm ar- rayMax is at least 2 + 1 + n + 4(n − 1) + 1 = 5n and at most 2 + 1 + n + 6(n − 1) + 1 = 7n − 2. The best case (t(n) = 5n) occurs when A is the maximum element, so that variable currentMax is never reassigned. The worst case (t(n) = 7n − 2) occurs when the elements are sorted in increasing order, so that variable currentMax is reassigned at each iteration of the for loop. 1.1. Analyzing Algorithms 9 Average-Case and Worst-Case Analysis Like the arrayMax method, an algorithm may run faster on some inputs than it does on others. In such cases we may wish to express the running time of such an algorithm as an average taken over all possible inputs. Although such an average case analysis would often be valuable, it is typically quite challenging. It requires us to define a probability distribution on the set of inputs, which is typically a diffi- cult task. Figure 1.3 schematically shows how, depending on the input distribution, the running time of an algorithm can be anywhere between the worst-case time and the best-case time. For example, what if inputs are really only of types “A” or “D”? An average-case analysis also typically requires that we calculate expected run- ning times based on a given input distribution. Such an analysis often requires heavy mathematics and probability theory. Therefore, except for experimental studies or the analysis of algorithms that are themselves randomized, we will, for the remainder of this book, typically charac- terize running times in terms of the worst case. We say, for example, that algorithm arrayMax executes t(n) = 7n − 2 primitive operations in the worst case, meaning that the maximum number of primitive operations executed by the algorithm, taken over all inputs of size n, is 7n − 2. This type of analysis is much easier than an average-case analysis, as it does not require probability theory; it just requires the ability to identify the worst-case input, which is often straightforward. In addition, taking a worst-case approach can actually lead to better algorithms. Making the standard of success that of having an algorithm perform well in the worst case necessarily requires that it perform well on every input. That is, designing for the worst case can lead to stronger algorithmic “muscles,” much like a track star who always practices by running uphill. 5 ms worst-case time 4 ms } Running Time average-case time? 3 ms best-case time 2 ms 1 ms A B C D E F G Input Instance Figure 1.3: The difference between best-case and worst-case time. Each bar repre- sents the running time of some algorithm on a different possible input. 10 Chapter 1. Algorithm Analysis 1.1.4 Analyzing Recursive Algorithms Iteration is not the only interesting way of solving a problem. Another useful tech- nique, which is employed by many algorithms, is to use recursion. In this tech- nique, we define a procedure P that is allowed to make calls to itself as a subrou- tine, provided those calls to P are for solving subproblems of smaller size. The subroutine calls to P on smaller instances are called “recursive calls.” A recur- sive procedure should always define a base case, which is small enough that the algorithm can solve it directly without using recursion. We give a recursive solution to the array maximum problem in Algorithm 1.4. This algorithm first checks if the array contains just a single item, which in this case must be the maximum; hence, in this simple base case we can immediately solve the problem. Otherwise, the algorithm recursively computes the maximum of the first n − 1 elements in the array and then returns the maximum of this value and the last element in the array. As with this example, recursive algorithms are often quite elegant. Analyzing the running time of a recursive algorithm takes a bit of additional work, however. In particular, to analyze such a running time, we use a recurrence equation, which defines mathematical statements that the running time of a recursive algorithm must satisfy. We introduce a function T (n) that denotes the running time of the algorithm on an input of size n, and we write equations that T (n) must satisfy. For example, we can characterize the running time, T (n), of the recursiveMax algorithm as 3 if n = 1 T (n) = T (n − 1) + 7 otherwise, assuming that we count each comparison, array reference, recursive call, max cal- culation, or return as a single primitive operation. Ideally, we would like to char- acterize a recurrence equation like that above in closed form, where no references to the function T appear on the righthand side. For the recursiveMax algorithm, it isn’t too hard to see that a closed form would be T (n) = 7(n − 1) + 3 = 7n − 4. In general, determining closed form solutions to recurrence equations can be much more challenging than this, and we study some specific examples of recurrence equations in Chapter 8, when we study some sorting and selection algorithms. We study methods for solving recurrence equations of a general form in Section 11.1. Algorithm recursiveMax(A, n): Input: An array A storing n ≥ 1 integers. Output: The maximum element in A. if n = 1 then return A return max{recursiveMax(A, n − 1), A[n − 1]} Algorithm 1.4: Algorithm recursiveMax. 1.1. Analyzing Algorithms 11 1.1.5 Asymptotic Notation We have clearly gone into laborious detail for evaluating the running time of such a simple algorithm as arrayMax and its recursive cousin, recursiveMax. Such an approach would clearly prove cumbersome if we had to perform it for more complicated algorithms. In general, each step in a pseudocode description and each statement in a high-level language implementation tends to correspond to a small number of primitive operations that does not depend on the input size. Thus, we can perform a simplified analysis that estimates the number of primitive operations executed up to a constant factor, by counting the steps of the pseudocode or the statements of the high-level language executed. Fortunately, there is a notation that allows us to characterize the main factors affecting an algorithm’s running time without going into all the details of exactly how many primitive operations are performed for each constant-time set of instructions. The “Big-Oh” Notation Let f (n) and g(n) be functions mapping nonnegative integers to real numbers. We say that f (n) is O(g(n)) if there is a real constant c > 0 and an integer constant n0 ≥ 1 such that f (n) ≤ cg(n) for every integer n ≥ n0. This definition is often pronounced as “f (n) is big-Oh of g(n)” or “f (n) is order g(n).” (See Figure 1.5.) Example 1.1: 7n − 2 is O(n). Proof: We need a real constant c > 0 and an integer constant n0 ≥ 1 such that 7n − 2 ≤ cn for every integer n ≥ n0. It is easy to see that a possible choice is c = 7 and n0 = 1, but there are other possibilities as well. cg(n) Running Time f(n) n0 Input Size Figure 1.5: The function f (n) is O(g(n)), for f (n) ≤ c · g(n) when n ≥ n0. 12 Chapter 1. Algorithm Analysis The big-Oh notation allows us to say that a function of n is “less than or equal to” another function (by the inequality “≤” in the definition), up to a constant factor (by the constant c in the definition) and in the asymptotic sense as n grows toward infinity (by the statement “n ≥ n0 ” in the definition). The big-Oh notation is used widely to characterize running times and space bounds of algorithm in terms of a parameter, n, which represents the “size” of the problem. For example, if we are interested in finding the largest element in an array of integers (see arrayMax given in Algorithm 1.2), it would be most natural to let n denote the number of elements of the array. For example, we can write the following precise statement on the running time of algorithm arrayMax from Algorithm 1.2. Theorem 1.2: The running time of algorithm arrayMax for computing the max- imum element in an array of n integers is O(n). Proof: As shown in Section 1.1.3, the number of primitive operations executed by algorithm arrayMax is at most 7n − 2. We may therefore apply the big-Oh definition with c = 7 and n0 = 1 and conclude that the running time of algorithm arrayMax is O(n). Let us consider a few additional examples that illustrate the big-Oh notation. Example 1.3: 20n3 + 10n log n + 5 is O(n3 ). Proof: 20n3 + 10n log n + 5 ≤ 35n3 , for n ≥ 1. In fact, any polynomial, ak nk + ak−1 nk−1 + · · · + a0 , will always be O(nk ). Example 1.4: 3 log n + log log n is O(log n). Proof: 3 log n + log log n ≤ 4 log n, for n ≥ 2. Note that log log n is not even defined for n = 1, but log log n < log n, for n ≥ 2. That is why we use n ≥ 2. Example 1.5: 2100 is O(1). Proof: 2100 ≤ 2100 · 1, for n ≥ 1. Note that variable n does not appear in the inequality, since we are dealing with constant-valued functions. Example 1.6: 5n log n + 2n is O(n log n). Proof: 5n log n + 2n ≤ 7n log n, for n ≥ 2 (but not for n = 1). As mentioned above, we are typically interested in characterizing the running time or space usage of algorithm in terms of a function, f (n), which we bound using the big-Oh notion. For this reason, we should use the big-Oh notation to characterize such a function, f (n), using an asymptotically small and simple func- tion, g(n). For instance, while it is true that a function, f (n) = 4n3 + 3n4/3 , is O(n5 ), it is more informative to say that such an f (n) is O(n3 ). Moreover, it is 1.1. Analyzing Algorithms 13 often difficult or cumbersome to characterize the running time or space usage of algorithm exactly, whereas characterizing such measures using the big-Oh notion is typically easier. Instead of always applying the big-Oh definition directly to obtain a big-Oh characterization, we can often use the following rules to simplify our task of figur- ing out the simplest characterization. Theorem 1.7: Let d(n), e(n), f (n), and g(n) be functions mapping nonnegative integers to nonnegative reals. 1. If d(n) is O(f (n)), then ad(n) is O(f (n)), for any constant a > 0. 2. If d(n) is O(f (n)) and e(n) is O(g(n)), then d(n)+e(n) is O(f (n)+g(n)). 3. If d(n) is O(f (n)) and e(n) is O(g(n)), then d(n)e(n) is O(f (n)g(n)). 4. If d(n) is O(f (n)) and f (n) is O(g(n)), then d(n) is O(g(n)). 5. If f (n) is a polynomial of degree d (that is, f (n) = a0 + a1 n + · · · + ad nd ), then f (n) is O(nd ). 6. nx is O(an ) for any fixed x > 0 and a > 1. 7. log nx is O(log n) for any fixed x > 0. 8. logx n is O(ny ) for any fixed constants x > 0 and y > 0. It is considered poor taste to include constant factors and lower order terms in the big-Oh notation. For example, it is not fashionable to say that the function 2n2 is O(4n2 + 6n log n), although this is completely correct. We should strive instead to describe the function in the big-Oh in simplest terms. Example 1.8: 2n3 + 4n2 log n is O(n3 ). Proof: We can apply the rules of Theorem 1.7 as follows: log n is O(n) (Rule 8). 4n2 log n is O(4n3 ) (Rule 3). 2n3 + 4n2 log n is O(2n3 + 4n3 ) (Rule 2). 2n3 + 4n3 is O(n3 ) (Rule 5 or Rule 1). 2n3 + 4n2 log n is O(n3 ) (Rule 4). Some functions appear often in the analysis of algorithms and data structures, and we often use special terms to refer to them. Table 1.6 shows some terms com- monly used in algorithm analysis. logarithmic linear quadratic polynomial exponential O(log n) O(n) O(n2 ) O(nk ) (k ≥ 1) O(an ) (a > 1) Table 1.6: Terminology for classes of functions. 14 Chapter 1. Algorithm Analysis Using the Big-Oh Notation It is considered poor taste, in general, to say “f (n) ≤ O(g(n)),” since the big-Oh already denotes the “less-than-or-equal-to” concept. Likewise, although common, it is not completely correct to say “f (n) = O(g(n))” (with the usual understand- ing of the “=” relation), and it is actually incorrect to say “f (n) ≥ O(g(n))” or “f (n) > O(g(n)).” It is best to say “f (n) is O(g(n)).” For the more mathemati- cally inclined, it is also correct to say, “f (n) ∈ O(g(n)),” for the big-Oh notation is, technically speaking, denoting a whole collection of functions. Even with this interpretation, there is considerable freedom in how we can use arithmetic operations with the big-Oh notation, provided the connection to the def- inition of the big-Oh is clear. For instance, we can say, “f (n) is g(n) + O(h(n)),” which would mean that there are constants c > 0 and n0 ≥ 1 such that f (n) ≤ g(n) + ch(n) for n ≥ n0. As in this example, we may sometimes wish to give the exact leading term in an asymptotic characterization. In that case, we would say that “f (n) is g(n) + O(h(n)),” where√h(n) grows slower than g(n). For example, we could say that 2n log n + 4n + 10 n is 2n log n + O(n). Big-Omega and Big-Theta Just as the big-Oh notation provides an asymptotic way of saying that a function is “less than or equal to” another function, there are other notations that provide asymptotic ways of making other types of comparisons. Let f (n) and g(n) be functions mapping integers to real numbers. We say that f (n) is Ω(g(n)) (pronounced “f (n) is big-Omega of g(n)”) if g(n) is O(f (n)); that is, there is a real constant c > 0 and an integer constant n0 ≥ 1 such that f (n) ≥ cg(n), for n ≥ n0. This definition allows us to say asymptotically that one function is greater than or equal to another, up to a constant factor. Likewise, we say that f (n) is Θ(g(n)) (pronounced “f (n) is big-Theta of g(n)”) if f (n) is O(g(n)) and f (n) is Ω(g(n)); that is, there are real constants c > 0 and c > 0, and an integer constant n0 ≥ 1 such that c g(n) ≤ f (n) ≤ c g(n), for n ≥ n0. The big-Theta allows us to say that two functions are asymptotically equal, up to a constant factor. We consider some examples of these notations below. 1.1. Analyzing Algorithms 15 Example 1.9: 3 log n + log log n is Ω(log n). Proof: 3 log n + log log n ≥ 3 log n, for n ≥ 2. This example shows that lower-order terms are not dominant in establishing lower bounds with the big-Omega notation. Thus, as the next example sums up, lower-order terms are not dominant in the big-Theta notation either. Example 1.10: 3 log n + log log n is Θ(log n). Proof: This follows from Examples 1.4 and 1.9. Some Words of Caution A few words of caution about asymptotic notation are in order at this point. First, note that the use of the big-Oh and related notations can be somewhat misleading should the constant factors they “hide” be very large. For example, while it is true that the function 10100 n is Θ(n), if this is the running time of an algorithm being compared to one whose running time is 10n log n, we should prefer the Θ(n log n)- time algorithm, even though the linear-time algorithm is asymptotically faster. This preference is because the constant factor, 10100 , which is called “one googol,” is believed by many astronomers to be an upper bound on the number of atoms in the observable universe. So we are unlikely to ever have a real-world problem that has this number as its input size. Thus, even when using the big-Oh notation, we should at least be somewhat mindful of the constant factors and lower order terms we are “hiding.” The above observation raises the issue of what constitutes a “fast” algorithm. Generally speaking, any algorithm running in O(n log n) time (with a reasonable constant factor) should be considered efficient. Even an O(n2 )-time method may be fast enough in some contexts—that is, when n is small. But an algorithm running in Θ(2n ) time should never be considered efficient. This fact is illustrated by a famous story about the inventor of the game of chess. He asked only that his king pay him 1 grain of rice for the first square on the board, 2 grains for the second, 4 grains for the third, 8 for the fourth, and so on. But try to imagine the sight of 264 grains stacked on the last square! In fact, this number cannot even be represented as a standard long integer in most programming languages. Therefore, if we must draw a line between efficient and inefficient algorithms, it is natural to make this distinction be that between those algorithms running in polynomial time and those requiring exponential time. That is, make the distinction between algorithms with a running time that is O(nk ), for some constant k ≥ 1, and those with a running time that is Θ(c n ), for some constant c > 1. Like so many notions we have discussed in this section, this too should be taken with a “grain of salt,” for an algorithm running in Θ(n100 ) time should probably not be considered “efficient.” Even so, the distinction between polynomial-time and exponential-time algorithms is considered a robust measure of tractability. 16 Chapter 1. Algorithm Analysis Little-Oh and Little-Omega There are also some ways of saying that one function is strictly less than or strictly greater than another asymptotically, but these are not used as often as the big-Oh, big-Omega, and big-Theta. Nevertheless, for the sake of completeness, we give their definitions as well. Let f (n) and g(n) be functions mapping integers to real numbers. We say that f (n) is o(g(n)) (pronounced “f (n) is little-oh of g(n)”) if, for any constant c > 0, there is a constant n0 > 0 such that f (n) ≤ cg(n) for n ≥ n0. Likewise, we say that f (n) is ω(g(n)) (pronounced “f (n) is little-omega of g(n)”) if g(n) is o(f (n)), that is, if, for any constant c > 0, there is a constant n0 > 0 such that g(n) ≤ cf (n) for n ≥ n0. Intuitively, o(·) is analogous to “less than” in an asymptotic sense, and ω(·) is analogous to “greater than” in an asymptotic sense. Example 1.11: The function f (n) = 12n2 + 6n is o(n3 ) and ω(n). Proof: Let us first show that f (n) is o(n3 ). Let c > 0 be any constant. If we take n0 = (12 + 6)/c = 18/c, then 18 ≤ cn, for n ≥ n0. Thus, if n ≥ n0 , f (n) = 12n2 + 6n ≤ 12n2 + 6n2 = 18n2 ≤ cn3. Thus, f (n) is o(n3 ). To show that f (n) is ω(n), let c > 0 again be any constant. If we take n0 = c/12, then, for n ≥ n0 , 12n ≥ c. Thus, if n ≥ n0 , f (n) = 12n2 + 6n ≥ 12n2 ≥ cn. Thus, f (n) is ω(n). For the reader familiar with limits, we note that f (n) is o(g(n)) if and only if f (n) lim = 0, n→∞ g(n) provided this limit exists. The main difference between the little-oh and big-Oh notions is that f (n) is O(g(n)) if there exist constants c > 0 and n0 ≥ 1 such that f (n) ≤ cg(n), for n ≥ n0 ; whereas f (n) is o(g(n)) if for all constants c > 0 there is a constant n0 such that f (n) ≤ cg(n), for n ≥ n0. Intuitively, f (n) is o(g(n)) if f (n) becomes insignificant compared to g(n) as n grows toward infinity. As previously mentioned, asymptotic notation is useful because it allows us to concentrate on the main factor determining a function’s growth. To summarize, the asymptotic notations of big-Oh, big-Omega, and big-Theta, as well as little-oh and little-omega, provide a convenient language for us to analyze data structures and algorithms. As mentioned earlier, these notations provide con- venience because they let us concentrate on the “big picture” rather than low-level details. 1.1. Analyzing Algorithms 17 1.1.6 The Importance of Asymptotic Notation Asymptotic notation has many important benefits, which might not be immediately obvious. Specifically, we illustrate one important aspect of the asymptotic view- point in Table 1.7. This table explores the maximum size allowed for an input instance for various running times to be solved in 1 second, 1 minute, and 1 hour, assuming each operation can be processed in 1 microsecond (1 μs). It also shows the importance of algorithm design, because an algorithm with an asymptotically slow running time (for example, one that is O(n2 )) is beaten in the long run by an algorithm with an asymptotically faster running time (for example, one that is O(n log n)), even if the constant factor for the faster algorithm is worse. Running Maximum Problem Size (n) Time 1 second 1 minute 1 hour 400n 2,500 150,000 9,000,000 20nlog n 4,096 166,666 7,826,087 2n2 707 5,477 42,426 n4 31 88 244 2n 19 25 31 Table 1.7: Maximum size of a problem that can be solved in one second, one minute, and one hour, for various running times measured in microseconds. The importance of good algorithm design goes beyond just what can be solved effectively on a given computer, however. As shown in Table 1.8, even if we achieve a dramatic speedup in hardware, we still cannot overcome the handicap of an asymptotically slow algorithm. This table shows the new maximum problem size achievable for any fixed amount of time, assuming algorithms with the given running times are now run on a computer 256 times faster than the previous one. Running New Maximum Time Problem Size 400n 256m 20nlog n approx. 256((log m)/(7 + log m))m 2n2 16m n4 4m 2n m+8 Table 1.8: Increase in the maximum size of a problem that can be solved in a certain fixed amount of time, by using a computer that is 256 times faster than the previous one, for various running times of the algorithm. Each entry is given as a function of m, the previous maximum problem size. 18 Chapter 1. Algorithm Analysis Ordering Functions by Their Growth Rates Suppose two algorithms solving the same problem are available: an algorithm A, which has a running time of Θ(n), and an algorithm B, which has a running time of Θ(n2 ). Which one is better? The little-oh notation says that n is o(n2 ), which implies that algorithm A is asymptotically better than algorithm B, although for a given (small) value of n, it is possible for algorithm B to have lower running time than algorithm A. Still, in the long run, as shown in the above tables, the benefits of algorithm A over algorithm B will become clear. In general, we can use the little-oh notation to order functions by asymptotic growth rate, as we show in Table 1.9. Some Functions Ordered by Growth Rate Common Name log n logarithmic 2 log √ n polylogarithmic n square root n linear n log n linearithmic n2 quadratic n3 cubic 2n exponential Table 1.9: An ordered list of simple functions such that if a function f (n) precedes a function g(n) in the list, then f (n) is o(g(n)). Using common terminology, the function, logc n, for any c > 0, is also polylogarithmic, and the functions, n2 and n3 , are also polynomial. In Table 1.10, we illustrate the difference in the growth rate of the functions shown in Table 1.9. √ n log n log2 n n n log n n2 n3 2n 4 2 4 2 8 16 64 16 16 4 16 4 64 256 4, 096 65, 536 64 6 36 8 384 4, 096 262, 144 1.84 × 1019 256 8 64 16 2, 048 65, 536 16, 777, 216 1.15 × 1077 1, 024 10 100 32 10, 240 1, 048, 576 1.07 × 109 1.79 × 10308 4, 096 12 144 64 49, 152 16, 777, 216 6.87 × 1010 101233 16, 384 14 196 128 229, 376 268, 435, 456 4.4 × 1012 104932 65, 536 16 256 256 1, 048, 576 4.29 × 109 2.81 × 1014 1019728 262, 144 18 324 512 4, 718, 592 6.87 × 1010 1.8 × 1016 1078913 Table 1.10: Growth rates of several functions. Note the point at which the function √ n dominates log2 n. 1.2. A Quick Mathematical Review 19 1.2 A Quick Mathematical Review In this section, we briefly review some of the fundamental concepts from discrete mathematics that will arise in several of our discussions. In addition to these fun- damental concepts, Appendix A includes a list of other useful mathematical facts that apply in the context of data structure and algorithm analysis. 1.2.1 Summations A notation that appears again and again in the analysis of data structures and algo- rithms is the summation, which is defined as b f (i) = f (a) + f (a + 1) + f (a + 2) + · · · + f (b). i=a Summations arise in data structure and algorithm analysis because the running times of loops naturally give rise to summations. For example, a summation that often arises in data structure and algorithm analysis is the geometric summation. Theorem 1.12: For any integer n ≥ 0 and any real number 0 < a = 1, consider n ai = 1 + a + a2 + · · · + an i=0 (remembering that a0 = 1 if a > 0). This summation is equal to 1 − an+1. 1−a Summations as shown in Theorem 1.12 are called geometric summations, be- cause each term is geometrically larger than the previous one if a > 1. That is, the terms in such a geometric summation exhibit exponential growth. For example, everyone working in computing should know that 1 + 2 + 4 + 8 + · · · + 2n−1 = 2n − 1, for this is the largest integer that can be represented in binary notation using n bits. Another summation that arises in several contexts is n i = 1 + 2 + 3 + · · · + (n − 2) + (n − 1) + n. i=1 This summation often arises in the analysis of loops in cases where the number of operations performed inside the loop increases by a fixed, constant amount with each iteration. This summation also has an interesting history. In 1787, a German elementary schoolteacher decided to keep his 9- and 10-year-old pupils occupied with the task of adding up all the numbers from 1 to 100. But almost immediately after giving this assignment, one of the children claimed to have the answer—5,050. 20 Chapter 1. Algorithm Analysis That elementary school student was none other than Carl Gauss, who would grow up to be one of the greatest mathematicians of the 19th century. It is widely suspected that young Gauss derived the answer to his teacher’s assignment using the following identity. Theorem 1.13: For any integer n ≥ 1, we have n n(n + 1) i=. 2 i=1 Proof: We give two “visual” justifications of Theorem 1.13 in Figure 1.11, both of which are based on computing the area of a collection of rectangles representing the numbers 1 through n. In Figure 1.11a we draw a big triangle over an ordering of the rectangles, noting that the area of the rectangles is the same as that of the big triangle (n2 /2) plus that of n small triangles, each of area 1/2. In Figure 1.11b, which applies when n is even, we note that 1 plus n is n + 1, as is 2 plus n − 1, 3 plus n − 2, and so on. There are n/2 such pairings. n+1 n n...... 3 3 2 2 1 1 0 n 0 1 2 3 1 2 n/2 (a) (b) Figure 1.11: Visual justifications of Theorem 1.13. Both illustrations visualize the identity in terms of the total area covered by n unit-width rectangles with heights 1, 2,... , n. In (a) the rectangles are shown to cover a big triangle of area n2 /2 (base n and height n) plus n small triangles of area 1/2 each (base 1 and height 1). In (b), which applies only when n is even, the rectangles are shown to cover a big rectangle of base n/2 and height n + 1. 1.2. A Quick Mathematical Review 21 1.2.2 Logarithms and Exponents One of the interesting and sometimes even surprising aspects of the analysis of data structures and algorithms is the ubiquitous presence of logarithms and exponents, where we say logb a = c if a = bc. As is the custom in the computing literature, we omit writing the base b of the logarithm when b = 2. For example, log 1024 = 10. There are a number of important rules for logarithms and exponents, including the following: Theorem 1.14: Let a, b, and c be positive real numbers. We have 1. logb ac = logb a + logb c 2. logb a/c = logb a − logb c 3. logb ac = c logb a 4. logb a = (logc a)/ logc b 5. blogc a = alogc b 6. (ba )c = bac 7. ba bc = ba+c 8. ba /bc = ba−c. Also, as a notational shorthand, we use logc n to denote the function (log n)c and we use log log n to denote log(log n). Rather than show how we could derive each of the above identities, which all follow from the definition of logarithms and exponents, let us instead illustrate these identities with a few examples of their usefulness. Example 1.15: We illustrate some interesting cases when the base of a logarithm or exponent is 2. The rules cited refer to Theorem 1.14. log(2n log n) = 1 + log n + log log n, by Rule 1 (twice) log(n/2) = log n − log 2 = log n − 1, by Rule 2 √ log n = log(n)1/2 = (log n)/2, by Rule 3 √ log log n = log(log n)/2 = log log n − 1, by Rules 2 and 3 log4 n = (log n)/ log 4 = (log n)/2, by Rule 4 log 2n = n, by Rule 3 2log n = n, by Rule 5 22 log n = (2log n )2 = n2 , by Rules 5 and 6 4n = (22 )n = 22n , by Rule 6 n2 23 log n = n2 · n3 = n5 , by Rules 5, 6, and 7 4n /2n = 22n /2n = 22n−n = 2n , by Rules 6 and 8 22 Chapter 1. Algorithm Analysis The Floor and Ceiling Functions One additional comment concerning logarithms is in order. The value of a loga- rithm is typically not an integer, yet the running time of an algorithm is typically expressed by means of an integer quantity, such as the number of operations per- formed. Thus, an algorithm analysis may sometimes involve the use of the so-called “floor” and “ceiling” functions, which are defined respectively as follows: x = the largest integer less than or equal to x x = the smallest integer greater than or equal to x. These functions give us a way to convert real-valued functions into integer-valued functions. Even so, functions used to analyze data structures and algorithms are often expressed simply as real-valued functions (for example, n log n or n3/2 ). We should read such a running time as having a “big” ceiling function surrounding it.1 1.2.3 Simple Justification Techniques We will sometimes wish to make strong claims about a certain data structure or al- gorithm. We may, for example, wish to show that our algorithm is correct or that it runs fast. In order to rigorously make such claims, we must use mathematical lan- guage, and in order to back up such claims, we must justify or prove our statements. Fortunately, there are several simple ways to do this. By Example Some claims are of the generic form, “There is an element x in a set S that has property P.” To justify such a claim, we need only produce a particular x ∈ S that has property P. Likewise, some hard-to-believe claims are of the generic form, “Every element x in a set S has property P.” To justify that such a claim is false, we need to only produce a particular x from S that does not have property P. Such an instance is called a counterexample. Example 1.16: A certain Professor Amongus claims that every number of the form 2i − 1 is a prime, when i is an integer greater than 1. Professor Amongus is wrong. Proof: To prove Professor Amongus is wrong, we need to find a counter- example. Fortunately, we need not look too far, for 24 − 1 = 15 = 3 · 5. 1 Real-valued running-time functions are almost always used in conjunction with the asymptotic notation described in Section 1.1.5, for which the use of the ceiling function would usually be redun- dant anyway. (See Exercise R-1.25.) 1.2. A Quick Mathematical Review 23 Contrapositives and Contradiction Another set of justification techniques involves the use of the negative. The two primary such methods are the use of the contrapositive and the contradiction. To justify the statement “if p is true, then q is true,” we instead establish that “if q is not true, then p is not true.” Logically, these two statements are the same, but the latter, which is called the contrapositive of the first, may be easier to think about. Example 1.17: If ab is odd, then a is odd and b is odd. Proof: To justify this claim, let us prove the contrapositive, “If a is even or b is even, then ab is even.” So, suppose a = 2i or b = 2i, for some integer i. Then we have ab = (2i)b = 2ib, or we have ab = a(2i) = 2ai; hence, in either case, ab is even. Since this establishes the contrapositive, it proves the original statement. Besides showing a use of the contrapositive proof technique, the above exam- ple also contains an application of DeMorgan’s law. This law helps us deal with negations, for it states that the negation of a statement, “p or q,” is “not p and not q,” and that the negation of a statement, “p and q,” is “not p or not q.” Another justification technique is proof by contradiction, which also often in- volves using DeMorgan’s law. In applying the proof by contradiction technique, we establish that a statement q is true by first supposing that q is false and then showing that this assumption leads to a contradiction (such as 2 = 2 or 1 > 3). By reaching such a contradiction, we show that no consistent situation exists with q being false, so q must be true. Of course, in order to reach this conclusion, we must be sure our situation is consistent before we assume q is false. Example 1.18: If ab is even, then a is even or b is even. Proof: Let ab be even. We wish to show that a is even or b is even. So, with the hope of leading to a contradiction, let us assume the opposite, namely, suppose a is odd and b is odd. Then a = 2i + 1 and b = 2j + 1, for some integers i and j. Hence, ab = (2i + 1)(2j + 1) = 4ij + 2i + 2j + 1 = 2(2ij + i + j) + 1; that is, ab is odd. But this is a contradiction: ab cannot simultaneously be odd and even. Therefore, a is even or b is even. Induction Most of the claims we make about a running time or a space bound involve an inte- ger parameter n (usually denoting an intuitive notion of the “size” of the problem). Moreover, most of these claims are equivalent to saying some statement q(n) is true “for all n ≥ 1.” Since this is making a claim about an infinite set of numbers, we cannot justify this exhaustively in a direct fashion. We can often justify claims such as those above as true, however, by using the technique of induction. This technique amounts to showing that, for any particular 24 Chapter 1. Algorithm Analysis n ≥ 1, there is a finite sequence of implications that starts with something known to be true and ultimately leads to showing that q(n) is true. Specifically, we begin a proof by induction by showing that q(n) is true for n = 1 (and possibly some other values n = 2, 3,... , k, for some constant k). Then we justify that the inductive “step” is true for n > k, namely, we show “if q(i) is true for i < n, then q(n) is true.” The combination of these two pieces completes the proof by induction. Example 1.19: Consider the Fibonacci sequence: F (1) = 1, F (2) = 2, and F (n) = F (n − 1) + F (n − 2) for n > 2. We claim that F (n) < 2n. Proof: We will show our claim is right by induction. Base cases: (n ≤ 2). F (1) = 1 < 2 = 21 and F (2) = 2 < 4 = 22. Induction step: (n > 2). Suppose our claim is true for n < n. Consider F (n). Since n > 2, F (n) = F (n − 1) + F (n − 2). Moreover, since n − 1 < n and n−2 < n, we can apply the inductive assumption (sometimes called the “inductive hypothesis”) to imply that F (n) < 2n−1 + 2n−2. In addition, 2n−1 + 2n−2 < 2n−1 + 2n−1 = 2 · 2n−1 = 2n. Let us do another inductive argument, this time for a fact we have seen before. Theorem 1.20: (which is the same as Theorem 1.13) n n(n + 1) i=. 2 i=1 Proof: We will justify this equality by induction. Base case: n = 1. Trivial, for 1 = n(n + 1)/2, if n = 1. Induction step: n ≥ 2. Assume the claim is true for n < n. Consider n. n n−1 i=n+ i. i=1 i=1 By the induction hypothesis, then n (n − 1)n i=n+ , 2 i=1 which we can simplify as (n − 1)n 2n + n2 − n n2 + n n(n + 1) n+ = = =. 2 2 2 2 It is useful to think about the concreteness of the inductive technique. It shows that, for any particular n, there is a finite step-by-step sequence of implications that starts with something true and leads to the truth about n. In short, the inductive argument is a formula for building a sequence of direct proofs. 1.2. A Quick Mathematical Review 25 Loop Invariants The final justification technique we discuss in this section is the loop invariant. To prove some statement S about a loop is correct, define S in terms of a series of smaller statements S0 , S1 ,... , Sk , where 1. The initial claim, S0 , is true before the loop begins. 2. If Si−1 is true before iteration i begins, then one can show that