Data Structures and Algorithms in Python PDF

Data Structures and Algorithms in Python Michael T. Goodrich Department of Computer Science University of California, Irvine Roberto Tamassia Department of Computer Science Brown University Michael H. Goldwasser Department of Mathematics and Computer Science Saint Louis University VP & PUBLISHER Don Fowley EXECUTIVE EDITOR Beth Lang Golub EDITORIAL PROGRAM ASSISTANT Katherine Willis MARKETING MANAGER Christopher Ruel DESIGNER Kenji Ngieng SENIOR PRODUCTION MANAGER Janis Soo ASSOCIATE PRODUCTION MANAGER Joyce Poh This book was set in LaTEX by the authors. Printed and bound by Courier Westford. The cover was printed by Courier Westford. This book is printed on acid free paper. Founded in 1807, John Wiley & Sons, Inc. has been a valued source of knowledge and understanding for more than 200 years, helping people around the world meet their needs and fulﬁll their aspirations. Our company is built on a foundation of principles that include responsibility to the communities we serve and where we live and work. In 2008, we launched a Corporate Citizenship Initiative, a global effort to address the environmental, social, economic, and ethical challenges we face in our business. Among the issues we are addressing are carbon impact, paper speciﬁcations and procurement, ethical conduct within our business and among our vendors, and community and charitable support. For more information, please visit our website: www.wiley.com/go/citizenship. Copyright © 2013 John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc. 222 Rosewood Drive, Danvers, MA 01923, website www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, (201)748-6011, fax (201)748-6008, website http://www.wiley.com/go/permissions. Evaluation copies are provided to qualiﬁed academics and professionals for review purposes only, for use in their courses during the next academic year. These copies are licensed and may not be sold or transferred to a third party. Upon completion of the review period, please return the evaluation copy to Wiley. Return instructions and a free of charge return mailing label are available at www.wiley.com/go/returnlabel. If you have chosen to adopt this textbook for use in your course, please accept this book as your complimentary desk copy. Outside of the United States, please contact your local sales representative. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 To Karen, Paul, Anna, and Jack – Michael T. Goodrich To Isabel – Roberto Tamassia To Susan, Calista, and Maya – Michael H. Goldwasser Preface The design and analysis of efﬁcient data structures has long been recognized as a vital subject in computing and is part of the core curriculum of computer science and computer engineering undergraduate degrees. Data Structures and Algorithms in Python provides an introduction to data structures and algorithms, including their design, analysis, and implementation. This book is designed for use in a beginning- level data structures course, or in an intermediate-level introduction to algorithms course. We discuss its use for such courses in more detail later in this preface. To promote the development of robust and reusable software, we have tried to take a consistent object-oriented viewpoint throughout this text. One of the main ideas of the object-oriented approach is that data should be presented as being en- capsulated with the methods that access and modify them. That is, rather than simply viewing data as a collection of bytes and addresses, we think of data ob- jects as instances of an abstract data type (ADT), which includes a repertoire of methods for performing operations on data objects of this type. We then empha- size that there may be several different implementation strategies for a particular ADT, and explore the relative pros and cons of these choices. We provide complete Python implementations for almost all data structures and algorithms discussed, and we introduce important object-oriented design patterns as means to organize those implementations into reusable components. Desired outcomes for readers of our book include that: They have knowledge of the most common abstractions for data collections (e.g., stacks, queues, lists, trees, maps). They understand algorithmic strategies for producing efﬁcient realizations of common data structures. They can analyze algorithmic performance, both theoretically and experi- mentally, and recognize common trade-offs between competing strategies. They can wisely use existing data structures and algorithms found in modern programming language libraries. They have experience working with concrete implementations for most foun- dational data structures and algorithms. They can apply data structures and algorithms to solve complex problems. In support of the last goal, we present many example applications of data structures throughout the book, including the processing of ﬁle systems, matching of tags in structured formats such as HTML, simple cryptography, text frequency analy- sis, automated geometric layout, Huffman coding, DNA sequence alignment, and search engine indexing. v vi Preface Book Features This book is based upon the book Data Structures and Algorithms in Java by Goodrich and Tamassia, and the related Data Structures and Algorithms in C++ by Goodrich, Tamassia, and Mount. However, this book is not simply a translation of those other books to Python. In adapting the material for this book, we have signiﬁcantly redesigned the organization and content of the book as follows: The code base has been entirely redesigned to take advantage of the features of Python, such as use of generators for iterating elements of a collection. Many algorithms that were presented as pseudo-code in the Java and C++ versions are directly presented as complete Python code. In general, ADTs are deﬁned to have consistent interface with Python’s built- in data types and those in Python’s collections module. Chapter 5 provides an in-depth exploration of the dynamic array-based un- derpinnings of Python’s built-in list, tuple, and str classes. New Appendix A serves as an additional reference regarding the functionality of the str class. Over 450 illustrations have been created or revised. New and revised exercises bring the overall total number to 750. Online Resources This book is accompanied by an extensive set of online resources, which can be found at the following Web site: www.wiley.com/college/goodrich Students are encouraged to use this site along with the book, to help with exer- cises and increase understanding of the subject. Instructors are likewise welcome to use the site to help plan, organize, and present their course materials. Included on this Web site is a collection of educational aids that augment the topics of this book, for both students and instructors. Because of their added value, some of these online resources are password protected. For all readers, and especially for students, we include the following resources: All the Python source code presented in this book. PDF handouts of Powerpoint slides (four-per-page) provided to instructors. A database of hints to all exercises, indexed by problem number. For instructors using this book, we include the following additional teaching aids: Solutions to hundreds of the book’s exercises. Color versions of all ﬁgures and illustrations from the book. Slides in Powerpoint and PDF (one-per-page) format. The slides are fully editable, so as to allow an instructor using this book full free- dom in customizing his or her presentations. All the online resources are provided at no extra charge to any instructor adopting this book for his or her course. Preface vii Contents and Organization The chapters for this book are organized to provide a pedagogical path that starts with the basics of Python programming and object-oriented design. We then add foundational techniques like algorithm analysis and recursion. In the main portion of the book, we present fundamental data structures and algorithms, concluding with a discussion of memory management (that is, the architectural underpinnings of data structures). Speciﬁcally, the chapters for this book are organized as follows: 1. Python Primer 2. Object-Oriented Programming 3. Algorithm Analysis 4. Recursion 5. Array-Based Sequences 6. Stacks, Queues, and Deques 7. Linked Lists 8. Trees 9. Priority Queues 10. Maps, Hash Tables, and Skip Lists 11. Search Trees 12. Sorting and Selection 13. Text Processing 14. Graph Algorithms 15. Memory Management and B-Trees A. Character Strings in Python B. Useful Mathematical Facts A more detailed table of contents follows this preface, beginning on page xi. Prerequisites We assume that the reader is at least vaguely familiar with a high-level program- ming language, such as C, C++, Python, or Java, and that he or she understands the main constructs from such a high-level language, including: Variables and expressions. Decision structures (such as if-statements and switch-statements). Iteration structures (for loops and while loops). Functions (whether stand-alone or object-oriented methods). For readers who are familiar with these concepts, but not with how they are ex- pressed in Python, we provide a primer on the Python language in Chapter 1. Still, this book is primarily a data structures book, not a Python book; hence, it does not give a comprehensive treatment of Python. viii Preface We delay treatment of object-oriented programming in Python until Chapter 2. This chapter is useful for those new to Python, and for those who may be familiar with Python, yet not with object-oriented programming. In terms of mathematical background, we assume the reader is somewhat famil- iar with topics from high-school mathematics. Even so, in Chapter 3, we discuss the seven most-important functions for algorithm analysis. In fact, sections that use something other than one of these seven functions are considered optional, and are indicated with a star (). We give a summary of other useful mathematical facts, including elementary probability, in Appendix B. Relation to Computer Science Curriculum To assist instructors in designing a course in the context of the IEEE/ACM 2013 Computing Curriculum, the following table describes curricular knowledge units that are covered within this book. Knowledge Unit Relevant Material AL/Basic Analysis Chapter 3 and Sections 4.2 & 12.2.4 AL/Algorithmic Strategies Sections 12.2.1, 13.2.1, 13.3, & 13.4.2 AL/Fundamental Data Structures Sections 4.1.3, 5.5.2, 9.4.1, 9.3, 10.2, 11.1, 13.2, and Algorithms Chapter 12 & much of Chapter 14 Sections 5.3, 10.4, 11.2 through 11.6, 12.3.1, AL/Advanced Data Structures 13.5, 14.5.1, & 15.3 AR/Memory System Organization Chapter 15 and Architecture DS/Sets, Relations and Functions Sections 10.5.1, 10.5.2, & 9.4 DS/Proof Techniques Sections 3.4, 4.2, 5.3.2, 9.3.6, & 12.4.1 DS/Basics of Counting Sections 2.4.2, 6.2.2, 12.2.4, 8.2.2 & Appendix B DS/Graphs and Trees Much of Chapters 8 and 14 DS/Discrete Probability Sections 1.11.1, 10.2, 10.4.2, & 12.3.1 Much of the book, yet especially Chapter 2 and PL/Object-Oriented Programming Sections 7.4, 9.5.1, 10.1.3, & 11.2.1 PL/Functional Programming Section 1.10 SDF/Algorithms and Design Sections 2.1, 3.3, & 12.2.1 SDF/Fundamental Programming Chapters 1 & 4 Concepts Chapters 6 & 7, Appendix A, and Sections 1.2.1, SDF/Fundamental Data Structures 5.2, 5.4, 9.1, & 10.1 SDF/Developmental Methods Sections 1.7 & 2.2 SE/Software Design Sections 2.1 & 2.1.3 Mapping IEEE/ACM 2013 Computing Curriculum knowledge units to coverage in this book. Preface ix About the Authors Michael Goodrich received his Ph.D. in Computer Science from Purdue University in 1987. He is currently a Chancellor’s Professor in the Department of Computer Science at University of California, Irvine. Previously, he was a professor at Johns Hopkins University. He is a Fulbright Scholar and a Fellow of the American As- sociation for the Advancement of Science (AAAS), Association for Computing Machinery (ACM), and Institute of Electrical and Electronics Engineers (IEEE). He is a recipient of the IEEE Computer Society Technical Achievement Award, the ACM Recognition of Service Award, and the Pond Award for Excellence in Undergraduate Teaching. Roberto Tamassia received his Ph.D. in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign in 1988. He is the Plastech Professor of Computer Science and the Chair of the Department of Computer Sci- ence at Brown University. He is also the Director of Brown’s Center for Geometric Computing. His research interests include information security, cryptography, anal- ysis, design, and implementation of algorithms, graph drawing and computational geometry. He is a Fellow of the American Association for the Advancement of Science (AAAS), Association for Computing Machinery (ACM) and Institute for Electrical and Electronic Engineers (IEEE). He is also a recipient of the Technical Achievement Award from the IEEE Computer Society. Michael Goldwasser received his Ph.D. in Computer Science from Stanford University in 1997. He is currently a Professor in the Department of Mathematics and Computer Science at Saint Louis University and the Director of their Com- puter Science program. Previously, he was a faculty member in the Department of Computer Science at Loyola University Chicago. His research interests focus on the design and implementation of algorithms, having published work involving approximation algorithms, online computation, computational biology, and compu- tational geometry. He is also active in the computer science education community. Additional Books by These Authors M.T. Goodrich and R. Tamassia, Data Structures and Algorithms in Java, Wiley. M.T. Goodrich, R. Tamassia, and D.M. Mount, Data Structures and Algorithms in C++, Wiley. M.T. Goodrich and R. Tamassia, Algorithm Design: Foundations, Analysis, and Internet Examples, Wiley. M.T. Goodrich and R. Tamassia, Introduction to Computer Security, Addison- Wesley. M.H. Goldwasser and D. Letscher, Object-Oriented Programming in Python, Prentice Hall. x Preface Acknowledgments We have depended greatly upon the contributions of many individuals as part of the development of this book. We begin by acknowledging the wonderful team at Wiley. We are grateful to our editor, Beth Golub, for her enthusiastic support of this project, from beginning to end. The efforts of Elizabeth Mills and Katherine Willis were critical in keeping the project moving, from its early stages as an initial proposal, through the extensive peer review process. We greatly appreciate the attention to detail demonstrated by Julie Kennedy, the copyeditor for this book. Finally, many thanks are due to Joyce Poh for managing the ﬁnal months of the production process. We are truly indebted to the outside reviewers and readers for their copious comments, emails, and constructive criticism, which were extremely useful in writ- ing this edition. We therefore thank the following reviewers for their comments and suggestions: Claude Anderson (Rose Hulman Institute of Technology), Alistair Campbell (Hamilton College), Barry Cohen (New Jersey Institute of Technology), Robert Franks (Central College), Andrew Harrington (Loyola University Chicago), Dave Musicant (Carleton College), and Victor Norman (Calvin College). We wish to particularly acknowledge Claude for going above and beyond the call of duty, providing us with an enumeration of 400 detailed corrections or suggestions. We thank David Mount, of University of Maryland, for graciously sharing the wisdom gained from his experience with the C++ version of this text. We are grate- ful to Erin Chambers and David Letscher, of Saint Louis University, for their intan- gible contributions during many hallway conversations about the teaching of data structures, and to David for comments on early versions of the Python code base for this book. We thank David Zampino, a student at Loyola University Chicago, for his feedback while using a draft of this book during an independent study course, and to Andrew Harrington for supervising David’s studies. We also wish to reiterate our thanks to the many research collaborators and teaching assistants whose feedback shaped the previous Java and C++ versions of this material. The beneﬁts of those contributions carry forward to this book. Finally, we would like to warmly thank Susan Goldwasser, Isabel Cruz, Karen Goodrich, Giuseppe Di Battista, Franco Preparata, Ioannis Tollis, and our parents for providing advice, encouragement, and support at various stages of the prepa- ration of this book, and Calista and Maya Goldwasser for offering their advice regarding the artistic merits of many illustrations. More importantly, we thank all of these people for reminding us that there are things in life beyond writing books. Michael T. Goodrich Roberto Tamassia Michael H. Goldwasser Contents Preface................................. v 1 Python Primer 1 1.1 Python Overview......................... 2 1.1.1 The Python Interpreter.................. 2 1.1.2 Preview of a Python Program.............. 3 1.2 Objects in Python........................ 4 1.2.1 Identiﬁers, Objects, and the Assignment Statement... 4 1.2.2 Creating and Using Objects................ 6 1.2.3 Python’s Built-In Classes................. 7 1.3 Expressions, Operators, and Precedence........... 12 1.3.1 Compound Expressions and Operator Precedence.... 17 1.4 Control Flow........................... 18 1.4.1 Conditionals........................ 18 1.4.2 Loops........................... 20 1.5 Functions............................. 23 1.5.1 Information Passing.................... 24 1.5.2 Python’s Built-In Functions................ 28 1.6 Simple Input and Output.................... 30 1.6.1 Console Input and Output................ 30 1.6.2 Files............................ 31 1.7 Exception Handling....................... 33 1.7.1 Raising an Exception................... 34 1.7.2 Catching an Exception.................. 36 1.8 Iterators and Generators.................... 39 1.9 Additional Python Conveniences................ 42 1.9.1 Conditional Expressions.................. 42 1.9.2 Comprehension Syntax.................. 43 1.9.3 Packing and Unpacking of Sequences.......... 44 1.10 Scopes and Namespaces.................... 46 1.11 Modules and the Import Statement.............. 48 1.11.1 Existing Modules..................... 49 1.12 Exercises............................. 51 xi xii Contents 2 Object-Oriented Programming 56 2.1 Goals, Principles, and Patterns................ 57 2.1.1 Object-Oriented Design Goals.............. 57 2.1.2 Object-Oriented Design Principles............ 58 2.1.3 Design Patterns...................... 61 2.2 Software Development..................... 62 2.2.1 Design........................... 62 2.2.2 Pseudo-Code....................... 64 2.2.3 Coding Style and Documentation............. 64 2.2.4 Testing and Debugging.................. 67 2.3 Class Deﬁnitions......................... 69 2.3.1 Example: CreditCard Class................ 69 2.3.2 Operator Overloading and Python’s Special Methods.. 74 2.3.3 Example: Multidimensional Vector Class......... 77 2.3.4 Iterators.......................... 79 2.3.5 Example: Range Class................... 80 2.4 Inheritance............................ 82 2.4.1 Extending the CreditCard Class.............. 83 2.4.2 Hierarchy of Numeric Progressions............ 87 2.4.3 Abstract Base Classes................... 93 2.5 Namespaces and Object-Orientation............. 96 2.5.1 Instance and Class Namespaces.............. 96 2.5.2 Name Resolution and Dynamic Dispatch......... 100 2.6 Shallow and Deep Copying................... 101 2.7 Exercises............................. 103 3 Algorithm Analysis 109 3.1 Experimental Studies...................... 111 3.1.1 Moving Beyond Experimental Analysis.......... 113 3.2 The Seven Functions Used in This Book........... 115 3.2.1 Comparing Growth Rates................. 122 3.3 Asymptotic Analysis....................... 123 3.3.1 The “Big-Oh” Notation.................. 123 3.3.2 Comparative Analysis................... 128 3.3.3 Examples of Algorithm Analysis............. 130 3.4 Simple Justiﬁcation Techniques................ 137 3.4.1 By Example........................ 137 3.4.2 The “Contra” Attack................... 137 3.4.3 Induction and Loop Invariants.............. 138 3.5 Exercises............................. 141 Contents xiii 4 Recursion 148 4.1 Illustrative Examples...................... 150 4.1.1 The Factorial Function.................. 150 4.1.2 Drawing an English Ruler................. 152 4.1.3 Binary Search....................... 155 4.1.4 File Systems........................ 157 4.2 Analyzing Recursive Algorithms................ 161 4.3 Recursion Run Amok...................... 165 4.3.1 Maximum Recursive Depth in Python.......... 168 4.4 Further Examples of Recursion................. 169 4.4.1 Linear Recursion...................... 169 4.4.2 Binary Recursion..................... 174 4.4.3 Multiple Recursion.................... 175 4.5 Designing Recursive Algorithms................ 177 4.6 Eliminating Tail Recursion................... 178 4.7 Exercises............................. 180 5 Array-Based Sequences 183 5.1 Python’s Sequence Types.................... 184 5.2 Low-Level Arrays......................... 185 5.2.1 Referential Arrays..................... 187 5.2.2 Compact Arrays in Python................ 190 5.3 Dynamic Arrays and Amortization............... 192 5.3.1 Implementing a Dynamic Array.............. 195 5.3.2 Amortized Analysis of Dynamic Arrays.......... 197 5.3.3 Python’s List Class.................... 201 5.4 Eﬃciency of Python’s Sequence Types............ 202 5.4.1 Python’s List and Tuple Classes............. 202 5.4.2 Python’s String Class................... 208 5.5 Using Array-Based Sequences................. 210 5.5.1 Storing High Scores for a Game............. 210 5.5.2 Sorting a Sequence.................... 214 5.5.3 Simple Cryptography................... 216 5.6 Multidimensional Data Sets.................. 219 5.7 Exercises............................. 224 6 Stacks, Queues, and Deques 228 6.1 Stacks............................... 229 6.1.1 The Stack Abstract Data Type.............. 230 6.1.2 Simple Array-Based Stack Implementation........ 231 6.1.3 Reversing Data Using a Stack.............. 235 6.1.4 Matching Parentheses and HTML Tags......... 236 xiv Contents 6.2 Queues.............................. 239 6.2.1 The Queue Abstract Data Type............. 240 6.2.2 Array-Based Queue Implementation........... 241 6.3 Double-Ended Queues...................... 247 6.3.1 The Deque Abstract Data Type............. 247 6.3.2 Implementing a Deque with a Circular Array....... 248 6.3.3 Deques in the Python Collections Module........ 249 6.4 Exercises............................. 250 7 Linked Lists 255 7.1 Singly Linked Lists........................ 256 7.1.1 Implementing a Stack with a Singly Linked List..... 261 7.1.2 Implementing a Queue with a Singly Linked List..... 264 7.2 Circularly Linked Lists...................... 266 7.2.1 Round-Robin Schedulers................. 267 7.2.2 Implementing a Queue with a Circularly Linked List... 268 7.3 Doubly Linked Lists....................... 270 7.3.1 Basic Implementation of a Doubly Linked List...... 273 7.3.2 Implementing a Deque with a Doubly Linked List.... 275 7.4 The Positional List ADT.................... 277 7.4.1 The Positional List Abstract Data Type......... 279 7.4.2 Doubly Linked List Implementation............ 281 7.5 Sorting a Positional List.................... 285 7.6 Case Study: Maintaining Access Frequencies........ 286 7.6.1 Using a Sorted List.................... 286 7.6.2 Using a List with the Move-to-Front Heuristic...... 289 7.7 Link-Based vs. Array-Based Sequences............ 292 7.8 Exercises............................. 294 8 Trees 299 8.1 General Trees........................... 300 8.1.1 Tree Deﬁnitions and Properties.............. 301 8.1.2 The Tree Abstract Data Type.............. 305 8.1.3 Computing Depth and Height............... 308 8.2 Binary Trees........................... 311 8.2.1 The Binary Tree Abstract Data Type........... 313 8.2.2 Properties of Binary Trees................ 315 8.3 Implementing Trees....................... 317 8.3.1 Linked Structure for Binary Trees............. 317 8.3.2 Array-Based Representation of a Binary Tree...... 325 8.3.3 Linked Structure for General Trees............ 327 8.4 Tree Traversal Algorithms................... 328 Contents xv 8.4.1 Preorder and Postorder Traversals of General Trees... 328 8.4.2 Breadth-First Tree Traversal............... 330 8.4.3 Inorder Traversal of a Binary Tree............ 331 8.4.4 Implementing Tree Traversals in Python......... 333 8.4.5 Applications of Tree Traversals.............. 337 8.4.6 Euler Tours and the Template Method Pattern .... 341 8.5 Case Study: An Expression Tree................ 348 8.6 Exercises............................. 352 9 Priority Queues 362 9.1 The Priority Queue Abstract Data Type........... 363 9.1.1 Priorities.......................... 363 9.1.2 The Priority Queue ADT................. 364 9.2 Implementing a Priority Queue................ 365 9.2.1 The Composition Design Pattern............. 365 9.2.2 Implementation with an Unsorted List.......... 366 9.2.3 Implementation with a Sorted List............ 368 9.3 Heaps............................... 370 9.3.1 The Heap Data Structure................. 370 9.3.2 Implementing a Priority Queue with a Heap....... 372 9.3.3 Array-Based Representation of a Complete Binary Tree. 376 9.3.4 Python Heap Implementation............... 376 9.3.5 Analysis of a Heap-Based Priority Queue......... 379 9.3.6 Bottom-Up Heap Construction ............. 380 9.3.7 Python’s heapq Module.................. 384 9.4 Sorting with a Priority Queue................. 385 9.4.1 Selection-Sort and Insertion-Sort............. 386 9.4.2 Heap-Sort......................... 388 9.5 Adaptable Priority Queues................... 390 9.5.1 Locators.......................... 390 9.5.2 Implementing an Adaptable Priority Queue....... 391 9.6 Exercises............................. 395 10 Maps, Hash Tables, and Skip Lists 401 10.1 Maps and Dictionaries..................... 402 10.1.1 The Map ADT...................... 403 10.1.2 Application: Counting Word Frequencies......... 405 10.1.3 Python’s MutableMapping Abstract Base Class..... 406 10.1.4 Our MapBase Class.................... 407 10.1.5 Simple Unsorted Map Implementation.......... 408 10.2 Hash Tables........................... 410 10.2.1 Hash Functions...................... 411 xvi Contents 10.2.2 Collision-Handling Schemes................ 417 10.2.3 Load Factors, Rehashing, and Eﬃciency......... 420 10.2.4 Python Hash Table Implementation........... 422 10.3 Sorted Maps........................... 427 10.3.1 Sorted Search Tables................... 428 10.3.2 Two Applications of Sorted Maps............ 434 10.4 Skip Lists............................. 437 10.4.1 Search and Update Operations in a Skip List...... 439 10.4.2 Probabilistic Analysis of Skip Lists ........... 443 10.5 Sets, Multisets, and Multimaps................ 446 10.5.1 The Set ADT....................... 446 10.5.2 Python’s MutableSet Abstract Base Class........ 448 10.5.3 Implementing Sets, Multisets, and Multimaps...... 450 10.6 Exercises............................. 452 11 Search Trees 459 11.1 Binary Search Trees....................... 460 11.1.1 Navigating a Binary Search Tree............. 461 11.1.2 Searches.......................... 463 11.1.3 Insertions and Deletions.................. 465 11.1.4 Python Implementation.................. 468 11.1.5 Performance of a Binary Search Tree........... 473 11.2 Balanced Search Trees..................... 475 11.2.1 Python Framework for Balancing Search Trees...... 478 11.3 AVL Trees............................. 481 11.3.1 Update Operations.................... 483 11.3.2 Python Implementation.................. 488 11.4 Splay Trees............................ 490 11.4.1 Splaying.......................... 490 11.4.2 When to Splay....................... 494 11.4.3 Python Implementation.................. 496 11.4.4 Amortized Analysis of Splaying ............ 497 11.5 (2,4) Trees............................ 502 11.5.1 Multiway Search Trees.................. 502 11.5.2 (2,4)-Tree Operations................... 505 11.6 Red-Black Trees......................... 512 11.6.1 Red-Black Tree Operations................ 514 11.6.2 Python Implementation.................. 525 11.7 Exercises............................. 528 Contents xvii 12 Sorting and Selection 536 12.1 Why Study Sorting Algorithms?................ 537 12.2 Merge-Sort............................ 538 12.2.1 Divide-and-Conquer.................... 538 12.2.2 Array-Based Implementation of Merge-Sort....... 543 12.2.3 The Running Time of Merge-Sort............ 544 12.2.4 Merge-Sort and Recurrence Equations ......... 546 12.2.5 Alternative Implementations of Merge-Sort....... 547 12.3 Quick-Sort............................ 550 12.3.1 Randomized Quick-Sort.................. 557 12.3.2 Additional Optimizations for Quick-Sort......... 559 12.4 Studying Sorting through an Algorithmic Lens....... 562 12.4.1 Lower Bound for Sorting................. 562 12.4.2 Linear-Time Sorting: Bucket-Sort and Radix-Sort.... 564 12.5 Comparing Sorting Algorithms................. 567 12.6 Python’s Built-In Sorting Functions.............. 569 12.6.1 Sorting According to a Key Function........... 569 12.7 Selection............................. 571 12.7.1 Prune-and-Search..................... 571 12.7.2 Randomized Quick-Select................. 572 12.7.3 Analyzing Randomized Quick-Select........... 573 12.8 Exercises............................. 574 13 Text Processing 581 13.1 Abundance of Digitized Text.................. 582 13.1.1 Notations for Strings and the Python str Class...... 583 13.2 Pattern-Matching Algorithms................. 584 13.2.1 Brute Force........................ 584 13.2.2 The Boyer-Moore Algorithm............... 586 13.2.3 The Knuth-Morris-Pratt Algorithm............ 590 13.3 Dynamic Programming..................... 594 13.3.1 Matrix Chain-Product................... 594 13.3.2 DNA and Text Sequence Alignment........... 597 13.4 Text Compression and the Greedy Method......... 601 13.4.1 The Huﬀman Coding Algorithm............. 602 13.4.2 The Greedy Method.................... 603 13.5 Tries................................ 604 13.5.1 Standard Tries....................... 604 13.5.2 Compressed Tries..................... 608 13.5.3 Suﬃx Tries........................ 610 13.5.4 Search Engine Indexing.................. 612 xviii Contents 13.6 Exercises............................. 613 14 Graph Algorithms 619 14.1 Graphs............................... 620 14.1.1 The Graph ADT...................... 626 14.2 Data Structures for Graphs................... 627 14.2.1 Edge List Structure.................... 628 14.2.2 Adjacency List Structure................. 630 14.2.3 Adjacency Map Structure................. 632 14.2.4 Adjacency Matrix Structure................ 633 14.2.5 Python Implementation.................. 634 14.3 Graph Traversals......................... 638 14.3.1 Depth-First Search.................... 639 14.3.2 DFS Implementation and Extensions........... 644 14.3.3 Breadth-First Search................... 648 14.4 Transitive Closure........................ 651 14.5 Directed Acyclic Graphs.................... 655 14.5.1 Topological Ordering................... 655 14.6 Shortest Paths.......................... 659 14.6.1 Weighted Graphs..................... 659 14.6.2 Dijkstra’s Algorithm.................... 661 14.7 Minimum Spanning Trees.................... 670 14.7.1 Prim-Jarnı́k Algorithm.................. 672 14.7.2 Kruskal’s Algorithm.................... 676 14.7.3 Disjoint Partitions and Union-Find Structures...... 681 14.8 Exercises............................. 686 15 Memory Management and B-Trees 697 15.1 Memory Management...................... 698 15.1.1 Memory Allocation.................... 699 15.1.2 Garbage Collection.................... 700 15.1.3 Additional Memory Used by the Python Interpreter... 703 15.2 Memory Hierarchies and Caching............... 705 15.2.1 Memory Systems..................... 705 15.2.2 Caching Strategies.................... 706 15.3 External Searching and B-Trees................ 711 15.3.1 (a,b) Trees......................... 712 15.3.2 B-Trees.......................... 714 15.4 External-Memory Sorting.................... 715 15.4.1 Multiway Merging..................... 716 15.5 Exercises............................. 717 Contents xix A Character Strings in Python 721 B Useful Mathematical Facts 725 Bibliography 732 Index 737 Chapter 1 Contents Python Primer 1.1 Python Overview........................ 2 1.1.1 The Python Interpreter................... 2 1.1.2 Preview of a Python Program............... 3 1.2 Objects in Python....................... 4 1.2.1 Identiﬁers, Objects, and the Assignment Statement.... 4 1.2.2 Creating and Using Objects................. 6 1.2.3 Python’s Built-In Classes.................. 7 1.3 Expressions, Operators, and Precedence........... 12 1.3.1 Compound Expressions and Operator Precedence..... 17 1.4 Control Flow.......................... 18 1.4.1 Conditionals......................... 18 1.4.2 Loops............................ 20 1.5 Functions............................ 23 1.5.1 Information Passing..................... 24 1.5.2 Python’s Built-In Functions................. 28 1.6 Simple Input and Output................... 30 1.6.1 Console Input and Output................. 30 1.6.2 Files............................. 31 1.7 Exception Handling...................... 33 1.7.1 Raising an Exception.................... 34 1.7.2 Catching an Exception................... 36 1.8 Iterators and Generators................... 39 1.9 Additional Python Conveniences............... 42 1.9.1 Conditional Expressions................... 42 1.9.2 Comprehension Syntax................... 43 1.9.3 Packing and Unpacking of Sequences........... 44 1.10 Scopes and Namespaces................... 46 1.11 Modules and the Import Statement............. 48 1.11.1 Existing Modules...................... 49 1.12 Exercises............................ 51 2 Chapter 1. Python Primer 1.1 Python Overview Building data structures and algorithms requires that we communicate detailed in- structions to a computer. An excellent way to perform such communications is using a high-level computer language, such as Python. The Python programming language was originally developed by Guido van Rossum in the early 1990s, and has since become a prominently used language in industry and education. The sec- ond major version of the language, Python 2, was released in 2000, and the third major version, Python 3, released in 2008. We note that there are signiﬁcant in- compatibilities between Python 2 and Python 3. This book is based on Python 3 (more speciﬁcally, Python 3.1 or later). The latest version of the language is freely available at www.python.org, along with documentation and tutorials. In this chapter, we provide an overview of the Python programming language, and we continue this discussion in the next chapter, focusing on object-oriented principles. We assume that readers of this book have prior programming experi- ence, although not necessarily using Python. This book does not provide a com- plete description of the Python language (there are numerous language references for that purpose), but it does introduce all aspects of the language that are used in code fragments later in this book. 1.1.1 The Python Interpreter Python is formally an interpreted language. Commands are executed through a piece of software known as the Python interpreter. The interpreter receives a com- mand, evaluates that command, and reports the result of the command. While the interpreter can be used interactively (especially when debugging), a programmer typically deﬁnes a series of commands in advance and saves those commands in a plain text ﬁle known as source code or a script. For Python, source code is conven- tionally stored in a ﬁle named with the.py sufﬁx (e.g., demo.py). On most operating systems, the Python interpreter can be started by typing python from the command line. By default, the interpreter starts in interactive mode with a clean workspace. Commands from a predeﬁned script saved in a ﬁle (e.g., demo.py) are executed by invoking the interpreter with the ﬁlename as an argument (e.g., python demo.py), or using an additional -i ﬂag in order to execute a script and then enter interactive mode (e.g., python -i demo.py). Many integrated development environments (IDEs) provide richer software development platforms for Python, including one named IDLE that is included with the standard Python distribution. IDLE provides an embedded text-editor with support for displaying and editing Python code, and a basic debugger, allowing step-by-step execution of a program while examining key variable values. 1.1. Python Overview 3 1.1.2 Preview of a Python Program As a simple introduction, Code Fragment 1.1 presents a Python program that com- putes the grade-point average (GPA) for a student based on letter grades that are entered by a user. Many of the techniques demonstrated in this example will be discussed in the remainder of this chapter. At this point, we draw attention to a few high-level issues, for readers who are new to Python as a programming language. Python’s syntax relies heavily on the use of whitespace. Individual statements are typically concluded with a newline character, although a command can extend to another line, either with a concluding backslash character (\), or if an opening delimiter has not yet been closed, such as the { character in deﬁning value map. Whitespace is also key in delimiting the bodies of control structures in Python. Speciﬁcally, a block of code is indented to designate it as the body of a control structure, and nested control structures use increasing amounts of indentation. In Code Fragment 1.1, the body of the while loop consists of the subsequent 8 lines, including a nested conditional structure. Comments are annotations provided for human readers, yet ignored by the Python interpreter. The primary syntax for comments in Python is based on use of the # character, which designates the remainder of the line as a comment. print( Welcome to the GPA calculator. ) print( Please enter all your letter grades, one per line. ) print( Enter a blank line to designate the end. ) # map from letter grade to point value points = { A+ :4.0, A :4.0, A- :3.67, B+ :3.33, B :3.0, B- :2.67, C+ :2.33, C :2.0, C :1.67, D+ :1.33, D :1.0, F :0.0} num courses = 0 total points = 0 done = False while not done: grade = input( ) # read line from user if grade == : # empty line was entered done = True elif grade not in points: # unrecognized grade entered print("Unknown grade {0} being ignored".format(grade)) else: num courses += 1 total points += points[grade] if num courses > 0: # avoid division by zero print( Your GPA is {0:.3}.format(total points / num courses)) Code Fragment 1.1: A Python program that computes a grade-point average (GPA). 4 Chapter 1. Python Primer 1.2 Objects in Python Python is an object-oriented language and classes form the basis for all data types. In this section, we describe key aspects of Python’s object model, and we intro- duce Python’s built-in classes, such as the int class for integers, the ﬂoat class for ﬂoating-point values, and the str class for character strings. A more thorough presentation of object-orientation is the focus of Chapter 2. 1.2.1 Identiﬁers, Objects, and the Assignment Statement The most important of all Python commands is an assignment statement, such as temperature = 98.6 This command establishes temperature as an identiﬁer (also known as a name), and then associates it with the object expressed on the right-hand side of the equal sign, in this case a ﬂoating-point object with value 98.6. We portray the outcome of this assignment in Figure 1.1. ﬂoat temperature 98.6 Figure 1.1: The identiﬁer temperature references an instance of the ﬂoat class having value 98.6. Identiﬁers Identiﬁers in Python are case-sensitive, so temperature and Temperature are dis- tinct names. Identiﬁers can be composed of almost any combination of letters, numerals, and underscore characters (or more general Unicode characters). The primary restrictions are that an identiﬁer cannot begin with a numeral (thus 9lives is an illegal name), and that there are 33 specially reserved words that cannot be used as identiﬁers, as shown in Table 1.1. Reserved Words False as continue else from in not return yield None assert def except global is or try True break del ﬁnally if lambda pass while and class elif for import nonlocal raise with Table 1.1: A listing of the reserved words in Python. These names cannot be used as identiﬁers. 1.2. Objects in Python 5 For readers familiar with other programming languages, the semantics of a Python identiﬁer is most similar to a reference variable in Java or a pointer variable in C++. Each identiﬁer is implicitly associated with the memory address of the object to which it refers. A Python identiﬁer may be assigned to a special object named None, serving a similar purpose to a null reference in Java or C++. Unlike Java and C++, Python is a dynamically typed language, as there is no advance declaration associating an identiﬁer with a particular data type. An iden- tiﬁer can be associated with any type of object, and it can later be reassigned to another object of the same (or different) type. Although an identiﬁer has no de- clared type, the object to which it refers has a deﬁnite type. In our ﬁrst example, the characters 98.6 are recognized as a ﬂoating-point literal, and thus the identiﬁer temperature is associated with an instance of the ﬂoat class having that value. A programmer can establish an alias by assigning a second identiﬁer to an existing object. Continuing with our earlier example, Figure 1.2 portrays the result of a subsequent assignment, original = temperature. ﬂoat temperature original 98.6 Figure 1.2: Identiﬁers temperature and original are aliases for the same object. Once an alias has been established, either name can be used to access the under- lying object. If that object supports behaviors that affect its state, changes enacted through one alias will be apparent when using the other alias (because they refer to the same object). However, if one of the names is reassigned to a new value using a subsequent assignment statement, that does not affect the aliased object, rather it breaks the alias. Continuing with our concrete example, we consider the command: temperature = temperature + 5.0 The execution of this command begins with the evaluation of the expression on the right-hand side of the = operator. That expression, temperature + 5.0, is eval- uated based on the existing binding of the name temperature, and so the result has value 103.6, that is, 98.6 + 5.0. That result is stored as a new ﬂoating-point instance, and only then is the name on the left-hand side of the assignment state- ment, temperature, (re)assigned to the result. The subsequent conﬁguration is dia- grammed in Figure 1.3. Of particular note, this last command had no effect on the value of the existing ﬂoat instance that identiﬁer original continues to reference. ﬂoat ﬂoat temperature original 103.6 98.6 Figure 1.3: The temperature identiﬁer has been assigned to a new value, while original continues to refer to the previously existing value. 6 Chapter 1. Python Primer 1.2.2 Creating and Using Objects Instantiation The process of creating a new instance of a class is known as instantiation. In general, the syntax for instantiating an object is to invoke the constructor of a class. For example, if there were a class named Widget, we could create an instance of that class using a syntax such as w = Widget( ), assuming that the constructor does not require any parameters. If the constructor does require parameters, we might use a syntax such as Widget(a, b, c) to construct a new instance. Many of Python’s built-in classes (discussed in Section 1.2.3) support what is known as a literal form for designating new instances. For example, the command temperature = 98.6 results in the creation of a new instance of the ﬂoat class; the term 98.6 in that expression is a literal form. We discuss further cases of Python literals in the coming section. From a programmer’s perspective, yet another way to indirectly create a new instance of a class is to call a function that creates and returns such an instance. For example, Python has a built-in function named sorted (see Section 1.5.2) that takes a sequence of comparable elements as a parameter and returns a new instance of the list class containing those elements in sorted order. Calling Methods Python supports traditional functions (see Section 1.5) that are invoked with a syn- tax such as sorted(data), in which case data is a parameter sent to the function. Python’s classes may also deﬁne one or more methods (also known as member functions), which are invoked on a speciﬁc instance of a class using the dot (“.”) operator. For example, Python’s list class has a method named sort that can be invoked with a syntax such as data.sort( ). This particular method rearranges the contents of the list so that they are sorted. The expression to the left of the dot identiﬁes the object upon which the method is invoked. Often, this will be an identiﬁer (e.g., data), but we can use the dot op- erator to invoke a method upon the immediate result of some other operation. For example, if response identiﬁes a string instance (we will discuss strings later in this section), the syntax response.lower( ).startswith( y ) ﬁrst evaluates the method call, response.lower( ), which itself returns a new string instance, and then the startswith( y ) method is called on that intermediate string. When using a method of a class, it is important to understand its behavior. Some methods return information about the state of an object, but do not change that state. These are known as accessors. Other methods, such as the sort method of the list class, do change the state of an object. These methods are known as mutators or update methods. 1.2. Objects in Python 7 1.2.3 Python’s Built-In Classes Table 1.2 provides a summary of commonly used, built-in classes in Python; we take particular note of which classes are mutable and which are immutable. A class is immutable if each object of that class has a ﬁxed value upon instantiation that cannot subsequently be changed. For example, the ﬂoat class is immutable. Once an instance has been created, its value cannot be changed (although an identiﬁer referencing that object can be reassigned to a different value). Class Description Immutable? bool Boolean value int integer (arbitrary magnitude) ﬂoat ﬂoating-point number list mutable sequence of objects tuple immutable sequence of objects str character string set unordered set of distinct objects frozenset immutable form of set class dict associative mapping (aka dictionary) Table 1.2: Commonly used built-in classes for Python In this section, we provide an introduction to these classes, discussing their purpose and presenting several means for creating instances of the classes. Literal forms (such as 98.6) exist for most of the built-in classes, and all of the classes support a traditional constructor form that creates instances that are based upon one or more existing values. Operators supported by these classes are described in Section 1.3. More detailed information about these classes can be found in later chapters as follows: lists and tuples (Chapter 5); strings (Chapters 5 and 13, and Appendix A); sets and dictionaries (Chapter 10). The bool Class The bool class is used to manipulate logical (Boolean) values, and the only two instances of that class are expressed as the literals True and False. The default constructor, bool( ), returns False, but there is no reason to use that syntax rather than the more direct literal form. Python allows the creation of a Boolean value from a nonboolean type using the syntax bool(foo) for value foo. The interpretation depends upon the type of the parameter. Numbers evaluate to False if zero, and True if nonzero. Sequences and other container types, such as strings and lists, evaluate to False if empty and True if nonempty. An important application of this interpretation is the use of a nonboolean value as a condition in a control structure. 8 Chapter 1. Python Primer The int Class The int and ﬂoat classes are the primary numeric types in Python. The int class is designed to represent integer values with arbitrary magnitude. Unlike Java and C++, which support different integral types with different precisions (e.g., int, short, long), Python automatically chooses the internal representation for an in- teger based upon the magnitude of its value. Typical literals for integers include 0, 137, and −23. In some contexts, it is convenient to express an integral value using binary, octal, or hexadecimal. That can be done by using a preﬁx of the number 0 and then a character to describe the base. Example of such literals are respectively 0b1011, 0o52, and 0x7f. The integer constructor, int( ), returns value 0 by default. But this constructor can be used to construct an integer value based upon an existing value of another type. For example, if f represents a ﬂoating-point value, the syntax int(f) produces the truncated value of f. For example, both int(3.14) and int(3.99) produce the value 3, while int(−3.9) produces the value −3. The constructor can also be used to parse a string that is presumed to represent an integral value (such as one en- tered by a user). If s represents a string, then int(s) produces the integral value that string represents. For example, the expression int( 137 ) produces the inte- ger value 137. If an invalid string is given as a parameter, as in int( hello ), a ValueError is raised (see Section 1.7 for discussion of Python’s exceptions). By de- fault, the string must use base 10. If conversion from a different base is desired, that base can be indicated as a second, optional, parameter. For example, the expression int( 7f , 16) evaluates to the integer 127. The ﬂoat Class The ﬂoat class is the sole ﬂoating-point type in Python, using a ﬁxed-precision representation. Its precision is more akin to a double in Java or C++, rather than those languages’ ﬂoat type. We have already discussed a typical literal form, 98.6. We note that the ﬂoating-point equivalent of an integral number can be expressed directly as 2.0. Technically, the trailing zero is optional, so some programmers might use the expression 2. to designate this ﬂoating-point literal. One other form of literal for ﬂoating-point values uses scientiﬁc notation. For example, the literal 6.022e23 represents the mathematical value 6.022 × 1023. The constructor form of ﬂoat( ) returns 0.0. When given a parameter, the con- structor attempts to return the equivalent ﬂoating-point value. For example, the call ﬂoat(2) returns the ﬂoating-point value 2.0. If the parameter to the constructor is a string, as with ﬂoat( 3.14 ), it attempts to parse that string as a ﬂoating-point value, raising a ValueError as an exception. 1.2. Objects in Python 9 Sequence Types: The list, tuple, and str Classes The list, tuple, and str classes are sequence types in Python, representing a col- lection of values in which the order is signiﬁcant. The list class is the most general, representing a sequence of arbitrary objects (akin to an “array” in other languages). The tuple class is an immutable version of the list class, beneﬁting from a stream- lined internal representation. The str class is specially designed for representing an immutable sequence of text characters. We note that Python does not have a separate class for characters; they are just strings with length one. The list Class A list instance stores a sequence of objects. A list is a referential structure, as it technically stores a sequence of references to its elements (see Figure 1.4). El- ements of a list may be arbitrary objects (including the None object). Lists are array-based sequences and are zero-indexed, thus a list of length n has elements indexed from 0 to n − 1 inclusive. Lists are perhaps the most used container type in Python and they will be extremely central to our study of data structures and algo- rithms. They have many valuable behaviors, including the ability to dynamically expand and contract their capacities as needed. In this chapter, we will discuss only the most basic properties of lists. We revisit the inner working of all of Python’s sequence types as the focus of Chapter 5. Python uses the characters [ ] as delimiters for a list literal, with [ ] itself being an empty list. As another example, [ red , green , blue ] is a list containing three string instances. The contents of a list literal need not be expressed as literals; if identiﬁers a and b have been established, then syntax [a, b] is legitimate. The list( ) constructor produces an empty list by default. However, the construc- tor will accept any parameter that is of an iterable type. We will discuss iteration further in Section 1.8, but examples of iterable types include all of the standard con- tainer types (e.g., strings, list, tuples, sets, dictionaries). For example, the syntax list( hello ) produces a list of individual characters, [ h , e , l , l , o ]. Because an existing list is itself iterable, the syntax backup = list(data) can be used to construct a new list instance referencing the same contents as the original. 2 3 5 7 11 13 17 19 23 29 31 primes: 0 1 2 3 4 5 6 7 8 9 10 Figure 1.4: Python’s internal representation of a list of integers, instantiated as prime = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31]. The implicit indices of the ele- ments are shown below each entry. 10 Chapter 1. Python Primer The tuple Class The tuple class provides an immutable version of a sequence, and therefore its instances have an internal representation that may be more streamlined than that of a list. While Python uses the [ ] characters to delimit a list, parentheses delimit a tuple, with ( ) being an empty tuple. There is one important subtlety. To express a tuple of length one as a literal, a comma must be placed after the element, but within the parentheses. For example, (17,) is a one-element tuple. The reason for this requirement is that, without the trailing comma, the expression (17) is viewed as a simple parenthesized numeric expression. The str Class Python’s str class is speciﬁcally designed to efﬁciently represent an immutable sequence of characters, based upon the Unicode international character set. Strings have a more compact internal representation than the referential lists and tuples, as portrayed in Figure 1.5. S A M P L E 0 1 2 3 4 5 Figure 1.5: A Python string, which is an indexed sequence of characters. String literals can be enclosed in single quotes, as in hello , or double quotes, as in "hello". This choice is convenient, especially when using an- other of the quotation characters as an actual character in the sequence, as in "Don t worry". Alternatively, the quote delimiter can be designated using a backslash as a so-called escape character, as in Don\ t worry. Because the backslash has this purpose, the backslash must itself be escaped to occur as a natu- ral character of the string literal, as in C:\\Python\\ , for a string that would be displayed as C:\Python\. Other commonly escaped characters are \n for newline and \t for tab. Unicode characters can be included, such as 20\u20AC for the string 20. Python also supports using the delimiter or """ to begin and end a string literal. The advantage of such triple-quoted strings is that newline characters can be embedded naturally (rather than escaped as \n). This can greatly improve the readability of long, multiline strings in source code. For example, at the beginning of Code Fragment 1.1, rather than use separate print statements for each line of introductory output, we can use a single print statement, as follows: print(”””Welcome to the GPA calculator. Please enter all your letter grades, one per line. Enter a blank line to designate the end.”””) 1.2. Objects in Python 11 The set and frozenset Classes Python’s set class represents the mathematical notion of a set, namely a collection of elements, without duplicates, and without an inherent order to those elements. The major advantage of using a set, as opposed to a list, is that it has a highly optimized method for checking whether a speciﬁc element is contained in the set. This is based on a data structure known as a hash table (which will be the primary topic of Chapter 10). However, there are two important restrictions due to the algorithmic underpinnings. The ﬁrst is that the set does not maintain the elements in any particular order. The second is that only instances of immutable types can be added to a Python set. Therefore, objects such as integers, ﬂoating-point numbers, and character strings are eligible to be elements of a set. It is possible to maintain a set of tuples, but not a set of lists or a set of sets, as lists and sets are mutable. The frozenset class is an immutable form of the set type, so it is legal to have a set of frozensets. Python uses curly braces { and } as delimiters for a set, for example, as {17} or { red , green , blue }. The exception to this rule is that { } does not represent an empty set; for historical reasons, it represents an empty dictionary (see next paragraph). Instead, the constructor syntax set( ) produces an empty set. If an iterable parameter is sent to the constructor, then the set of distinct elements is produced. For example, set( hello ) produces { h , e , l , o }. The dict Class Python’s dict class represents a dictionary, or mapping, from a set of distinct keys to associated values. For example, a dictionary might map from unique student ID numbers, to larger student records (such as the student’s name, address, and course grades). Python implements a dict using an almost identical approach to that of a set, but with storage of the associated values. A dictionary literal also uses curly braces, and because dictionaries were intro- duced in Python prior to sets, the literal form { } produces an empty dictionary. A nonempty dictionary is expressed using a comma-separated series of key:value pairs. For example, the dictionary { ga : Irish , de : German } maps ga to Irish and de to German. The constructor for the dict class accepts an existing mapping as a parameter, in which case it creates a new dictionary with identical associations as the existing one. Alternatively, the constructor accepts a sequence of key-value pairs as a pa- rameter, as in dict(pairs) with pairs = [( ga , Irish ), ( de , German )]. 12 Chapter 1. Python Primer 1.3 Expressions, Operators, and Precedence In the previous section, we demonstrated how names can be used to identify ex- isting objects, and how literals and constructors can be used to create instances of built-in classes. Existing values can be combined into larger syntactic expressions using a variety of special symbols and keywords known as operators. The seman- tics of an operator depends upon the type of its operands. For example, when a and b are numbers, the syntax a + b indicates addition, while if a and b are strings, the operator indicates concatenation. In this section, we describe Python’s opera- tors in various contexts of the built-in types. We continue, in Section 1.3.1, by discussing compound expressions, such as a + b c, which rely on the evaluation of two or more operations. The order in which the operations of a compound expression are evaluated can affect the overall value of the expression. For this reason, Python deﬁnes a speciﬁc order of precedence for evaluating operators, and it allows a programmer to override this order by using explicit parentheses to group subexpressions. Logical Operators Python supports the following keyword operators for Boolean values: not unary negation and conditional and or conditional or The and and or operators short-circuit, in that they do not evaluate the second operand if the result can be determined based on the value of the ﬁrst operand. This feature is useful when constructing Boolean expressions in which we ﬁrst test that a certain condition holds (such as a reference not being None), and then test a condition that could have otherwise generated an error condition had the prior test not succeeded. Equality Operators Python supports the following operators to test two notions of equality: is same identity is not different identity == equivalent != not equivalent The expression a is b evaluates to True, precisely when identiﬁers a and b are aliases for the same object. The expression a == b tests a more general notion of equivalence. If identiﬁers a and b refer to the same object, then a == b should also evaluate to True. Yet a == b also evaluates to True when the identiﬁers refer to 1.3. Expressions, Operators, and Precedence 13 different objects that happen to have values that are deemed equivalent. The precise notion of equivalence depends on the data type. For example, two strings are con- sidered equivalent if they match character for character. Two sets are equivalent if they have the same contents, irrespective of order. In most programming situations, the equivalence tests == and != are the appropriate operators; use of is and is not should be reserved for situations in which it is necessary to detect true aliasing. Comparison Operators Data types may deﬁne a natural order via the following operators: < less than greater than >= greater than or equal to These operators have expected behavior for numeric types, and are deﬁned lexi- cographically, and case-sensitively, for strings. An exception is raised if operands have incomparable types, as with 5 < hello. Arithmetic Operators Python supports the following arithmetic operators: + addition − subtraction multiplication / true division // integer division % the modulo operator The use of addition, subtraction, and multiplication is straightforward, noting that if both operands have type int, then the result is an int as well; if one or both operands have type ﬂoat, the result will be a ﬂoat. Python takes more care in its treatment of division. We ﬁrst consider the case in which both operands have type int, for example, the quantity 27 divided by 4. In mathematical notation, 27 ÷ 4 = 6 34 = 6.75. In Python, the / operator designates true division, returning the ﬂoating-point result of the computation. Thus, 27 / 4 results in the ﬂoat value 6.75. Python supports the pair of opera- tors // and % to perform the integral calculations, with expression 27 // 4 evalu- ating to int value 6 (the mathematical ﬂoor of the quotient), and expression 27 % 4 evaluating to int value 3, the remainder of the integer division. We note that lan- guages such as C, C++, and Java do not support the // operator; instead, the / op- erator returns the truncated quotient when both operands have integral type, and the result of true division when at least one operand has a ﬂoating-point type. 14 Chapter 1. Python Primer Python carefully extends the semantics of // and % to cases where one or both operands are negative. For the sake of notation, let us assume that variables n and m represent respectively the dividend and divisor of a quotient m n , and that q = n // m and r = n % m. Python guarantees that q m + r will equal n. We already saw an example of this identity with positive operands, as 6 ∗ 4 + 3 = 27. When the divisor m is positive, Python further guarantees that 0 ≤ r < m. As a consequence, we ﬁnd that −27 // 4 evaluates to −7 and −27 % 4 evaluates to 1, as (−7) ∗ 4 + 1 = −27. When the divisor is negative, Python guarantees that m < r ≤ 0. As an example, 27 // −4 is −7 and 27 % −4 is −1, satisfying the identity 27 = (−7) ∗ (−4) + (−1). The conventions for the // and % operators are even extended to ﬂoating- point operands, with the expression q = n // m being the integral ﬂoor of the quotient, and r = n % m being the “remainder” to ensure that q m + r equals n. For example, 8.2 // 3.14 evaluates to 2.0 and 8.2 % 3.14 evaluates to 1.92, as 2.0 ∗ 3.14 + 1.92 = 8.2. Bitwise Operators Python provides the following bitwise operators for integers: ∼ bitwise complement (preﬁx unary operator) & bitwise and | bitwise or ˆ bitwise exclusive-or > shift bits right, ﬁlling in with sign bit Sequence Operators Each of Python’s built-in sequence types (str, tuple, and list) support the following operator syntaxes: s[j] element at index j s[start:stop] slice including indices [start,stop) s[start:stop:step] slice including indices start, start + step, start + 2 step,... , up to but not equalling or stop s+t concatenation of sequences k s shorthand for s + s + s +... (k times) val in s containment check val not in s non-containment check Python relies on zero-indexing of sequences, thus a sequence of length n has ele- ments indexed from 0 to n − 1 inclusive. Python also supports the use of negative indices, which denote a distance from the end of the sequence; index −1 denotes the last element, index −2 the second to last, and so on. Python uses a slicing 1.3. Expressions, Operators, and Precedence 15 notation to describe subsequences of a sequence. Slices are described as half-open intervals, with a start index that is included, and a stop index that is excluded. For example, the syntax data[3:8] denotes a subsequence including the ﬁve indices: 3, 4, 5, 6, 7. An optional “step” value, possibly negative, can be indicated as a third parameter of the slice. If a start index or stop index is omitted in the slicing nota- tion, it is presumed to designate the respective extreme of the original sequence. Because lists are mutable, the syntax s[j] = val can be used to replace an ele- ment at a given index. Lists also support a syntax, del s[j], that removes the desig- nated element from the list. Slice notation can also be used to replace or delete a sublist. The notation val in s can be used for any of the sequences to see if there is an element equivalent to val in the sequence. For strings, this syntax can be used to check for a single character or for a larger substring, as with amp in example. All sequences deﬁne comparison operations based on lexicographic order, per- forming an element by element comparison until the ﬁrst difference is found. For example, [5, 6, 9] < [5, 7] because of the entries at index 1. Therefore, the follow- ing operations are supported by sequence types: s == t equivalent (element by element) s != t not equivalent s < t lexicographically less than s t lexicographically greater than s >= t lexicographically greater than or equal to Operators for Sets and Dictionaries Sets and frozensets support the following operators: key in s containment check key not in s non-containment check s1 == s2 s1 is equivalent to s2 s1 != s2 s1 is not equivalent to s2 s1 = s2 s1 is superset of s2 s1 > s2 s1 is proper superset of s2 s1 | s2 the union of s1 and s2 s1 & s2 the intersection of s1 and s2 s1 − s2 the set of elements in s1 but not s2 s1 ˆ s2 the set of elements in precisely one of s1 or s2 Note well that sets do not guarantee a particular order of their elements, so the comparison operators, such as 1. This function is deﬁned as follows: x = logb n if and only if bx = n. By deﬁnition, logb 1 = 0. The value b is known as the base of the logarithm. The most common base for the logarithm function in computer science is 2, as computers store integers in binary, and because a common operation in many algorithms is to repeatedly divide an input in half. In fact, this base is so common that we will typically omit it from the notation when it is 2. That is, for us, log n = log2 n. 116 Chapter 3. Algorithm Analysis We note that most handheld calculators have a button marked LOG, but this is typically for calculating the logarithm base-10, not base-two. Computing the logarithm function exactly for any integer n involves the use of calculus, but we can use an approximation that is good enough for our pur- poses without calculus. In particular, we can easily compute the smallest integer greater than or equal to logb n (its so-called ceiling, logb n). For positive integer, n, this value is equal to the number of times we can divide n by b before we get a number less than or equal to 1. For example, the evaluation of log3 27 is 3, because ((27/3)/3)/3 = 1. Likewise, log4 64 is 3, because ((64/4)/4)/4 = 1, and log2 12 is 4, because (((12/2)/2)/2)/2 = 0.75 ≤ 1. The following proposition describes several important identities that involve logarithms for any base greater than 1. Proposition 3.1 (Logarithm Rules): Given real numbers a > 0, b > 1, c > 0 and d > 1, we have: 1. logb (ac) = logb a + logb c 2. logb (a/c) = logb a − logb c 3. logb (ac ) = c logb a 4. logb a = logd a/ logd b 5. blogd a = alogd b By convention, the unparenthesized notation lognc denotes the value log(nc ). We use a notational shorthand, logc n, to denote the quantity, (log n)c , in which the result of the logarithm is raised to a power. The above identities can be derived from converse rules for exponentiation that we will present on page 121. We illustrate these identities with a few examples. Example 3.2: We demonstrate below some interesting applications of the loga- rithm rules from Proposition 3.1 (using the usual convention that the base of a logarithm is 2 if it is omitted). log(2n) = log 2 + log n = 1 + log n, by rule 1 log(n/2) = log n − log 2 = log n − 1, by rule 2 log n3 = 3 log n, by rule 3 log 2n = n log 2 = n · 1 = n, by rule 3 log4 n = (log n)/ log 4 = (log n)/2, by rule 4 2log n = nlog 2 = n1 = n, by rule 5. As a practical matter, we note that rule 4 gives us a way to compute the base-two logarithm on a calculator that has a base-10 logarithm button, LOG, for log2 n = LOG n / LOG 2. 3.2. The Seven Functions Used in This Book 117 The Linear Function Another simple yet important function is the linear function, f (n) = n. That is, given an input value n, the linear function f assigns the value n itself. This function arises in algorithm analysis any time we have to do a single basic operation for each of n elements. For example, comparing a number x to each element of a sequence of size n will require n comparisons. The linear function also represents the best running time we can hope to achieve for any algorithm that processes each of n objects that are not already in the computer’s memory, because reading in the n objects already requires n operations. The N-Log-N Function The next function we discuss in this section is the n-log-n function, f (n) = n log n, that is, the function that assigns to an input n the value of n times the logarithm base-two of n. This function grows a little more rapidly than the linear function and a lot less rapidly than the quadratic function; therefore, we would greatly prefer an algorithm with a running time that is proportional to n log n, than one with quadratic running time. We will see several important algorithms that exhibit a running time proportional to the n-log-n function. For example, the fastest possible algorithms for sorting n arbitrary values require time proportional to n log n. The Quadratic Function Another function that appears often in algorithm analysis is the quadratic function, f (n) = n2. That is, given an input value n, the function f assigns the product of n with itself (in other words, “n squared”). The main reason why the quadratic function appears in the analysis of algo- rithms is that there are many algorithms that have nested loops, where the inner loop performs a linear number of operations and the outer loop is performed a linear number of times. Thus, in such cases, the algorithm performs n · n = n2 operations. 118 Chapter 3. Algorithm Analysis Nested Loops and the Quadratic Function The quadratic function can also arise in the context of nested loops where the ﬁrst iteration of a loop uses one operation, the second uses two operations, the third uses three operations, and so on. That is, the number of operations is 1 + 2 + 3 + · · · + (n − 2) + (n − 1) + n. In other words, this is the total number of operations that will be performed by the nested loop if the number of operations performed inside the loop increases by one with each iteration of the outer loop. This quantity also has an interesting history. In 1787, a German schoolteacher decided to keep his 9- and 10-year-old pupils occupied by adding up the integers from 1 to 100. But almost immediately one of the children claimed to have the answer! The teacher was suspicious, for the student had only the answer on his slate. But the answer, 5050, was correct and the student, Carl Gauss, grew up to be one of the greatest mathematicians of his time. We presume that young Gauss used the following identity. Proposition 3.3: For any integer n ≥ 1, we have: n(n + 1) 1 + 2 + 3 + · · · + (n − 2) + (n − 1) + n =. 2 We give two “visual” justiﬁcations of Proposition 3.3 in Figure 3.3. n+1 n n...... 3 3 2 2 1 1 0 n 0 1 2 3 1 2 n/2 (a) (b) Figure 3.3: Visual justiﬁcations of Proposition 3.3. Both illustrations visualize the identity in terms of the total area covered by n unit-width rectangles with heights 1, 2,... , n. In (a), the rectangles are shown to cover a big triangle of area n2 /2 (base n and height n) plus n small triangles of area 1/2 each (base 1 and height 1). In (b), which applies only when n is even, the rectangles are shown to cover a big rectangle of base n/2 and height n + 1. 3.2. The Seven Functions Used in This Book 119 The lesson to be learned from Proposition 3.3 is that if we perform an algorithm with nested loops such that the operations in the inner loop increase by one each time, then the total number of operations is quadratic in the number of times, n, we perform the outer loop. To be fair, the number of operations is n2 /2 + n/2, and so this is just over half the number of operations than an algorithm that uses n operations each time the inner loop is performed. But the order of growth is still quadratic in n. The Cubic Function and Other Polynomials Continuing our discussion of functions that are powers of the input, we consider the cubic function, f (n) = n3 , which assigns to an input value n the product of n with itself three times. This func- tion appears less frequently in the context of algorithm analysis than the constant, linear, and quadratic functions previously mentioned, but it does appear from time to time. Polynomials Most of the functions we have listed so far can each be viewed as being part of a larger class of functions, the polynomials. A polynomial function has the form, f (n) = a0 + a1 n + a2 n2 + a3 n3 + · · · + ad nd , where a0 , a1 ,... , ad are constants, called the coefﬁcients of the polynomial, and ad = 0. Integer d, which indicates the highest power in the polynomial, is called the degree of the polynomial. For example, the following functions are all polynomials: f (n) = 2 + 5n + n2 f (n) = 1 + n3 f (n) = 1 f (n) = n f (n) = n2 Therefore, we could argue that this book presents just four important functions used in algorithm analysis, but we will stick to saying that there are seven, since the con- stant, linear, and quadratic functions are too important to be lumped in with other polynomials. Running times that are polynomials with small degree are generally better than polynomial running times with larger degree. 120 Chapter 3. Algorithm Analysis Summations A notation that appears again and again in the analysis of data structures and algo- rithms is the summation, which is deﬁned as follows: b ∑ f (i) = f (a) + f (a + 1) + f (a + 2) + · · · + f (b), i=a where a and b are integers and a ≤ b. Summations arise in data structure and algo- rithm analysis because the running times of loops naturally give rise to summations. Using a summation, we can rewrite the formula of Proposition 3.3 as n n(n + 1) ∑i= 2. i=1 Likewise, we can write a polynomial f (n) of degree d with coefﬁcients a0 ,... , ad as d f (n) = ∑ ai ni. i=0 Thus, the summation notation gives us a shorthand way of expressing sums of in- creasing terms that have a regular structure. The Exponential Function Another function used in the analysis of algorithms is the exponential function, f (n) = bn , where b is a positive constant, called the base, and the argument n is the exponent. That is, function f (n) assigns to the input argument n the value obtained by mul- tiplying the base b by itself n times. As was the case with the logarithm function, the most common base for the exponential function in algorithm analysis is b = 2. For example, an integer word contain

Data Structures and Algorithms in Python PDF

Document Details

Tags

Related

Summary

Full Transcript