Data Structures and Algorithms with Python and C++ PDF
Document Details
2009
David M. Reed, John Zelle
Tags
Related
- Data Structures & Algorithms Lecture Notes 1 PDF
- Data Structures - Ellis Horowitz, Sartaj Sahni - Fundamentals - PDF
- Data Structures and Algorithms Study Guide (Sorsogon State University)
- CS301 Data Structures Past Paper PDF 2010
- Introduction to Java Programming and Data Structures (2019) by Y. Daniel Liang - PDF
- Data Structures & Algorithm Past Paper (July 2024) PDF
Summary
This textbook details data structures and algorithms, covering both Python and C++ implementations. It covers topics like abstraction, analysis, container classes, and recursion. The book is suitable for undergraduate computer science students.
Full Transcript
Data Structures and Algorithms Using Python and C++ David M. Reed Capital University John Zelle Wartburg College Franklin, Beedle & Associates, I ncorporated 8536 SW St. Helens Drive, Ste. D Wilsonvi...
Data Structures and Algorithms Using Python and C++ David M. Reed Capital University John Zelle Wartburg College Franklin, Beedle & Associates, I ncorporated 8536 SW St. Helens Drive, Ste. D Wilsonville, Oregon 97070 www.fbeedle.com President and Publisher Ji m Leisy (ji m [email protected]) Project M anager Tom Sumner Editor Stepha n ie Welch Printed i n the U.S.A. Names of a l l products herein are used for identification purposes only and are trademarks and/or registered trademarks of their respective owners. Franklin, Beedle & Associates, I nc., makes no claim of ownershi p or corporate association with the products or com pa nies that own them. ©2009 Fra n kl i n, Beedle & Associates I ncorporated. No part of this book may be repro d uced, stored i n a retrieval system, tra nsmitted , or tra nscribed, in any form or by a ny means-electronic, mechanical , telepathic, photocopying, recordi ng, or otherwise-with out prior written permission of the publ isher. Req uests for permission shou ld be addressed as fol lows: Rights a nd Permissions Fra n kli n , Beedle & Associates, I ncorporated 8536 SW St. Helens Drive, Suite D Wilsonvi l le, Oregon 97070 ISBN 978-1-59028-233-5 Library of Congress Catalogi ng-in-Publication data may be obtained from the publisher. Contents Preface.............. Xl Chapter 1 Abstraction and Analysis 1 1. 1 Introduction......... 1 1. 1. 1 Programming in the Large. 2 1. 1. 2 The Road Ahead.. 3 1. 2 Functional Abstraction.... 5 1. 2.1 Design by Contract.. 5 1. 2. 2 Testing Preconditions 9 1. 2. 3 Top-Down Design " 13 1. 2. 4 Documenting Side Effects 16 1. 3 Algorithm Analysis.. 17 1. 3. 1 Linear Search....... 17 1. 3. 2 Binary Search...... 21 1. 3. 3 Informal Algorithm Comparison 22 1. 3. 4 Formal Analysis......... 24 1. 3. 5 Big 0 Notation vs. Theta Notation 30 1.4 Chapter Summary 32 1. 5 Exercises...... 33 Chapter 2 Data Abstraction 39 2. 1 Overview............. 39 2. 2 Abstract Data Types....... 40 2. 2. 1 From Data Type to ADT 40 2. 2. 2 Defining an ADT.... 41 2. 2. 3 Implementing an ADT. 43 2. 3 ADTs and Objects....... 46 iii iv Contents 2. 3. 1 Specification......... 46 2. 3. 2 Implementation........ 48 2. 3.3 Changing the Representation 50 2. 3. 4 Object-Oriented Design and Programming. 51 2.4 A n Example ADT: Dataset 55 2. 4. 1 The Process of OOD.. 55 2. 4. 2 Identifying an ADT. , 56 2. 4. 3 Implementing the ADT 58 2.5 A n Example ADT: Rational. 60 2. 5. 1 Operator Overloading 60 2. 5. 2 The Rational Class... 61 2.6 Incremental Development and Unit Testing 63 2.7 Chapter Summary 67 2.8 Exercises 68 Chapter 3 Container Classes 75 3.1 Overview.................. 75 3.2 Python Lists............... 76 3.3 A Sequential Collection: A Deck of Cards 77 3.4 A Sorted Collection: Hand... 80 3. 4. 1 Creating a Bridge Hand 81 3. 4. 2 Comparing Cards... 83 3. 4. 3 Sorting Cards..... 85 3.5 Python List Implementation 87 3. 5. 1 Array-based Lists... 87 3. 5. 2 Efficiency Analysis.. 89 3.6 Python Dictionaries (Optional) 89 3. 6. 1 A Dictionary ADT... 90 3. 6. 2 Python Dictionaries.. 90 3. 6. 3 Dictionary Implementation 93 3. 6. 4 A n Extended Example: A Markov Chain 95 3.7 Chapter Summary 99 3.8 Exercises 100 Chapter 4 Linked Structures and Iterators 107 4. 1 Overview.......... 107 4. 2 The Python Memory Model.... 108 4. 2. 1 Passing Parameters..... 1 14 4. 3 A Linked Implementation of Lists. 1 17 Contents v 4. 4 Linked Implementation of a List ADT 122 4. 5 Iterators............... 135 4. 5. 1 Iterators in Python....... 136 4. 5. 2 Adding an Iterator t o LList.. 137 4. 5. 3 Iterating with a Python Generator 138 4. 6 A Cursor-based List API (Optional) 140 4. 6. 1 A Cursor API..... 141 4. 6. 2 A Python CursorList 142 4. 6.3 A Linked CursorList 144 4. 7 Links vs. Arrays. 147 4. 8 Chapter Summary 148 4. 9 Exercises 148 Chapter 5 Stacks and Queues 155 5.1 Overview...... 155 5.2 Stacks........ 155 5. 2. 1 The Stack ADT 156 5. 2. 2 Simple Stack Applications 157 5. 2.3 Implementing Stacks.... 159 5. 2. 4 An Application: Expression Manipulation 160 5. 2. 5 An Application: Grammar Processing (Optional) 163 5.3 Queues................ 169 5. 3. 1 A Queue ADT....... 169 5. 3. 2 Simple Queue Applications 1 70 5.4 Queue Implementations...... 1 72 5.5 An Example Application: Queueing Simulations (Optional) 1 74 5.6 Chapter Summary 180 5.7 Exercises.. 181 Chapter 6 Recursion 189 6. 1 Introduction. 189 6. 2 Recursive Definitions.... 191 6. 3 Simple Recursive Examples 193 6. 3. 1 Example: String Reversal 193 6. 3. 2 Example: Anagrams... 195 6. 3. 3 Example: Fast Exponentiation 197 6. 3. 4 Example: Binary Search. 198 6. 4 Analyzing Recursion 199 6. 5 Sorting............... 202 vi Contents 6. 5. 1 Recursive Design: Mergesort.. 202 6. 5. 2 Analyzing Mergesort....... 205 6. 6 A "Hard" Problem: The Tower of Hanoi 207 6. 7 Chapter Summary 212 6. 8 Exercises 212 Chapter 7 Trees 223 7. 1 Overview................... 223 7.2 Tree Terminology............... 224 7.3 A n Example Application: Expression Trees 226 7.4 Tree Representations.......... 228 7.5 An Application: A Binary Search Tree... 230 7. 5. 1 The Binary Search Property..... 230 7. 5. 2 Implementing A Binary Search Tree 230 7. 5. 3 Traversing a BST........... 238 7. 5. 4 A Run-time Analysis of BST Algorithms. 241 7. 6 Implementing a Mapping with BST (Optional) 242 7. 7 Chapter Summary 245 7. 8 Exercises 245 Chapter 8 C++ Introduction for Python Programmers 255 8. 1 Introd uction.................... 255 8.2 C++ History and Background......... 256 8.3 Comments, Blocks of Code, Identifiers, and Keywords 262 8.4 Data Types and Variable Declarations........ 263 8.5 Include Statements, Namespaces, and Input/Output 267 8.6 Compiling................ 271 8.7 Expressions and Operator Precedence 274 8.8 Decision Statements 277 8.9 Type Conversions.. 281 8. 10 Looping Statements 282 8. 1 1 Arrays........ 285 8. 1 1. 1 Single-Dimension Arrays. 285 8. 1 1. 2 M ulti-Dimensional Arrays 287 8. 1 1. 3 Arrays of Characters... 287 8. 1 2 Function Details......... 288 8. 1 2. 1 Declarations, Definitions, and Prototypes 289 8. 1 2. 2 Pass by Value... 292 8. 1 2. 3 Pass by Reference............. 293 Contents vii 8. 1 2. 4 Passing Arrays as Parameters. 294 8. 1 2. 5 const Parameters..... 296 8. 1 2. 6 Default Parameters..... 297 8.13 Header Files and Inline Functions. 298 8. 14 Assert Statements and Testing.. 303 8.15 The Scope and Lifetime of Variables 305 8.16 Common C++ Mistakes by Python Programmers. 306 8. 1 7 Additional C++ Topics (Optional). 307 8. 1 7. 1 The C++ Switch Statement. 307 8. 1 7. 2 Creating C++ Namespaces 310 8. 1 7. 3 Global Variables 310 8.18 Chapter Summary 312 8.19 Exercises... 313 Chapter 9 C++ Classes 319 9. 1 Basic Syntax and Semantics 319 9.2 Strings......... 330 9.3 File Input and Output... 333 9.4 Operator Overloading... 335 9.5 Class Variables and Methods 343 9.6 Chapter Summary 347 9.7 Exercises.......... 348 Chapter 10 C++ Dynamic Memory 353 10. 1 Introduction... 353 10.2 C++ Pointers....... 360 10.3 Dynamic Arrays..... 366 10.4 Dynamic Memory Classes 371 1 0.4. 1 Destructor..... 371 1 0. 4. 2 Copy Constructor 373 1 0. 4. 3 Assignment Operator 378 1 0. 4. 4 A Complete Dynamic Array Class 381 1 0. 4. 5 Reference Return Types 386 1 0. 5 Dynamic Memory Errors..... 388 1 0. 5. 1 Memory Leaks...... 388 1 0. 5. 2 Accessing Invalid Memory 390 1 0. 5. 3 Memory Error Summary. 393 1 0. 6 Chapter Summary 395 1 0. 7 Exercises..... 395 viii Contents Chapter 1 1 C++ Linked Structures 403 11.1 Introduction......... 403 1 1.2 A C++ Linked Structure Class 404 1 1.3 A C++ Linked List...... 407 1 1.4 C++ Linked Dynamic Memory Errors 42 1 1 1.5 Chapter Summary 422 1 1.6 Exercises 422 Chapter 12 C++ Templates 427 1 2. 1 Introduction..... 427 1 2. 2 Template Functions 429 1 2. 3 Template Classes.. 431 1 2. 3. 1 The Standard Template Library vector Class. 431 1 2. 3. 2 User-defined Template Classes 435 1 2. 4 Chapter Summary 440 1 2. 5 Exercises 440 Chapter 13 Heaps, Balanced Trees, and Hash Tables 443 13. 1 Introduction......... 443 13.2 Priority Queues and Heaps.......... 444 1 3. 2. 1 Heapsort................ 451 1 3. 2. 2 Notes on Heap and Priority Queue Implementations 452 13.3 Balanced Trees.... 453 13.4 Other Tree Structures 465 13.5 Hash Tables.... 465 13.6 Chapter Summary 478 13.7 Exercises 478 Chapter 14 Graphs 485 14. 1 Introduction........ 485 14.2 Graph Data Structures.. 487 14.3 Shortest Path Algorithms 491 1 4. 3. 1 The Unweighted Shortest Path 492 1 4. 3. 2 The Weighted Shortest Path 496 14.4 Depth First Algorithms.. 501 14.5 Minimum Spanning Trees...... 507 1 4. 5. 1 Kruskal's Algorithm..... 507 1 4. 5. 2 The Disjoint Set Data Structure 509 1 4. 5. 3 Prim's Algorithm......... 511 Contents ix 1 4. 6 Chapter Summary 512 1 4. 7 Exercises..... 513 Chapter 15 Algorithm Techniques 517 1 5. 1 Introduction........ 517 1 5. 2 Divide and Conquer.......... 518 1 5. 2. 1 Analyzing Recursive Functions 518 1 5. 2. 2 Quicksort.... 521 1 5. 3 Greedy Algorithms........... 527 1 5. 4 Dynamic Programming........ 536 1 5. 4. 1 Longest Common Subsequence 537 1 5.4. 2 Memoization......... 541 1 5. 4. 3 Matrix Chain Multiplication 541 1 5. 5 NP-Complete Problems 543 1 5. 6 Chapter Summary 545 1 5. 7 Exercises. 546 Appendix Glossary 549 Index 563 Preface This book is intended for use in a traditional college-level data structures course (commonly known as CS2). The authors have found that Python is an excellent language for an introductory course. Its relatively simple syntax allows students to focus on problem solving more so than the more complicated syntax of languages such as Java, C++, and Ada. Python's built-in data structures and large standard library also allow students to write more interesting programs than can easily be written with other languages. This book assumes that students have learned the basic syntax of Python and been exposed to the use of existing classes. Most traditional CSI courses that use Python will have covered all the necessary topics, and some may have covered a few of the topics covered in this book. Python's object-oriented features make it an elegant language for starting a data structures course. We have found that most students successfully completing a CSI course know how to use classes, but many of them need more experience to learn how to design and write their own classes. This is not surprising given the limited amount of time that is typically spent designing classes in a CSI course. We address this issue by including a number of examples of class design in the first few chapters of this book. Starting with Python in a CS2 course allows students to continue expanding their skills and gain experience designing and writing classes in a simple, familiar language. Python also makes it relatively easy to learn about linked structures. Every name in Python is a reference, so there is no additional syntax that needs to be learned in order to write linked structures. These advantages allow topics to be covered more quickly than is possible using more complex languages. One potential drawback of Python for a data structures course is that it hides the complexity of memory management. This is a benefit in a first course, but we think that in a second course it is important that students begin to understand some of these low-level details that the Python interpreter hides from them. Since xi xii Preface we can cover the basic data structures in less time using Python, there is time to learn a second language, even in a single-semester CS2 course. After the students have continued to improve their Python programming skills while covering the first few chapters of the book, it is relatively easy for them to learn a second object oriented language. By using C++ as the second language, the students are exposed to a lower-level, compiled language. The syntax of C++ is more complicated than Python, but that is a relatively small hurdle once students have mastered fundamental programming concepts using Python. For example, now that they understand the basic concepts of programming and the semantics of statements such as conditional statements and looping statements, they can focus on learning the C++ syntax for these statements. Once the students have learned fundamental C++ syntax, we cover the con cepts of dynamic memory management by rewriting linked structures in C++. This reinforces the basic data structure concepts while focusing on the memory management issues. This book is not intended to provide complete coverage of the C++ language; instead, it is designed to introduce a large subset of the C++ language so students can understand the low-level details of memory management and write object-oriented code in C++. After covering the basics of the C++ language, we also introduce some of the more advanced data structures by providing Python implementations and leaving the student to write them in C++. In effect, Python becomes an executable pseudocode for presenting key algorithms and data structures. Coverage O ptions Since Python allows coverage o f topics more quickly than other languages, a five semester-hour CS2 course can cover most, if not all, of this book. One of the authors covers the entire book over two courses that are three semester-hours each. In the three semester-hour CS2 course, the first seven chapters are covered in eight weeks and then the first three C++ chapters are covered in seven weeks, allowing plenty of time for the students to write a significant amount of C++ code. The final five chapters are covered in detail in the second three semester-hour course. This allows a week of review at the beginning of the course and more time to discuss the advanced algorithms and data structures in the last three chapters. Depending on the amount of experience students have with object-oriented programming, the first three chapters of the book may be covered fairly quickly or may require more detailed coverage. We introduce the asymptotic run-time analysis of algorithms in the first chapter so that we can analyze the running time of all Preface xiii the data structure implementations. We also introduce one of Python's unit-testing frameworks early on so the students can formally test their code. After completing the discussion of linked structures in Chapter 4, the basic concepts of stacks and queues can be covered quickly, or the example applications can be used to continue developing algorithm and design skills. Some CSI courses cover recursion, although in the our experiences, most students do not fully understand recursion the first time they study it. Since the study of tree data structures requires recursion, a chapter on recursion (Chapter 6) is included before trees (Chapter 7). After the chapter on trees, the book switches to C++. Chapter 8 provides an introduction to C++ assuming the reader knows Python. Chapter 9 covers the details of writing and using classes in C++. Chapters 10 and 1 1 cover the issues of dynamic memory and writing linked structures in C++. We strongly recommend that you cover chapters 8 through 1 1 in order. Chapter 12 covers the basics of using and writing template code, but is not intended to provide complete coverage of templates. Chapter 12 may be skipped as none of the remaining chapters require an understanding of templates. The last three chapters cover some of the advanced data structures and algorithms. These three chapters can be covered in any order although there are a few references to topics in the other chapters. Acknowledgments We would like to thank Nancy Saks and Steve Bogaerts at Wittenberg University for comments on early drafts of the C++ chapters. We also thank Franklin, Beedle & Associates, especially Jim Leisy and Tom Sumner. We also need to thank our editor, Stephanie Welch, who found numerous minor mistakes, helping us improve the quality of this book. David thanks Capital University for their support through a sabbatical, which allowed him to start the book. David would also like to thank his students who used early drafts and provided suggestions for improvements. The following Capital University students deserve special recognition for their many suggestions: Kyle Beal, Michael Herold, Jeff Huenemann, John Larison, and Brenton Wolfe. Finally, David would like to thank his wife, Sherri, for her support and understanding during the countless hours required to complete this book. John extends thanks to his departmental colleagues at Wartburg College who have always been supportive of his writing endeavors and his students who have helped "test drive" some of the material. Most importantly, John thanks his family, especially his wife Elizabeth Bingham; for the love, support, and understanding that makes projects like this possible. Chapter 1 Abstraction and Analysis O bjectives To understand how programming "in the large" differs from programming "in the small." To understand the motivation and use for pre- and postconditions. To develop design and decomposition skills. To understand the importance of algorithm efficiency and learn how to analyze the running time of simple algorithms. [II] I ntrod uction Believe i t or not, a first course i n computer programming covers all the tools strictly necessary to solve any problem that can be solved with a computer. A very famous computer scientist named Alan Turing conjectured, and it is now widely accepted, that any problem solvable with computers requires only the basic statements that all computer programming languages include: decision statements (e.g. , if) , looping statements (e.g. , for and while) and the ability to store and retrieve data. Since you already know about these, you may wonder what else there is to learn. That's a good question. 1 2 Chapter 1 Abstraction a nd Ana lysis 1 1.1.1 1 P rogra m m i ng i n the Large If you think of computer programming as a process similar to constructing a building, right now you have the knowledge equivalent to how to use a few tools such as a hammer, screwdriver, saw, and drill. Those might be all the tools necessary to build a house, but that does not mean you can build yourself a habitable home, let alone one that meets modern building codes. That's not to say that you can't do some useful things. You are probably capable of building benches or birdhouses, you're just not yet ready for the challenges that come with a larger project. In programming, just as in house construction, tackling bigger projects requires additional knowledge, techniques, and skills. This book is intended to give you a solid foundation of this additional knowledge that you can build on in future courses and throughout your career. As you work your way through this material, you will be making a transition from programming "in the small" to programming "in the large." Software projects can vary in difficulty in many ways. Obviously, they may range from the very small (e.g., a program to convert temperatures from Celsius to Fahrenheit) to the very large (e.g. , a computer operating system) to anything in between. Projects also differ widely in how mission-critical the developed systems are. A web-based diary need not be designed to the same exacting specifications as, say, an online banking system, and neither is as critical as the software controlling a medical life-support device. There is no single property that makes any particular project "large" or "dif ficult." In general, though, there are a number of characteristics that distinguish real-world programming from the simpler academic exercises that you have probably seen so far. Here are some of them: program size So far you may have written programs that comprise up to hundreds (perhaps thousands) of lines of code. It is not uncommon for real applications to have hundreds of thousands or millions of lines. For example, the Linux operating system kernel contains around six million lines of code. single programmer vs. programming team Most of the programs you have worked on so far have probably been your own projects. However, most software today is produced by teams of developers working together. No single programmer has complete knowledge of every facet of the system. working from scratch vs. existing code base You have probably written most of your programs pretty much starting from scratch. In real-world projects, program- 1. 1 I ntroduction 3 ming happens in the context of existing applications. Existing systems may be extended, borrowed from, superseded, or used in concert with new software. system lifetime When you are first learning to program, you may write many pro grams just for practice. Once your program has been graded, it may not ever be looked at again. Most real software projects have extended lifetimes. While they are in use, they continue to be refined, improved, and updated. environment complexity A small project may be written in a single programming language using a small set of standard libraries. Larger projects tend to use many languages and a vast array of supporting development tools and software libraries. 1 1. 1.2 1 The Road Ahead The fundamental problem of programming in the large is managing the associated complexity. Humans are good at keeping track of only a few things at a time. In order to deal with a complex software system, we need ways of limiting the number of details that have to be considered at any given moment. The process of ignoring some details while concentrating on those that are relevant to the problem at hand is called abstraction. Effective software development is an exercise in building appropriate abstractions. Therefore, we will visit the idea of abstraction frequently throughout this book. Another important technique in coping with complexity is to reuse solutions that have been developed before. As a programmer, you will need to learn how to use various application programming interfaces (APIs) for the tools/libraries you will use. An API is the collection of classes and functions that a library of code provides and an explanation of how to use them (i.e. , what the parameters and return types are and what they represent). For example, you have already learned some simple APls such as the functions provided in the Python math module and methods for built-in data structures such as the list and dictionary. Another common example of an API is a graphical user interface (GUI) toolkit. Most languages provide APIs for accomplishing many common tasks. APIs will vary from language to language and from one operating system to another. This book cannot possibly begin to cover even a small fraction of the APIs that you will learn and use during your career. However, by learning a few APIs and, more importantly, by learning to develop your own APls, you will acquire the skills that will make it easy for you to master new APIs in the future. Just as important as being able to reuse existing code through APIs is the ability to leverage existing knowledge of good design principles. Over the years, 4 Chapter 1 Abstraction a nd Analysis computer scientists have developed algorithms for solving common problems (e.g. , searching and sorting) and ways of structuring data collections that are used as the basic building blocks in most programs. In this course, you will be learning how these algorithms and data structures work so that you can write larger, more complicated programs that are well designed and maintainable using these well understood components. Studying these existing algorithms and data structures will also help you learn how to create your own novel algorithms and data structures for the unique problems you will face in the future. Computer scientists have also developed techniques for analyzing and classifying the efficiency of various algorithms and data structures so that you can predict whether or not a program using them will solve problems in a reasonable amount of time and within the memory constraints you have. Naturally, you will also need to learn algorithm analysis techniques so that you can analyze the efficiency of the algorithms you invent. This book covers abstraction and data structures using two different program ming languages. Getting experience in more than one language is important for a number of reasons. Seeing how languages differ, you can start to gain an appreciation of how different tools available to the developer are suitable for different tasks. Having a larger toolkit at your disposal makes it easier to solve a wider variety of problems. However, the most important advantage is that you will also see how the underlying ideas of abstraction, reuse, and analysis are applied in both languages. Only by seeing different approaches can you really appreciate what are underlying principles versus what are just details of a particular language. Rest assured, those underlying principles will be useful no matter what languages or environments you may have in your future. Speaking of programming languages, at about the time this book is going to press, a new version of Python is coming out (Python 3.0). The new version includes significant redesign and will not be backward compatible with programs written for the 2.x versions of Python. The code in this book has been written in Python 2.x style. As much as possible, we have tried to use conventions and features that are also compatible with Python 3.0, and the conversion to 3.0 is straightforward. To make the code run in Python 3.0, you need to keep the following changes in mind. print becomes a function call. You must put parentheses around the sequence of expressions to print. The input function acts like the old raw_input. If you want to evaluate user input, you must do it yourself explicitly (eval ( input ( "Enter a number : ") ). ) 1.2 Functional Abstraction 5 range no longer produces a list. You can still use it as before in f or loops (e.g. , for i in range ( 10) :) , but you need to use something like nums = list (range ( 10) ) to produce an explicit list. The single slash operator, / , always produces floating point division. Use the double slash, / / , for integer division (this also works in Python 2.x). We have provided both Python 2.x and Python 3.0 versions of all the code from the text in the online resources, so you should be able to use this book comfortably with any modern version of Python. [I1J Fu nctiona l Abstraction I n order t o tackle a large software project, i t is essential t o b e able t o break i t into smaller pieces. One way of dividing a problem into smaller pieces is to decompose it into a set of cooperating functions. This is called functional (or procedural) abstraction. 1 1.2.1 1 Design by Contract To see how writing functions is an example of abstraction, let's look at a simple example. Suppose you are writing a program that needs to calculate the square root of some value. Do you know how to do this? You may or may not actually know an algorithm for computing square roots, but that really doesn't matter, because you know how to use the square root function from the Python math library. I import math wer = math. sqrt (x) You can use the sqrt function confidently, because you know what it does, even though you may not know exactly how it accomplishes that task. Thus, you are focusing on some aspects of the sqrt function (the what) while ignoring certain details (the how). That's abstraction. This separation of concerns between what a component does and how it ac complishes its task is a particularly powerful form of abstraction. If we think of a function in terms of providing a service, then the programs that use the function are called clients of the service, and the code that actually performs the function is said to implement the service. A programmer working on the client needs to know 6 Chapter 1 Abstraction a nd Ana lysis only what the function does. He or she does not need to know any of the details of how the function works. To the client, the function is like a magical black box that carries out a needed operation. Similarly, the implementer of the function does not need to worry about how the function might be used. He or she is free to concentrate only on the details of how the function accomplishes its task, ignoring the larger picture of where and why the function is actually called. In order to accomplish this clean separation, the client and implementer must have a firm agreement about what the function is to accomplish. That is, they must have a common understanding of the interface between the client code and the implementation. The interface forms a sort of abstraction barrier that separates the two views of the function. Figure 1. 1 illustrates the situation for the Python string split method (or the equivalent split function in the string module). The diagram shows that the function/method accepts one required parameter that is a string and one optional parameter that is a string and returns a list of strings. The client using the split function/method does not need to be concerned with how the code works (i.e. , what's inside the box) , just how to use it. What we need is a careful description of what a function will do, without having to describe how the function will accomplish the task. Such a description is called a specification. list of strings split function/ method optional separator string (defaults to any whitespace) Figure 1. 1 : Split function as black box with interface Obviously, one important part of a specification is describing how the function is called. That is, we need to know the name of the function, what parameters are required, and what if anything the function will return. This information is sometimes called the signature of a function. Beyond the signature, a specification also requires a precise description of what the function accomplishes. We need to know how the result of calling the function relates to the parameters that are provided. Sometimes, this is done rather informally. For example, suppose you are writing a math library function for square root. Consider this specification of the function: 1.2 Functional Abstraction 7 def sqrt (x) : " " " Computes the square root of x " " " This doesn't really do the job. The problem with such informal descriptions is that they tend to be incomplete and ambiguous. Remember, both the client and the implementer (even if they're one and the same person) should be able to fulfill their roles confidently based only on the specification. That's what makes the abstraction process so useful. What if the implementation computes the square root of x, but does not return the result? Technically, the specification is met, but the function will not be useful to the client. Is it OK if sqrt ( 16) returns -4? What if the implementation works only for floating-point numbers, but the client calls the function with an integer parameter? Whose fault is it then if the program crashes? What happens if the client calls this function with a negative number? Perhaps it returns a complex number as a result, or perhaps it crashes. What happens if the client calls this function with a string parameter? The bottom line is that the simple, informal description just does not tell us what to expect. Now this may sound like nitpicking, since everyone generally "understands" what the square root function should do. If we had any questions, we could just test our assumptions by either looking at the code that implements the function or by actually trying it out (e.g. , try computing sqrt ( - 1 ) and see what happens). But having to do either of these things breaks the abstraction barrier between the client and the implementation. Forcing the client coder to understand the actual implementation means that he or she has to wrestle with all the details of that code, thus losing the benefit of abstraction. On the other hand, if the client programmer simply relies on what the code actually does (by trying it out ) , he or she risks making assumptions that may not be shared by the implementer. Suppose the implementer discovers a better way of computing square roots and changes the implementation. Now the client's assumptions about certain "fringe" behavior may be incorrect. If we keep the abstraction barrier firmly in place, both the client code and the implementation can change radically; the abstraction barrier ensures that the program will continue to function properly. This desirable property is called implementation independence. Hopefully you can see how precise specification of components is important when programming in the large. In most situations, careful specification is an absolute necessity; real disaster can loom when specifications are not clearly spelled out and adhered to. In one notorious example, NASA's 1999 Mars Climate Orbiter mission crashed at a loss of $125 million due to a mismatch in assumptions: a module was being given information in imperial units, but was expecting them in metric units. 8 Chapter 1 Abstraction a nd Ana lysis Clearly, we need something better than an informal comment to have a good specification. Function specifications are often written in terms of preconditions and postconditions. A precondition of a function is a statement of what is assumed to be true about the state of the computation at the time the function is called. A postcondition is a statement about what is true after the function has finished. Here is a sample specification of the sqrt function using pre- and postconditions: def sqrt (x) : " " " C omputes the square root of x. pre : x i s an int or a f loat and x >= 0 post : returns the non-negat ive square root of x " " " The job of the precondition is to state any assumptions that the implementation makes, especially those about the function parameters. In doing so, it describes the parameters using their formal names (x in this case). The postcondition describes whatever the code accomplishes as a function of its input parameters. Together, the pre- and post conditions describe the function as a sort of contract between the client and the implementation. If the client guarantees that the precondition is met when the function is called, then the implementation guarantees the postcondition will hold when the function terminates. For this reason, using pre- and postconditions to specify the modules of a system is sometimes called design by contract. Pre- and post conditions are specific examples of a particular kind of documen tation known as program assertions. An assertion is a statement about the state of the computation that is true at a specific point in the program. A precondition must be true just before a function executes, and the postcondition must be true immediately after. We will see later that there are other places in a program where assertions can also be extremely valuable for documentation. If you are reading very carefully, you might be a bit uneasy about the postcondi tion from the sqrt example above. That postcondition describes what the function is supposed to do. Technically speaking, an assertion should not state what a function does, but rather what is now true at a given point in a program. It would be more correct to state the postcondition as something like post : RETVAL == y'x, where RETVAL is a name used to indicate the value that was just returned by the function. Despite being less technically accurate, most programmers tend to use the less formal style of postcondition presented in our example. Given that the informal style is more popular and no less informative, we'll continue to use the "returns this, that, and the other thing" form of postcondition. Those who are sticklers for honest-to-goodness assertions can, no doubt, do the necessary translation. 1. 2 Functional Abstraction 9 This brings up an important point about pre- and post conditions in particular, and specifications in general. The whole point of a specification is that it provides a succinct and precise description of a function or other component. If the specification is ambiguous or longer or more complicated than the actual implementation code, then little has been gained. Mathematical notations tend to be succinct and exact, so they are often useful in specifications. In fact , some software engineering methods employ fully formal mathematical notations for specifying all system components. The use of these so-called formal methods adds precision to the development process by allowing properties of programs to be stated and proved mathematically. In the best case, one might actually be able to prove the correctness of a program, that is, that the code of a program faithfully implements its specification. Using such methods requires substantial mathematical prowess and has not been widely adopted in industry. For now, we'll stick with somewhat less formal specifications but use well-known mathematical and programming notations where they seem appropriate and helpful. Another important consideration is where to place specifications in code. In Python, a developer has two options for placing comments into code: regular comments (indicated with a leading #) and docstrings (string expressions at the top of a module or immediately following a function or class heading). Docstrings are carried along with the objects to which they are attached and are inspectable at run-time. Docstrings are also used by the internal Python help system and by the PyDoc documentation utility. This makes docstrings a particularly good medium for specifications, since API documentation can then be created automatically using PyDoc. As a rule of thumb, docstrings should contain information that is of use to client programmers, while internal comments should be used for information that is intended only for the implementers. 1 1.2.2 1 Testi ng P recond itions The basic idea of design by contract requires that if a function's precondition is met when it is called, then the postcondition must be true at the end of the function. If the precondition is not met, then all bets are off. This raises an interesting question. What should the function do when the precondition is not met? From the standpoint of the specification, it does not matter what the function does in this case, it is "off the hook," so to speak. If you are the implementer, you might be tempted to simply ignore any precondition violations. Sometimes, this means executing the function body will cause the program to immediately crash; other times the code might run, but produce nonsensical results. Neither of these outcomes seems particularly good. 10 Chapter 1 Abstraction a nd Ana lysis A better approach is to adopt defensive programming practices. An unmet precondition indicates a mistake in the program. Rather than silently ignoring such a situation, you can detect the mistake and deal with it. But how exactly should the function do this? One idea might be to have it print an error message. The sqrt function might have some code like this: def sqrt (x) : if x < 0 : print ' Error : can ' t take the square root of a negat ive ' else : The problem with printing an error message like this is that the calling program has no way of knowing that something has gone wrong. The output might appear, for example, in the middle of a generated report. Furthermore, the actual error message might go unnoticed. In fact, if this is a general-purpose library, it's very possible that the sqrt function is called within a GUI program, and the error message will not even appear anywhere at all. Most of the time, it is simply not appropriate for a function that implements a ( service to print out messages unless printing something is part of the specification ) of the method. It would be much better if the function could somehow signal that an error has occurred and then let the client program decide what to do about the problem. For some programs, the appropriate response might be to terminate the program and print an error message; in other cases, the program might be able to recover from the error. The point is that such a decision can be made only by the client. The function could signal an error in a number of ways. Sometimes, returning an out-of-range result is used as a signal. Here's an example: : t (xl , I def if x < 0 : return - 1 Since the specification of sqrt clearly implies that the return value cannot be negative, the value - 1 can be used to indicate an error. Client code can check the result to see if it is OK. Another technique that is sometimes used is to have a global ( ) accessible to all parts of the program variable that records errors. The client code checks the value of this variable after each operation to see if there was an error. 1. 2 Functional Abstraction 11 Of course, the problem with this ad hoc approach to error detection is that a client program can become riddled with decision structures that constantly check to see whether an error has occurred. The logic of the code starts looking something like this: x = someOperat ion ( ) i f x i s not OK : fix x y = anotherOperat ion (x) if y is not OK : abort z = yetAnotherOperat ion (y) if z is not OK : z = SOME_DEfAULT_VALUE The continual error checking with each operation obscures the intent of the original algorithm. Most modern programming languages now include exception handling mecha nisms that provide an elegant alternative for propagating error information in a program. The basic idea behind exception handling is that program errors don't directly lead to a "crash," but rather they cause the program to transfer control to a special section called an exception handler. What makes this particularly useful is that the client does not have to explicitly check whether an error has occurred. The client just needs to say, in effect, "here's the code I want to execute should any errors come up." The run-time system of the language then makes sure that, should an error occur, the appropriate exception handler is called. In Python, run-time errors generate exception objects. A program can include a try statement to catch and deal with these errors. For example, taking the square root of a negative number causes Python to generate a ValueError, which is a subclass of Python's general Exception class. If this exception is not handled by the client, it results in program termination. Here is what happens interactively: » > sqrt ( - 1 ) Traceback (most recent call last ) : file " < stdin> " , line 1 , in ? ValueError : math domain error » > Alternatively, the program could "catch" the exception with a try statement: 12 Chapter 1 Abstraction and Ana lysis » > try : sqrt ( - l ) except ValueError : print "Doops , sorry. " Doops , sorry. » > The statement(s) indented under try are executed, and if an error occurs, Python sees whether the error matches the type listed in any except clauses. The first matching except block is executed. If no except matches, then the program halts with an error message. To take advantage of exception handling for testing preconditions, we just need to test the precondition in a decision and then generate an appropriate exception object. This is called raising an exception and is accomplished by the Python raise statement. The raise statement is very simple: raise where is an expression that produces an exception object containing information about what went wrong. When the raise statement executes, it causes the Python interpreter to interrupt the current operation and transfer control to an exception handler. If no suitable handler is found, the program will terminate. The sqrt function in the Python library checks to make sure that its parameter is non-negative and also that the parameter has the correct type (either int or float). The code for sqrt could implement these checks as follows: def sqrt (x) : if x < 0 : raise ValueError ( ' math domain error ' ) if type (x) not in (type ( l ) , type ( 1 1) , type ( 1. 0) ) : raise TypeError ( ' number expected ' ) # compute square root here Notice that there are no elses required on these conditions. When a raise executes, it effectively terminates the function, so the "compute square root" portion will only execute if the preconditions are met. Oftentimes, it is not really important what specific exception is raised when a precondition violation is detected. The important thing is that the error is diagnosed as early as possible. Python provides a statement for erubedding assertions directly into code. The statement is called assert. It takes a Boolean expression and raises an Asserti onError exception if the expression does not evaluate to True. Using assert makes it particularly easy to enforce preconditions. 1. 2 Functional Abstraction 13 def sqrt (x) : assert x >= 0 and type (x) in (type ( 1 ) , type ( 11 ) , type ( 1. 0) ) As you can see, the assert statement is a very handy way of inserting ssertions directly into your code. This effectively turns the documentation of preconditions (and other assertions) into extra testing that helps to ensure that programs behave correctly, that is, according to specifications. One potential drawback of this sort of defensive programming is that it adds extra overhead to the execution of the program. A few CPU cycles will be consumed checking the preconditions each time a function is called. However, given the ever-increasing speed of modern processors and the potential hazards of incorrect programs, that is a price that is usually well worth paying. That said, one additional benefit of the assert statement is that it is possible to turn off the checking of assertions, if desired. Executing Python with a -0 switch on the command line causes the interpreter to skip testing of assertions. That means it is possible to have assertions on during program testing but turn them off once the system is judged to be working and placed into production. Of course, checking assertions during testing and then turning them off in the production system is akin to practising a tightrope act 10 feet above the ground with a safety net in place and then performing the actual stunt 100 feet off the ground on a windy day without the net. As important as it is to catch errors during testing, it's even more important to catch them when the system is in use. Our advice is to use assertions liberally and leave the checking turned on. 1 1.2.3 1 Top- Down Design One popular technique for designing programs that you probably already know about is top-down design. Top-down design is essentially the direct application of functional abstraction to decompose a large problem into smaller, more manageable components. As an example, suppose you are developing a program to help your instructor with grading. Your instructor wants a program that takes a set of exam scores as input and prints out a report that summarizes student performance. Specifically, the program should report the following statistics about the data: high score This is the largest number in the data set. low score This is the smallest number in the data set. 14 Chapter 1 Abstraction a nd Ana lysis mean This is the "average" score in the data set. It is often denoted x and calculated using this formula: L Xi _ X= -- n That is, we sum up all of the scores (Xi denotes the ith score) and divide by the number of scores (n). standard deviation This is a measure of how spread out the scores are. The standard deviation, s, is given by the following formula: s= In this formula x is the mean, Xi represents the ith data value, and n is the number of data values. The formula looks complicated, but it is not hard to compute. The expression (x Xi ) 2 is the square of the "deviation" of an - individual item from the mean. The numerator of the fraction is the sum of the deviations (squared) across all the data values. As a starting point for this program, you might develop a simple algorithm such as this. Get scores from the user Calculate the minimum score Calculate the maximum score Calculate the average (mean) score Calculate the st andard deviat ion Suppose you are working with a friend to develop this program. You could divide this algorithm up into parts and each work on various pieces of the program. Before going off and working on the pieces, however, you will need a more complete design to ensure that the pieces that each of you develops will fit together to solve the problem. Using top-down design, each line of the algorithm can be written as a separate function. The design will just consist of the specification for each of these functions. One obvious approach is to store the exam scores in a list that can be passed as a parameter to various functions. Using this approach, here is a sample design: 1. 2 Functional Abstraction 15 # stat s. py def get_scores ( ) : " " " Get scores interactively from the user post : returns a list of numbers obtained from user " " " def min_value (nums) : " " " f ind the minimum pre : nums is a list of numbers and len (nums) > 0 post : returns smallest number in nums " " " def max_value (nums) : " " " f ind the maximum pre : nums is a list of numbers and len (nums) > 0 post : returns largest number in nums " " " def average (nums ) : " " " calculate the mean pre : nums is a list of numbers and len (nums) > 0 post : returns the mean (a f loat ) of the values in nums " " " def std_deviation (nums ) : " " " calculate the standard deviation pre : nums is a list of numbers and len (nums) > 1 post : returns the standard deviation (a f loat ) of the values in nums " " " With the specification of these functions in hand, you and your friend should easily be able to divvy up the functions and complete the program in no time. Let's implement one of the functions just to see how it might look. Here's an implementation of std_deviation. def std_deviation (nums ) : xbar = average (nums) sum = 0. 0 f or num in nums : sum += (xbar - num) **2 return math. sqrt ( sum / (len (nums) - 1 ) ) Notice how this code relies on the average function. Since we have that function specified, we can go ahead and use it here with confidence, thus avoiding duplication 16 Chapter 1 Abstraction and Ana lysis of effort. We have also used the "shorthand" += operator, which you may not have seen before. This is a convenient way of accumulating a sum. Writing x += y produces the same result as writing x = x + y. The rest of the program is left for you to complete. As you can see, top-down design and functional specification go hand in hand. As necessary functionality is identified, a specification formalizes the design decisions so that each part can be worked on in isolation. You should have no trouble finishing up this program. 1 1. 2.4 1 Documenting Side Effects In order for specifications to be effective, they must spell out the expectations of both the client and the implementation of a service. Any effect of a service that is visible to the client should be described in the postcondition. For example, suppose that the std_deviation function had been implemented like this: def std_deviation (nums ) : # This is bad code. Don ' t use it. xbar = average (nums) n = len (nums) sum = 0. 0 while nums ! = [] : num = nums. pop ( ) sum + = (xbar - num) **2 return math. sqrt (sum / (n - 1 ) ) This version uses the pop ( ) method of Python lists. The call to nums. pop ( ) returns the last number in the list and also deletes that item from the list. The loop continues until all the items in the list have been processed. This version of std_deviat i on returns the correct value, so it would seem to meet the contract specified by the pre- and postconditions. However, the list object nums passed as a parameter is mutable, and the changes to the list will be visible to the client. The user of this code is likely to be quite surprised when they find out that calling std_deviation ( examScores) causes all the values in examScores to be deleted! These sorts of interactions between function calls and other parts of a program are called side effects. In this case, the deletion of items in examScores is a side effect of calling the std_deviat i on function. Generally, it's a good idea to avoid side effects in functions, but a strict prohibition is too strong. Some functions are designed to have side effects. The pop method of the list class is a good example. It's used in the case where one wants to get a value and also, as a side effect , remove the value from the list. What is crucial is that any side effects of a function should 1.3 Algorithm Ana lysis 17 be indicated in its postcondition. Since the postcondition for std_deviation did not say anything about nums being modified, an implementation that does this is implicitly breaking the contract. The only visible effects of a function should be those that are described in its postcondition. By the way, printing something or placing information in a file are also examples of side effects. When we said above that functions should generally not print any thing unless that is part of their stated functionality, we were really just identifying one special case of (potentially) undocumented side effects. J Algorith Ana lysis m When we start dealing with programs that contain collections of data, we often need to know more about a function than just its pre- and postconditions. Dealing with a list of 10 or even 100 exam scores is no problem, but a list of customers for an online business might contain tens or hundreds of thousands of items. A programmer working on problems in biology might have to deal with a DNA sequence containing Inillions or even billions of nucleotides. Applications that search and index web pages have to deal with collections of a similar magnitude. When collection sizes get large, the efficiency of an algorithm can be just as critical as its correctness. An algorithm that gives a correct answer but requires 10 years of computing time is not likely to be very useful. Algorithm analysis allows us to characterize algorithms according to how much time and memory they require to accomplish a task. In this section, we'll take a first look at techniques of algorithm analysis in the context of searching a collection. 1 1.3.1 1 Li near Search Searching is the process of looking for a particular value in a collection. For example, a program that maintains the membership list for a club might need to look up the information about a particular member. This involves some form of a search process. It is a good problem for us to examine because there are numerous algorithms that can be used, and they differ in their relative efficiency. Boiling the problem down to its simplest essence, we'll consider the problem of finding a particular number in a list. The same principles we use here will apply to more complex searching problems such as searching through a customer list to find those who live in Iowa. The specification for our simple search problem looks like this: 18 Chapter 1 Abstraction a nd Ana lysis def search ( items , target ) : II lI lI Locate target in item s pre : items is a list of numbers post : returns non-negative x where items [x] == target , if target in items ; returns - 1 , otherwise ll ll il Here are a couple interactive examples that illustrate its behavior: » > search ( [3 , 1 , 4 , 2 , 5] , 4) 2 » > search ( [3 , 1 , 4 , 2 , 5] , 7 ) -1 In the first example, the function returns the index where 4 appears in the list. In the second example, the return value -1 indicates that 7 is not in the list. Using the built-in Python list methods, the search function is easily imple mented: # search1. py def search ( items , target ) : try : return items. index (target ) except ValueError : return -1 The index method returns the first position in the list where a target value occurs. If target is not in the list, index raises a ValueError exception. In that case, we catch the exception and return - 1. Clearly, this function meets the specification; the interesting question for us is how efficient is this method? One way to determine the efficiency of an algorithm is to do empirical testing. We can simply code the algorithm and run it on different data sets to see how long it takes. A simple method for timing code in Python is to use the t ime module's time function, which returns the number of seconds that have passed since January 1 , 1970. We can just call that method before and after our code executes and print the difference between the times. If we placed our search function in a module named search1 py, we could test it directly like this:. 1.3 Algorithm Analysis 19 # t ime_search. py import t ime from search1 import search items = range ( 1 000000) # create a big l ist start = t ime. t ime ( ) search ( items , 999999) # look f or the last item stop = t ime. t ime ( ) print stop - start start = t ime. t ime ( ) search ( items , 499999) # look f or the middle item stop = t ime. t ime ( ) print stop - start start = t ime. t ime ( ) search ( items , 10) # look f or an item near the front stop = t ime. t ime ( ) print stop - start Try this code on your computer and note the time to search for the three numbers. What does that tell you about how the index method works? By the way, the Python library contains a module called timei t that provides a more accurate and sophisticated way of timing code. If you are doing much empirical testing, it's worth checking out this module. Let's try our hand at developing our own search algorithm using a simple "be the computer" strategy. Suppose that I give you a page full of numbers in no particular order and ask whether the number 13 is in the list. How will you solve this problem? If you are like most people, you simply scan down the list comparing each value to 13. When you see 13 in the list, you quit and tell me that you found it. If you get to the very end of the list without seeing 13, then you tell me it's not there. This strategy is called a linear search. You are searching through the list of items one by one until the target value is found. This algorithm translates directly into simple code. # search2. py def search ( items , target ) : for i in range (len ( items) ) : if items [i] == target : return i return 1 - 20 Chapter 1 Abstraction and Ana lysis You can see here that we have a simple f or loop to go through the valid indexes for the list (range ( len ( items ) ) ). We test the item at each position to see if it is the target. If the target is found, the loop terminates by immediately returning the index of its position. If this loop goes all the way through without finding the item, the function returns - 1. One problem with writing the function this way is that the range expression creates a list of indexes that is the same size as the list being searched. Since an int generally requires four bytes (32 bits) of storage space, the index list in our test code would require four megabytes of memory for a list of one million numbers. In addition to the memory usage, there would also be considerable time wasted creating this second large list. Python has an alternative form of the range function called xrange that could be used instead. An xrange is used only for iteration, it does not actually create a list. However, the use of xrange is discouraged in new Python code. 1 If your version of Python is 2.3 or newer, you can use the enumerate function. This elegant alternative allows you to iterate through a list and, on each iteration, you are handed the next index along with the next item. Here's how the search looks using enumerate. # search3. py def search ( it ems , target ) : f or i , item in enumerate ( items ) : if item == target : return i return 1 - Another approach would be to avoid the whole range/xrange/enumerate issue by using a while loop instead. # search4. py def search ( it ems , target ) : i = 0 yhile i < len ( items) : if items [i] == target : return i i += 1 return - 1 1 In Python 3. 0 , the standard range expression behaves like xrange and does not actually create a list. 1. 3 Algorith m Ana lysis 21 Notice that all of these search functions implement the same algorithm, namely linear search. How efficient is this algorithm? To get an idea, you might try experimenting with it. Try timing the search for the three values as you did using the list index method. The only code you need to change is the import of the actual search function, since the parameters and return values are the same. Because we wrote to a specification, the client code does not need to change, even when different implementations are mixed and matched. This is implementation independence at work. Pretty cool, huh? 1 1.3.2 1 Bi nary Search The linear search algorithm was not hard to develop, and it will work very nicely for modest-sized lists. For an unordered list, this algorithm is as good as any. The Python in and index operations both implement linear searching algorithms. If we have a very large collection of data, we might want to organize it in some way so that we don't have to look at every single item to determine where, or if, a particular value appears in the list. Suppose that the list is stored in sorted order (lowest to highest). As soon as we encounter a value that is greater than the target value, we can quit the linear search without looking at the rest of the list. On average, that saves us about half of the work. But if the list is sorted, we can do even better than this. When a list is ordered, there is a much better searching strategy, one that you probably already know. Have you ever played the number guessing game? I pick a number between 1 and 100, and you try to guess what it is. Each time you guess, I will tell you if your guess is correct, too high, or too low. What is your strategy? If you play this game with a very young child, they might well adopt a strategy of simply guessing numbers at random. An older child might employ a systematic approach corresponding to linear search, guessing 1 , 2, 3, 4, and so on until the lnystery value is found. Of course, virtually any adult will first guess 50. If told that the number is higher, then the range of possible values is 50- 100. The next logical guess is 75. Each time we guess the middle of the remaining numbers to try to narrow down the possible range. This strategy is called a binary search Binary means two, and at each step, we are dividing the remaining numbers into two parts. We can employ a binary search strategy to look through a sorted list. The basic idea is that we use two variables to keep track of the endpoints of the range in the list where the item could be. Initially, the target could be anywhere in the list, so we start with variables low and high set to the first and last positions of the list, respectively. 22 Chapter 1 Abstraction a nd Ana lysis The heart of the algorithm is a loop that looks at the item in the middle of the remaining range to compare it to x. If x is smaller than the middle item, then we move high, so that the search is narrowed to the lower half. If x is larger, then we move low, and the search is narrowed to the upper half. The loop terminates when x is found or there are no longer any more places to look (i.e. , low > high). The code below implements a binary search using our same search API. # bsearch. py def search ( items , target ) : low = 0 high = len ( it ems )- 1 while low c = Card(4 , ' c ' ) » > print c Four of Clubs We have now translated our ADT into an object-oriented form. Clients of this class will use dot notation to perform operations on the ADT. Here's the code that prints out all 52 cards translated into its object-based form: 48 Chapter 2 Data Abstraction # printcards. py # S imple test of the Card ADT from Card import Card def printAII ( ) : f or suit in ' cdhs ' : f or rank in range ( 1 , 14) : card = Card (rank , suit ) print ' Rank : ' , card. rank ( ) print ' Suit : ' , card. suit ( ) print card if __ name printAII ( ) Notice that the constructor is invoked by using the name o f the class, Card, and the __str method is implicitly called by Python when it is asked to print the card. __ 1 2.3.2 1 I m plementation We can translate our previous implementation of the card ADT into our new class based implementation. Now the rank and suit components of a card can just be stored in appropriate instance variables: # Card. py class Card (obj ect ) : II II II A simple playing card. A Card is characterized by two components : rank : an integer value in the range 1 - 13 , inclusive (Ace-King) suit : a character in ' cdhs ' f or clubs , diamonds , hearts , and spade s. 11 11 11 SUITS = ' cdhs ' SUIT_NAMES = [ ' Clubs ' , ' Diamonds ' , ' Heart s ' , ' Spades ' ] RANKS = range ( 1 , 14) RANK NAMES [ ' Ace ' , ' Two ' , ' Three ' , ' Four ' , ' Five ' , ' Six ' , ' Seven ' , ' Eight ' , ' Nine ' , ' Ten ' , ' Jack ' , ' Queen ' , ' King ' ] def _ _ init_ _ ( self , rank , suit ) : II II II Constructor pre : rank in range ( 1 , 14) and suit in ' cdhs ' post : self has the given rank and suit II II II self. rank_num = rank self. suit _char = suit 2.3 ADTs a nd Objects 49 def suit ( self ) : " " " Card suit post : Returns the suit of self as a single character " " " return self. suit _char def rank (self ) : " " " Card rank post : Returns the rank of self as an int " " " return self. rank_num def suitName (self ) : " " " Card suit name post : Returns one of ( ' clubs ' , ' diamonds ' , ' hearts ' , ' spades ' ) corresponding to self ' s suit. " " " index = self. SUITS. index (self. suit_char) return self. SUIT_NAMES [index] def rankName (self ) : " " " Card rank name post : Returns one of ( ' ace ' , ' two ' , ' three ' ,... , ' king ' ) corresponding t o self ' s rank. " " " index = self. RANKS. index ( self. rank_num) return self. RANK_NAMES [index] def __ str __ ( self ) : " " " String representation post : Returns string represent ing self , e. g. ' Ace of Spades ' " " " return self. rankName ( ) + ' of ' + self. suitName ( ) Notice that the lookup tables from the previous version have now been imple mented as variables that are assigned inside of the Card class but outside of any of the methods of the class. These are class variables. They "live" inside the class definition, so there is one copy shared by all instances of the class. These variables are accessed just like instance variables using the self. convention. When Python is asked to retrieve the value of an object's attribute, it first checks to see if the attribute has been assigned directly for the object. If not, it will look in the object's class to find it. For example, when the sui tName method accesses self. SUITS , Python sees that self does not have a SUIT attribute, so the value from the Card class is used (because self is a Card). 50 Chapter 2 Data Abstraction You now have three different kinds of variables for storing information in pro grams: regular (local) variables, instance variables, and class variables. Choosing the right kind of variable for a given piece of information is an important decision when implementing ADTs. The first question you must answer is whether the data needs to be remembered from one method invocation to another. If not, you should use a local variable. The index variable used in rankName 0 is a good example of a local variable; its value is no longer needed once the method terminates. Notice that there is also a local variable called index in the sui tName method. These are two completely independent variables, even though they happen to have the same name. Each exists only while the method where they are used is executing. We could have written this code using an instance variable self. index in these two methods. Doing so would be a misleading design choice, because we have no reason to hang onto the value of index from the last execution of rankName or sui tName. Reusing an instance variable in this case would imply a connection where none exists. Data that does need to be remembered from one method invocation to another should be stored in either instance variables or class variables. The decision about which to use in this case depends on whether the data may be different from one object to the next or whether it is the same for all objects of the class. In our card example, self. rank_num and self. suit_char are values that will vary among cards. They are part of the intrinsic state of a particular card, so they have to be instance variables. The suit names, on the other hand will be the same for all cards of the class, so it makes sense to use a class variable for that. Constants are often good candidates for class variables, since, by definition, they are the same from one object to the next. However, there are also times when non-constant class variables make sense. Keeping these simple rules in mind should help you turn your ADTs into working classes. As you can see there is a natural correspondence between the notion of an ADT and an object-oriented class. When using an object-oriented language, you will usually want to implement an ADT as a class. The nice thing about using classes is that they naturally combine the two facets of an ADT (data and operations) into a single programming structure. 1 2.3.3 1 C h a ngi ng the Representation We have emphasized that the primary strength of using ADTs to design software is implementation independence. However, the playing card example that we've discussed so far has not really illustrated this point. After all, we said that a card has a rank that is an int and a suit that is a character, then we simply stored these 2.3 ADTs and Objects 51 values as instance variables. Isn't the client directly using the representation when it manipulates suits and ranks? The reason it seems that the client has access to the representation in this case is simply because the concrete representation that we've chosen directly mirrors the data types that are used to pass information to and from the ADT. However, since access to the data takes place through methods (like suit and rank) we can actually change the concrete representation without affecting the client code. This is where the independence comes in. Suppose we are developing card games for a handheld device such as a PDA or cell phone. On such a device, we might have strict memory limitations. Our current representation of cards requires two instance variables for each card; the rank, which is a 32-bit int i and the suit, which is a character. An alternative way to think about cards is simply to number them. Since there are 52 cards, each can be represented as a number from 0 to 5 1. Think of putting the cards in order so that all the clubs come first, diamonds second, etc. Within each suit , put the cards in rank order. Now we have a complete ordering where the first card in the deck is the ace of clubs, and the last card is the king of spades. Given a card's number, we can calculate its rank and suit. Since there are 13 cards in each suit, dividing the card number by 13 (using integer division) produces a value between 0 and 3 (inclusive). Clubs will yield a 0, diamonds a 1 , etc. Furthermore, the remainder from the division will give the relative position of the card within the suit (i.e. , its rank). For example, if the card number is 37, 37//13 = 2 so the suit is hearts, and 37%13 = 11 which corresponds to a rank of queen since the first card in a suit (the ace) will have a remainder of O. So card 37 is the queen of hearts. Using this approach, the concrete representation of our Card ADT can be a single number. We leave it as an exercise for the reader to complete an implementation of the Card class using this more-memory-efficient alternative representation. 1 2.3.4 1 Object-Oriented Design a nd Progra m m i ng As you have seen, there is a close correspondence between the ideas of ADTs and object-oriented programming. But there is more to object-orientation (00) than just implementing ADTs. Most 00 gurus talk about three features that together make development truly object-oriented: encapsulation, polymorphism, and inheritance. 52 Chapter 2 Data Abstraction Enca psu lation As you know, objects know stuff and do stuff. They combine data and operations. This process of packaging some data along with the set of operations that can be performed on the data is called encapsulation. Encapsulation is one of the major attractions of using objects. It provides a convenient way to compose solutions to complex problems that corresponds to our intuitive view of how the world works. We naturally think of the world around us as consisting of interacting objects. Each object has its own identity, and knowing what kind of object it is allows us to understand its nature and capabilities. When you look out your window, you see houses, cars, and trees, not a swarming mass of countless molecules or atoms. From a design standpoint , encapsulation also provides the critical service of separating the concerns of "what" vs. "how." The actual implementation of an object is independent of its use. Encapsulation is what gives us implementation independence. Encapsulation is probably the chief benefit of using objects, but alone it only makes a system object-based. To be truly objected- oriented, the approach must also have the characteristics of polymorphism and inheritance. Polymorphism Literally, the word polymorphism means "many forms." When used in object oriented literature, this refers to the fact that what an object does in response to a message (a method call) depends on the type or class of the object. Consider a simple example. Suppose you are working with a graphics library for drawing two dimensional shapes. The library provides a number of primitive geometric shapes that can be drawn into a window on the screen. Each shape has an operation that actually draws the shape. We have a collection of classes something like this: class Circle (obj ect ) : def draw ( self , window) : # code to draw the circle class Rectangle ( obj ect ) : def draw ( self , window) : # code to draw the rectangle class Polygon (obj ect ) : def draw ( self , window) : # code to draw the polygon 2.3 ADTs a nd Objects 53 Of course, each of these classes would have other methods in addition to its draw method. Here we're just giving a basic outline for illustration. Suppose you write a program that creates a list containing a mixture of geometric objects: circles, rectangles, polygons, etc. To draw all of the objects in the list, you would write code something like this: I f or obj in obj ect s : obj. draw (win) Now consider the single line of code in the loop body. What function is called when obj draw (win) executes? Actually, this single line of code calls several distinct. functions. When obj is a circle, it executes the draw method from the circle class. When obj is a rectangle, it is the draw method from the rectangle class, and so on. The draw operation takes many forms; the particular one used depends on the type of obj. That's the polymorphism. Polymorphism gives object-oriented systems the flexibility for each object to perform an action just the way that it should be performed for that object. If we didn't have objects that supported polymorphism we'd have to do something like this: f or obj in obj ect s : if type (obj ) is Circle : draw_ circle (.. ). elif type (obj ) is Rectangle : draw_rectangle (... ) elif type (obj ) is Polygon : draw_polygon (.. ). Not only is this code more cumbersome, it is also much less flexible. If we want to add another type of object to our library, we have to find all of the places where we made a decision based on the object type and add another branch. In the polymorphic version, we can just create another class of geometric object that has its own draw method, and all the rest of the code remains exactly the same. Polymorphism allows us to extend the program without having to go in and modify the existing code. I n h erita nce The third important property for object-oriented development is inheritance. As its name implies, the idea behind inheritance is that a new class can be defined to borrow behavior from another class. The new class (the one doing the borrowing) 54 Chapter 2 Data Abstraction is called a subclass, and the existing class (the one being borrowed from) is its superclass. For example, if we are building a system to keep track of employees, we might have a class Employee that contains the general information and methods that are common to all employees. One sample attribute would be a homeAddress method that returns the home address of an employee. Within the class of all employees, we might distinguish between SalariedEmployee and HourlyEmployee. We could make these subclasses of Employee, so they would share methods like homeAddres s ; however, each subclass would have its own monthlyPay function, since pay is computed differently for these different classes of employees. Figure 2. 1 shows a simple class diagram depicting this situation. The arrows with open heads indicate inheritance; the subclasses inherit the homeAddress method defined in the Employee class, but each defines its own implementation of the monthlyPay method. Employee HourlyEmployee Salaried Employee Figure 2. 1 : Simple example of inheritance with subclasses inheriting one shared method and each separately implementing one method Inheritance provides two benefits. One is that we can structure the classes of a system to avoid duplication of operations. We don't have to write a separate homeAddress method for the HourlyEmployee and SalariedEmployee classes. A closely related benefit is that new classes can often be based on existing classes, thus promoting code reuse. 2.4 An Exa m ple ADT: Dataset 55 1 2. 4 1 An Exa m ple A DT: Dataset Now that we've covered ADTs and object-oriented principles, let's go back and look at the simple statistics problem introduced in Chapter 1. This time, we're going to tackle the problem using an object-oriented approach. 1 2.4. 1 1 The P rocess of 0 0 0 The essence of design is describing a system in terms of "black boxes" and their interfaces. Each component provides a service or set of services through its interface. In top-down design, functions serve the role of our black boxes. A client program can use a function as long as it understands what the function does. The details of how the task is accomplished are encapsulated in the function definition. In object-oriented design (OOD) , the black boxes are objects. If we can break a large problem into a set of cooperating classes, we drastically reduce the complexity that must be considered to understand any given part of the program. Each class stands on its own. Object-oriented design is the process of finding and defining a useful set of classes for a given problem. Like all design, it is part art and part science. There are many different approaches to OOD, each with its own special tech niques, notations, gurus, and textbooks. Probably the best way to learn about design is to do it. The more you design, the better you will get. Just to get you started, here are some intuitive guidelines for object-oriented design: 1. Look for object candidates. Your goal is to define a set of objects that will be helpful in solving the problem. Start with a careful consideration of the problem statement. Objects are usually described by nouns. You might underline all of the nouns in the problem statement and consider them one by one. Which of them will actually be represented in the program? Which of them have "interesting" behavior? Things that can be represented as primitive data types (numbers or strings) are probably not important candidates for objects. Things that seem to involve a grouping of related data items (e.g. , coordinates of a point or personal data about an employee) probably are. 2. Identify instance variables. Once you have uncovered some possible objects, think about the information that each object will need to do its job. What kinds of values will the instance variables have? Some object attributes will have primitive values; others might themselves be complex types that suggest other useful objects/classes. Strive to find good "home" classes for all the data in your program. 56 Chapter 2 Data Abstraction 3. Think about interfaces. When you have identified a potential object / class and some associated data, think about what operations would be required for objects of that class to be useful. You might start by considering the verbs in the problem statement. Verbs are used to describe actions what must be done. List the methods that the class will require. Remember that all manipulation of the object's data should be done through the methods you provide. 4. Refine the nontrivial methods. Some methods will look like they can be accomplished with a couple of lines of code. Other methods will require considerable work to develop an algorithm. Use top-down design and stepwise refinement to flesh out the details of the more difficult methods. As you go along, you may very well discover that some new interactions with other classes are needed, and this might force you to add new methods to other classes. Sometimes you may discover a need for a brand new kind of object that calls for the definition of another class. 5. Design iteratively. As you work through the design, you will bounce back and forth between designing new classes and adding methods to existing classes. Work on whatever seems to be demanding your attention. No one designs a program top to bottom in a linear, systematic fashion. Make progress wherever it seems progress needs to be made. 6. Try out alternatives. Don't be afraid to scrap an approach that doesn't seem to be working or to follow an idea and see where it leads. Good design involves a lot of trial and error. When you look at the programs of others, you are seeing finished work, not the process they went through to get there. If a program is well designed, it probably is not the result of a first try. Fred Brooks, a legendary software engineer, coined the maxim: "Plan to throw one away." Often you won't really know how a system should be built until you've already built it the wrong way. 7. Keep it simple. At each step in the design, try to find the simplest approach that will solve the problem at hand. Don't design in extra complexity until it is clear that a more complex approach is needed. 1 2.4.2 1 Identifyi ng a n A DT Recall that in the statistics problem our goal was to report some simple statistics for a set of exam scores. What are the likely candidates for objects in this program? 2.4 An Exam ple ADT: Dataset 57 Looking at the problem description, we are going to have to manipulate scores (a noun). Should a score be an object? Since a score is just a number, it looks like one of the built-in numeric types can be used for this, probably float. What else is there? In order to compute the required statistics, we need to keep track of an entire set of scores. In statistics, we would call these scores a dataset. Collections are often good candidates for ADTs; let's try specifying a Dataset class. It's obvious that we want methods that return the minimum value, maximum value, mean, and standard deviation of the values in the dataset, since those are the statistics called for in the original problem. The only remaining question is how we get the numbers into the Dataset in the first place. Once simple approach is to have an add method that places another number in the dataset. We can construct an initially empty set and then add the numbers one at a time. Here's a sample specification: # Dataset. py class Dataset ( obj ect ) : " " "Dataset is a collect ion of numbers from which s imple descript ive statistics can be computed. " " " def __ init __ (self ) : " " " post : self is an empty Dataset " " " def add (self , x) : " " " add x to the data set post : x is added to the data set " " " def min (self ) : " " " f ind the minimum pre : size of self >= 1 post : returns smallest number in self " " " def max (self ) : " " " f ind the maximum pre : size of self >= 1 post : returns largest number in self " " " def average (self ) : " " " calculate the mean pre : size of self >= 1 post : returns the mean of the values in self " " " def std_deviation (nums ) : " " " calculate the standard deviat ion pre : size of self >= 2 post : returns the st andard deviat ion of the values in self " " " 58 Chapter 2 Data Abstraction Examining this specification immediately suggests one more operation that we should add to the ADT. Since various operations have preconditions based on how many values are in the dataset, we really should have an operation that returns this. It's always a good idea to ensure that the preconditions of ADT operations are testable. This allows the client to make sure it is using an ADT properly and also allows the implementation to easily check the preconditions. Let's add one more method: def s ize (self ) : 11 11 11 post : returns the size of self (number of values added) As before, we can "test " our design by writing some code that makes use of it. In this case, we can actually write the main program for our application, relying on the Dataset ADT to do the hard work. All we need is a sentinel loop to input the data: # test_Dataset. py def main e ) : print ' This is a program to compute the min , max , mean and ' print ' standard deviat ion for a set of numbers. \n ' data = Dataset 0 while True : xStr = raw_input ( ' Enter a number « Enter> to quit ) : ' ) if xStr == ". break try : x = f loat (xStr) except ValueError : print ' Invalid Entry Ignored : Input was not a number ' continue dat a. add (x) print ' Summary of ' , data. size ( ) , ' scores. ' print ' Min : ' , data. min ( ) print ' Max : ' , data. max ( ) print ' Mean : ' , data. average ( ) print ' Standard Deviat ion : ' , data. std_deviation ( ) i f __name main e ) 1 2.4.3 1 I m plementing the ADT To implement our Dataset ADT, we need to come up with a concrete representation for the set of numbers. One obvious approach would be to use a list of numbers, 2.4 An Exa m ple ADT: Dataset 59 just as we did in the original version developed using top-down design. In this approach, the add method would simply append another number to the list. Each of the statistics methods could then loop through the list of numbers to perform their calculations. Of course, as with virtually any ADT, there are other possible concrete repre sentations. Do we really need to store a list of all the numbers in the Dataset? Actually, none of the methods really needs to know the specific numbers in the collection, they just need some summary information about the numbers. Clearly, for the min and max methods, we just need to know the smallest and largest values, respectively, that have been a