Data Structures & Algorithms PDF

Summary

This document provides an overview of data structures, including arrays, linked lists, trees, and hash tables, and details their advantages and disadvantages. It also covers how to choose the right data structure for various tasks, and the relationship of general-purpose data structures. The document discusses sorting algorithms and their time and space complexities, and it also includes a comparison of special-purpose structures like stacks, queues, and priority queues. The document is intended for use in a computer science course, or related training materials.

Full Transcript

Data Structures & Algorithms Data Structures - When to Use What ? ? Overview of Data Structures Another way to look at data structures is to focus on their strengths and weaknesses. Data Structure Advantages Disadvantages Array...

Data Structures & Algorithms Data Structures - When to Use What ? ? Overview of Data Structures Another way to look at data structures is to focus on their strengths and weaknesses. Data Structure Advantages Disadvantages Array Quick insertion, very fast access if index known. Slow search, slow deletion, fixed size. Ordered array Quicker search than unsorted array. Slow insertion and deletion, fixed size. Stack Provides last-in, first-out access. Slow access to other items. Queue Provides first-in, first-out access. Slow access to other items. Linked list Quick insertion, quick deletion. Slow search. Binary tree Quick search, insertion, deletion (if tree remains Deletion algorithm is complex. balanced). Red-black tree Quick search, insertion, deletion. Tree always Complex. balanced. 2-3-4 tree Quick search, insertion, deletion. Tree always Complex. balanced. Similar trees good for disk storage. Hash table Very fast access if key known. Fast insertion. Slow deletion, access slow if key not known, inefficient memory usage. Heap Fast insertion, deletion, access to largest item. Slow access to other items. How to decide... How to decide which general-purpose data structure—array, linked list, tree, or hash table —to use? How to decide which specialized data structure —stack, queue, or priority queue—to use? How to decide which sorting algorithm to use? General-purpose data structures If you need to store real-world data such as personnel records, inventories, contact lists, or sales data, you need a general-purpose data structure. The structures of this type that we’ve discussed in this book are arrays, linked lists, trees, and hash tables. We call these general-purpose data structures because they are used to store and retrieve data using key values. They provide convenient access to any data item (as opposed to specialized structures such as stacks, which allow access to only certain data items). The relationship of general-purpose data structures Speed and Algorithms The general-purpose data structures can be roughly arranged in terms of speed: Arrays and linked lists are slow, trees are fairly fast, and hash tables are very fast. However, don’t draw the conclusion that it’s always best to use the fastest structures. There’s a penalty for using them. First, they are—in varying degrees—more complex to program than the array and linked list. Also, hash tables require you to know in advance about how much data can be stored, and they don’t use memory very efficiently. Ordinary binary trees will revert to slow O(N) operation for ordered data, and balanced trees, which avoid this problem, are difficult to program. Computers Grow Faster Every Year The fast structures come with penalties, and another development makes the slow structures more attractive. Every year there’s an increase in the CPU and memory-access speed of the latest computers. Moore’s Law (postulated by Gordon Moore in 1965) specifies that CPU performance will double every 18 months. This adds up to an astonishing difference in performance between the earliest computers and those available today, and there’s no reason to think this increase will slow down any time soon. Pointers Are Faster C++ has an advantage over some languages in the speed with which objects can be manipulated because, in many data structures, C++ stores only pointers, not actual objects. Therefore most algorithms will run faster than if actual objects occupy space in a data structure. In analyzing the algorithms it’s not the case, as when objects themselves are stored, that the time to “move” an object depends on the size of the object. Because only a pointer is moved, it doesn’t matter how large the object is. Libraries Libraries of data structures are available commercially for all major programming languages. Languages themselves may have some structures built in. C++, includes vector, stack, and various other container classes as part of the Standard Template Library (STL). Using a ready-made library might eliminate or at least reduce the programming necessary to create the data structures. When that’s the case, using a complex structure such as a balanced tree, or a delicate algorithm such as quicksort, becomes a more attractive possibility. However, you must ensure that the class can be adapted to your particular situation. Arrays In many situations the array is the first kind of structure you should consider when storing and manipulating data. Arrays are useful when 1. The amount of data is reasonably small. 2. The amount of data is predictable in advance. If you have plenty of memory, you can relax the second condition; just make the array big enough to handle any foreseeable influx of data. If insertion speed is important, use an unordered array. If search speed is important, use an ordered array with a binary search. Deletion is always slow in arrays because an average of half the items must be moved to fill in the newly vacated cell. Traversal is fast in an ordered array but not supported in an unordered array. Vectors, such as the vector class supplied with the C++ STL, are arrays that expand themselves when they become too full. Linked Lists Consider a linked list whenever the amount of data to be stored cannot be predicted in advance or when data will frequently be inserted and deleted. The linked list obtains whatever storage it needs as new items are added, so it can expand to fill all of the available memory; there is no need to fill “holes” during deletion, as there is in arrays. Insertion is fast in an unordered list. Searching and deletion are slow (although deletion is faster than in an array), so, like arrays, linked lists are best used when the amount of data is comparatively small. A linked list is somewhat more complicated to program than an array, but is simple compared with a tree or hash table. Binary Search Trees A binary tree is the first structure to consider when arrays and linked lists prove too slow. A tree provides fast O(log N) insertion, searching, and deletion. Traversal is O(N), which is the maximum for any data structure (by definition, you must visit every item). You can also find the minimum and maximum quickly, and traverse a range of items. An unbalanced binary tree is much easier to program than a balanced tree, but unfortunately ordered data can reduce its performance to O(N) time, no better than a linked list. However, if you’re sure the data will arrive in random order, there’s less necessity to using a balanced tree. Balanced Trees Of the various kinds of balanced trees, we discussed red-black trees and 2-3-4 trees. They are both balanced trees, and thus guarantee O(log N) performance whether the input data is ordered or not. However, these balanced trees are challenging to program, with the red-black tree being the more difficult. They also impose additional memory overhead, which might or might not be significant. The problem of complex programming is reduced if a commercial class can be used for a tree. In some cases a hash table might be a better choice than a balanced tree. Hash table performance doesn’t degrade when the data is ordered. There are other kinds of balanced trees, including AVL trees, splay trees, 2-3 trees, and so on, but they are not as commonly used as the red-black tree. Hash Tables Hash tables are the fastest data storage structure. This makes them a necessity for situations where a computer program, rather than a human, is interacting with large amounts of data. Hash tables are typically used in spelling checkers and as symbol tables in computer language compilers, where a program must check thousands of words or symbols in a fraction of a second. Hash tables might also be useful when a person, as opposed to a computer, initiates dataaccess operations. As noted above, hash tables are not sensitive to the order in which data is inserted, and so can take the place of a balanced tree. Programming is much simpler than for balanced trees. Hash tables require additional memory, especially for open addressing. Also, the amount of data to be stored must be known fairly accurately in advance because an array is used as the underlying structure. A hash table with separate chaining is the most robust implementation, unless the amount of data is known accurately in advance, in which case open addressing offers simpler programming because no linked list class is required. Hash tables don’t support any kind of ordered traversal, or access to the minimum or maximum items. If these capabilities are important, the binary search tree is a better choice. Comparing the General-Purpose Storage Structures summarizes the speeds of the various general-purpose data storage structures using Big O notation. Traversal implies visiting the data items in order of ascending or descending keys. A dash (—) means the indicated operation is not supported. Special-Purpose Data Structures The special-purpose data structures are the stack, the queue, and the priority queue. These structures, rather than supporting a database of user-accessible data, are more often used by a computer program to aid in carrying out some algorithm. Stacks, queues, and priority queues are abstract data types (ADTs) that are implemented by a more fundamental structure such as an array or linked list. These ADTs present a simple interface to the user, typically allowing only insertion and the ability to access or delete only one data item. These items are For stacks: the last item inserted For queues: the first item inserted For priority queues: the item with the highest priority These ADTs can be seen as conceptual aids. Their functionality could be obtained using the underlying structure (such as an array) directly, but the reduced interface they offer simplifies many problems. These ADTs can’t be conveniently searched for an item by key value, or traversed. Stack A stack is used when you want access only to the last data item inserted; it’s a last-infirst-out (LIFO) structure. A stack is often implemented as an array or a linked list. The array implementation is efficient because the most recently inserted item is placed at the end of the array, where it’s also easy to delete it. Stack overflow can occur, but is not likely if the array is reasonably sized, because stacks seldom contain huge amounts of data. If the stack will contain a lot of data and the amount can’t be predicted accurately in advance (as when recursion is implemented as a stack) a linked list is a better choice than an array. A linked list is efficient because items can be inserted and deleted quickly from the head of the list. Stack overflow can’t occur (unless the entire memory is full). A linked list is slightly slower than an array because memory allocation is necessary to create a new link for insertion, and deallocation of the link is necessary at some point, usually following removal of an item from the list. Queue A queue is used when you want access only to the first data item inserted; it’s a first-infirst-out (FIFO) structure. Like stacks, queues can be implemented as arrays or linked lists. Both are efficient. The array requires additional programming to handle the situation where the queue wraps around at the end of the array. A linked list must be double-ended, to allow insertions at one end and deletions at the other. As with stacks, the choice between an array implementation and a linked list implementation is determined by how well the amount of data can be predicted. Use the array if you know about how much data there will be; otherwise, use a linked list. Priority Queue A priority queue is used when the only access desired is to the data item with the highest priority. This is the item with the largest (or sometimes the smallest) key. Priority queues can be implemented as ordered arrays or ordered linked lists. Insertion in these structures is slow, but deletion is fast. The array works when the amount of data to be stored can be predicted in advance; the linked list when the amount of data is unknown. A vector can be substituted for the array. A priority queue can also be implemented as a heap. A heap is a tree-like structure, usually based on an array, that provides fast access to the largest (or smallest) data item. As the basis for a priority queue, the heap allows insertion in O(log N) time; unfortunately, deletion is also O(log N), not as fast as an ordered array. The heap is more complicated than the array or linked list. However it’s the structure of choice when insertion speed is vital. Comparison of Special-Purpose Structures Shows the Big O times for stacks, queues, and priority queues. These structures don’t support searching or traversal. Sorting As with the choice of data structures, it’s worthwhile initially to try a slow but simple sort, such as the insertion sort. It might be that the fast processing speeds available in modern computers will allow sorting of your data in reasonable time. (As a wild guess, the slow sort might be appropriate for under 1,000 items.) Insertion sort is also good for almost-sorted files, operating in about O(N) time if not too many items are out of place. This is typically the case where a few new items are added to an already- sorted file. If the insertion sort proves too slow you can use one of the more complex but faster sorts: mergesort or quicksort. Mergesort requires extra memory and is somewhat slower than quicksort, so quicksort is the usual choice when the fastest sorting time is necessary. Sorting However, quicksort is suspect if there’s a danger that the data may not be random, in which case it may deteriorate to O(N2) performance in some implementations. For potentially non-random data, heapsort is better. Quicksort is also prone to subtle errors if it is not implemented correctly. Small mistakes in coding can make it work poorly for certain arrangements of data, a situation that might be hard to diagnose. The shellsort is intermediate in speed between the slow sorts like the insertion sort and the fast sorts like mergesort and quicksort. It’s considerably easier to program than the faster sorts, and might therefore be useful in situations when there’s too much data for a slow sort but not enough to justify a fast sort. The heapsort is based on the heap structure, just mentioned in connection with priority queues. The heapsort rivals the mergesort in its ability to handle non-random data. Comparison of sorting algorithms Summarizes the running time for various sorting algorithms. The column labeled Comparison attempts to estimate the minor speed differences between algorithms with the same average Big O times. (There’s no entry for shellsort because there are no other algorithms with the same Big O performance.) Literature Michael T. Goodrich, Roberto Tamassia, David M. Mount Data Structures and Algorithms in C++ Second Edition; ISBN-13 978-0-470-38327-8; 2011, John Wiley & Sons, Inc. Robert Lafore, Sams Teach Yourself Data Structures and Algorithms in 24 Hours Paperback; ISBN-10: 0672316331; Sams; Pap/Cdr edition (May 1999) Weiss, Mark Allen. Data structures and algorithm analysis in C++ / Mark Allen Weiss — Fourth edition (2014) Source Code for Data Structures and Algorithm Analysis in C++ (Fourth Edition) - http://users.cs.fiu.edu/~weiss/dsaa_c++4/code/

Use Quizgecko on...
Browser
Browser