Module 8 Space and Time Trade-Offs I PDF

‫الجامعة السعودية االلكترونية‬ ‫الجامعة السعودية االلكترونية‬ ‫‪26/12/2021‬‬ College of Computing and Informatics Design and Analysis of Algorithms DS352 Design and Analysis of Algorithms Module 8 Space and Time Trade-Offs Week Learning Outcomes Describe Space and Time Trade-Offs Express Sorting by Counting Demonstrate Input Enhancement in String Matching 4 Topic Outline  Sorting by Counting  Input Enhancement in String Matching 5 Space-for-time tradeoffs Two varieties of space-for-time algorithms: input enhancement — preprocess the input (or its part) to store some info to be used later in solving the problem o counting sorts o string searching algorithms prestructuring — preprocess the input to make accessing its elements easier o hashing o indexing schemes (e.g., B-trees) 6 Sorting By Counting Sorting by Counting As a ﬁrst example of applying the input-enhancement technique, its application to the sorting problem Comparison- Counting Sort : o Counting for each element of a list to be sorted, the total number of elements smaller than this element and record the results in a table. These numbers will indicate the positions of the elements in the sorted list. o Thus, we will be able to sort the list by simply copying its elements to their appropriate positions in a new, sorted list. 8 Pseudocode Sorting by Counting 9 Example of Sorting by Counting 1 0 Counting by Sorting Efficiency The number of times its basic operation, the comparison A[i] < A[j ], is executed is equal to the sum we have encountered several times already: The algorithm makes the same number of key comparisons as selection sort uses a linear amount of extra space 1 1 Input Enhancement in String Matching String searching by brute force pattern: a string of m characters to search for text: a (long) string of n characters to search in Brute force algorithm Step 1 Align pattern at beginning of text Step 2 Moving from left to right, compare each character of pattern to the corresponding character in text until either all characters are found to match (successful search) or a mismatch is detected Step 3 While a mismatch is detected and the text is not yet exhausted, realign pattern one position to the right and repeat Step 2 Time complexity (worst-case): O(mn) 1 3 String searching by preprocessing Several string searching algorithms are based on the input enhancement idea of preprocessing the pattern Knuth-Morris-Pratt (KMP) algorithm preprocesses pattern left to right to get useful information for later searching Boyer -Moore algorithm preprocesses pattern right to left and store information into two tables Horspool’s algorithm simplifies the Boyer-Moore algorithm by using just one table O(m+n) time in the worst case 1 4 Horspool’s Algorithm Simplified version of Boyer-Moore algorithm: o preprocesses pattern to generate a shift table that determines how much to shift the pattern when a mismatch occurs o always makes a shift based on the text’s character c aligned with the last compared (mismatched) character in the pattern according to the shift table’s entry for c 1 5 Horspool’s Algorithm Example: Consider searching for the pattern BARBER in some text: :In general, the following four possibilities can occur Case 1 If there are no c’s in the pattern—e.g., c is letter S in our example— we can safely shift the pattern by its entire length 1 6 Horspool’s Algorithm Case 2 If there are occurrences of character c in the pattern but it is not the last one there—e.g., c is letter B in our example—the shift should align the rightmost occurrence of c in the pattern with the c in the text: 1 7 Horspool’s Algorithm Case 3 If c happens to be the last character in the pattern but there are no c’s among its other m − 1 characters—e.g., c is letter R in example -the pattern should be shifted by the entire pattern’s length m: 1 8 Horspool’s Algorithm Case 4, if c happens to be the last character in the pattern and there are other c’s among its ﬁrst m − 1 characters—e.g., c is letter R in the example —the rightmost occurrence of c among the ﬁrst m − 1 characters in the pattern should be aligned with the text’s c: 1 9 Horspool’s Algorithm Case 4, if c happens to be the last character in the pattern and there are other c’s among its ﬁrst m − 1 characters—e.g., c is letter R in the example —the rightmost occurrence of c among the ﬁrst m − 1 characters in the pattern should be aligned with the text’s c: 2 0 Horspool’s Algorithm - Steps Step 1 For a given pattern of length m and the alphabet used in both the pattern and text, construct the shift table as described above. Step 2 Align the pattern against the beginning of the text. Step 3 Repeat the following until either a matching substring is found or the pattern reaches beyond the last character of the text. Starting with the last character in the pattern, compare the corresponding characters in the pattern and text until either all m characters are matched (then stop) or a mismatching pair is encountered. Retrieve the entry t(c) from the c’s column of the shift table where c is the text’s character currently aligned against the last character of the pattern, and shift the pattern by t(c) characters to the right along the text. 2 1 Horspool’s Algorithm - Pseudocode 2 2 Boyer-Moore algorithm Based on the same two ideas: comparing pattern characters to text from right to left precomputing shift sizes in two tables bad-symbol table indicates how much to shift based on text’s character causing a mismatch good-suffix table indicates how much to shift based on matched part (suffix) of the pattern (taking advantage of the periodic structure of the pattern) 2 3 Bad-symbol shift in Boyer-Moore algorithm If the rightmost character of the pattern doesn’t match, BM algorithm acts as Horspool’s If the rightmost character of the pattern does match, BM compares preceding characters right to left until either all pattern’s characters match or a mismatch on text’s character c is encountered after k > 0 matches 2 4 Good-suffix shift Boyer-Moore algorithm Good-suffix shift d2 is applied after 0 < k < m last characters were matched d2(k) = the distance between (the last letter of) the matched suffix of size k and (the last letter of ) its rightmost occurrence in the pattern that is not preceded by the same character preceding the suffix Example: CABABA d2(1) = 4 If there is no such occurrence, match the longest part (tail) of the k-character suffix with corresponding prefix; if there are no such suffix-prefix matches, d2 (k) = m Example: WOWWOW d2(2) = 5, d2(3) = 3, d2(4) = 3, d2(5) = 3 2 5 Boyer-Moore algorithm Steps Step 1 Fill in the bad-symbol shift table Step 2 Fill in the good-suffix shift table Step 3 Align the pattern against the beginning of the text Step 4 Repeat until a matching substring is found or text ends: Compare the corresponding characters right to left. If no characters match, retrieve entry t1(c) from the bad-symbol table for the text’s character c causing the mismatch and shift the pattern to the right by t1(c). If 0 < k < m characters are matched, retrieve entry t1(c) from the bad-symbol table for the text’s character c causing the mismatch and entry d2(k) from the good-suffix table and shift the pattern to the right by d = max {d1, d2} where d1 = max{t1(c) - k, 1}. 2 6 Summary Space and time trade-offs in algorithm design are a well-known issue for both theoreticians and practitioners of computing. As an algorithm design technique, trading space for time is much more prevalent than trading time for space. Input enhancement is one of the two principal varieties of trading space for time in algorithm design. Its idea is to preprocess the problem’s input, in whole or in part, and store the additional information obtained in order to accelerate solving the problem afterward. Sorting by distribution counting and several important algorithms for string matching are examples of algorithms based on this technique. Distribution counting is a special method for sorting lists of elements from a small set of possible values. 2 7 Summary Horspool’s algorithm for string matching can be considered a simpliﬁed version of the Boyer-Moore algorithm. Both algorithms are based on the ideas of input enhancement and right-to-left comparisons of a pattern’s characters Prestructuring—the second type of technique that exploits space-for-time trade-offs—uses extra space to facilitate a faster and/or more ﬂexible access to the data. 2 8 List of Readings  Required Reading : Chapter 1 , Chapter 2 from Anany Levitin (2012). Introduction to the Design and Analysis of Algorithms. Pearson, 2012, 3rd edition  Recommending Reading : Chapter 1 from Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein ( 2022 ). Introduction to Algorithms. MIT Press..Fourth edition Chapter 1 from Michael T.Goodrich, Roberto Tamassia 2 9 Thank You ‫الجامعة السعودية االلكترونية‬ ‫الجامعة السعودية االلكترونية‬ ‫‪26/12/2021‬‬ College of Computing and Informatics Design and Analysis of Algorithms Design and Analysis of Algorithms Module 9 Space and Time Trade-Offs Week Learning Outcomes Apply Hashing in various problem domain Describe B-Trees 3 4 Topic Outline  Hashing  B-Trees 3 5 Hashing Hashing A very efficient method for implementing a dictionary, i.e., a set with the operations: o find o insert o delete o Based on representation-change and space-for-time tradeoff ideas Important applications: o symbol tables o databases (extendible hashing) 3 7 Hash tables and hash functions The idea of hashing is to map keys of a given file of size n into a table of size m, called the hash table, by using a predefined function, called the hash function, h: K location (cell) in the hash table Example: student records, key = SSN. Hash function: h(K) = K mod m where m is some integer (typically, prime) If m = 1000, where is record with SSN= 314159265 stored? Generally, a hash function should: o be easy to compute o distribute keys about evenly throughout the hash table 3 8 Collisions If h(K1) = h(K2), there is a collision Good hash functions result in fewer collisions but some collisions should be expected (birthday paradox) Two principal hashing schemes handle collisions differently: o Open hashing each cell is a header of linked list of all keys hashed to it o Closed hashing - one key per cell -in case of collision, finds another cell by - linear probing: use next free bucket - double hashing: use second hash function to compute increment 3 9 Collisions If h(K1) = h(K2), there is a collision Good hash functions result in fewer collisions but some collisions should be expected (birthday paradox) Two principal hashing schemes handle collisions differently: o Open hashing each cell is a header of linked list of all keys hashed to it o Closed hashing - one key per cell -in case of collision, finds another cell by - linear probing: use next free bucket - double hashing: use second hash function to compute increment 4 0 Open hashing (Separate chaining) Keys are stored in linked lists outside a hash table whose elements serve as the lists’ headers. Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED h(K) = sum of K’s letters’ positions in the alphabet MOD 13 4 1 Open hashing (Separate chaining- efficiency) If hash function distributes keys uniformly, average length of linked list will be α = n/m. This ratio is called load factor. For ideal hash functions, the average numbers of probes in successful, S, and unsuccessful searches, U: Load α is typically kept small (ideally, about 1) Open hashing still works if n > m The efﬁciency of these operations is identical to that of searching, and they are all 0(1) in the average case if the number of keys n is about equal to the hash table’s size m 4 2 Closed hashing (Open addressing) Keys are stored inside a hash table itself without the use of linked lists. The table size m must be at least as large as the number of keys n. For handling collision resolution different strategies can be used. linear probing: Checks the cell following the one where the collision occurs. If that cell is empty, the new key is installed there. If the next cell is already occupied, the availability of that cell’s immediate successor is checked, and so on. Note that if the end of the hash table is reached, the search is wrapped to the beginning of the table; i.e., it is treated as a circular array. 4 3 Closed hashing (Open addressing) Example: 4 4 Closed hashing (Open addressing- efficiency) Does not work if n > m Avoids pointers Deletions are not straightforward Number of probes to find/insert/delete a key depends on load factor α = n/m (hash table density) and collision resolution strategy. For linear probing: S = (½) (1+ 1/(1- α)) and U = (½) (1+ 1/(1- α)²) As the table gets filled (α approaches 1), number of probes in linear probing increases dramatically 4 5 Activity !!! In groups, search about double hashing as collision resolution strategies ? B-Trees B-Trees A principal device in organizing such data sets is an index, which provides some information about the location of records with indicated key values. For data sets of structured records the most important index organization is the B-tree, introduced by R. Bayer and E. McGreight [Bay72]. By permitting more than a single key in the same node of a search tree. In the B-tree version, all data records (or record keys) are stored at the leaves, in increasing order of the keys. The parental nodes are used for indexing. Speciﬁcally, each parental node contains n − 1 ordered keys K1 <... < Kn−1 assumed, for the sake of simplicity, to be distinct. 4 8 B-Trees The keys are interposed with n pointers to the node’s children so that all the keys in subtree T0 are smaller than K1, all the keys in subtree T1 are greater than or equal to K1 and smaller than K2 with K1 being equal to the smallest key in T1, and so on, through the last subtree Tn−1 whose keys are greater than or equal to Kn−1 with Kn−1 being equal to the smallest key in Tn−1 4 9 B-Trees illustration: 5 0 B-Trees Properties: A B-tree of order m ≥ 2 must satisfy the following structural properties: The root is either a leaf or has between 2 and m children. Each node, except for the root and the leaves, has between Im/27 and m children (and hence between Im/27− 1 and m − 1 keys). The tree is (perfectly) balanced, i.e., all its leaves are at the same level. 5 1 B-Trees – (Example) : An example of a B-tree of order 4 5 2 B-Trees - Searching: Starting with the root: Follow a chain of pointers to the leaf that may contain the search key. Search for the search key among the keys of that leaf. Use binary search if the number of keys at a node is large enough, since keys are stored in sorted order, at both parental nodes and leaves. 5 3 B-Trees – Searching: Time complixity for searching in B-Trees: The height of the B+-tree is h = O(logn) where n is the number of the keys stored in the tree. In addition of the disk reads, The time complexity of each linear search = O(t). The total time complexity = O(t logt n). Note : If the linear search inside each node is changed to a binary search, The total time complexity = O(log2 t logt n). 5 4 B-Trees – Searching Example : (a) (b) 5 5 B-Trees – Searching Example : (c) 5 6 Activity !!! In groups, how insertion & deletion operations are done in B-Tree ? Summary Hashing is a very efﬁcient approach to implementing dictionaries. It is based on the idea of mapping keys into a one-dimensional table. The size limitations of such a table make it necessary to employ a collision resolution mechanism. The two principal varieties of hashing are open hashing or separate chaining (with keys stored in linked lists outside of the hash table) and closed hashing or open addressing (with keys stored inside the table). Both enable searching, insertion, and deletion in 0(1) time, on average. The B-tree is a balanced search tree that generalizes the idea of the 2-3 tree by allowing multiple keys at the same node. Its principal application, called the B+-tree, is for keeping index-like information about data stored on a disk. By choosing the order of the tree appropriately, one can implement the operations of searching, insertion, and deletion with just a few disk accesses even for extremely large ﬁles. 5 8 List of Readings  Required Reading : Chapter 1 , Chapter 2 from Anany Levitin (2012). Introduction to the Design and Analysis of Algorithms. Pearson, 2012, 3rd edition  Recommending Reading : Chapter 1 from Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein ( 2022 ). Introduction to Algorithms. MIT Press..Fourth edition Chapter 1 from Michael T.Goodrich, Roberto Tamassia 5 9 Thank You

Module 8 Space and Time Trade-Offs I PDF

Document Details

Tags

Related

Summary

Full Transcript