ch07_Space and Time Trade-Offs_StringSearch-Hashing.pdf
Document Details
Uploaded by PlayfulErbium
UKM
Tags
Full Transcript
Algorithms Analysis and Design Chapter 7: Space and Time Trade-Offs A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 3 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 1 Space –for-time trad...
Algorithms Analysis and Design Chapter 7: Space and Time Trade-Offs A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 3 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 1 Space –for-time tradeoffs Two varieties of space-for-time algorithms: input enhancement — preprocess the input (or its part) to store some info to be used later in solving the problem counting sorts string searching algorithms prestructuring — preprocess the input to make accessing its elements easier hashing indexing schemes (e.g., B-trees) A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 2 Review: String searching by brute force pattern: a string of m characters to search for text: a (long) string of n characters to search in Brute force algorithm Step 1 Align pattern at beginning of text Step 2 Moving from left to right, compare each character of pattern to the corresponding character in text until either all characters are found to match (successful search) or a mismatch is detected Step 3 While a mismatch is detected and the text is not yet exhausted, realign pattern one position to the right and repeat Step 2 A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 3 String searching by preprocessing Several string searching algorithms are based on the input enhancement idea of preprocessing the pattern Knuth-Morris-Pratt (KMP) algorithm preprocesses pattern left to right to get useful information for later searching Boyer -Moore algorithm preprocesses pattern right to left and store information into two tables Horspool’s algorithm simplifies the Boyer-Moore algorithm by using just one table A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 4 Horspool’s Algorithm A simplified version of Boyer-Moore algorithm: preprocesses pattern to generate a shift table that determines how much to shift the pattern when a Character Shift mismatch occurs A B … … always makes a shift based on the text’s character c aligned with the last character in the pattern according to the shift table’s entry for c A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 5 How far to shift? Look at first (rightmost) character in text that was compared: 1. The character is not in the pattern.....C...................... BAOBAB 2. The character is in the pattern (but not the rightmost) .....O...................... (O occurs once in pattern) BAOBAB .....A...................... (A occurs twice in pattern) BAOBAB 3. The rightmost characters do match.....B...................... BAOBAB A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 6 Shift table (1) The length of the shift is determined by the shift table. shift[c] is defined for all c ∈ Σ: (Σ: alphabet set) 1. If c does not occur in P, shift[c] = m. (m: pattern length) 2. Otherwise, shift[c] = m − 1 − i, where P[i] = c is the last occurrence of c in P[0..m − 2]. 1. The character is not in the pattern Character Shift......C.................. - 6 BAOBAB 2. The character is in the pattern ......A...................... (O occurs once or more in pattern) BAOBAB Character Shift A 1 012345 shift[‘A’]= 6-1-4=1 A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 7 Shift table (2) The length of the shift is determined by the shift table. shift[c] is defined for all c ∈ Σ: (Σ: alphabet set) If c does not occur in P, shift[c] = m. (m: pattern length) Otherwise, shift[c] = m − 1 − i, where P[i] = c is the last occurrence of c in P[0..m − 2]. 01234567 m=8 A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 8 Shift table (3) Shift sizes can be precomputed by the formula - distance from c’s rightmost occurrence in pattern among its first m-1 characters to its right end t (c) = - pattern’s length m, otherwise by scanning pattern before search begins and stored in a table called shift table Shift table is indexed by text and pattern alphabet Example: for P=“BAOBAB”, shift[‘A’]=1 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6 A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 9 Horspool’s Algorithm: Example 1 Text: “BARD LOVED BANANAS”, P:”BAOBAB” A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ 1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6 6 BARD LOVED BANANAS BAOBAB ……(No matchshift[‘L’]=6) Space BARD LOVED_BANANAS BAOBAB ……(No match shift[‘B’]=2) BARD LOVED BANANAS BAOBAB ……(No match shift[‘N’]=6) BAOBAB (unsuccessful search) A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 10 Horspool’s Algorithm: Example 2 11 Boyer-Moore Horspool Example A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 12 Boyer-Moore Horspool Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 13 Boyer-Moore Horspool Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 14 Boyer-Moore Horspool Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 15 Boyer-Moore Horspool Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 16 Boyer-Moore Horspool Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 17 Boyer-Moore Horspool Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 18 Boyer-Moore Horspool Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 19 Boyer-Moore Horspool Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 20 Hashing prestructuring — preprocess the input to make accessing its elements easier A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 21 Hashing A very efficient method for implementing a dictionary, i.e., a set with the operations: – find – insert – delete Based on representation-change and space-for-time tradeoff ideas Important applications: – symbol tables – databases (extendible hashing) A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 22 Hash tables and hash functions The idea of hashing is to map keys of a given file of size n into a table of size m, called the hash table, by using a predefined function, called the hash function, h: K location (cell) in the hash table Example: student records, key K= SSN: h(K) = K mod m where m is some integer (typically, prime) e.g.: Given m = 1000, where is record with K= 314159265 stored? Generally, a hash function should: be easy to compute distribute keys about evenly throughout the hash table A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 23 Collisions If h(K1) = h(K2), there is a collision Good hash functions result in fewer collisions but some collisions should be expected (birthday paradox) Two principal hashing schemes handle collisions differently: Open hashing – each cell is a header of linked list of all keys hashed to it Closed hashing – one key per cell – in case of collision, finds another cell by – linear probing: use next free bucket – double hashing: use second hash function to compute increment A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 24 Open hashing (Separate chaining) Keys are stored in linked lists outside a hash table whose elements serve as the lists’ headers. Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED h(K) = ((sum of K’s letters’ positions in the alphabet) MOD 13) Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 A AND MONEY FOOL HIS ARE PARTED SOON Search for“Introduction A. Levitin KID to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 25 Closed hashing (Open addressing) Keys are stored inside a hash table. Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 11 12 The table is treated as a circular array 0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 26 Hashing - Example ASCII start=65 cap , ASCII=97 small 11 A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 27 Hashing – Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 28 Hashing – Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 29 Hashing – Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 30 Hashing – Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 31 Hashing – Example … cont. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 32 Hashing Algorithms A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 33 A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 34 Closed Hashing – Linear Probing A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 35 Closed Hashing – Linear Probing A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 36 Closed Hashing – Linear Probing A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 37 Closed Hashing – Linear Probing A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 38 Closed Hashing – Linear Probing A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 39 Closed Hashing – Linear Probing A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 40 Open Hashing – Separate Chaining A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 41 Open Hashing – Separate Chaining A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 42 Open Hashing – Separate Chaining A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 43 Open Hashing – Separate Chaining A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 44 Open Hashing – Separate Chaining A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 45 Open Hashing – Separate Chaining A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 46 Open Hashing – Separate Chaining A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 47 Hashing - Summary A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved. 48