Hashing Algorithms Explained - Hash Tables

Summary

This document explains hashing algorithms and hash tables. The document also covers collision resolution strategies, linear probing, quadratic probing, and double hashing, while providing examples.

Full Transcript

Algorithm HashChaining – steps 1. Index = HashFunction(K) 11. Endif 2. ptr = H[Index] 12. EndWhile 3. flag = False 13. If (flag = FALSE) then 4. While(ptr≠NULL) and (flag 14. print “key don’t exist” =False) 15. If...

Algorithm HashChaining – steps 1. Index = HashFunction(K) 11. Endif 2. ptr = H[Index] 12. EndWhile 3. flag = False 13. If (flag = FALSE) then 4. While(ptr≠NULL) and (flag 14. print “key don’t exist” =False) 15. If (INSERT) then 5. if (ptr->DATA = K) then 16. InsertEnd_SL(H[Index]) 6. flag = TRUE 17. EndIf 7. Return(ptr) 18. Stop 8. Exit 9. Else 10. ptr = ptr->LINK Downloaded from Ktunotes.in Open Addressing If we have enough contiguous memory to store all the keys (m > N)  store the keys in the table itself e.g., insert 14 No need to use linked lists anymore Basic idea: ◦ Insertion: if a slot is full, try another one, until you find an empty one ◦ Search: follow the same sequence of probes Search time depends on the length of the probe sequence! Downloaded from Ktunotes.in Generalize hash function notation: A hash function contains two arguments now: insert 14 (i) Key value, and (ii) Probe number h(k,p), p=0,1,...,m-1 Example Probe sequences ◦ Must be a permutation of ◦ There are m! possible permutations ◦ Good hash functions should be able to produce all m! probe sequences Downloaded from Ktunotes.in Common Open Addressing Methods Linear probing Quadratic probing Double hashing Note: None of these methods can generate more than m2 different probing sequences! Downloaded from Ktunotes.in Linear Probing In linear probing, collisions are resolved by sequentially scanning an array (with wraparound) until an empty cell is found. ◦ i.e. f is a linear function of i, typically f(i)= i. Example: ◦ Insert items with keys: 89, 18, 49, 58, 9 into an empty hash table. ◦ Table size is 10. ◦ Hash function is hash(x) = x mod 10. ◦ f(i) = i; Downloaded from Ktunotes.in Figure : Linear probing hash table after each insertion Downloaded from Ktunotes.in Search and Delete The Search algorithm follows the same probe sequence as the insert algorithm. ◦ A search for 58 would involve 4 probes. ◦ A search for 19 would involve 5 probes. We must use lazy deletion (i.e. marking items as deleted) ◦ Standard deletion (i.e. physically removing the item) cannot be performed. ◦ e.g. remove 89 from hash table. Downloaded from Ktunotes.in Clustering Problem As long as table is big enough, a free cell can always be found, but the time to do so can get quite large. Worse, even if the table is relatively empty, blocks of occupied cells start forming. This effect is known as primary clustering. Any key that hashes into the cluster will require several attempts to resolve the collision, and then it will add to the cluster. Downloaded from Ktunotes.in Algorithm LinearProbe Input: K is the key value to be searched or inserted Output: Return location if found, else insert K in the table if no overflow occurs DS: Hash table H of size HSIZE Downloaded from Ktunotes.in Algorithm LinearProbe - Steps 1. flag = FALSE 15. if(H[i]=K) then 2. Index = HashFunction(K) 16. flag = TRUE 3. If (K=H[Index]) then 17. Return i 4. Return Index 18. Exit 5. Exit 19. Else 6. Else 20. i = i mod h+1 7. i = index + 1 21. EndIf 8. while(i≠ index) and (not flag) do 22. EndIf 9. if (H[i] = NULL) and (H[i] < 0) then 23. EndWhile 10. if(INSERT)then 24. If flag = FALSE and i = index then 11. H[i] = K 25. print “The table is overflow” 12. flag = TRUE 26. EndIf 13. EndIf 27. EndIf 14. Else 28. Stop Downloaded from Ktunotes.in Quadratic Probing Quadratic Probing eliminates primary clustering problem of linear probing. Collision function is quadratic. ◦ The popular choice is f(i) = i2. If the hash function evaluates to h and a search in cell h is inconclusive, we try cells h + 12, h+22, … h + i2. H(K) + i2 mod h, i = 1,2,3… ◦ i.e. It examines cells 1,4,9 and so on away from the original probe. Remember that subsequent probe points are a quadratic number of positions from the original probe point. Downloaded from Ktunotes.in Figure A quadratic probing hash table after each insertion (note that the table size was poorly chosen because it is not a prime number). Downloaded from Ktunotes.in Quadratic Probing Problem: ◦ We may not be sure that we will probe all locations in the table (i.e. there is no guarantee to find an empty cell if table is more than half full.) ◦ If the hash table size is not prime this problem will be much severe. However, there is a theorem stating that: ◦ If the table size is prime and load factor is not larger than 0.5, all probes will be to different locations and an item can always be inserted. Downloaded from Ktunotes.in Quadratic Probing Load Factor The load factor α of a hash table with n elements is given by the following formula: α = n / table.length Downloaded from Ktunotes.in Some considerations How efficient is calculating the quadratic probes? ◦Linear probing is easily implemented. Quadratic probing appears to require * and % operations. ◦However by the use of the following equation, this is overcome: ◦ Hi = Hi-1+2i – 1 (mod M) Downloaded from Ktunotes.in Some Considerations What happens if load factor gets too high? ◦ Dynamically expand the table as soon as the load factor reaches 0.5, which is called rehashing. ◦ Always double to a prime number. ◦ When expanding the hash table, reinsert the new table by using the new hash function. Downloaded from Ktunotes.in Rehashing Rehashing Algorithm: 1. Allocate a new hash table twice the size of the original table. 2. Reinsert each element of the old table into the new table (using the hash function). 3. Reference the new table as the hash table.. Downloaded from Ktunotes.in Analysis of Quadratic Probing Quadratic probing has not yet been mathematically analyzed. Although quadratic probing eliminates primary clustering, elements that hash to the same location will probe the same alternative cells. This is know as secondary clustering. Techniques that eliminate secondary clustering are available. ◦ the most popular is double hashing. Downloaded from Ktunotes.in Double Hashing Idea: Spread out the search for an empty slot by using a second hash function ◦ No primary or secondary clustering h(k,i) = (h1(k) + i h2(k) ) mod m, i=0,1,... Good choice of Hash2(X) can guarantee does not get “stuck” as long as  < 1 ◦ Integer keys: example Hash2(X) = R – (X mod R) where R is a prime smaller than TableSize Downloaded from Ktunotes.in Double Hashing Disadvantage: harder to delete an element Can generate m2 probe sequences maximum Downloaded from Ktunotes.in Double Hashing: Example 0 1 79 h1(k) = k mod 13 2 h2(k) = 1+ (k mod 11) 3 h(k,i) = (h1(k) + i h2(k) ) mod 13 4 69 5 98 Insert key 14: 6 h1(14,0) = 14 mod 13 = 1 7 72 h(14,1) = (h1(14) + h2(14)) mod 13 8 9 14 = (1 + 4) mod 13 = 5 10 h(14,2) = (h1(14) + 2 h2(14)) mod 13 11 50 = (1 + 8) mod 13 = 9 12 160 Downloaded from Ktunotes.in Hashing – performance factors 1.Hash function :should distribute the keys and entries evenly throughout the entire table o should minimize collisions 2.Collision resolution strategy: a. Open Addressing: store the key/entry in a different position b. Separate Chaining: chain several keys/entries in the same position 3.Table size : a. Too large a table, will cause a wastage of memory o b. Too small a table will cause increased collisions and eventually force rehashing (creating a new hash table of larger size and copying the contents of the current hash table into it) o c. The size should be appropriate to the hash function used and should typically be a prime number Downloaded from Ktunotes.in Applications Keeping track of customer account information at a bank ◦ Search through records to check balances and perform transactions Keep track of reservations on flights ◦ Search to find empty seats, cancel/modify reservations Search engine ◦ Looks for all documents containing a given word Downloaded from Ktunotes.in Special Case: Dictionaries Dictionary = data structure that supports mainly two basic operations: insert a new item and return an item with a given key Queries: return information about the set S: ◦ Search (S, k) ◦ Minimum (S), Maximum (S) ◦ Successor (S, x), Predecessor (S, x) Modifying operations: change the set ◦ Insert (S, k) ◦ Delete (S, k) – not very often Downloaded from Ktunotes.in