Hashing - Unit I PDF
Document Details
Uploaded by Deleted User
MIT-WPU
Tags
Summary
This document provides an introduction to hashing, a fundamental concept in data structures, and is part of a larger unit concerning data structures. The document covers introductory topics such as collision resolution strategies and hash table overflow.
Full Transcript
Hashing Introduction to hashing Hash functions Collision resolution strategies- Open Addressing and Chaining Hash Table Overflow 3/29/2023 Advanced Data Structures 1 Introduction to Hashing Suppose that we want to store 10,000 students r...
Hashing Introduction to hashing Hash functions Collision resolution strategies- Open Addressing and Chaining Hash Table Overflow 3/29/2023 Advanced Data Structures 1 Introduction to Hashing Suppose that we want to store 10,000 students records (each with a 5-digit ID) in a given container. ❑ A linked list implementation would take O(n) time. ❑ A height balanced tree would give O(log n) access time.(will see in Unit-4) ❑ Using an array of size 100,000 would give O(1) access time but will lead to a lot of space wastage. Is there some way that we could get O(1) access without wasting a lot of space? The answer is hashing. 3/29/2023 Advanced Data Structures 2 Introduction to Hashing ❑Why Hashing? ◦ Sequential Search requires O(n) Comparisons. ◦ It is not useful for large database. ◦ Binary Search requires less comparisons O(nlogn) ◦ But it requires data to be sorted. 3/29/2023 Advanced Data Structures 3 Introduction to Hashing ❑What is Hashing? ◦ Record for the key value is directly referred by calculating address from key value ◦ Address of element x is obtained by computing arithmetic function f(x). ◦ Function f(x) is called as hash function ◦ Table used for storing records is known as hash table ◦ Best Case Time Complexity of hashing=O(1) ◦ Worst Case Time Complexity =O(n) 3/29/2023 Advanced Data Structures 4 Example Hash Table 0 1 Items 2 john 25000 3 john 25000 Hash phil 31250 key Functio 4 phil 31250 dave 27500 n 5 mary 28200 6 dave 27500 7 mary 28200 key 8 9 3/29/2023 Advanced Data Structures 5 Hash Tables – Conceptual View buckets table obj1 7 key=15 6 hash value/index 5 Obj3 Obj2 key=4 key=30 4 3 2 Obj4 key=2 1 0 Obj5 key=1 3/29/2023 Advanced Data Structures 6 Key terms ❖ Hash Table: is an array of size MAX[0 to MAX-1] ❖ Hash Function: that transforms a key into an address. ❖ Bucket- Bucket is an index position in hash table that can store more than one record ❖ When the same index is mapped with two keys, then both the records are stored in the same bucket ❖ Probe-Each action of address calculation and check for success is called probe ❖ Collision-The result of two keys hashing into the same address is called collision ❖ Synonym-Keys those hash to the same address are called synonyms 3/29/2023 Advanced Data Structures 7 Key terms ❖ Overflow-The result of more keys hashing to the same address and if there is no room in the bucket, then it is said that overflow has occurred ❖ Collision and overflow are synonymous when the bucket is of size 1 ❖ Open or external hashing-When we allow records to be stored in potentially unlimited space, it is called as open or external hashing ❖ Closed or internal hashing-When we use fixed space for storage eventually limiting the number of records to be stored, it is called as closed or internal hashing 3/29/2023 Advanced Data Structures 8 Key Terms Load density- The maximum storage capacity that is maximum number of records that can be accommodated is called as loading density Full Table- Full table is the one in which all locations are occupied Owing to the characteristics of hash functions, there are always empty locations Load factor- the number of records stored in table divided by maximum capacity of table, expressed in terms of percentage 3/29/2023 Advanced Data Structures 9 Hashing ❑Example Location X X F(X) 31,42,35,67,24,19 1 31 31 1 2 42 F(x)=x mod 10 42 2 3 35 5 4 24 67 7 5 35 24 4 6 19 9 7 67 8 9 19 3/29/2023 Advanced Data Structures 10 key Hash(key) Address ❖ The resulting address is used as the basis for storing and retrieving records and this address is called as home address of the record ❖ For array to store a record in a hash table, hash function is applied to the key of the record being stored, returning an index within the range of the hash table ❖ The item is then stored in the table of that index position 3/29/2023 Advanced Data Structures 11 Basic Idea Use hash function to map keys into positions in a hash table Ideally If element e has key k and h is hash function, then e is stored in position h(k) of table To search for e, compute h(k) to locate position. If no element, dictionary does not contain e. 3/29/2023 Advanced Data Structures 12 Hash Functions A hash function is any function that can be used to map a data set of an arbitrary size to a data set of a fixed size, which falls into the hash table. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes. Example: H(x)=x mod 10 Here, H(x) is called as hash function while value returned by function is called as hash value. 3/29/2023 Advanced Data Structures 13 Characteristics of a good hashing function ◦ The average performance of hashing depends on how the hash function distributes the set of keys among the slots ◦ Assumption is that any given record is equally likely to hash into any of the slots, independently of whether any other record has been already hashed to it or not ◦ This assumption is called as simple uniform hashing ◦ A good hash function is the one which satisfies the assumption of simple uniform hashing 3/29/2023 Advanced Data Structures 14 Characteristics of a good hashing function ❖ Addresses generated from the key are uniformly and randomly distributed ❖ Small variations in the value of key will cause large variations in the record addresses to distribute records (with similar keys) evenly ❖ The hashing function must minimize the collision 3/29/2023 Advanced Data Structures 15 Division Method One of the required features of the hash function is that the resultant index must be within the table index range One simple choice for a hash function is to use the modulus division indicated as MOD The function returns an integer If any parameter is NULL, the result is NULL Hash(Key) = Key % M 3/29/2023 Advanced Data Structures 16 Hash Tables: Insert Example For example, if we hash keys 0…1000 into a hash table with 5 entries and use h(key) = key mod 5 , we get the following sequence of events: Insert 2 Insert 21 Insert 34 Insert 54 key data key data key data 0 0 0 1 1 21 … 1 21 … There is a collision at 2 2 … 2 2 … 2 2 … array entry #4 3 3 3 4 4 4 34 … ??? 3/29/2023 Advanced Data Structures 17 Multiplication Method The multiplication method works as: 1. Multiply the key ‘k’ by a constant A in the range 0 < A < 1 and extract the fractional part of kA 2. Then multiply this value by M and take the floor of the result Hash(k) = M (kA ), Donald Knuth suggested to use A=0.61803398987 Ex: key=107, assume M=50 key=123 m=1000 h(k)= M*107*0.61803398987 = 66.12 fractional part=0.12 h(k)=50*0.12 =6 h(k)=6 That means 107 will be placed at index 6 in hash table. 3/29/2023 Advanced Data Structures 18 Multiplication Method Disadvantage: Slower than the division method. Advantage: Value of m is not critical. ◦ Typically chosen as a power of 2, i.e., m = 2p, which makes implementation easy. Example: m = 1000, k = 123, A ≈ 0.6180339887… h(k) = ⎣1000(123 * 0.6180339887 )⎦ = ⎣1000 ( 76.018... ⎦. = ⎣1000 (.018... )⎦ = 18 3/29/2023 Advanced Data Structures 19 Mid-Square Hashing ❖ The mid-square hashing suggests to take square of the key and extract the middle digits of the squared key as address ❖ The difficulty is when the key is large. As the entire key participates in the address calculation, if the key is large, then it is very difficult to store the square of it as the square of key should not exceed the storage limit ❖ So mid-square is used when the key size is less than or equal to 4 digits 3/29/2023 Advanced Data Structures 20 Keys and addresses using mid-square The difficulty of storing larger numbers square can be overcome if for squaring we use few of digits of key instead of the whole key 3/29/2023 Advanced Data Structures 21 Keys and addresses using mid-square We can select a portion of key if key is larger in size and then square the portion of it Key Square Hashed Address 234137 234*234=54756 475 567187 567*567=321489 148 3/29/2023 Advanced Data Structures 22 Folding Technique ❖ In folding technique, the key is subdivided into subparts that are combined or folded and then combined to form the address ❖ For the key with digits, we can subdivide the digits in three parts, add them up, and use the result as an address. ❖ Here the size of subparts of key could be as that of the address 3/29/2023 Advanced Data Structures 23 Folding Technique (contd…) ❖ There are two types of folding methods: ❖ Fold shift — Key value is divided into several parts of that of the size of the address. Left, right, and middle parts are added ❖ Fold boundary — Key value is divided into parts of that of the size of the address. Left and right parts are folded on fixed boundary between them and the centre part 3/29/2023 Advanced Data Structures 24 Folding Technique ❖ For example, if the key is 987654321, it is understood as Left 987 Centre 654 Right 321 ❖ For fold shift, addition is ❖ 987 + 654 + 321 = 1962 ❖ Now discard digit 1 and the address is 962 ❖ For fold boundary, addition of reverse part is ❖ 789 + 456 + 123 = 1368 ❖ Discard digit 1 and the address is 368 3/29/2023 Advanced Data Structures 25 Extraction Method ❖ When a portion of the key is used for the address calculation, the technique is called as the extraction method ❖ In digit extraction, few digits are selected and extracted from the key which are used as the address 3/29/2023 Advanced Data Structures 26 Keys and addresses using digit extraction Key Hashed Address 345678 357 234137 243 952671 927 3/29/2023 Advanced Data Structures 27 Universal Hashing ❖ The main idea behind universal hashing is to select the hash function at random at run time from a carefully designed set of functions ❖ Because of randomization, the algorithm can behave differently on each execution, even for the same input ❖ This approach guarantees good average case performance, no matter what keys are provided as input 3/29/2023 Advanced Data Structures 28 Universal Hashing (at the beginning of the execution) 3/29/2023 Advanced Data 29 Structures Issues ❖ A problem arises, however, when the hash function returns the same value when applied to two different keys ❖ To handle the situation, where two records need to be hashed to the same address we can implement a table structure, so as to have a room for two or more members at the same index positions 3/29/2023 Advanced Data Structures 30 Do you see any problems with this approach? 0 U (universe of keys) h(k1) h(k4) k1 K (actual k4 k2 h(k2) = h(k5) keys) Collisions! k5 k3 h(k3) m-1 3/29/2023 Advanced Data 31 Structures Collision Resolution Strategies Collision-Element to be inserted is mapped to Location X same location 1 31 2 42 Example: 3 31,42,35,67,24,19,22 4 24 5 35 F(x)=x mod 10 6 7 67 8 Where to store 22? 9 19 3/29/2023 Advanced Data Structures 32 Collision Resolution Strategies Collision-Element to be inserted is mapped to Location X same location 1 31 2 42 Example: 3 22 31,42,35,67,24,19,22 4 24 5 35 F(x)=x mod 10 6 7 67 8 Where to store 22? 9 19 3/29/2023 Advanced Data Structures 33 Collision Resolution Strategies 1. Open Addressing 1. Linear probing 2. Quadratic probing 3. Double hashing, and 4. Key offset 2. Separate chaining (or linked list) 3. Bucket hashing (defers collision but does not prevent it) 3/29/2023 Advanced Data Structures 34 Open Addressing When collision occurs, it is resolved by finding an available empty location other than the home address. If Hash(key) is not empty, the positions are probed in the following sequence until an empty location is found. When we reach the end of the table ,the search is wrapped around to start and search continues till the current collision location. Closed hash tables use open addressing. 3/29/2023 Advanced Data Structures 35 Linear Probing ❖ A hash table in which a collision is resolved by putting the item in the next empty place in following the occupied place is called linear probing ❖ This strategy looks for the next free location until it is found ❖ The function that we can use for probing linearly from the next location is as follows: ❖ (Hash(x) + p(i)) MOD Max ❖ As p(i) = i for linear probing, the function becomes ❖ (Hash(x)+ i)) MOD Max ❖ Initially i = 1, if the location is not empty then it becomes 2, 3, 4, …, and so on till empty location is found. 3/29/2023 Advanced Data Structures 36 Linear Probing ❖ 1.With replacement 2.Without replacement ❖ With replacement : ❖ If the slot is already occupied by the key there are two possibilities, that is, either it is home address (collision) or not key’s home address ❖ If the key’s actual address is different, then the new key having the address at that slot is placed at that position and the key with other address is placed in the next empty position 3/29/2023 Advanced Data Structures 38 Linear Probing Example : Given the input {4371, 1323, 6173, 4199, 4344, 9699, 1889} and hash function as Key % 10, show the results for the following: 1. Open addressing using linear probing 2. Open addressing using quadratic probing 3. Open addressing using double hashing h2 (x) = 7−(x MOD 7) 3/29/2023 Advanced Data Structures 39 Open addressing using linear probing with replacement Initial Insert Insert Insert Insert Insert Insert Insert ly 4371 1323 6173 4199 4344 9699 1889 0 9699 9699 1 4371 4371 4371 4371 4371 4371 4371 2 1889 3 1323 1323 1323 1323 1323 1323 4 6173 6173 4344 4344 4344 5 6173 6173 6173 6 7 8 9 4199 4199 4199 4199 3/29/2023 Advanced Data Structures 40 Linear Probing ❖ Without replacement : ❖ When some data is to be stored in hash table, and if the slot is already occupied by the key then another empty location is searched for a new record ❖ There are two possibilities when location is occupied—it is its home address or not key’s home address. ❖ In both the cases, the without replacement strategy empty position is searched for the key that is to be stored 3/29/2023 Advanced Data Structures 41 Open addressing using linear probing without replacement 3/29/2023 Advanced Data Structures 42 Linear Probing without replacement Example h(k) = k mod 13 Insert keys: 31 73 44 32 41 18 44 59 32 22 31 73 3/29/2023 Advanced Data Structures 43 Linear Probing without replacement Example(cont.) 3/29/2023 Advanced Data Structures 44 Linear probing (chaining)with and without replacement 12,01,04,03,07,08,10,02,05,14 Assume buckets from 0 to 9 and each bucket has 1 slot. 3/29/2023 Advanced Data Structures 45 Open addressing using linear probing with replacement Initi 12 01 04 03 07 08 10 02 05 14 ally 0 10 10 10 10 1 01 01 01 01 01 01 01 01 01 2 12 12 12 12 12 12 12 12 12 12 3 03 03 03 03 03 03 03 4 04 04 04 04 04 04 04 04 5 02 05 05 6 02 02 7 07 07 07 07 07 07 8 08 08 08 08 08 9 14 3/29/2023 Advanced Data Structures 46 Open addressing using linear probing without replacement Initi 12 01 04 03 07 08 10 02 05 14 ally 0 10 10 10 10 1 01 01 01 01 01 01 01 01 01 2 12 12 12 12 12 12 12 12 12 12 3 03 03 03 03 03 03 03 4 04 04 04 04 04 04 04 04 5 02 02 02 6 05 05 7 07 07 07 07 07 07 8 08 08 08 08 08 9 14 3/29/2023 Advanced Data Structures 47 Functions for Linear Probing without replacement insert function Algorithm int insert_linear_prob(int hashtable[],int key) { if(i==loc) print(“Hash is full”) loc=key % MAX; } If(hashTable[loc]==-1) { hashtable[loc]=key; return(loc); } else{i=(loc+1)%MAX; while(i!=loc) { if(hashTable[i]==-1) { hashtable[i]=key; return(i); } i=(i+1)%MAX; } 3/29/2023 Advanced Data Structures 49 Functions for Linear Probing without replacement- search function Search_linear_probe(hashtable,key) { int i,j; loc=key % MAX; if(hashtable[loc]==key) print “key found” else{ i=(loc+1)%MAX; while(i!=loc){ if(hashtable[i]==key) {print “key found” break;} i=(i+1)%MAX;} } if(i==loc) print(“key not found”) } 3/29/2023 Advanced Data Structures 50 Functions for Linear Probing with replacement insert function Algorithm int insert_linear_prob(int hashtable[],int key) while(i!=loc) { { loc=key % MAX; if(hashTable[i]==-1) If(hashTable[loc]==-1) { { hashtable[loc]=key; hashtable[i]=temp; return(loc); return(i);} } i=(i+1)%MAX; else{ } temp=key if(loc!=(hashtable[loc]%MAX){ if(i==loc) print(“Hash is full”) temp=hashtable[loc]; } Hashtable[loc]=key;} i=(loc+1)%MAX; 3/29/2023 Advanced Data Structures 51 Functions for Linear Probing with replacement- search function Search_linear_probe(hashtable,key) { int i,loc; loc=key % MAX; if(hashtable[loc]==key) print “key found” else{ i=(loc+1)%MAX; while(i!=loc){ if(hashtable[i]==key) {print “key found” break;} i=(i+1)%MAX;} } if(i==loc) print(“key not found”) } 3/29/2023 Advanced Data Structures 52 Clustering One problem with the above technique is the tendency to form “clusters” A cluster is a group of items not containing any open slots The bigger a cluster gets, the more likely it is that new values will hash into the cluster, and make it ever bigger Clusters cause efficiency to degrade 3/29/2023 Advanced Data Structures 53 Quadratic Probing ❖ In quadratic probing, we add offset by amount square of collision probe number ❖ In quadratic probing, the empty location is searched using the following formula ❖ (Hash(Key) + i2) MOD Max where i lies between 1 to (Max − 1)/2 ❖ Quadratic probing probes the array at location Hash(key)+1, Hash(key)+4, _____,Hash(key)+9 etc. ❖ Quadratic probing works much better than linear probing, but to make full use of hash table, there are constraints on the values of i and Max so that the address lies in table boundaries 3/29/2023 Advanced Data Structures 54 Open addressing using quadratic probing ❖ Let us insert these keys using quadratic probing ❖ For 6173, the hashed address 6173 % 10 gives 3 and it is not empty, hence using quadratic probing we get the address as follows: Hash(6173) = (6173 + 12) % 10 = 4 and as it is empty, the key 6173 is stored there ❖ Now while inserting 4344, the location 4 is not empty and hence quadratic probing generates the address as (4344 + 12) % 10 = 5 and as is empty 4344 is stored ❖ For key 9699, the address is (9699 + 12) % 10 = 0 and is empty so store. ❖ While inserting 1889, the address (1889 + 12) % 10 = 0 is not empty so probe again ❖ The address (1889 + 22) % 10 = 3 is not empty so probe again. ❖ The address(1889 + 32) % 10 = 8 is empty so store 1889 at location 8 3/29/2023 Advanced Data Structures 55 Open addressing using quadratic probing Initially Insert Insert Insert Insert Insert Insert Insert 4371 1323 6173 4199 4344 9699 1889 0 9699 9699 1 4371 4371 4371 4371 4371 4371 4371 2 3 1323 1323 1323 1323 1323 1323 4 6173 6173 6173 6173 6173 5 4344 4344 4344 6 7 8 1889 9 4199 4199 4199 4199 3/29/2023 Advanced Data Structures 56 Quadratic Probing If cell j is already occupied then location j+1,j+4,j+9 are checked. This reduces primary clustering of data. Main Idea: Spread out the search for an empty slot – Increment by i2 instead of i hi(X) = (Hash(X) + i2) % TableSize h0(X) = Hash(X) % TableSize h1(X) = Hash(X) + 1 % TableSize h2(X) = Hash(X) + 4 % TableSize h3(X) = Hash(X) + 9 % TableSize 3/29/2023 Advanced Data Structures 57 Let the sequence of keys = 9 ,19 ,29 ,39 ,49 ,59,71 These keys are to be inserted into the hash table. The hash function for indexing,=Kmod10, where k = key value. index keys 0 19 1 71 2 3 29 4 59 5 49 6 7 8 39 9 9 3/29/2023 Advanced Data Structures 58 Quadratic Probing Operation Algorithm Insert(int hashtable[],int key) { start=key % MAX; j=start for(i=0;inext=NULL; }; } node *hashtable[MAX]; } 3/29/2023 Advanced Data Structures 79 Separate Chaining-Used with Open Hashing Operations on Hash Table: else Algorithm Insert(node *hashtable[],int x) { { for(p=hashtable[loc];p->next!=NULL;p=p->next); int loc; p->next=q; node *p,*q; } loc=x % MAX; } q=new node; q->data=x; q->next=NULL; If(hashtable[loc]->next==NULL) Hashtable[loc]->next=q; 3/29/2023 Advanced Data Structures 80 Separate Chaining-Used with Open Hashing Operations on Hash Table: Algorithm node *find(node *hashtable[],int x) { int loc, node *p; loc=x % MAX; p=hashtable[loc]; While(p!=NULL && x!=p->data) p=p->next; return (p); } 3/29/2023 Advanced Data Structures 81 Extendible Hashing Suppose that g=2 and bucket size = 3. Suppose that we have records with these keys and hash function h(key) = key mod 64: key h(key) = key mod 64 bit pattern 1111 23 010111 3333 5 000101 1235 19 010011 2378 10 001010 1212 60 111100 1456 48 110000 2134 22 010110 2345 41 101001 1111 23 010111 8231 39 100111 2222 46 101110 9999 15 001111 Extendible Hashing ❖ Directories and buckets are two key terms in this algorithm. Buckets are the holders of hashed data, while directories are the holders of pointers pointing towards these buckets. Each directory has a unique ID. ❖ Global Depth: It is associated with the Directories. They denote the number of bits which are used by the hash function to categorize the keys. Global Depth = Number of bits in directory id. ❖ Local Depth: It is the same as that of Global Depth except for the fact that Local Depth is associated with the buckets and not the directories. Local depth in accordance with the global depth is used to decide the action that to be performed in case an overflow occurs. Local Depth is always less than or equal to the Global Depth. Extendible Hashing Algorithm 1.Initialize the bucket depths and the global depth of the directories. 2.Convert data into a binary representation. 3.Consider the "global depth" number of the least significant bits (LSBs) of data. 4.Map the data according to the ID of a directory. 5.Check for the following conditions if a bucket overflows (if the number of elements in a bucket exceeds the set limit): Global depth == bucket depth: Split the bucket into two and increment the global depth and the buckets' depth. Re-hash the elements that were present in the split bucket. Global depth > bucket depth: Split the bucket into two and increment the bucket depth only. Re-hash the elements that were present in the split bucket. Repeat the steps above for each element. 1111 3333 1235 2378 1212 1456 2134 2345 1111 8231 2222 9999 l=2 local depth is 2 and bucket size is 3 global depth is 2 g=2 l=2 00 01 10 l=2 11 l=2 1111 1111 3333 1235 2378 1212 1456 2134 2345 1111 8231 2222 9999 110111 l=2 g=2 l=2 00 01 10 l=2 11 1 1 l=2 1111 1111 3333 1235 3333 2378 1212 1456 2134 2345 1111 8231 2222 9999 000101 l=2 g=2 l=2 00 3333 0 01 1 10 l=2 11 l=2 1111 1111 3333 1235 2378 1212 1456 2134 2345 1111 8231 2222 9999 010011 l=2 g=2 l=2 00 3333 01 10 l=2 11 1 1 l=2 1111 1111 3333 1235 2378 1212 1456 2134 2345 1111 8231 2222 9999 l=2 1212 1456 g=2 l=2 00 3333 2345 01 10 l=2 11 2378 2134 l=2 1111 1111 3333 1235 2378 1212 1456 2134 2345 1111 8231 2222 9999 100111 l=2 1212 1456 g=2 l=2 00 3333 2345 01 10 l=2 11 1 2378 2134 1 Bucket overflow occurs. As g = l, directory structure is doubled. l=2 1111 1111 3333 1235 2378 1212 1456 2134 2345 1111 8231 2222 9999 g=3 l=2 00 1212 1456 01 10 l=2 11 3333 2345 00 l=2 01 2378 2134 10 11 l=2 1111 1111 3333 1235 2378 1212 1456 2134 2345 1111 8231 2222 9999 l=2 g=3 1212 1456 000 001 l=2 010 3333 2345 011 l=2 100 2378 2134 101 110 l=3 l=2 111 1111 1235 1111 l=3 1111 3333 1235 2378 1212 1456 2134 2345 1111 8231 2222 9999 100111 l=2 g=3 1212 1456 000 001 l=2 010 3333 2345 011 l=2 100 2378 2134 101 110 l=3 l=2 111 11 1235 1 l=3 1111 1111 8231 1111 3333 1235 2378 1212 1456 2134 2345 1111 8231 2222 9999 2222 101110 l=2 g=3 1212 1456 000 001 l=2 010 3333 2345 011 l=2 100 2378 2134 2222 101 110 11 l=3 l=2 0 111 1235 l=3 1111 1111 8211 1111 3333 1235 2378 1212 1456 2134 2345 1111 8231 2222 9999 9999 001111 l=2 g=3 1212 1456 000 001 l=2 010 3333 2345 011 l=2 100 2378 2134 101 110 l=3 l=2 111 11 1B 2u3c5ket overflow occurs again. As g = l, 1 directory structure should be doubled again and same process will repeat. l=3 1111 1111 8211 Data = {28,4,19,1,22,16,12,0,5,7} Bucket limit = 3 Convert the data into binary representation: 28 = 11100 4 = 00100 19 = 10011 1 = 00001 22 = 10110 16 = 10000 12 = 01100 0 = 00000 5 = 00101 7 = 00111 https://www.educative.io/answers/what-is-extendible-hashing 28,4,22 g=1 0 1 19,1 28,4,16 g=2 00 01 22 Initialize the hash table w global d 10 11 19,1 g=3 000 28,4,12 001 010 16,0 011 22 100 101 110 19,1,5 111 7 FAQ What is hashing? Why there is need of hashing? What is the time complexity of hashing? What is collision? Define hash function, hash key What are the collision resolution techniques? 3/29/2023 Advanced Data Structures 103 MCQs A hash table of length 10 uses open addressing with hash function h(k)=k mod 10, and linear probing. After inserting 6 values into an empty hash table, the table is as shown below. Which one of the following choices gives a possible order in which the key values could have been inserted in the table? A)46, 42, 34, 52, 23, 33 B)34, 42, 23, 52, 33, 46 C)46, 34, 42, 23, 52, 33 D)42, 46, 33, 23, 34, 52 3/29/2023 Advanced Data Structures 104 MCQs How many different insertion sequences of the key values using the same hash function and linear probing will result in the hash table shown above? A.10 B.20 C.30 D.40 3/29/2023 Advanced Data Structures 105 MCQs The keys 12, 18, 13, 2, 3, 23, 5 and 15 are inserted into an initially empty hash table of length 10 using open addressing with hash function h(k) = k mod 10 and linear probing. What is the resultant hash table? 3/29/2023 Advanced Data Structures 106 MCQs Consider a hash table of size seven, with starting index zero, and a hash function (3x + 4)mod7. Assuming the hash table is initially empty, which of the following is the contents of the table when the sequence 1, 3, 8, 10 is inserted into the table using closed hashing? Note that ‘_’ denotes an empty location in the table. A 8, _, _, _, _, _, 10 B 1, 8, 10, _, _, _, 3 C 1, _, _, _, _, _,3 D 1, 10, 8, _, _, _, 3 3/29/2023 Advanced Data Structures 107 MCQs Given the following input (4322, 1334, 1471, 9679, 1989, 6171, 6173, 4199) and the hash function x mod 10, which of the following statements are true? i. 9679, 1989, 4199 hash to the same value ii. 1471, 6171 has to the same value iii. All elements hash to the same value iv. Each element hashes to a different value (GATE CS 2004) A i only B ii only C i and ii only D iii or iv 3/29/2023 Advanced Data Structures 108 Practice Assignments 1. Write a pseudo code to implement quadratic probing 2. Store roll numbers of the students in a database and implement double hashing onto it. 3. Store roll numbers of the students in a database and implement linear probing with and without replacement onto it. 4. Implement chaining method by identifying one suitable application. 3/29/2023 Advanced Data Structures 109 Takeaway Hashing is one of efficient searching technique. Hashing’s best case time complexity is O(1) Hashing collision is resolved by using different collision resolution techniques. 3/29/2023 Advanced Data Structures 110 References 1. Horowitz, Sahani, Dinesh Mehta, “Fundamentals of Data Structures in C++”, Galgotia Publisher, ISBN: 8175152788, 9788175152786. 2. Peter Brass, “Advanced Data Structures”, Cambridge University Press, ISBN: 978-1-107-43982 3/29/2023 Advanced Data Structures 111