Hashing Technique PDF
Document Details
Uploaded by PeaceableTucson
Tags
Summary
This document presents a detailed overview of hashing, a crucial technique in data structures for efficiently storing, retrieving, and managing data. It explains how hashing functions map data items to specific locations minimizing search time and addressing potential collisions. It's a useful overview for students and professionals interested in data structures and algorithms.
Full Transcript
Hashing Technique Hashing in data structures is a technique used to map data items to specific locations, or "buckets," in a hash table, enabling efficient data storage and retrieval. Hashing is a well-known technique to search any particular element among several elements. hashin...
Hashing Technique Hashing in data structures is a technique used to map data items to specific locations, or "buckets," in a hash table, enabling efficient data storage and retrieval. Hashing is a well-known technique to search any particular element among several elements. hashing is an efficient technique in data structures for storing, retrieving, and managing data by transforming keys into specific locations in a hash table, enabling fast data access and reducing search times. It minimizes the number of comparisons while performing the search. Hashing is the process of mapping large amount of data item to smaller table with the help of hashing function. Hashing is also known as Hashing Algorithm or Message Digest Function. It is a technique to convert a range of key values into a range of indexes of an array. It is used to facilitate the next level searching method when compared with the linear or binary search. Hashing allows to update and retrieve any data entry in a constant time O(1) 1)Purpose of Hashing 1) Hashing is primarily used to optimize search operations, allowing data to be located quickly with minimal comparisons. 2) It’s especially useful when dealing with large datasets, as it can reduce search time to near constant time, O(1), for retrieval and updates. 2) Benefits of Hashing Allows constant-time complexity, O(1), for search, insert, and delete operations in ideal scenarios. Efficient for applications requiring frequent access to stored data, like database indexing and caching. 3) How Hashing Works (Mechanism) Hashing involves applying a hash function to a data item (or key), which generates a unique integer (the hash value). This hash value corresponds to an index in a hash table, where the data is stored. An array data structure called as Hash table is used to store the data items. Based on the hash key value, data items are inserted into the hash table. 1) Hash Value Hash key value is a special value that serves as an index for a data item. It indicates where the data item should be be stored in the hash table. Hash key value is generated using a hash function 2) Hash Function The hash function is central to hashing. It takes an input (key) and returns a numeric index within a range (e.g., 0 to N-1 for an array of size N). A good hash function distributes data evenly across the table, reducing the chance of collisions (when two keys produce the same hash value). Hash function takes the data item as an input and returns a small integer value as an output. The small integer value is called as a hash value. Hash value of the data item is then used as an index for storing it into the hash table. 3) Hash Table A hash table (or hash map) is the data structure that holds the key-value pairs. Keys are hashed to locate the appropriate slot in the table, enabling quick access to values. Each index in the table points to a "bucket," which holds the data. Hash table or hash map is a data structure used to store key-value pairs. It is a collection of items stored to make it easy to find them later. It uses a hash function to compute an index into an array of buckets or slots from which the desired value can be found. It is an array of list where each list is known as bucket. It contains value based on the key. Hash table is synchronized and contains only unique elements. Example with solution (No collision): Collisions and Resolution Techniques Collisions occur when two keys produce the same hash value and attempt to store data at the same index. Common collision resolution techniques: Linear Probing: Finds the next available slot sequentially. Quadratic Probing: Uses a quadratic function to find the next available slot, reducing clustering. Double Hashing: Applies a second hash function to resolve collisions. Chaining: Stores all elements that hash to the same index in a linked list at that index. Collision Resolution Techniques (open Addressing & closed Addressing) are the techniques used for resolving or handling the collision Open Addressing (closed hashing) : 1) Linear probing : In linear probing, when a collision occurs, the algorithm checks the next available slot sequentially until it finds an empty one. Example: Imagine we have a hash table with 10 slots and we’re inserting values using the hash function: Hash Key = Key % Table Size. 1) Insert 12: 12 % 10 = 2 → Place 12 in slot 2. 2) Insert 22: 22 % 10 = 2 → Collision! Slot 2 is taken. 3) Linear probing checks the next slot: slot 3. 4) Place 22 in slot 3. This way, we resolve the collision by moving to the next slot. Solve this problem with table size = 7 2) Quadratic probing : If a slot in the hash table, calculated by h(key), is already full: 1) Try the next slot using (h(key)+ 1^2) % TableSize. 2) If that slot is also full, try (h(key) + 2^2) % TableSize. 3) If that slot is still full, try (h(key) + 3^2) % TableSize, and so on Example: Using table with 10 slots, and Hash Key = Key % Table Size: 1) Insert 12: 12 % 10 = 2 → Place 12 in slot 2. 2) Insert 22: 22 % 10 = 2 → Collision in slot 2. 3) Instead of moving to the next slot, quadratic probing tries slot (2 + 1^2) % 10 = 3 → Place 22 in slot 3. 4) Insert 32: 32 % 10 = 2 → Collision in slot 2. 5) Next, try (2 + 2^2) % 10 = 6 → Place 32 in slot 6. Quadratic probing spreads out the search more than linear probing, reducing clustering. Solve this problem with table size = 7 Quick Revision McQs : 1) What is the primary purpose of hashing in data structures? - A) To sort data items - B) To map data items to specific locations in a hash table for efficient retrieval - C) To encrypt data for security purposes - D) To compress data for storage 2) Which of the following is true about hash tables? - A) They can store duplicate keys - B) They are always sorted - C) They use hash functions to map keys to specific indices - D) They use binary search for data retrieval 3) What is a hash function? - A) A function that encrypts data - B) A function that generates an index for data items - C) A function that sorts data in ascending order - D) A function that compresses data 4) Which collision resolution technique checks the next available slot sequentially? - A) Chaining - B) Linear Probing - C) Quadratic Probing - D) Double Hashing 5) In quadratic probing, if there’s a collision at index h(key) which slot will be checked next? - A) ( h(key) + 1 - B) ( h(key) + 1^2 ) - C) ( h(key) + 2 ) - D) ( h(key) + 3 ) 6) Which of these is not a common collision resolution technique? - A) Linear Probing - B) Quick Hashing - C) Quadratic Probing - D) Double Hashing 7) What is the time complexity of an ideal hash table operation for search, insert, and delete? - A) ( O(n) - B) O(log n) - C) O(1) - D) O(n^2) 8) In the example of quadratic probing, if the hash table size is 10 and there is a collision at index 2, what index will be checked after (2 + 2^2) % 10 )? - A) Index 2 - B) Index 3 - C) Index 6 - D) Index 7 9) Double hashing involves: - A) Using a linked list to store elements at the same index - B) Using two hash functions to handle collisions - C) Storing elements at the same index in an array - D) Searching for an empty slot by linear probing Answers : 1) -Answer: B) 2) Answer: C) 3) Answer:B) 4) Answer:B) 5) Answer: B) 6) -Answer: B) 7) Answer:C) 8) Answer: C) 9) Answer:B) 3) Double Hashing : Double hashing is a collision resolution technique used in hash tables. When two keys hash to the same slot (a collision), double hashing uses a second hash function to determine how far to move or "probe" to find an open slot. This technique helps spread out keys more evenly and reduces clustering, making the table more efficient for lookups and insertions. When the result of the hash function results in a collision, a second hash function is used. How Double Hashing Works : In double hashing, we use two hash functions: 1. Primary hash function (hash1): Calculates the initial position in the table. 2. Secondary hash function (hash2): Determines the "step size" or "increment" to move from the initial position when a collision occurs. Why Double Hashing Is Effective : Double hashing is effective because: 1)It produces unique step sizes for different keys, reducing clustering. 2)Keys spread out more evenly in the table, improving performance. 3)It's less likely to revisit the same slots repeatedly. 4) Increse table size ( rehashing) : closed Addressing (open hashing) : 1) Bucket In data structures, a bucket is a concept used in hash tables and hashing to group or store data items that map to the same index. This technique helps manage collisions where multiple items map to the same location in a data structure. By organizing data items into buckets, we can improve the efficiency of search and insertion operations in a hash table A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically stores one complete disk block, which in turn can store one or more records. How Buckets Work in Hash Tables: 1. Hash Function: A hash function maps a key to an index in the array (the hash table). 2. Buckets: If two different keys have the same hash code (causing a collision), they are stored in the same "bucket." 3. Collision Resolution: Buckets allow the hash table to handle collisions. Each bucket can use a linked list, array, or balanced tree to store multiple values associated with the same hash index. 2) Separate Chaining : A chain is simply a linked list of all the elements with the same hash key. A linked list is created at each index in the hash table. A data items key is hashed to the index in hashing, and the item is inserted into the linked list at the index Other items that hash to the same index are simply added to the linked list Thanks eng / esRaa shehab