Podcast
Questions and Answers
Which of the following is NOT a requirement of a good hash function?
Which of the following is NOT a requirement of a good hash function?
- It should distribute keys unevenly across the table to minimize collisions. (correct)
- It should be deterministic, producing the same output for the same input.
- It should be computationally efficient, allowing for fast computation of hash values.
- It should minimize collisions, ideally not hashing two different keys to the same index.
What is the primary purpose of hashing?
What is the primary purpose of hashing?
- To compress data for efficient storage.
- To sort data in ascending order.
- To encrypt data for secure storage.
- To map data of arbitrary size to a fixed-size value. (correct)
In the context of hash tables, what is a collision?
In the context of hash tables, what is a collision?
- When a key is successfully inserted into the hash table.
- When a key cannot be found in the hash table.
- When two different keys hash to the same index. (correct)
- When a hash table is full.
Which collision resolution strategy involves storing a linked list of entries in each bucket of the hash table?
Which collision resolution strategy involves storing a linked list of entries in each bucket of the hash table?
Which collision resolution technique involves finding the next available slot in the table itself?
Which collision resolution technique involves finding the next available slot in the table itself?
In linear probing, if i
is the initial index and k
is the number of collisions, what is the formula to check the next index in a hash table of size tableSize
?
In linear probing, if i
is the initial index and k
is the number of collisions, what is the formula to check the next index in a hash table of size tableSize
?
Which probing technique uses the formula (i + c1 * k + c2 * k^2) % tableSize
to resolve collisions?
Which probing technique uses the formula (i + c1 * k + c2 * k^2) % tableSize
to resolve collisions?
What is the purpose of 'rehashing' in the context of hash tables?
What is the purpose of 'rehashing' in the context of hash tables?
Why is using a prime number for the table size beneficial in hash tables?
Why is using a prime number for the table size beneficial in hash tables?
Which collision resolution strategy requires extra memory due to the use of linked lists or other data structures?
Which collision resolution strategy requires extra memory due to the use of linked lists or other data structures?
In the context of open addressing, what is a 'tombstone' used for?
In the context of open addressing, what is a 'tombstone' used for?
What is a drawback of linear probing in open addressing?
What is a drawback of linear probing in open addressing?
Which of the following is an application of hashing?
Which of the following is an application of hashing?
In double hashing, what is the purpose of using a second hash function?
In double hashing, what is the purpose of using a second hash function?
What is the load factor of a hash table?
What is the load factor of a hash table?
What is the expected time complexity for search, insert, and delete operations in an ideal hash table?
What is the expected time complexity for search, insert, and delete operations in an ideal hash table?
Which data structure is commonly used in chaining to handle collisions?
Which data structure is commonly used in chaining to handle collisions?
What is a disadvantage of quadratic probing compared to linear probing?
What is a disadvantage of quadratic probing compared to linear probing?
Symbol tables in compilers often leverage which application of hashing?
Symbol tables in compilers often leverage which application of hashing?
Which of the following describes the 'clustering risk' associated with linear probing?
Which of the following describes the 'clustering risk' associated with linear probing?
Flashcards
What is Hashing?
What is Hashing?
Mapping data to a fixed-size value for fast data access.
Hashing's Role
Hashing's Role
Backbone of hash tables, enabling average-case constant time operations (O(1)).
Hash Function
Hash Function
Takes a key and returns an index in the hash table.
Deterministic Hash
Deterministic Hash
Signup and view all the flashcards
Uniform Distribution
Uniform Distribution
Signup and view all the flashcards
Minimize Collisions
Minimize Collisions
Signup and view all the flashcards
What is a Collision?
What is a Collision?
Signup and view all the flashcards
Chaining (Separate Chaining)
Chaining (Separate Chaining)
Signup and view all the flashcards
Open Addressing
Open Addressing
Signup and view all the flashcards
Linear Probing
Linear Probing
Signup and view all the flashcards
Quadratic Probing
Quadratic Probing
Signup and view all the flashcards
Double Hashing
Double Hashing
Signup and view all the flashcards
Load Factor
Load Factor
Signup and view all the flashcards
Rehashing
Rehashing
Signup and view all the flashcards
Clustering
Clustering
Signup and view all the flashcards
Tombstone
Tombstone
Signup and view all the flashcards
Study Notes
- Hashing maps data to a fixed-size value, called a hash code or hash value, and is used for fast data access, insertion, and deletion.
- Hashing is the foundation of hash tables, which allows for average-case constant time (O(1)) operations.
Hash Functions
- A hash function takes a key and returns an index in the hash table.
- A good hash function must be deterministic (same input yields the same output), efficient (fast to compute), uniform (distributes keys evenly), and minimizes collisions.
Collisions
- A collision occurs when two different keys hash to the same index.
Collision Resolution Strategies
- Chaining (Separate Chaining): Each bucket stores a linked list of entries and is easy to implement. The table size doesn't limit the number of elements, but can be slower if many items hash to the same index.
- Open Addressing: Finds the next available slot in the table itself.
- Linear Probing: Checks the next index = ((i + k) % tableSize) until a free spot is found.
- i is the index the key was originally hashed to.
- k is the number of collisions for that key, which is also the current step number in the probing process.
- Quadratic Probing: Checks the index = (i + c1 * k + c2 * k²) % tableSize until a free spot is found.
- i is the original index from the hash function.
- k is the number of collisions for that key, which is also your current step number in the probing process.
- c1 and c2 are constants (commonly c1 = 0, c2 = 1).
- Double Hashing: Uses a second hash function to determine the step size, index = (i + k * h2(key)) % tableSize.
- h2 is the second hash function and k is the number of collisions so far.
Load Factor
- Load Factor (a) = number of elements / table size.
- It measures how full the table is; as it approaches 1, collision chances increase.
- Rehashing is common practice when 'a' exceeds a threshold (e.g., 0.7).
- Rehashing involves creating a new table, typically doubling the original size to the nearest prime number, and re-inserting all elements using the new table and hash function to maintain efficiency.
Picking a Table Size
- Using key % size when the size is not prime can lead to clustering.
- Clustering: Keys hash to nearby locations, forming large blocks and slowing search, insert, and delete operations.
- With tableSize = 10, keys like 10, 20, 30 all hash to 0, creating non-uniform distribution.
- With quadratic probing, non-prime sizes can cause repeated or skipped indices.
- Using a prime number for the table size minimizes modular arithmetic patterns, ensuring a more uniform spread of keys.
Picking a Collision Resolution Strategy
- Chaining (Separate Chaining): Each bucket holds a list of entries, is easy to implement, and handles high load factors but requires extra memory.
- Open Addressing: All entries are stored in the table itself.
- Linear Probing probes the next slot: (i + k) % tableSize which is simple but causes primary clustering and performance degradation.
- Quadratic Probing probes using: (i + c1 * k + c2 * k^2) % tableSize which reduces primary clustering and has better spread but has secondary clustering and more complex deletions.
- Deleting an item in open addressing can break the probe sequence for other keys, leading to incorrect searches.
- Simply marking a slot as empty can cause searches for later-inserted keys to stop too early.
- To fix this, deleted slots are marked with a tombstone that indicates the slot was occupied but is now deleted, signaling to continue probing.
- Tombstones preserve probe sequence integrity but add complexity by distinguishing between empty, tombstone, and occupied slots.
- Deleting an item in open addressing can break the probe sequence for other keys, leading to incorrect searches.
- Double Hashing probes using a second hash function: (i + k * h2(key)) % tableSize, which minimizes clustering and provides unique probe sequences but requires two good hash functions and is more complex.
Performance
- Chaining has high extra space, low clustering risk, unlimited load factor limit and Easy deletion handling
- Linear Probing has None extra space, High clustering risk, ~0.7 max load factor limit and Tricky (tombstones) deletion handling
- Quadratic Probing has None extra space, Medium clustering risk, ~0.5-0.7 load factor limit and Tricky (tombstones) deletion handling
- Double Hashing has None extra space, Low clustering risk, ~0.7 load factor limit and Tricky (tombstones) deletion handling
Runtime of Hash Map Operations
- Search average/best case: O(1), worst case: O(n)
- Insert average/best case: O(1), worst case: O(n)
- Delete average/best case: O(1), worst case: O(n)
- In ideal scenarios, hash tables provide constant-time operations.
Applications of Hashing
- Dictionaries/Maps (e.g., unordered_map in C++)
- Database indexing
- Caching (e.g., using hash values as keys)
- Symbol tables in compilers
- Password storage (cryptographic hashing)
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.