Hashing and Hash Tables

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following is NOT a requirement of a good hash function?

  • It should distribute keys unevenly across the table to minimize collisions. (correct)
  • It should be deterministic, producing the same output for the same input.
  • It should be computationally efficient, allowing for fast computation of hash values.
  • It should minimize collisions, ideally not hashing two different keys to the same index.

What is the primary purpose of hashing?

  • To compress data for efficient storage.
  • To sort data in ascending order.
  • To encrypt data for secure storage.
  • To map data of arbitrary size to a fixed-size value. (correct)

In the context of hash tables, what is a collision?

  • When a key is successfully inserted into the hash table.
  • When a key cannot be found in the hash table.
  • When two different keys hash to the same index. (correct)
  • When a hash table is full.

Which collision resolution strategy involves storing a linked list of entries in each bucket of the hash table?

<p>Chaining (Separate Chaining) (D)</p> Signup and view all the answers

Which collision resolution technique involves finding the next available slot in the table itself?

<p>Open Addressing (C)</p> Signup and view all the answers

In linear probing, if i is the initial index and k is the number of collisions, what is the formula to check the next index in a hash table of size tableSize?

<p>$(i + k) \mod tableSize$ (C)</p> Signup and view all the answers

Which probing technique uses the formula (i + c1 * k + c2 * k^2) % tableSize to resolve collisions?

<p>Quadratic Probing (C)</p> Signup and view all the answers

What is the purpose of 'rehashing' in the context of hash tables?

<p>To create a new, larger hash table and re-insert all elements. (A)</p> Signup and view all the answers

Why is using a prime number for the table size beneficial in hash tables?

<p>It minimizes clustering and ensures a more uniform spread of keys. (B)</p> Signup and view all the answers

Which collision resolution strategy requires extra memory due to the use of linked lists or other data structures?

<p>Chaining (Separate Chaining) (B)</p> Signup and view all the answers

In the context of open addressing, what is a 'tombstone' used for?

<p>To indicate that a slot was occupied but is now deleted. (B)</p> Signup and view all the answers

What is a drawback of linear probing in open addressing?

<p>It can lead to primary clustering, which degrades performance. (D)</p> Signup and view all the answers

Which of the following is an application of hashing?

<p>Database indexing (B)</p> Signup and view all the answers

In double hashing, what is the purpose of using a second hash function?

<p>To determine the step size for probing. (D)</p> Signup and view all the answers

What is the load factor of a hash table?

<p>The ratio of the number of elements to the table size. (C)</p> Signup and view all the answers

What is the expected time complexity for search, insert, and delete operations in an ideal hash table?

<p>O(1) (C)</p> Signup and view all the answers

Which data structure is commonly used in chaining to handle collisions?

<p>Linked List (B)</p> Signup and view all the answers

What is a disadvantage of quadratic probing compared to linear probing?

<p>It can be harder to implement deletions correctly. (B)</p> Signup and view all the answers

Symbol tables in compilers often leverage which application of hashing?

<p>Efficient lookup of identifiers (B)</p> Signup and view all the answers

Which of the following describes the 'clustering risk' associated with linear probing?

<p>High, because it can lead to blocks of occupied slots. (C)</p> Signup and view all the answers

Flashcards

What is Hashing?

Mapping data to a fixed-size value for fast data access.

Hashing's Role

Backbone of hash tables, enabling average-case constant time operations (O(1)).

Hash Function

Takes a key and returns an index in the hash table.

Deterministic Hash

Same input always produces the same output.

Signup and view all the flashcards

Uniform Distribution

Keys are evenly distributed across the table.

Signup and view all the flashcards

Minimize Collisions

Two different keys ideally do not hash to the same index.

Signup and view all the flashcards

What is a Collision?

When two different keys hash to the same index.

Signup and view all the flashcards

Chaining (Separate Chaining)

Each bucket stores a linked list of entries.

Signup and view all the flashcards

Open Addressing

Finds the next available slot in the table itself.

Signup and view all the flashcards

Linear Probing

Checks the next index = ((i + k)% tableSize) continuously.

Signup and view all the flashcards

Quadratic Probing

Checks index = (i + c1 * k + c2 * k²) % tableSize.

Signup and view all the flashcards

Double Hashing

Uses a second hash function to decide the step size.

Signup and view all the flashcards

Load Factor

Equals: number of elements / table size.

Signup and view all the flashcards

Rehashing

Creating a new table and re-inserting elements.

Signup and view all the flashcards

Clustering

Keys hash to nearby locations, forming large blocks.

Signup and view all the flashcards

Tombstone

Special placeholder indicating a deleted slot.

Signup and view all the flashcards

Study Notes

  • Hashing maps data to a fixed-size value, called a hash code or hash value, and is used for fast data access, insertion, and deletion.
  • Hashing is the foundation of hash tables, which allows for average-case constant time (O(1)) operations.

Hash Functions

  • A hash function takes a key and returns an index in the hash table.
  • A good hash function must be deterministic (same input yields the same output), efficient (fast to compute), uniform (distributes keys evenly), and minimizes collisions.

Collisions

  • A collision occurs when two different keys hash to the same index.

Collision Resolution Strategies

  • Chaining (Separate Chaining): Each bucket stores a linked list of entries and is easy to implement. The table size doesn't limit the number of elements, but can be slower if many items hash to the same index.
  • Open Addressing: Finds the next available slot in the table itself.
  • Linear Probing: Checks the next index = ((i + k) % tableSize) until a free spot is found.
    • i is the index the key was originally hashed to.
    • k is the number of collisions for that key, which is also the current step number in the probing process.
  • Quadratic Probing: Checks the index = (i + c1 * k + c2 * k²) % tableSize until a free spot is found.
    • i is the original index from the hash function.
    • k is the number of collisions for that key, which is also your current step number in the probing process.
    • c1 and c2 are constants (commonly c1 = 0, c2 = 1).
  • Double Hashing: Uses a second hash function to determine the step size, index = (i + k * h2(key)) % tableSize.
    • h2 is the second hash function and k is the number of collisions so far.

Load Factor

  • Load Factor (a) = number of elements / table size.
  • It measures how full the table is; as it approaches 1, collision chances increase.
  • Rehashing is common practice when 'a' exceeds a threshold (e.g., 0.7).
    • Rehashing involves creating a new table, typically doubling the original size to the nearest prime number, and re-inserting all elements using the new table and hash function to maintain efficiency.

Picking a Table Size

  • Using key % size when the size is not prime can lead to clustering.
    • Clustering: Keys hash to nearby locations, forming large blocks and slowing search, insert, and delete operations.
    • With tableSize = 10, keys like 10, 20, 30 all hash to 0, creating non-uniform distribution.
    • With quadratic probing, non-prime sizes can cause repeated or skipped indices.
  • Using a prime number for the table size minimizes modular arithmetic patterns, ensuring a more uniform spread of keys.

Picking a Collision Resolution Strategy

  • Chaining (Separate Chaining): Each bucket holds a list of entries, is easy to implement, and handles high load factors but requires extra memory.
  • Open Addressing: All entries are stored in the table itself.
    • Linear Probing probes the next slot: (i + k) % tableSize which is simple but causes primary clustering and performance degradation.
    • Quadratic Probing probes using: (i + c1 * k + c2 * k^2) % tableSize which reduces primary clustering and has better spread but has secondary clustering and more complex deletions.
      • Deleting an item in open addressing can break the probe sequence for other keys, leading to incorrect searches.
        • Simply marking a slot as empty can cause searches for later-inserted keys to stop too early.
        • To fix this, deleted slots are marked with a tombstone that indicates the slot was occupied but is now deleted, signaling to continue probing.
        • Tombstones preserve probe sequence integrity but add complexity by distinguishing between empty, tombstone, and occupied slots.
    • Double Hashing probes using a second hash function: (i + k * h2(key)) % tableSize, which minimizes clustering and provides unique probe sequences but requires two good hash functions and is more complex.

Performance

  • Chaining has high extra space, low clustering risk, unlimited load factor limit and Easy deletion handling
  • Linear Probing has None extra space, High clustering risk, ~0.7 max load factor limit and Tricky (tombstones) deletion handling
  • Quadratic Probing has None extra space, Medium clustering risk, ~0.5-0.7 load factor limit and Tricky (tombstones) deletion handling
  • Double Hashing has None extra space, Low clustering risk, ~0.7 load factor limit and Tricky (tombstones) deletion handling

Runtime of Hash Map Operations

  • Search average/best case: O(1), worst case: O(n)
  • Insert average/best case: O(1), worst case: O(n)
  • Delete average/best case: O(1), worst case: O(n)
  • In ideal scenarios, hash tables provide constant-time operations.

Applications of Hashing

  • Dictionaries/Maps (e.g., unordered_map in C++)
  • Database indexing
  • Caching (e.g., using hash values as keys)
  • Symbol tables in compilers
  • Password storage (cryptographic hashing)

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

해싱(Hashing)에 대한 퀴즈
25 questions
Hashing in Computer Science
12 questions

Hashing in Computer Science

ImmaculateFallingAction avatar
ImmaculateFallingAction
Hashing, Maps, and Dictionary Problem Quiz
16 questions
Use Quizgecko on...
Browser
Browser