Hash Tables in Java, Annotated Slides (PDF)

Summary

These slides are part of a presentation on hash tables and their implementations in Java;  the slides cover data structures (specifically looking at hash tables), and contain examples along with explanations of the concepts involved.

Full Transcript

✎ Unit 08: Hash tables Unit Objectives. Here you will: learn about how hash tables work learn about what is meant by hashing and its purpose learn about a range of terminology related to hash table learn about the issues surrounding the use...

✎ Unit 08: Hash tables Unit Objectives. Here you will: learn about how hash tables work learn about what is meant by hashing and its purpose learn about a range of terminology related to hash table learn about the issues surrounding the use of a hash table learn about some typical hashing methods explore some implementations using the concept of hashing in the JCF ✔ SHS Wong DSA 08.Hashing 1/25 ✎ Hashing Elements can be stored in an array, contens contents contents.length-2 contents.length-3 0 1 2... contents.length-1 contents SHS Wong DSA 08.Hashing 2/25 ✎ Hashing Elements can be stored in an array, a linked structure next next next next next null element element element element element SHS Wong DSA 08.Hashing 2/25 ✎ Hashing Elements can be stored in an array, a linked structure or a... hash value hash table hash cell/bucket function collision perfect hash function: no collision SHS Wong DSA 08.Hashing 2/25 ✎ Hashing Elements can be stored in an array, a linked structure or a... hash table Hash function: For computing the location of an element in a hash table. SHS Wong DSA 08.Hashing 2/25 ✎ Hashing Elements can be stored in an array, a linked structure or a... hash table Hash function: For computing the location of an element in a hash table. Cell or Bucket: A storage location in the hash table. SHS Wong DSA 08.Hashing 2/25 ✎ Hashing Elements can be stored in an array, a linked structure or a... hash table Hash function: For computing the location of an element in a hash table. Cell or Bucket: A storage location in the hash table. Hashing: A computing process to determine the location where an element will be (or is) stored in a hash table. SHS Wong DSA 08.Hashing 2/25 ✎ Hashing Elements can be stored in an array, a linked structure or a... hash table Hash function: For computing the location of an element in a hash table. Cell or Bucket: A storage location in the hash table. Hashing: A computing process to determine the location where an element will be (or is) stored in a hash table. Hash value: A value computed from a hash function, i.e. the index of the location in which an element is to be stored within a hash table. ✔ SHS Wong DSA 08.Hashing 2/25 ✎ Hash Function and Hash Table: an Example Hash Table Anna Dan Elsa Heidi Hash Chris Function Mae Tim Will Yannie ✔ SHS Wong DSA 08.Hashing 3/25 ✎ Hashing Method – Extraction: an Example One example of hash function is extraction... SHS Wong DSA 08.Hashing 4/25 ✎ Hashing Method – Extraction: an Example One example of hash function is extraction... Let’s assume that we are to create a hash table with 26 cell/bucket. Each cell will store one name that begins with different letter. We create a simple hash function that uses the first letter of each name, e.g.: Hash Value Name Hash Value Name A Ann M Mary D Doug T Tim E Elizabeth W Walter H Hal Y Young Our array will look like p.5. What would the hash function look like? ✔ SHS Wong DSA 08.Hashing 4/25 ✎ Hashing Method – Extraction: an Example Hash Value Name A Ann D Doug E Elizabeth H Hal M Mary T Tim W Walter Y Young What is the hash value for the name ”Zoe”? ✔ SHS Wong DSA 08.Hashing 5/25 ✎ Hashing Method – Extraction: an Example Hash Value Name A Ann D Doug E Elizabeth H Hal M Mary T Tim W Walter Y Young What is the hash value for the name ”Zoe”? ANSWER: ’Z’ ✔ SHS Wong DSA 08.Hashing 5/25 ✎ Hashing Method – Extraction Consider our earlier hashing example: Assume that the first cell/bucket of the hash table is at location 0, the second at 1, the third at 2,... , the last one at 25. ➠ Range of locations: 0 − 25 Given the name Chris, how can we work out in which cell/bucket it should be kept? ✔ SHS Wong DSA 08.Hashing 6/25 ✎ Hashing Method – Extraction Consider our earlier hashing example: Assume that the first cell/bucket of the hash table is at location 0, the second at 1, the third at 2,... , the last one at 25. ➠ Range of locations: 0 − 25 Given the name Chris, how can we work out in which cell/bucket it should be kept? Answer: 1. Find out the ASCII code of the first letter, i.e. C. ➠ A = 65, B = 66, · · · , Z = 90 2. Compute the index using the formula: index = ASCII code − 65 ➠ index = 67 - 65 = 2. Chris will be kept in location 2, i.e. the 3rd cell in the given hash table. ✔ SHS Wong DSA 08.Hashing 6/25 ✎ Hashing Method – Division Another example of hash function is division... 0 1 2 3... p-2 p-1 SHS Wong DSA 08.Hashing 7/25 ✎ Hashing Method – Division Another example of hash function is division... Let’s assume that we are to create a hash table with 50, 000 cells (or buckets). Each cell will store one Aston University student record. We create a simple hash function as follows: hashValue(key) = |key| mod p where: ⋄ key is the candidate number of a student ⋄ p can be the size of the hash table, i.e. number of cells/buckets Given the candidate number 340175, in which cell will the student record be stored? ✔ 340175 / 50000 SHS Wong DSA 08.Hashing 7/25 ✎ Hashing Method – Division Another example of hash function is division... Let’s assume that we are to create a hash table with 50, 000 cells (or buckets). Each cell will store one Aston University student record. We create a simple hash function as follows: hashValue(key) = |key| mod p where: ⋄ key is the candidate number of a student ⋄ p can be the size of the hash table, i.e. number of cells/buckets Given the candidate number 340175, in which cell will the student record be stored? In Cell 40175. ✔ SHS Wong DSA 08.Hashing 7/25 ✎ Hashing Hashing enables the access time to a particular element to be independent of the number of elements in the table. All operations on an element of a hash table should be O(1)... theoretically. SHS Wong DSA 08.Hashing 8/25 ✎ Hashing Hashing enables the access time to a particular element to be independent of the number of elements in the table. All operations on an element of a hash table should be O(1)... theoretically. However, this efficiency can only be realised if each element maps to a unique position in the table. A collision occurs when ⩾ 2 elements map to the same location. SHS Wong DSA 08.Hashing 8/25 ✎ Hashing Hashing enables the access time to a particular element to be independent of the number of elements in the table. All operations on an element of a hash table should be O(1)... theoretically. However, this efficiency can only be realised if each element maps to a unique position in the table. A collision occurs when ⩾ 2 elements map to the same location. A hash function that maps each element to a unique location is called perfect hash function. SHS Wong DSA 08.Hashing 8/25 ✎ Hashing Hashing enables the access time to a particular element to be independent of the number of elements in the table. All operations on an element of a hash table should be O(1)... theoretically. However, this efficiency can only be realised if each element maps to a unique position in the table. A collision occurs when ⩾ 2 elements map to the same location. A hash function that maps each element to a unique location is called perfect hash function. A hash function that does a good job distributing elements in the ✔ table with few collisions will still result in O(1) operations. SHS Wong DSA 08.Hashing 8/25 ✎ Collision in Hashing: an Example Adding ”Molly” to the hash table will result in collision: Hash Table 1 Anna 2 3 Chris 4 Dan 5 Elsa 6 7 computed hash value: 13 8 Heidi 9 cell/bucket 10 Hash 11 Molly 12 Function 13 Mae Molly 14 Using extraction method 15 16 17 18 19 collision 20 Tim 21 22 23 Will 24 25 Yannie ✔ 26 SHS Wong DSA 08.Hashing 9/25 ✎ Resolving Collisions in a Hash Table: Chaining Need to resolve collision, because: SHS Wong DSA 08.Hashing 10/25 ✎ Resolving Collisions in a Hash Table: Chaining Need to resolve collision, because: ⋄ A perfect hash function is not easy to develop. SHS Wong DSA 08.Hashing 10/25 ✎ Resolving Collisions in a Hash Table: Chaining Need to resolve collision, because: ⋄ A perfect hash function is not easy to develop. ⋄ Often we also do not know the size of the dataset. ➠ We need a method to handle collisions, e.g. chaining. The chaining method: ⋄ Treats each cell in the hash table conceptually as a collection. SHS Wong DSA 08.Hashing 10/25 ✎ Resolving Collisions in a Hash Table: Chaining Need to resolve collision, because: ⋄ A perfect hash function is not easy to develop. ⋄ Often we also do not know the size of the dataset. ➠ We need a method to handle collisions, e.g. chaining. The chaining method: ⋄ Treats each cell in the hash table conceptually as a collection. ⋄ Two implementations: · Using a linked structure (p.11), or SHS Wong DSA 08.Hashing 10/25 ✎ Resolving Collisions in a Hash Table: Chaining Need to resolve collision, because: ⋄ A perfect hash function is not easy to develop. ⋄ Often we also do not know the size of the dataset. ➠ We need a method to handle collisions, e.g. chaining. The chaining method: ⋄ Treats each cell in the hash table conceptually as a collection. ⋄ Two implementations: · Using a linked structure (p.11), or · Using an array based structure with an overflow area (p.12). ✔ SHS Wong DSA 08.Hashing 10/25 How to resolve collision.... ✎ Chaining using a linear linked structure Ann Ben Cathy, Clare Doug Elizabeth Felix, Frances, Fred Greg Hal Ivan Jane, John Ken Lara, Lawrence, Lenny Mary Nathan Oliver ✔ SHS Wong DSA 08.Hashing 11/25 ✎ Chaining using an overflow area Ann Ben Cathy Doug Elizabeth Felix Greg Hal Ivan Jane Ken Lara Mary Nathan Oliver Clare Frances John Lawrence Fred ✔ Lenny SHS Wong DSA 08.Hashing 12/25 ✎ How large should the table be? Another issue surrounding hashing is: How large should the table be? SHS Wong DSA 08.Hashing 13/25 ✎ How large should the table be? Another issue surrounding hashing is: How large should the table be? If we have a dataset of size n and a perfect hash function, the table should be of size n. SHS Wong DSA 08.Hashing 13/25 ✎ How large should the table be? Another issue surrounding hashing is: How large should the table be? If we have a dataset of size n and a perfect hash function, the table should be of size n. Without a perfect hash function, but know that the size of the dataset is n, a good rule of thumb is to make the table 150% the size of the dataset. SHS Wong DSA 08.Hashing 13/25 ✎ How large should the table be? Another issue surrounding hashing is: How large should the table be? If we have a dataset of size n and a perfect hash function, the table should be of size n. Without a perfect hash function, but know that the size of the dataset is n, a good rule of thumb is to make the table 150% the size of the dataset. If we do not know the size of the dataset, we will depend upon dynamic resizing. ✔ SHS Wong DSA 08.Hashing 13/25 ✎ Resizing a Hash Table Dynamically... is similar to expanding the capacity of an array. Resizing dynamically means to: ⋄ Create a new, larger hash table, ⋄ Insert all of the entries of the old table into the new one, and ⋄ Rehash the entries. SHS Wong DSA 08.Hashing 14/25 ✎ Resizing a Hash Table Dynamically... is similar to expanding the capacity of an array. Resizing dynamically means to: ⋄ Create a new, larger hash table, ⋄ Insert all of the entries of the old table into the new one, and ⋄ Rehash the entries. Deciding when to resize is important to dynamic resizing: SHS Wong DSA 08.Hashing 14/25 ✎ Resizing a Hash Table Dynamically... is similar to expanding the capacity of an array. Resizing dynamically means to: ⋄ Create a new, larger hash table, ⋄ Insert all of the entries of the old table into the new one, and ⋄ Rehash the entries. Deciding when to resize is important to dynamic resizing: ⋄ We can simply resize when the table is full. ➠ But, the performance of hash tables degrades seriously as they become full. SHS Wong DSA 08.Hashing 14/25 ✎ Resizing a Hash Table Dynamically... is similar to expanding the capacity of an array. Resizing dynamically means to: ⋄ Create a new, larger hash table, ⋄ Insert all of the entries of the old table into the new one, and ⋄ Rehash the entries. Deciding when to resize is important to dynamic resizing: ⋄ We can simply resize when the table is full. ➠ But, the performance of hash tables degrades seriously as they become full. ⋄ A better approach is to use a load factor. SHS Wong DSA 08.Hashing 14/25 ✎ Resizing a Hash Table Dynamically... is similar to expanding the capacity of an array. Resizing dynamically means to: ⋄ Create a new, larger hash table, ⋄ Insert all of the entries of the old table into the new one, and ⋄ Rehash the entries. Deciding when to resize is important to dynamic resizing: ⋄ We can simply resize when the table is full. ➠ But, the performance of hash tables degrades seriously as they become full. ⋄ A better approach is to use a load factor. The load factor of a hash table is the percentage occupancy of the table at which the table will be resized. SHS Wong DSA 08.Hashing 14/25 ✎ Resizing a Hash Table Dynamically... is similar to expanding the capacity of an array. Resizing dynamically means to: ⋄ Create a new, larger hash table, ⋄ Insert all of the entries of the old table into the new one, and ⋄ Rehash the entries. Deciding when to resize is important to dynamic resizing: ⋄ We can simply resize when the table is full. ➠ But, the performance of hash tables degrades seriously as they become full. ⋄ A better approach is to use a load factor. The load factor of a hash table is the percentage occupancy of the table at which the table will be resized. ✔ ➠ E.g. a hash table with a load factor of 75 will resize when it is 75% full. SHS Wong DSA 08.Hashing 14/25 ✎ Hash Functions There are a wide variety of hash functions that provide good distribution of various types of data. SHS Wong DSA 08.Hashing 15/25 ✎ Hash Functions There are a wide variety of hash functions that provide good distribution of various types of data. To compute a hash value for an object, we need to feed data about the object to a hash function. What sort of data about the object would be suitable for feeding to a hash function? : SavingAccount balance : 20.5 accountNumber : 1 ✔ name : "John Benjy" SHS Wong DSA 08.Hashing 15/25 ✎ Working with Hash Functions A hash function typically use part of the object information (i.e. the key) to compute the hash value. Question: Which field of the following SavingAccount object would be a suitable key? : SavingAccount balance : 20.5 accountNumber : 1 name : "John Benjy" ✔ SHS Wong DSA 08.Hashing 16/25 ✎ Working with Hash Functions A hash function typically use part of the object information (i.e. the key) to compute the hash value. Question: Which field of the following SavingAccount object would be a suitable key? : SavingAccount balance : 20.5 accountNumber : 1 name : "John Benjy" Answer: accountNumber ✔ SHS Wong DSA 08.Hashing 16/25 ✎ Hash Functions Extraction involves using only a part of an element’s value (or key) to compute the location at which to store the element. (Cf p.5) SHS Wong DSA 08.Hashing 17/25 ✎ Hash Functions Extraction involves using only a part of an element’s value (or key) to compute the location at which to store the element. (Cf p.5) ➠ E.g.: we extracted the 1st letter of a string and computed its value relative to the letter A. SHS Wong DSA 08.Hashing 17/25 ✎ Hash Functions Extraction involves using only a part of an element’s value (or key) to compute the location at which to store the element. (Cf p.5) ➠ E.g.: we extracted the 1st letter of a string and computed its value relative to the letter A. Division uses the remainder of the key divided by some positive integer, p, as the index of the element, e.g.: hashValue(key) = |key| mod p where p can be the size of the hash table. SHS Wong DSA 08.Hashing 17/25 ✎ Hash Functions Extraction involves using only a part of an element’s value (or key) to compute the location at which to store the element. (Cf p.5) ➠ E.g.: we extracted the 1st letter of a string and computed its value relative to the letter A. Division uses the remainder of the key divided by some positive integer, p, as the index of the element, e.g.: hashValue(key) = |key| mod p where p can be the size of the hash table. A good hash function should assign an element to a cell (aka bucket) in a random manner. ✔ ➠ This will ensure an even distribution of elements in the available cells. SHS Wong DSA 08.Hashing 17/25 ✎ The need for Hashing Consider a library application storing information about books using the ISBN (a thirteen-digit) number as the key to identify books. SHS Wong DSA 08.Hashing 18/25 ✎ The need for Hashing Consider a library application storing information about books using the ISBN (a thirteen-digit) number as the key to identify books. There are more than 1013 , i.e. 10, 000, 000, 000, 000 possible ISBNs. An array of this size (with the ISBN as index) is very wasteful of storage as even a large library is unlikely to hold more than (say) 50,000 books, so most elements would be empty. SHS Wong DSA 08.Hashing 18/25 ✎ The need for Hashing Consider a library application storing information about books using the ISBN (a thirteen-digit) number as the key to identify books. There are more than 1013 , i.e. 10, 000, 000, 000, 000 possible ISBNs. An array of this size (with the ISBN as index) is very wasteful of storage as even a large library is unlikely to hold more than (say) 50,000 books, so most elements would be empty. Using a hash table of size 75,000 (or perhaps 100,000) and a good hash function on the ISBN, a book can be looked up in constant time and a new book can be added in constant time. Hashing prevents the need for reserving 1013 storage spaces for values that are likely not to appear in the collection. ✔ SHS Wong DSA 08.Hashing 18/25 ✎ Hash Functions in the Java Language ➠ The output of a hash function indicates where an element is (to be) stored within the hash table. In Java, an object’s hash code is used to determine where to put the object in a hash table. ➠ Identical objects, as indicated by method equals, must have the same hash code. Method hashCode in class java.lang.Object returns the object’s hash code: 1 public int hashCode() The integer returned is based on the memory location that the object was initially placed. ✔ SHS Wong DSA 08.Hashing 19/25 ✎ Hash Functions in the Java Language (Cont’d) Identical objects must return the same value when invoking their hashCode methods. Under this definition, no two object can be the same! Many classes tend to override the inherited hashCode method so as to redefine object identity. SHS Wong DSA 08.Hashing 20/25 ✎ Hash Functions in the Java Language (Cont’d) Identical objects must return the same value when invoking their hashCode methods. Under this definition, no two object can be the same! Many classes tend to override the inherited hashCode method so as to redefine object identity. E.g. two String objects are considered identical if they contain the same sequence of characters. SHS Wong DSA 08.Hashing 20/25 ✎ Hash Functions in the Java Language (Cont’d) Identical objects must return the same value when invoking their hashCode methods. Under this definition, no two object can be the same! Many classes tend to override the inherited hashCode method so as to redefine object identity. E.g. two String objects are considered identical if they contain the same sequence of characters. If method equals(Object) is overridden, method hashCode will also needed to be overridden. SHS Wong DSA 08.Hashing 20/25 ✎ Hash Functions in the Java Language (Cont’d) Identical objects must return the same value when invoking their hashCode methods. Under this definition, no two object can be the same! Many classes tend to override the inherited hashCode method so as to redefine object identity. E.g. two String objects are considered identical if they contain the same sequence of characters. If method equals(Object) is overridden, method hashCode will also needed to be overridden. ➠ Identical objects must return the same value when invoking their hashCode methods. ✔ SHS Wong DSA 08.Hashing 20/25 ✎ Redefining the method hashCode: an Example 1 public class Student { 2 3 private String id; 4 private String surname; 5 private String firstname; 6 7 // Other details omitted. 8 9 public int hashCode() { 10 return id.hashCode(); 11 } 12 13 public boolean equals(Object another) { 14 if (another instanceof Student) { 15 return this.id.equals(another.id); 16 } 17 return false; 18 } 19 } ✔ SHS Wong DSA 08.Hashing 21/25 ✎ hashCode in Class String Class java.lang.String inherits and overrides method hashCode: 1 public int hashCode() The hash code is computed using: hash code = s ∗ 31(n−1) + s ∗ 31(n−2) +... + s[n − 1] where s[i] is the ith character of the string, and n is the length of the string. The hash value of the empty string (i.e. "") is 0. Two String objects having exactly the same sequence of characters will be treated as identical objects. ✔ SHS Wong DSA 08.Hashing 22/25 ✎ Redefining the method hashCode: an Exercise Two DishInMenu objects are considered to be the same if their menuId and dishId are the same. 1 public class DishInMenu { 2 private String menuId; 3 private String dishId; 4 private String nameOfDish; 5 6 // Other fields and constructors omitted. 7 8 public int hashCode() { 9 return menuId.hashCode() + dishId.hashCode(); 10 } 11 12 13 ✔ 14 // Other methods omitted. 15 } SHS Wong DSA 08.Hashing 23/25 ✎ Redefining the method hashCode: an Exercise Two DishInMenu objects are considered to be the same if their menuId and dishId are the same. 1 public class DishInMenu { 2 private String menuId; 3 private String dishId; 4 private String nameOfDish; 5 6 // Other fields and constructors omitted. 7 8 public int hashCode() { 9 return menuId.hashCode() + dishId.hashCode(); 10 } 11 12 13 ✔ 14 // Other methods omitted. 15 } SHS Wong DSA 08.Hashing 23/25 ✎ Redefining the method hashCode: an Exercise Two DishInMenu objects are considered to be the same if their menuId and dishId are the same. 1 public class DishInMenu { 2 private String menuId; 3 private String dishId; 4 private String nameOfDish; 5 6 // Other fields and constructors omitted. 7 8 public int hashCode() { 9 return menuId.hashCode() + dishId.hashCode(); 10 } (menuId + dishId).hashCode(); 11 12 13 ✔ 14 // Other methods omitted. 15 } SHS Wong DSA 08.Hashing 23/25 ✎ Redefining the method hashCode: an Exercise Two DishInMenu objects are considered to be the same if their menuId and dishId are the same. 1 public class DishInMenu { 2 private String menuId; 3 private String dishId; 4 private String nameOfDish; 5 6 // Other fields and constructors omitted. 7 8 public int hashCode() { 9 return (menuId + dishId).hashCode(); 10 } 11 12 // !!!! DON’T FORGET TO OVERRIDE equals(Object) ALSO. 13 ✔ 14 // Other methods omitted. 15 } SHS Wong DSA 08.Hashing 23/25 ✎ Hash Tables in the Java Collections Framework JCF includes a range of ADTs backed by a hash table: SHS Wong DSA 08.Hashing 24/25 ✎ Hash Tables in the Java Collections Framework JCF includes a range of ADTs backed by a hash table: Hashtable: Implements a hash table, mapping keys to values. SHS Wong DSA 08.Hashing 24/25 ✎ Hash Tables in the Java Collections Framework JCF includes a range of ADTs backed by a hash table: Hashtable: Implements a hash table, mapping keys to values. HashMap: Hash table based implementation of Map. SHS Wong DSA 08.Hashing 24/25 ✎ Hash Tables in the Java Collections Framework JCF includes a range of ADTs backed by a hash table: Hashtable: Implements a hash table, mapping keys to values. HashMap: Hash table based implementation of Map. HashSet: Implements Set, backed by a hash table. SHS Wong DSA 08.Hashing 24/25 ✎ Hash Tables in the Java Collections Framework JCF includes a range of ADTs backed by a hash table: Hashtable: Implements a hash table, mapping keys to values. HashMap: Hash table based implementation of Map. HashSet: Implements Set, backed by a hash table. LinkedHashSet: A hash table & a linked list implementation of Set. LinkedHashMap: A hash table & a linked list implementation of Map. SHS Wong DSA 08.Hashing 24/25 ✎ Hash Tables in the Java Collections Framework JCF includes a range of ADTs backed by a hash table: Hashtable: Implements a hash table, mapping keys to values. HashMap: Hash table based implementation of Map. HashSet: Implements Set, backed by a hash table. LinkedHashSet: A hash table & a linked list implementation of Set. LinkedHashMap: A hash table & a linked list implementation of Map. ✔ etc. SHS Wong DSA 08.Hashing 24/25 ✎ Learning Outcomes Learning Outcomes. You should now be able to: describe what is meant by hashing explain a range of terminology related to hash table identify when to use a hash table to model a collection explain how some basic hashing algorithms works predict the effect on the data storage when using a selected hashing algorithm use some implementations of hashing in JCF appropriately SHS Wong DSA 08.Hashing 25/25

Use Quizgecko on...
Browser
Browser