OCR AS/A Level SLR14 Data Structures Part 5 - Hash Tables PDF

Summary

These notes cover data structures and hash tables. They explain the concept of a hash table, its use in programming, and how it is applicable to file systems and code compilation. The document also briefly introduces methods of resolving collisions.

Full Transcript

OCR AS Level SLR14 Data structures (H046) OCR A...

OCR AS Level SLR14 Data structures (H046) OCR A Level (H446) Overview of structures to store data – Part 5: Hash tables OCR A Level (H446) SLR14 - 1.4 Data st ructures part 5 - Hash tables Teacher site: craigndave.org | Student site: student.craigndave.org | Smart SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 What is a hash table? The goal with a hash table is to immediately find an item in a sorted or unsorted list without the need to compare other items in the data set. Programming languages use hash tables to implement a dictionary data structure. A hashing function is used to calculate the position of an item in a hash table. SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 What is a hash table? The hashing function is Converted value Hash applied to an item to Initial data fed to hash function determine a hash value — Alabam function 65 108 97 98 its position in a hash table. a 97 109 97 Addre Data There are many different ss hashing functions in use 71 101 111 114 1 today. A simple example is Georgia 2 adding up the ASCII values of 103 105 97 all the characters in a string 3 and calculating the modulus 70 108 111 114 4 of that value by the size of Florida Florida 105 100 97 5 5 the hash table. Assuming a table size of 10: 6 o Florida: F = 70, l = 108, Delawa 68 101 108 97 7 o = 111, r = 114, i = re 119 97 114 101 8 105, d = 100, a = 97 o 70 + 108 + 111 + 114 + 9 105 + 100 + 97 = 705 67 97 108 105 Address = 10 o 705 MOD 10 = 5 Californ 102 111 114 (sum)MOD o The position of Florida is ia 110 105 97 10 5. SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 What is a hash table? A hash table needs to be at Converted value Hash least large enough to store all Initial data fed to hash function the data items but is usually Alabam function 65 108 97 98 significantly larger to a 97 109 97 Addre Data minimise the chance of the ss algorithm returning the same value for more than one item, 71 101 111 114 1 known as a collision – for Georgia 2 example: 103 105 97 o Delaware: D = 68, e = 3 101, l = 108, a = 97, w = 70 108 111 114 4 Florida 119, a = 97, r = 114, e = Florida 101 105 100 97 5 5 o 68 + 101 + 108 + 97 + 6 119 + 97 + 114 + 101 = Delawa 68 101 108 97 7 805 re 119 97 114 101 8 o 805 MOD 10 = 5 o The position of Delaware 9 is 5. 67 97 108 105 Address = 10 Californ 102 111 114 (sum)MOD Since two data items cannot ia 110 105 97 10 occupy the same position in the hash table, a collision SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 Properties of a good hashing A good hashing function function should: Be calculated quickly Result in as few collisions as possible Use as little memory as possible SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 Resolving collisions There are many strategies for Converted value Hash resolving collisions Initial data fed to hash function generated from hashing Alabam function 65 108 97 98 functions. a 97 109 97 Addre Data A simple solution is to ss repeatedly check the next 71 101 111 114 1 available space in the hash Georgia 2 table until an empty position 103 105 97 is found and store the item 3 there – this is known as open 70 108 111 114 4 Florida addressing. Florida 105 100 97 5 5 To find the item later, the 6 hashing function delivers Delawa 68 101 108 97 7 the start position from which re 119 97 114 101 8 a linear search can then be applied until the item is found 9 – this is known as linear 67 97 108 105 Address = 10 probing. Californ 102 111 114 (sum)MOD ia 110 105 97 10 SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 Resolving collisions In this example, we can see Converted value Hash that Delaware has a hash Initial data fed to hash function value of 5, but that address Alabam function 65 108 97 98 is occupied by Florida. a 97 109 97 Addre Data 1 Therefore, Delaware is placed ss Alabama at 6, the next available 71 101 111 114 2 1 Georgia position. Georgia 2 103 105 97 California has a hash value 3 of 6 but cannot occupy its 5 4 intended position, so it must 70 108 111 114 Florida Florida be stored at the next 105 100 97 5 Delaware 6 available position, 7. 6 California Delawa 68 101 108 97 7 A disadvantage of linear re 119 97 114 101 8 probing in this way is that it prevents other items from 9 being stored at their correct 67 97 108 105 Address = 10 location in the hash table. Californ 102 111 114 (sum)MOD ia 110 105 97 10 It also results in clustering — several positions being SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 Resolving collisions Notice that, with a table size Converted value Hash of 10, two collisions occur. Initial data fed to hash function Alabam function 65 108 97 98 With a table size of 5, three a 97 109 97 Addre Data collisions occur, resulting in 1 ss a less efficient algorithm but Alabama a reduced memory footprint. 71 101 111 114 2 1 Georgia Georgia 2 If the table size is increased 103 105 97 to 11, no collisions occur. 3 5 4 With hashing algorithms, 70 108 111 114 Florida Florida there is often a trade-off 105 100 97 5 Delaware 6 between the efficiency of the 6 algorithm and the size of the California Delawa 68 101 108 97 7 hash table. re 119 97 114 101 8 The process of finding an 9 alternative position for items 67 97 108 105 Address = 10 in the hash table is known Californ 102 111 114 (sum)MOD as rehashing. ia 110 105 97 10 SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 Resolving collisions An alternative method of handling collisions is to use a two-dimensional hash Converted value Hash table. Initial data fed to hash function Alabam function 65 108 97 98 It is then possible for more a 97 109 97 than one item to be placed at the same position, known as Addre Data chaining. Here, we see that 71 101 111 114 ss both Florida and Georgia 0 1 103 105 97 1 Delaware can occupy Alabama the same position 2 1 Georgia but in different 70 108 111 114 2 elements of the 2D Florida 105 100 97 3 array. 5 4 Florida Delaware Delawa 68 101 108 97 5 California re 119 97 114 101 6 6 7 67 97 108 105 Address = Californ 8 102 111 114 (sum)MOD ia 110 105 97 10 9 10 SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 Another possibility would be Resolving collisions to use a second table for collisions, known as an Converted value Hash overflow table, which can Initial data fed to hash function then be searched Alabam function 65 108 97 98 sequentially. a 97 109 97 You could also use a linked list to maintain your overflow Overflow – again, searched 71 101 111 114 Georgia Addre Data Addre table sequentially. OverflowData 103 105 97 1 ss ss Delawar Alabama Delaware 1 2 1 1e Georgia 70 108 111 114 2 2 Florida 105 100 97 3 3 free 2 5 4 4 Florida Delawa 68 101 108 97 5 5 re 119 97 114 101 California 6 6 free 6 3 67 97 108 105 7 7 Californ Address = 102 111 114 (sum)MOD 8 8 ia 110 105 97 10 9 9 10 10 SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 What are the applications of a hash table? Hash tables can be used in situations where items in a large data set need to be found quickly. Typical uses include: o File systems linking a file name to the path of the file. SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 What are the applications of a Index Token Token class Data type hash table? [Keyword:Function] 0 Function Keyword Hash tables can be used in [Identifier:checkScore] situations where items in a [Separator:(] 1 checkScore Identifier Integer large data set need to be [Identifier:score] 2 ( Separator found quickly. Typical uses [Keyword:As] 3 score Identifier Integer include: [Datatype:Integer] o File systems linking a file 4 As Keyword [Separator:)] name to the path of the [Keyword:If] 5 Integer Datatype file. [Identifier:score] 6 ) Separator o Identifying the keywords [Operator:>] 7 If Keyword in a programming [Literal:75] language during 8 > Operator [Keyword:Then] compilation. [Keyword:Return] 9 75 Literal [Quote:"] 10 Then Keyword [Literal:Pass] 11 Return Keyword [Quote:"] 12 " Quote [Keyword:Else] [Keyword:Return] 13 Pass Literal [Quote:"] 14 Else Keyword [Literal:Fail] 15 Fail Literal [Quote:"] 16 EndIf Keyword [Keyword:EndIf] [Keyword:EndFunction] 17 EndFunction Keyword SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 What operations can be performed on a hash table? Add: Adds a new item to a hash table Delete: Removes an item from a hash table Retrieve: Retrieves an item from a hash table using its hash value SLR14 Data structures | Data structures – Part 5: Hash tables OCR H046/H466 Having watched this video, you should be able to answer the following key question. How do hash tables work?

Use Quizgecko on...
Browser
Browser