Document Details

GentleTellurium9861

Uploaded by GentleTellurium9861

PES University Online

2024

M S Anand

Tags

data structures algorithms data management computer science

Summary

This document is a set of lecture notes on data structures and their applications. It covers topics such as BFS, DFS, Tries, Suffix trees, and Hashing, and includes examples and diagrams.

Full Transcript

Data structures and their applications Unit - IV Compiled by M S Anand Department of Computer Science Data Structures and their applications Introduction Text Book(s): 1. "Data Structures using C / C++", Langsum Yedidyah, Moshe J Augenstein, Aaron M Tenenbaum Pearson E...

Data structures and their applications Unit - IV Compiled by M S Anand Department of Computer Science Data Structures and their applications Introduction Text Book(s): 1. "Data Structures using C / C++", Langsum Yedidyah, Moshe J Augenstein, Aaron M Tenenbaum Pearson Education Inc, 2nd edition, 2015. Reference Book(s): 1. "Data Structures and Program Design in C", Robert Kruse, Bruce Leung, C.L Tondo, Shashi Mogalla, Pearson, 2nd Edition, 2019 20/11/2024 Data structures and their applications Applications of BFS and DFS Graph traversal techniques like BFS and DFS have countless. real-world applications. From social network analysis to web crawling, these algorithms help solve complex problems in diverse domains. They're essential for tasks like pathfinding, resource allocation, and connectivity analysis. BFS and DFS each have unique strengths. BFS excels at finding shortest paths and nearest neighbors DFS is great for exploring all possible paths. Understanding their trade-offs helps choose the right approach for specific problems, whether it's optimizing network flow or building recommendation systems. 20/11/2024 Data structures and their applications Applications of BFS and DFS Real world applications of BFS and DFS. Social network analysis Finds shortest paths between users to identify connections (LinkedIn) Identifies clusters or communities based on user interactions (Facebook groups) Recommends friends or connections based on shared interests (Twitter suggestions) Web crawling and indexing Discovers and indexes web pages for search engines (Google) Identifies broken links or dead ends to maintain website integrity Analyzes website structure and connectivity for SEO optimization (Search Engine Optimization) Pathfinding in navigation systems Finds shortest routes between locations for GPS navigation (Google Maps) Explores all possible paths in a maze or grid for game AI (Pac-Man) Generates directions or navigation instructions for users (Waze) 20/11/2024 Data structures and their applications Applications of BFS and DFS Resource allocation and scheduling. Assigns tasks or resources to minimize conflicts in project management (Trello) Optimizes resource utilization and efficiency in supply chain management Detects and resolves resource deadlocks in operating systems 20/11/2024 Data structures and their applications Applications of BFS and DFS Graph connectivity problems. Identifying isolated subgraphs or disconnected components Detects network partitions or segmentation in distributed systems Groups related nodes based on connectivity for data clustering (k- means) Determining reachability between nodes Checks if a path exists between two nodes in a communication network Verifies if a graph is fully connected for network reliability analysis Finding bridges or articulation points Identifies critical edges or nodes that disconnect the graph (load balancing) Assesses network vulnerability and resilience for security analysis Solving flood fill or coloring problems Assigns colors or labels to connected regions in image segmentation (Photoshop) Identifies boundaries or contours in image processing (OpenCV) 20/11/2024 Data structures and their applications Applications of BFS and DFS BFS and DFS in diverse domains. Recommendation systems Suggests friends, products, or content based on graph proximity (Amazon) Performs collaborative filtering and personalized recommendations (Netflix) Network analysis and optimization Identifies influential nodes or hubs in social networks (X (Twitter) influencers) Detects bottlenecks or single points of failure in computer networks Optimizes network flow or capacity in transportation systems (airline routes) Web search and ranking algorithms Calculates page rank or authority scores for web pages (Google PageRank) Identifies important or relevant web pages for search results 20/11/2024 Data structures and their applications BFS v/s DFS BFS vs DFS. BFS characteristics Explores nodes in increasing order of depth or distance from starting node Guarantees shortest paths in unweighted graphs (shortest path algorithms) Suitable for finding shortest paths or nearest neighbors (A* search) Requires more memory to store the queue of nodes to visit DFS characteristics Explores nodes as far as possible along each branch before backtracking May not find shortest paths but can explore all possible paths (maze solving) Suitable for exploring all connected components or cycles. Requires less memory as it uses a stack instead of a queue 20/11/2024 Data structures and their applications Connectivity of a graph A connected graph is a graph where there is a path between any two. vertices. A path is a sequence of edges that connects two vertices, and every edge and vertex in the path is distinct. A graph that is not connected is called a disconnected graph. In a connected graph, there is no unreachable vertex. In the area of Information Technology, connectivity refers to internet connectivity through which various devices are connected to the global network. 20/11/2024 Data structures and their applications Connectedness in the direct graph Using adjacency matrix. Traverse the graph using either BFS or DFS traversal method. After the traversal, if at least one node is marked as “not visited”, then the graph is a disconnected graph. Finding all the paths from a given source to destination 20/11/2024 Data structures and their applications Paths between two nodes in a graph Using DFS: The idea is to do Depth First Traversal of given. directed graph. Start the traversal from source. Keep storing the visited vertices in an array say ‘path[]’. If we reach the destination vertex, print contents of path[]. The important thing is to mark current vertices in path[] as beingVisited, so that the traversal doesn’t go in a cycle. A program in C to print all the paths between a source node and a destination node 20/11/2024 Data structures and their applications Trie A trie is a tree-like data structure whose nodes store the letters of an. alphabet. By structuring the nodes in a particular way, words and strings can be retrieved from the structure by traversing down a branch path of the tree. A trie, also called digital tree or prefix tree, is a kind of search tree—an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Keys tend to be associated with leaves, though some inner nodes may correspond to keys of interest. Hence, keys are not necessarily associated with every node. 20/11/2024 Data structures and their applications Trie. 20/11/2024 Data structures and their applications Trie The shape and the structure of a trie is always a set of linked. nodes, connecting back to an empty root node. An important thing to note is that the number of child nodes in a trie depends completely upon the total number of values possible. For example, if we are representing the English alphabet, then the total number of child nodes is directly connected to the total number of letters possible. In the English alphabet, there are 26 letters, so the total number of child nodes will be 26. The size of a trie is directly correlated to the size of all the possible values that the trie could represent. 20/11/2024 Data structures and their applications Trie A trie could be pretty small or big, depending on what it contains.. But, so far, all we’ve talked about is the root node, which is empty. Where do the letters of different words live if the root node doesn’t house them all? The answer to that lies in the root node’s references to its children. Let’s take a closer look at what a single node in a trie looks like, and hopefully this will start to become more clear. 20/11/2024 Data structures and their applications Trie In the example, we have a trie that has an empty root node, which. has references to children nodes. If we look at the cross-section of one of these child nodes, we’ll notice that a single node in a trie contains just two things: 1. A value, which might be null 2. An array of references to child nodes, all of which also might be null 3. Each node in a trie, including the root node itself, has only these two aspects to it. When a trie representing the English language is created, it consists of a single root node, whose value is usually set to an empty string: "". 4. That root node will also have an array that contains 26 references, all of which will point to null at first. As the trie grows, those pointers start to get filled up with references to other nodes, which we’ll see an example of pretty soon. 20/11/2024 Data structures and their applications Trie 5. The way that those pointers or references are represented is. particularly interesting. We know that each node contains an array of references/links to other nodes. What’s cool about this is that we can use the array’s indices to find specific references to nodes. For example, our root node will hold an array of indices 0 through 25, since there are 26 possible slots for the 26 letters of the alphabet. Since the alphabet is in order, we know that the reference to the node that will contain the letter A will live at index 0 20/11/2024 Data structures and their applications Trie. 20/11/2024 Data structures and their applications Trie Build a Trie to represent the nursery rhyme that starts off with. something like “Peter Piper picked a peck of pickled peppers”. Looking at our trie (slide #21), we can see that we have an empty root node, as is typical for a trie structure. We also have six different words that we’re representing in this trie: Peter, piper, picked, peck, pickled, and peppers. To make this trie easier to look at, we have drawn the references that actually have nodes in them; it’s important to remember that, even though they’re not illustrated here, every single node has 26 references to possible child nodes. 20/11/2024 Data structures and their applications Trie Notice how there are six different “branches” to this trie, one for. each word that’s being represented. We can also see that some words are sharing parent nodes. For example, all of the branches for the words Peter, peck, and peppers share the nodes for p and for e. Similarly, the path to the word picked and pickled share the nodes p, i, c, and k. 20/11/2024 Data structures and their applications Trie. 20/11/2024 Data structures and their applications Trie So, what if we wanted to add the word pecked to this list of words. represented by this trie? We’d need to do two things in order to make this happen: 1. First, we’d need to check that the word pecked doesn’t already exist in this trie. 2. Next, if we’ve traversed down the branch where this word ought to live and the word doesn’t exist yet, we’d insert a value into the node’s reference where the word should go. In this case, we’d insert e and d at the correct references. But how do we actually go about checking if the word exists? And how do we insert the letters into their correct places? Let’s look at a trie that is empty, and try inserting something into it. 20/11/2024 Data structures and their applications Trie We know that we’ll have an empty root node, which will have a. value of "", and an array with 26 references in it, all of which will be empty (pointing to null) to start. Let’s say that we want to insert the word "pie", and give it a value of 5. We’ll work our way through the key, using each letter to build up our trie and add nodes as necessary. We’ll first look for the pointer for p, since the first letter in our key "pie" is p. Since this trie doesn’t have anything in just yet, the reference at p in our root node will be null. So, we’ll create a new node for p, and the root node now has an array with 25 empty slots, and 1 slot (at index 15) that contains a reference to a node. 20/11/2024 Data structures and their applications Trie. 20/11/2024 Data structures and their applications Trie Are we done?. Now we have a node at index 15, holding the value for p. But, our string is "pie", so we’re not done yet. We’ll do the same thing for this node: check if there is a null pointer at the next letter of the key: i. Since we encounter another null link for the reference at i, we’ll create another new node. Finally, we’re at the last character of our key: the e in "pie". We create a new node for the array reference to e, and inside of this third node that we’ve created, we’ll set our value: 5. In the future, if we want to retrieve the value for the key "pie", we’ll traverse down from one array to another, using the indices to go from the nodes p, to i, to e; when we get to the node at the index for e, we’ll stop traversing, and retrieve the value from that node, which will be 5. 20/11/2024 Data structures and their applications Trie. 20/11/2024 Data structures and their applications Trie Trie | (Insert and Search). is an efficient information reTrieval data structure. Using Trie, search Trie complexities can be brought to optimal limit (key length). Every node of Trie consists of multiple branches. Each branch represents a possible character of keys. We need to mark the last node of every key as end of word node. A Trie node field isEndOfWord is used to distinguish the node as end of word node. A simple structure to represent nodes of the English alphabet can be: // Trie node struct TrieNode { struct TrieNode *children[ALPHABET_SIZE]; // isEndOfWord is true if the node represents end of a word bool isEndOfWord; }; 20/11/2024 Data structures and their applications Trie Inserting a key. Inserting a key into Trie is a simple approach. Every character of the input key is inserted as an individual Trie node. Note that the children is an array of pointers (or references) to next level trie nodes. The key character acts as an index into the array children. If the input key is new or an extension of the existing key, we need to construct non-existing nodes of the key, and mark end of the word for the last node. If the input key is a prefix of the existing key in Trie, we simply mark the last node of the key as the end of a word. The key length determines Trie depth. 20/11/2024 Data structures and their applications Trie Searching for a key. Searching for a key is similar to insert operation, however, we only compare the characters and move down. The search can terminate due to the end of a string or lack of key in the trie. In the former case, if the isEndofWord field of the last node is true, then the key exists in the trie. In the second case, the search terminates without examining all the characters of the key, since the key is not present in the trie. 20/11/2024 Data structures and their applications Trie Algorithm for deleting a node. During delete operation we delete the key in bottom up manner using recursion. The following are possible conditions when deleting key from trie, 1. Key may not be there in trie. Delete operation should not modify trie. 2. Key present as unique key (no part of key contains another key (prefix), nor the key itself is prefix of another key in trie). Delete all the nodes. 3. Key is prefix key of another long key in trie. Unmark the leaf node. 4. Key present in trie, having atleast one other key as prefix key. Delete nodes from end of key until first leaf node of longest prefix key. 20/11/2024 Data structures and their applications Trie A sample implementation is here:. trie.h trie_main.c trie_utils.c 20/11/2024 Data structures and their applications Suffix tree. 20/11/2024 Data structures and their applications Suffix tree A suffix is a substring that consists of all the characters in the. string from a particular location to the very end. For instance, the suffixes for the string "banana" are "banana," "anana," "nana," "ana," "na," and "a." These suffixes are all stored in a tree-like data structure called a suffix tree. An ordered tree data structure called a trie is effective at storing a dynamic set of strings. Each edge of a suffix tree corresponds to a single character, and the pathways from the root to the leaves make up the suffixes of the starting string. A suffix tree's compact representation is one of its key characteristics. A suffix tree merges such similar branches to conserve space because numerous suffixes in a string have the same prefixes. This is essential for the data structure's effectiveness, especially when working with lengthy strings. 20/11/2024 Data structures and their applications Suffix tree Numerous string-related algorithms use suffix trees for a variety of. purposes, including pattern searching, locating the largest repeated substring, building a generic suffix tree for multiple strings, and effectively resolving various string-based issues. Suffix trees are widely used in bioinformatics for tasks including genome assembly, gene searching, and identifying sequence similarities. Additionally, they are used in text editors for quick search and replace operations and in other programs that deal with extensive string processing. However, building a suffix tree might take a lot of resources because the tree needs to be stored, which limits its application to very lengthy strings or big datasets. Other compressed variations have been created to address space efficiency issues while keeping quick search capabilities, such as suffix arrays and FM- indexes. 20/11/2024 Data structures and their applications Suffix tree Substring searches benefit greatly from suffix trees. Standard. substring search algorithms have an O(n*m) time complexity when given a text string of length n and a pattern string of length m. However, substring search can be executed in O(m) time with a pre-built suffix tree, making it substantially faster for big text strings and numerous pattern searches. 20/11/2024 Data structures and their applications Suffix tree Applications of Suffix tree:.Substring Search: Suffix trees make it possible to quickly search for substrings. By navigating the suffix tree, you may quickly determine whether a pattern string exists in the original string given the pattern string. About the length of the pattern, this operation can be completed in linear time. Longest Common Substring: Suffix trees can be used to identify the longest common substring that several strings have in common. You can quickly determine the longest common substring by building a generic suffix tree for each input text. Pattern Matching with Wildcards: Suffix trees are capable of handling pattern matching when using wildcards (jokers). When you have patterns with ambiguous or missing characters, this is helpful. Longest Repeated Substring: Suffix trees can be used to identify the longest repeating substring in a given string. This has a variety of uses, including data compression and bioinformatics. 20/11/2024 Data structures and their applications Suffix tree Palindromes: Suffix trees are effective at locating the longest palindromic.substring within a given string, which is helpful for tasks like DNA analysis and RNA folding, among others. Multiple Pattern Matching: You may quickly look for many patterns in the source string using a single suffix tree. Applications like text indexing and searching benefit from this. Generalized Suffix Tree: You can quickly locate common substrings and carry out other related actions by building a generalized suffix tree for a group of strings. Data Compression: Suffix trees are used in data compression methods like the Burrows-Wheeler Transform (BWT) and Burrows-Wheeler Inverse Transform, which are related. Bioinformatics: Suffix trees are widely used in bioinformatics to examine DNA sequences, biological strings, and genomic data. Text Editing and Manipulation: Suffix trees can help with several text editing tasks, including substring substitution, insertion, and deletion. 20/11/2024 Data structures and their applications Suffix tree Advantages of Suffix tree:. 1. efficient memory use because all prefixes are compressed. 2. Fast pattern matching and substring searches with linear time complexity. 3. the ability to quickly identify the longest shared substring amongst different strings. 4. handling numerous pattern-matching tasks well. 5. support for wildcard (joker) substring searches. 6. the capacity to quickly identify the longest repeating substring in a string. 7. Finding the longest palindromic substring in a string quickly. 8. building in linear time with effective techniques. 9. many uses in text manipulation, data compression, and bioinformatics. 10. a powerful tool for different computer science jobs involving strings. 20/11/2024 Data structures and their applications Suffix tree Disadvantages of Suffix tree:. 1. The complexity of Construction: Creating a suffix tree can be challenging and may call for complex algorithms. 2. Memory Usage: Suffix trees can use up a lot of memory, especially when dealing with lengthy input strings. 3. Construction Time: Compared to more straightforward data structures, suffix trees may require more time to build initially. 4. Non-Dynamic: Suffix trees must be rebuilt for any changes to the source string since they are difficult to modify after they have been constructed. 5. Operation Complexity: When compared to other data structures, suffix trees can have more difficult-to-implement operations. 6. Overhead for Small Strings: For short strings, building a suffix tree may be more time-consuming than it is worth. 7. Suffix trees are best suited for activities involving strings; they may not be as flexible for work involving other forms of data. 20/11/2024 Data structures and their applications Hashing What is hashing?. Hashing is the process of transforming any given key or a string of characters into another value. This is usually represented by a shorter, fixed-length value or key that represents and makes it easier to find or employ the original string. The most popular use of hashing is for setting up hash tables. A hash table stores key and value pairs in a list that's accessible through its index. Because the number of keys and value pairs is unlimited, the hash function maps the keys to the table size. A hash value then becomes the index for a specific element. A hash function generates new values according to a mathematical hashing algorithm, known as a hash value or simply a hash. To prevent the conversion of a hash back into the original key, a good hash always uses a one-way hashing algorithm. 20/11/2024 Data structures and their applications Hashing How does hashing work?. Hashing involves three components: Input. The data entered into the algorithm is called input. This data can have any length and format. For instance, an input could be a music file or a paper. In hashing, every piece of input data is used to produce a single output. Hash function. The central part of the hashing process is the hash function. This function takes the input data and applies a series of mathematical operations to it, resulting in a fixed-length string of characters. The hash function ensures that even a small change in the input data produces a significantly different hash value. Hash output. Unlike the input, the hashing process's output or hash value has a set length. It is challenging to determine the length of the original input because outputs have a set length, which contributes to an overall boost in security. A hash value is a string of characters and numbers that a hacker might not be able to read, keeping a person's information private. As each hash value is distinct, hash values are also frequently referred to as fingerprints. 20/11/2024 Data structures and their applications Hashing Applications and Benefits of Hashing. Hashing has applications in various fields such as cryptography, computer science and data management. Some common uses and benefits of hashing include the following: Data integrity Efficient data retrieval Digital signatures Password storage Fast searching Efficient caching Cryptographic applications Space efficiency Blockchain technology Data compression Database management 20/11/2024 Data structures and their applications Hashing Disadvantages of hashing. While hashing offers several benefits, it also has certain drawbacks and limitations, including the following: Risk of collisions. Hashing can sometimes suffer from collisions, which occur when two different inputs produce the same hash value. Collisions can lead to decreased performance and increased lookup time, especially if the number of collisions is high. Techniques such as chaining and open addressing can be used to handle collisions, but they can introduce additional complexity. For example, the cache performance of chaining isn't always the best, as keys use a linked list. Non-reversible. Since hash functions are intended to be one-way functions, reversing the process and getting the original input data isn't computationally viable. This could be a drawback if reverse lookup is necessary. Limited sorting. Hashing isn't ideal if data needs to be sorted in a specific order. While hash tables are designed for efficient lookup and retrieval, they don't provide inherent support for sorting operations. If sorting is a requirement, other data structures such as balanced search trees might be worth considering. 20/11/2024 Data structures and their applications Hashing Space overhead. To store the hash values and the related data, hashing. typically requires more storage space. This space overhead can be substantial when working with big data sets and can be a cause for concern when storage resources are limited. Key dependency. Hashing relies on the uniqueness of keys to ensure efficient data retrieval. If the keys aren't unique, collisions can occur more frequently, leading to performance degradation. It's important to carefully choose or design keys to minimize the likelihood of collisions. Difficulty in setting up. Configuring a hash table or a hashing algorithm can be more complex compared to other data structures. Handling collisions, resizing the hash table and ensuring efficient performance requires careful consideration and planning and can make hashing challenging to set up. 20/11/2024 Data structures and their applications Hashing What is hashing in data structure?. Hashing is used in data structures to efficiently store and retrieve data. The Dewey Decimal System, which enables books to be organized and stored based on their subject matter, has worked well in libraries for many years and the underlying concept works just as well in computer science. Software engineers can save both file space and time by shrinking the original data assets and input strings to short alphanumeric hash keys. When someone is looking for an item on a data map, hashing narrows down the search. In this scenario, hash codes generate an index to store values. Here, hashing is used to index and retrieve information from a database because it helps accelerate the process. It's much easier to find an item using its shorter hashed key than its original value. 20/11/2024 Data structures and their applications Hashing.. What are hash tables?. Hash tables are a type of data structure in which the address/ index value of the data element is generated from a hash function. This enables very fast data access as the index value behaves as a key for the data value. In other words, hash tables store key-value pairs but the key is generated through a hashing function. So the search and insertion function of a data element becomes much faster as the key values themselves become the index of the array which stores the data. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. 20/11/2024 Data structures and their applications Hashing …. 20/11/2024 Data structures and their applications Hashing.. How to choose the hash function? If. your data consists of integers then the easiest hash function is to return the remainder of the division of key and the size of the table. It is important to keep the size of the table as a prime number. But more complex functions can be written to avoid the collision. If your data consists of strings then you can add all the ASCII values of the alphabets and modulo the sum with the size of the table. Some applications of Hash Table Password Verification. Compiler Operation. Linking File name and path together. Operating Systems Implementation (without handling collision) hashTable.h hashTableMain.c hashTableUtils.c 20/11/2024 Data structures and their applications Hashing.. What is Collision?. Since a hash function gets us a small number for a key which is a big integer or string, there is a possibility that two keys result in the same value. The situation where a newly inserted key maps to an already occupied slot in the hash table is called collision and must be handled using some collision handling technique. What are the chances of collisions with large table? Collisions are very likely even if we have big table to store keys. An important observation is Birthday Paradox. With only 23 persons, the probability that two people have the same birthday is 50%. 20/11/2024 Data structures and their applications Hashing – Resolving collisions How to handle Collisions?. Following are the collision resolution techniques used: 1. Open Hashing (Separate chaining) 2. Closed Hashing (Open Addressing) a. Liner Probing b. Quadratic probing c. Double hashing Note Load factor – It is a measure of how full the hash table is. It is calculated as: Load factor = Number of slots occupied / Total size of the hash table 20/11/2024 Data structures and their applications Hashing - Resolving collisions Open Hashing (Separate chaining). Collisions are resolved using a list of elements to store objects with the same key together. Suppose you wish to store a set of numbers = {0,1,2,4,5,7} into a hash table of size 5. Now, assume that we have a hash function H, such that H(x) = x%5 If we were to map the given data with the given hash function, we'll get the corresponding values H(0)-> 0%5 = 0 H(1)-> 1%5 = 1 H(2)-> 2%5 = 2 H(4)-> 4%5 = 4 H(5)-> 5%5 = 0 H(7)-> 7%5 = 2 20/11/2024 Data structures and their applications Hashing - Resolving collisions Clearly 0 and 5, as well as 2 and 7 will have the same hash value,. and in this case, we will simply append the colliding values to a list being pointed to by their hash keys. In practice, the table size can be significantly large and the hash function can be even more complex; also, the data being hashed would be more complex and non-primitive, but the idea remains the same. This is an easy way to implement hashing but has its own demerits. 1. The lookups/inserts/updates can become linear [O(N)] instead of constant time [O(1)] if the hash function results in too many collisions. 2. It doesn't account for any empty slots which can be leveraged for more efficient storage and lookups. 3. Ideally, we require a good hash function to guarantee even distribution of the values. 20/11/2024 Data structures and their applications Hashing - Resolving collisions Closed Hashing (Open Addressing). This collision resolution technique requires a hash table with fixed and known size. During insertion, if a collision is encountered, alternative cells are tried until an empty slot is found. These techniques require the size of the hash table to be supposedly larger than the number of objects to be stored. There are various methods to find these empty slots: 1. Liner Probing 2. Quadratic probing 3. Double hashing 20/11/2024 Data structures and their applications Hashing - Resolving collisions Linear Probing. The idea of linear probing is simple; we take a fixed sized hash table and every time we face a hash collision, we linearly traverse the table (in a cyclic manner) to find the next empty slot. In this case our hash function can be considered as: H(x, i) = (H(x) + i)%N where N is the size of the table and i represents the linearly increasing variable which starts from 1 (until empty slot is found). Despite being easy to compute, implement and deliver best cache performance, this suffers from the problem of clustering (many consecutive elements get grouped together, which eventually reduces the efficiency of finding elements or empty slots). 20/11/2024 Data structures and their applications Hashing - Resolving collisions Quadratic Probing. The general idea is similar to Linear Probing, the only difference is that we look at the Q(i) increment at each iteration when looking for an empty slot, where Q(i) is some quadratic expression of i. A simple expression of Q would be Q(i) = i2, in which case the hash function looks something like this: H(x, i) = (H(x) + i2)%N In general, H(x, i) = (H(x) + ((a*i2 + b * i + c)))%N, for some choice of constants a, b and c. Despite resolving the problem of clustering significantly, it may be the case that in some situations, this technique does not find any available slot, unlike linear probing which always finds an empty slot. 20/11/2024 Data structures and their applications Hashing - Resolving collisions Luckily, we can get good results from quadratic probing with the. right combination of probing function and hash table size which will guarantee that we will visit as many slots in the table as possible. In particular. If the hash table's size is a prime number and the probing function is H(x, i) = i^2, then at least 50% of the slots in the table will be visited. Thus, if the table is less than half full, we can be certain that a free slot will eventually be found. Alternatively, if the hash table size is a power of two and the probing function is H(x, i) = (i^2 + i)/2, then every slot in the table will be visited by the probing function. 20/11/2024 Data structures and their applications Hashing - Resolving collisions Double Hashing. This method is based upon the idea that in the event of a collision we use another hashing function with the key value as an input to find where in the open addressing scheme the data should actually be placed at. In this case we use two hashing functions, such that the final hashing function looks like: H(x, i) = (H1(x) + i*H2(x))%N Typically for H1(x) = x%N a good H2 is H2(x) = P - (x%P), where P is a prime number smaller than N. A good H2 is a function which never evaluates to zero and ensures that all the cells of a table are effectively traversed. 20/11/2024 Data structures and their applications Hashing - Resolving collisions Advantages of Double Hashing. It is one of the best forms of probing, producing a uniform 1. distribution of records throughout a hash table. 2. This technique does not yield any clusters. 20/11/2024 Data structures and their applications Hashing - Resolving collisions Rehashing. elements are inserted into a hashtable, the load factor As increases. If the load factor exceeds a certain threshold (often set to 0.75), the hashtable becomes inefficient as the number of collisions increases. To avoid this, the hashtable can be resized and the elements can be rehashed to new buckets, which decreases the load factor and reduces the number of collisions. This process is known as rehashing. Rehashing can be costly in terms of time and space, but it is necessary to maintain the efficiency of the hashtable. 20/11/2024 Data structures and their applications Hashing - Resolving collisions How is Rehashing done?. Rehashing can be done as follows: 1. For each addition of a new entry to the table, check the load factor. 2. If it is greater than its pre-defined value (or default value of 0.75 if not given), then Rehash. 3. For Rehash, make a new array of double the previous size and make it the new bucketarray. 4. Then traverse to each element in the old bucketArray and call the insert() for each so as to insert it into the new larger bucket array. 20/11/2024 Data structures and their applications URL decoding The idea here is to encode the URLs into tiny URLs.. Store a key value pair in a hash table where the key is the tiny URL and the value is the URL. Apply a relevant hashing function on the tiny URL to generate the index into the hash table. So, the item to be stored in the hash table would look like this: typedef struct url_Entry { char *tinyUrl; char *fullUrl; } URL_ENTRY; 20/11/2024 Data structures and their applications URL decoding Accept the full URL as an input from the user.. Convert it into a tiny url using your own function or any standard function available. Use the hashing function (your own) on the tiny url to generate the index into a hash table. Handle collisions, if any. 20/11/2024 THANK YOU M S Anand Department of Computer Science Engineering [email protected]

Use Quizgecko on...
Browser
Browser