Data Structures Using "C" PDF
Document Details
Uploaded by Deleted User
Biju Patnaik University of Technology
Dr. Subasish Mohapatra
Tags
Summary
This document provides lecture notes, syllabus, and related content for a computer science course on data structures using the C programming language. It covers topics including linear and non-linear data structures, algorithms, and their implementations.
Full Transcript
DATA STRUCTURES USING “C” DATA STRUCTURES USING “C” LECTURE NOTES Prepared by Dr. Subasish Mohapatra Department of Computer Science and Application College of Engineering and Technology, Bhubaneswar Biju Patnaik University...
DATA STRUCTURES USING “C” DATA STRUCTURES USING “C” LECTURE NOTES Prepared by Dr. Subasish Mohapatra Department of Computer Science and Application College of Engineering and Technology, Bhubaneswar Biju Patnaik University of Technology, Odisha SYLLABUS BE 2106 DATA STRUCTURE (3-0-0) Module – I Introduction to data structures: storage structure for arrays, sparse matrices, Stacks and Queues: representation and application. Linked lists: Single linked lists, linked list representation of stacks and Queues. Operations on polynomials, Double linked list, circular list. Module – II Dynamic storage management-garbage collection and compaction, infix to post fix conversion, postfix expression evaluation. Trees: Tree terminology, Binary tree, Binary search tree, General tree, B+ tree, AVL Tree, Complete Binary Tree representation, Tree traversals, operation on Binary tree-expression Manipulation. Module –III Graphs: Graph terminology, Representation of graphs, path matrix, BFS (breadth first search), DFS (depth first search), topological sorting, Warshall’s algorithm (shortest path algorithm.) Sorting and Searching techniques – Bubble sort, selection sort, Insertion sort, Quick sort, merge sort, Heap sort, Radix sort. Linear and binary search methods, Hashing techniques and hash functions. Text Books: 1. Gilberg and Forouzan: “Data Structure- A Pseudo code approach with C” by Thomson publication 2. “Data structure in C” by Tanenbaum, PHI publication / Pearson publication. 3. Pai: ”Data Structures & Algorithms; Concepts, Techniques & Algorithms ”Tata McGraw Hill. Reference Books: 1. “Fundamentals of data structure in C” Horowitz, Sahani & Freed, Computer Science Press. 2. “Fundamental of Data Structure” ( Schaums Series) Tata-McGraw-Hill. CONTENTS Lecture-01 Introduction to Data structure Lecture-02 Search Operation Lecture-03 Sparse Matrix and its representations Lecture-04 Stack Lecture-05 Stack Applications Lecture-06 Queue Lecture-07 Linked List Lecture-08 Polynomial List Lecture-09 Doubly Linked List Lecture-10 Circular Linked List Lecture-11 Memory Allocation Lecture-12 Infix to Postfix Conversion Lecture-13 Binary Tree Lecture-14 Special Forms of Binary Trees Lecture-15 Tree Traversal Lecture-16 AVL Trees Lecture-17 B+-tree Lecture-18 Binary Search Tree (BST) Lecture-19 Graphs Terminology Lecture-20 Depth First Search Lecture-21 Breadth First Search Lecture-22 Graph representation Lecture-23 Topological Sorting Lecture-24 Bubble Sort Lecture-25 Insertion Sort Lecture-26 Selection Sort Lecture-27 Merge Sort Lecture-28 Quick sort Lecture-29 Heap Sort Lecture-30 Radix Sort Lecture-31 Binary Search Lecture-32 Hashing Lecture-33 Hashing Functions Module-1 Lecture-01 Introduction to Data structures In computer terms, a data structure is a Specific way to store and organize data in a computer's memory so that these data can be used efficiently later. Data may be arranged in many different ways such as the logical or mathematical model for a particular organization of data is termed as a data structure. The variety of a particular data model depends on the two factors - Firstly, it must be loaded enough in structure to reflect the actual relationships of the data with the real world object. Secondly, the formation should be simple enough so that anyone can efficiently process the data each time it is necessary. Categories of Data Structure: The data structure can be sub divided into major types: Linear Data Structure Non-linear Data Structure Linear Data Structure: A data structure is said to be linear if its elements combine to form any specific order. There are basically two techniques of representing such linear structure within memory. First way is to provide the linear relationships among all the elements represented by means of linear memory location. These linear structures are termed as arrays. The second technique is to provide the linear relationship among all the elements represented by using the concept of pointers or links. These linear structures are termed as linked lists. The common examples of linear data structure are: Arrays Queues Stacks Linked lists Non linear Data Structure: This structure is mostly used for representing data that contains a hierarchical relationship among various elements. Examples of Non Linear Data Structures are listed below: Graphs family of trees and table of contents Tree: In this case, data often contain a hierarchical relationship among various elements. The data structure that reflects this relationship is termed as rooted tree graph or a tree. Graph: In this case, data sometimes hold a relationship between the pairs of elements which is not necessarily following the hierarchical structure. Such data structure is termed as a Graph. Array is a container which can hold a fix number of items and these items should be of the same type. Most of the data structures make use of arrays to implement their algorithms. Following are the important terms to understand the concept of Array. Element − Each item stored in an array is called an element. Index − Each location of an element in an array has a numerical index, which is used to identify the element. Array Representation:(Storage structure) Arrays can be declared in various ways in different languages. For illustration, let's take C array declaration. Arrays can be declared in various ways in different languages. For illustration, let's take C array declaration. As per the above illustration, following are the important points to be considered. Index starts with 0. Array length is 10 which means it can store 10 elements. Each element can be accessed via its index. For example, we can fetch an element at index 6 as 9. Basic Operations Following are the basic operations supported by an array. Traverse − print all the array elements one by one. Insertion − Adds an element at the given index. Deletion − Deletes an element at the given index. Search − Searches an element using the given index or by the value. Update − Updates an element at the given index. In C, when an array is initialized with size, then it assigns defaults values to its elements in following order. Data Type Default Value bool false char 0 int 0 float 0.0 double 0.0f void wchar_t 0 Insertion Operation Insert operation is to insert one or more data elements into an array. Based on the requirement, a new element can be added at the beginning, end, or any given index of array. Here, we see a practical implementation of insertion operation, where we add data at the end of the array − Algorithm Let LA be a Linear Array (unordered) with N elements and K is a positive integer such that K= K 5. Set LA[J+1] = LA[J] 6. Set J = J-1 7. Set LA[K] = ITEM 8. Stop Example Following is the implementation of the above algorithm − Live Demo #include main() { int LA[] = {1,3,5,7,8}; int item = 10, k = 3, n = 5; int i = 0, j = n; printf("The original array elements are :\n"); for(i = 0; i= k) { LA[j+1] = LA[j]; j = j - 1; } LA[k] = item; printf("The array elements after insertion :\n"); for(i = 0; idata != data){ if(current != NULL) { printf("%d ",current->data); //go to left tree if(current->data > data){ current = current->leftChild; } //else go to right tree else { current = current->rightChild; } //not found if(current == NULL){ return NULL; } } } return current; } Insert Operation Whenever an element is to be inserted, first locate its proper location. Start searching from the root node, then if the data is less than the key value, search for the empty location in the left subtree and insert the data. Otherwise, search for the empty location in the right subtree and insert the data. Algorithm void insert(int data) { struct node *tempNode = (struct node*) malloc(sizeof(struct node)); struct node *current; struct node *parent; tempNode->data = data; tempNode->leftChild = NULL; tempNode->rightChild = NULL; //if tree is empty if(root == NULL) { root = tempNode; } else { current = root; parent = NULL; while(1) { parent = current; //go to left of the tree if(data < parent->data) { current = current->leftChild; //insert to the left if(current == NULL) { parent->leftChild = tempNode; return; } } //go to right of the tree else { current = current->rightChild; //insert to the right if(current == NULL) { parent->rightChild = tempNode; return; } } } } } Module-3: Lecture-19 Graphs Terminology A graph consists of: A set, V, of vertices (nodes) A collection, E, of pairs of vertices from V called edges (arcs) Edges, also called arcs, are represented by (u, v) and are either: Directed if the pairs are ordered (u, v) u the origin v the destination Undirected if the pairs are unordered A graph is a pictorial representation of a set of objects where some pairs of objects are connected by links. The interconnected objects are represented by points termed as vertices, and the links that connect the vertices are called edges. Formally, a graph is a pair of sets (V, E), where V is the set of vertices and Eis the set of edges, connecting the pairs of vertices. Take a look at the following graph − In the above graph, V = {a, b, c, d, e} E = {ab, ac, bd, cd, de} Then a graph can be: Directed graph (di-graph) if all the edges are directed Undirected graph (graph) if all the edges are undirected Mixed graph if edges are both directed or undirected Illustrate terms on graphs End-vertices of an edge are the endpoints of the edge. Two vertices are adjacent if they are endpoints of the same edge. An edge is incident on a vertex if the vertex is an endpoint of the edge. Outgoing edges of a vertex are directed edges that the vertex is the origin. Incoming edges of a vertex are directed edges that the vertex is the destination. Degree of a vertex, v, denoted deg(v) is the number of incident edges. Out-degree, outdeg(v), is the number of outgoing edges. In-degree, indeg(v), is the number of incoming edges. Parallel edges or multiple edges are edges of the same type and end-vertices Self-loop is an edge with the end vertices the same vertex Simple graphs have no parallel edges or self-loops Properties If graph, G, has m edges then Σv∈G deg(v) = 2m If a di-graph, G, has m edges then Σv∈G indeg(v) = m = Σv∈G outdeg(v) If a simple graph, G, has m edges and n vertices: If G is also directed then m ≤ n(n-1) If G is also undirected then m ≤ n(n-1)/2 So a simple graph with n vertices has O(n2) edges at most More Terminology Path is a sequence of alternating vetches and edges such that each successive vertex is connected by the edge. Frequently only the vertices are listed especially if there are no parallel edges. Cycle is a path that starts and end at the same vertex. Simple path is a path with distinct vertices. Directed path is a path of only directed edges Directed cycle is a cycle of only directed edges. Sub-graph is a subset of vertices and edges. Spanning sub-graph contains all the vertices. Connected graph has all pairs of vertices connected by at least one path. Connected component is the maximal connected sub-graph of a unconnected graph. Forest is a graph without cycles. Tree is a connected forest (previous type of trees are called rooted trees, these are free trees) Spanning tree is a spanning subgraph that is also a tree. More Properties If G is an undirected graph with n vertices and m edges: If G is connected then m ≥ n - 1 If G is a tree then m = n - 1 If G is a forest then m ≤ n – 1 Graph Traversal: 1. Depth First Search 2. Breadth First Search Lecture-20 Depth First Search: Depth First Search (DFS) algorithm traverses a graph in a depthward motion and uses a stack to remember to get the next vertex to start a search, when a dead end occurs in any iteration. As in the example given above, DFS algorithm traverses from S to A to D to G to E to B first, then to F and lastly to C. It employs the following rules. Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Push it in a stack. Rule 2 − If no adjacent vertex is found, pop up a vertex from the stack. (It will pop up all the vertices from the stack, which do not have adjacent vertices.) Rule 3 − Repeat Rule 1 and Rule 2 until the stack is empty. Step Traversal Description 1 Initialize the stack. 2 Mark S as visited and put it onto the stack. Explore any unvisited adjacent node from S. We have three nodes and we can pick any of them. For this example, we shall take the node in an alphabetical order. 3 Mark A as visited and put it onto the stack. Explore any unvisited adjacent node from A. Both Sand D are adjacent to A but we are concerned for unvisited nodes only. 4 Visit D and mark it as visited and put onto the stack. Here, we have B and C nodes, which are adjacent to D and both are unvisited. However, we shall again choose in an alphabetical order. 5 We choose B, mark it as visited and put onto the stack. Here Bdoes not have any unvisited adjacent node. So, we pop Bfrom the stack. 6 We check the stack top for return to the previous node and check if it has any unvisited nodes. Here, we find D to be on the top of the stack. 7 Only unvisited adjacent node is from D is C now. So we visit C, mark it as visited and put it onto the stack. As C does not have any unvisited adjacent node so we keep popping the stack until we find a node that has an unvisited adjacent node. In this case, there's none and we keep popping until the stack is empty. Lecture-21 Breadth First Search Breadth First Search (BFS) algorithm traverses a graph in a breadthward motion and uses a queue to remember to get the next vertex to start a search, when a dead end occurs in any iteration. As in the example given above, BFS algorithm traverses from A to B to E to F first then to C and G lastly to D. It employs the following rules. Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Insert it in a queue. Rule 2 − If no adjacent vertex is found, remove the first vertex from the queue. Rule 3 − Repeat Rule 1 and Rule 2 until the queue is empty. Step Traversal Description 1 Initialize the queue. 2 We start from visiting S(starting node), and mark it as visited. 3 We then see an unvisited adjacent node from S. In this example, we have three nodes but alphabetically we choose A, mark it as visited and enqueue it. 4 Next, the unvisited adjacent node from S is B. We mark it as visited and enqueue it. 5 Next, the unvisited adjacent node from S is C. We mark it as visited and enqueue it. 6 Now, S is left with no unvisited adjacent nodes. So, we dequeue and find A. 7 From A we have D as unvisited adjacent node. We mark it as visited and enqueue it. At this stage, we are left with no unmarked (unvisited) nodes. But as per the algorithm we keep on dequeuing in order to get all unvisited nodes. When the queue gets emptied, the program is over. Lecture-22 Graph representation You can represent a graph in many ways. The two most common ways of representing a graph is as follows: Adjacency matrix An adjacency matrix is a VxV binary matrix A. Element Ai,j is 1 if there is an edge from vertex i to vertex j else Ai,jis 0. Note: A binary matrix is a matrix in which the cells can have only one of two possible values - either a 0 or 1. The adjacency matrix can also be modified for the weighted graph in which instead of storing 0 or 1 in Ai,j, the weight or cost of the edge will be stored. In an undirected graph, if Ai,j = 1, then Aj,i = 1. In a directed graph, if Ai,j = 1, then Aj,i may or may not be 1. Adjacency matrix provides constant time access (O(1) ) to determine if there is an edge between two nodes. Space complexity of the adjacency matrix is O(V2). The adjacency matrix of the following graph is: i/j : 1 2 3 4 1:0101 2:1010 3:0101 4:1010 The adjacency matrix of the following graph is: i/j: 1 2 3 4 1:0100 2:0001 3:1001 4:0100 Adjacency list The other way to represent a graph is by using an adjacency list. An adjacency list is an array A of separate lists. Each element of the array Ai is a list, which contains all the vertices that are adjacent to vertex i. For a weighted graph, the weight or cost of the edge is stored along with the vertex in the list using pairs. In an undirected graph, if vertex j is in list Ai then vertex i will be in list Aj. The space complexity of adjacency list is O(V + E) because in an adjacency list information is stored only for those edges that actually exist in the graph. In a lot of cases, where a matrix is sparse using an adjacency matrix may not be very useful. This is because using an adjacency matrix will take up a lot of space where most of the elements will be 0, anyway. In such cases, using an adjacency list is better. Note: A sparse matrix is a matrix in which most of the elements are zero, whereas a dense matrix is a matrix in which most of the elements are non-zero. Consider the same undirected graph from an adjacency matrix. The adjacency list of the graph is as follows: A1 → 2 → 4 A2 → 1 → 3 A3 → 2 → 4 A4 → 1 → 3 Consider the same directed graph from an adjacency matrix. The adjacency list of the graph is as follows: A1 → 2 A2 → 4 A3 → 1 → 4 A4 → 2 Lecture-23 Topological Sorting: Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for every directed edge uv, vertex u comes before v in the ordering. Topological Sorting for a graph is not possible if the graph is not a DAG. For example, a topological sorting of the following graph is “5 4 2 3 1 0”. There can be more than one topological sorting for a graph. For example, another topological sorting of the following graph is “4 5 2 3 1 0”. The first vertex in topological sorting is always a vertex with in-degree as 0 (a vertex with no in-coming edges). Algorithm to find Topological Sorting: In DFS, we start from a vertex, we first print it and then recursively call DFS for its adjacent vertices. In topological sorting, we use a temporary stack. We don’t print the vertex immediately, we first recursively call topological sorting for all its adjacent vertices, then push it to a stack. Finally, print contents of stack. Note that a vertex is pushed to stack only when all of its adjacent vertices (and their adjacent vertices and so on) are already in stack. Topological Sorting vs Depth First Traversal (DFS): In DFS, we print a vertex and then recursively call DFS for its adjacent vertices. In topological sorting, we need to print a vertex before its adjacent vertices. For example, in the given graph, the vertex ‘5’ should be printed before vertex ‘0’, but unlike DFS, the vertex ‘4’ should also be printed before vertex ‘0’. So Topological sorting is different from DFS. For example, a DFS of the shown graph is “5 2 3 1 0 4”, but it is not a topological sorting Dynamic Programming The Floyd Warshall Algorithm is for solving the All Pairs Shortest Path problem. The problem is to find shortest distances between every pair of vertices in a given edge weighted directed Graph. Example: Input: graph[][] = { {0, 5, INF, 10}, {INF, 0, 3, INF}, {INF, INF, 0, 1}, {INF, INF, INF, 0} } which represents the following graph 10 (0)------->(3) | /|\ 5| | | |1 \|/ | (1)------->(2) 3 Note that the value of graph[i][j] is 0 if i is equal to j And graph[i][j] is INF (infinite) if there is no edge from vertex i to j. Output: Shortest distance matrix 0 5 8 9 INF 0 3 4 INF INF 0 1 INF INF INF 0 Floyd Warshall Algorithm We initialize the solution matrix same as the input graph matrix as a first step. Then we update the solution matrix by considering all vertices as an intermediate vertex. The idea is to one by one pick all vertices and update all shortest paths which include the picked vertex as an intermediate vertex in the shortest path. When we pick vertex number k as an intermediate vertex, we already have considered vertices {0, 1, 2,.. k-1} as intermediate vertices. For every pair (i, j) of source and destination vertices respectively, there are two possible cases. 1) k is not an intermediate vertex in shortest path from i to j. We keep the value of dist[i][j] as it is. 2) k is an intermediate vertex in shortest path from i to j. We update the value of dist[i][j] as dist[i][k] + dist[k][j]. The following figure shows the above optimal substructure property in the all-pairs shortest path problem. Lecture-24 Bubble Sort We take an unsorted array for our example. Bubble sort takes Ο(n 2) time so we're keeping it short and precise. Bubble sort starts with very first two elements, comparing them to check which one is greater. In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we compare 33 with 27. We find that 27 is smaller than 33 and these two values must be swapped. The new array should look like this − Next we compare 33 and 35. We find that both are in already sorted positions. Then we move to the next two values, 35 and 10. We know then that 10 is smaller 35. Hence they are not sorted. We swap these values. We find that we have reached the end of the array. After one iteration, the array should look like this − To be precise, we are now showing how an array should look like after each iteration. After the second iteration, it should look like this − Notice that after each iteration, at least one value moves at the end. And when there's no swap required, bubble sorts learns that an array is completely sorted. Now we should look into some practical aspects of bubble sort. Algorithm We assume list is an array of n elements. We further assume that swapfunction swaps the values of the given array elements. begin BubbleSort(list) for all elements of list if list[i] > list[i+1] swap(list[i], list[i+1]) end if end for return list end BubbleSort Pseudocode We observe in algorithm that Bubble Sort compares each pair of array element unless the whole array is completely sorted in an ascending order. This may cause a few complexity issues like what if the array needs no more swapping as all the elements are already ascending. To ease-out the issue, we use one flag variable swapped which will help us see if any swap has happened or not. If no swap has occurred, i.e. the array requires no more processing to be sorted, it will come out of the loop. Pseudocode of BubbleSort algorithm can be written as follows − procedure bubbleSort( list : array of items ) loop = list.count; for i = 0 to loop-1 do: swapped = false for j = 0 to loop-1 do: if list[j] > list[j+1] then swap( list[j], list[j+1] ) swapped = true end if end for if(not swapped) then break end if end for end procedure return list Lecture-25 Insertion Sort We take an unsorted array for our example. Insertion sort compares the first two elements. It finds that both 14 and 33 are already in ascending order. For now, 14 is in sorted sub-list. Insertion sort moves ahead and compares 33 with 27. And finds that 33 is not in the correct position. It swaps 33 with 27. It also checks with all the elements of sorted sub-list. Here we see that the sorted sub-list has only one element 14, and 27 is greater than 14. Hence, the sorted sub-list remains sorted after swapping. By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10. These values are not in a sorted order. So we swap them. However, swapping makes 27 and 10 unsorted. Hence, we swap them too. Again we find 14 and 10 in an unsorted order. We swap them again. By the end of third iteration, we have a sorted sub-list of 4 items. This process goes on until all the unsorted values are covered in a sorted sub-list. Now we shall see some programming aspects of insertion sort. Algorithm Now we have a bigger picture of how this sorting technique works, so we can derive simple steps by which we can achieve insertion sort. Step 1 − If it is the first element, it is already sorted. return 1; Step 2 − Pick next element Step 3 − Compare with all elements in the sorted sub-list Step 4 − Shift all the elements in the sorted sub-list that is greater than the value to be sorted Step 5 − Insert the value Step 6 − Repeat until list is sorted Pseudocode procedure insertionSort( A : array of items ) int holePosition int valueToInsert for i = 1 to length(A) inclusive do: valueToInsert = A[i] holePosition = i while holePosition > 0 and A[holePosition-1] > valueToInsert do: A[holePosition] = A[holePosition-1] holePosition = holePosition -1 end while A[holePosition] = valueToInsert end for end procedure Lecture-26 Selection Sort Consider the following depicted array as an example. For the first position in the sorted list, the whole list is scanned sequentially. The first position where 14 is stored presently, we search the whole list and find that 10 is the lowest value. So we replace 14 with 10. After one iteration 10, which happens to be the minimum value in the list, appears in the first position of the sorted list. For the second position, where 33 is residing, we start scanning the rest of the list in a linear manner. We find that 14 is the second lowest value in the list and it should appear at the second place. We swap these values. After two iterations, two least values are positioned at the beginning in a sorted manner. The same process is applied to the rest of the items in the array. Following is a pictorial depiction of the entire sorting process − Now, let us learn some programming aspects of selection sort. Algorithm Step 1 − Set MIN to location 0 Step 2 − Search the minimum element in the list Step 3 − Swap with value at location MIN Step 4 − Increment MIN to point to next element Step 5 − Repeat until list is sorted Pseudocode procedure selection sort list : array of items n : size of list for i = 1 to n - 1 min = i for j = i+1 to n if list[j] < list[min] then min = j; end if end for if indexMin != i then swap list[min] and list[i] end if end for end procedure Lecture-27 Merge Sort To understand merge sort, we take an unsorted array as the following − We know that merge sort first divides the whole array iteratively into equal halves unless the atomic values are achieved. We see here that an array of 8 items is divided into two arrays of size 4. This does not change the sequence of appearance of items in the original. Now we divide these two arrays into halves. We further divide these arrays and we achieve atomic value which can no more be divided. Now, we combine them in exactly the same manner as they were broken down. Please note the color codes given to these lists. We first compare the element for each list and then combine them into another list in a sorted manner. We see that 14 and 33 are in sorted positions. We compare 27 and 10 and in the target list of 2 values we put 10 first, followed by 27. We change the order of 19 and 35 whereas 42 and 44 are placed sequentially. In the next iteration of the combining phase, we compare lists of two data values, and merge them into a list of found data values placing all in a sorted order. After the final merging, the list should look like this − Now we should learn some programming aspects of merge sorting. Algorithm Merge sort keeps on dividing the list into equal halves until it can no more be divided. By definition, if it is only one element in the list, it is sorted. Then, merge sort combines the smaller sorted lists keeping the new list sorted too. Step 1 − if it is only one element in the list it is already sorted, return. Step 2 − divide the list recursively into two halves until it can no more be divided. Step 3 − merge the smaller lists into new list in sorted order. Merge sort works with recursion and we shall see our implementation in the same way. procedure mergesort( var a as array ) if ( n == 1 ) return a var l1 as array = a... a[n/2] var l2 as array = a[n/2+1]... a[n] l1 = mergesort( l1 ) l2 = mergesort( l2 ) return merge( l1, l2 ) end procedure procedure merge( var a as array, var b as array ) var c as array while ( a and b have elements ) if ( a > b ) add b to the end of c remove b from b else add a to the end of c remove a from a end if end while while ( a has elements ) add a to the end of c remove a from a end while while ( b has elements ) add b to the end of c remove b from b end while return c end procedure Lecture-28 Quick sort Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data into smaller arrays. A large array is partitioned into two arrays one of which holds values smaller than the specified value, say pivot, based on which the partition is made and another array holds values greater than the pivot value. Quick sort partitions an array and then calls itself recursively twice to sort the two resulting subarrays. This algorithm is quite efficient for large-sized data sets as its average and worst case complexity are of Ο(n2), where n is the number of items. Partition in Quick Sort Following animated representation explains how to find the pivot value in an array. The pivot value divides the list into two parts. And recursively, we find the pivot for each sub-lists until all lists contains only one element. Quick Sort Pivot Algorithm Based on our understanding of partitioning in quick sort, we will now try to write an algorithm for it, which is as follows. Step 1 − Choose the highest index value has pivot Step 2 − Take two variables to point left and right of the list excluding pivot Step 3 − left points to the low index Step 4 − right points to the high Step 5 − while value at left is less than pivot move right Step 6 − while value at right is greater than pivot move left Step 7 − if both step 5 and step 6 does not match swap left and right Step 8 − if left ≥ right, the point where they met is new pivot Quick Sort Pivot Pseudocode The pseudocode for the above algorithm can be derived as − function partitionFunc(left, right, pivot) leftPointer = left rightPointer = right - 1 while True do while A[++leftPointer] < pivot do //do-nothing end while while rightPointer > 0 && A[--rightPointer] > pivot do //do-nothing end while if leftPointer >= rightPointer break else swap leftPointer,rightPointer end if end while swap leftPointer,right return leftPointer end function Quick Sort Algorithm Using pivot algorithm recursively, we end up with smaller possible partitions. Each partition is then processed for quick sort. We define recursive algorithm for quicksort as follows − Step 1 − Make the right-most index value pivot Step 2 − partition the array using pivot value Step 3 − quicksort left partition recursively Step 4 − quicksort right partition recursively Quick Sort Pseudocode To get more into it, let see the pseudocode for quick sort algorithm − procedure quickSort(left, right) if right-left arr[mid]) 12 return binarySearch(arr, mid + 1, last, target); 13 // target < arr[mid] 14 return binarySearch(arr, first, mid - 1, target); 15 } Lecture-32 Hashing Introduction The problem at hands is to speed up searching. Consider the problem of searching an array for a given value. If the array is not sorted, the search might require examining each and all elements of the array. If the array is sorted, we can use the binary search, and therefore reduce the worse-case runtime complexity to O(log n). We could search even faster if we know in advance the index at which that value is located in the array. Suppose we do have that magic function that would tell us the index for a given value. With this magic function our search is reduced to just one probe, giving us a constant runtime O(1). Such a function is called a hash function. A hash function is a function which when given a key, generates an address in the table. The example of a hash function is a book call number. Each book in the library has a unique call number. A call number is like an address: it tells us where the book is located in the library. Many academic libraries in the United States, uses Library of Congress Classification for call numbers. This system uses a combination of letters and numbers to arrange materials by subjects. A hash function that returns a unique hash number is called a universal hash function. In practice it is extremely hard to assign unique numbers to objects. The later is always possible only if you know (or approximate) the number of objects to be proccessed. Thus, we say that our hash function has the following properties it always returns a number for an object. two equal objects will always have the same number two unequal objects not always have different numbers The precedure of storing objets using a hash function is the following. Create an array of size M. Choose a hash function h, that is a mapping from objects into integers 0, 1,..., M-1. Put these objects into an array at indexes computed via the hash function index = h(object). Such array is called a hash table. Collisions When we put objects into a hashtable, it is possible that different objects (by the equals() method) might have the same hashcode. This is called a collision. Here is the example of collision. Two different strings ""Aa" and "BB" have the same key:. "Aa" = 'A' * 31 + 'a' = 2112 "BB" = 'B' * 31 + 'B' = 2112 How to resolve collisions? Where do we put the second and subsequent values that hash to this same location? There are several approaches in dealing with collisions. One of them is based on idea of putting the keys that collide in a linked list! A hash table then is an array of lists!! This technique is called a separate chaining collision resolution. The big attraction of using a hash table is a constant-time performance for the basic operations add, remove, contains, size. Though, because of collisions, we cannot guarantee the constant runtime in the worst-case. Why? Imagine that all our objects collide into the same index. Then searching for one of them will be equivalent to searching in a list, that takes a liner runtime. However, we can guarantee an expected constant runtime, if we make sure that our lists won't become too long. This is usually implemnted by maintaining a load factor that keeps a track of the average length of lists. If a load factor approaches a set in advanced threshold, we create a bigger array and rehash all elements from the old table into the new one. Another technique of collision resolution is a linear probing. If we cannoit insert at index k, we try the next slot k+1. If that one is occupied, we go to k+2, and so on. Lecture-33 Hashing Functions Choosing a good hashing function, h(k), is essential for hash-table based searching. h should distribute the elements of our collection as uniformly as possible to the "slots" of the hash table. The key criterion is that there should be a minimum number of collisions. If the probability that a key, k, occurs in our collection is P(k), then if there are m slots in our hash table, a uniform hashing function, h(k), would ensure: Sometimes, this is easy to ensure. For example, if the keys are randomly distributed in (0,r], then, h(k) = floor((mk)/r) will provide uniform hashing. Mapping keys to natural numbers Most hashing functions will first map the keys to some set of natural numbers, say (0,r]. There are many ways to do this, for example if the key is a string of ASCII characters, we can simply add the ASCII representations of the characters mod 255 to produce a number in (0,255) - or we could xor them, or we could add them in pairs mod 2 16-1, or... Having mapped the keys to a set of natural numbers, we then have a number of possibilities. 1. Use a mod function: h(k) = k mod m. When using this method, we usually avoid certain values of m. Powers of 2 are usually avoided, for k mod 2b simply selects the b low order bits of k. Unless we know that all the 2b possible values of the lower order bits are equally likely, this will not be a good choice, because some bits of the key are not used in the hash function. Prime numbers which are close to powers of 2 seem to be generally good choices for m. For example, if we have 4000 elements, and we have chosen an overflow table organization, but wish to have the probability of collisions quite low, then we might choose m = 4093. (4093 is the largest prime less than 4096 = 2 12.) 2. Use the multiplication method: o Multiply the key by a constant A, 0 < A < 1, o Extract the fractional part of the product, o Multiply this value by m. Thus the hash function is: h(k) = floor(m * (kA - floor(kA))) In this case, the value of m is not critical and we typically choose a power of 2 so that we can get the following efficient procedure on most digital computers: o Choose m = 2p. o Multiply the w bits of k by floor(A * 2w) to obtain a 2w bit product. o Extract the p most significant bits of the lower half of this product. It seems that: A = (sqrt(5)-1)/2 = 0.6180339887 is a good choice (see Knuth, "Sorting and Searching", v. 3 of "The Art of Computer Programming"). 3. Use universal hashing: A malicious adversary can always chose the keys so that they all hash to the same slot, leading to an average O(n) retrieval time. Universal hashing seeks to avoid this by choosing the hashing function randomly from a collection of hash functions (cf Cormen et al, p 229- ). This makes the probability that the hash function will generate poor behaviour small and produces good average performance.