DS-chapter4.pdf
Document Details
Uploaded by Deleted User
Full Transcript
Chapter 4 Trees Syllabus Trees 21 Introduction, Tree Terminologies, 22 Binary Tree, Binary Tree Representation, 23 Types of Binary Tree, 24 Binary Tree Traversals, 25 Binary Search Tree, 26 Operations on Binary Search Tree, , 27 Applications of Binary T...
Chapter 4 Trees Syllabus Trees 21 Introduction, Tree Terminologies, 22 Binary Tree, Binary Tree Representation, 23 Types of Binary Tree, 24 Binary Tree Traversals, 25 Binary Search Tree, 26 Operations on Binary Search Tree, , 27 Applications of Binary Tree-Expression Tree 28 Huffman Encoding, 29 Search Trees-AVL, rotations in AVL Tree, 30 operations on AVL Tree, 31 Introduction of B Tree, B+ Tree. Tree: Introduction Tree is a hierarchical data structure which stores the information naturally in the form of hierarchy style. It is a non-linear data structure compared to arrays, linked lists, stack and queue. It represents the nodes connected by edges. Introduction A tree data structure is defined as a collection of objects or entities known as nodes that are linked together to represent or simulate hierarchy. A tree data structure is a non-linear data structure because it does not store in a sequential manner. It is a hierarchical structure as elements in a Tree are arranged in multiple levels. In the Tree data structure, the topmost node is known as a root node. Each node contains some data, and data can be of any type. In the above tree structure, the node contains the name of the employee, so the type of data would be a string. Each node contains some data and the link or reference of other nodes that can be called children. Properties of Tree data structure Recursive data structure: The tree is also known as a recursive data structure. A tree can be defined as recursively because the distinguished node in a tree data structure is known as a root node. The root node of the tree contains a link to all the roots of its subtrees. The left subtree is shown in the yellow color in the below figure, and the right subtree is shown in the red color. The left subtree can be further split into subtrees shown in three different colors. Recursion means reducing something in a self-similar manner. So, this recursive property of the tree data structure is implemented in various applications. Number of edges: If there are n nodes, then there would n-1 edges. Each arrow in the structure represents the link or path. Each node, except the root node, will have atleast one incoming link known as an edge. There would be one link for the parent-child relationship. Depth of node x: The depth of node x can be defined as the length of the path from the root to the node x. One edge contributes one-unit length in the path. So, the depth of node x can also be defined as the number of edges between the root node and the node x. The root node has 0 depth. Height of node x: The height of node x can be defined as the longest path from the node x to the leaf node. Terminology Path − Path refers to the sequence of nodes along the edges of a tree. Root − The node at the top of the tree is called root. There is only one root per tree and one path from the root node to any node. Parent − Any node except the root node has one edge upward to a node called parent. Child − The node below a given node connected by its edge downward is called its child node. Leaf − The node which does not have any child node is called the leaf node. Subtree − Subtree represents the descendants of a node. Visiting − Visiting refers to checking the value of a node when control is on the node. Traversing − Traversing means passing through nodes in a specific order. Levels − Level of a node represents the generation of a node. If the root node is at level 0, then its next child node is at level 1, its grandchild is at level 2, and so on. keys − Key represents a value of a node based on which a search operation is to be carried out for a node. Types of Trees 1. General Tree 2. Binary Tree 3. Binary Search Tree 4. AVL Tree 5. Red Black Tree 6. Splay Tree 7. Treap 8. B-Tree Binary Tree Binary tree is a special tree data structure in which each node can have at most 2 children. Thus, in a binary tree, each node has either 0 child or 1 child or 2 children. struct node { int data; struct node *left; struct node *right; }; Properties of Binary Tree At each level of i, the maximum number of nodes is 2^i. The height of the tree is defined as the longest path from the root node to the leaf node. Consider a tree which has a height equal to 3. Therefore, the maximum number of nodes at height 3 is equal to (1+2+4+8) = 15. In general, the maximum number of nodes possible at height h is (2^0 + 2^1 + 2^2+….) = 2^(h+1) -1. The minimum number of nodes possible at height h is equal to h+1. If there are 'n' number of nodes in the binary tree. The minimum height can be computed as: Types of Binary Tree Full/ proper/ strict Binary tree Complete Binary tree Perfect Binary tree Degenerate Binary tree Balanced Binary tree Full/ proper/ strict Binary tree The tree can only be considered as the full binary tree if each node must contain either 0 or 2 children. The number of leaf nodes is equal to the number of internal nodes plus 1. The number of internal nodes is 2; therefore, the number of leaf nodes is equal to 3. The maximum number of nodes is the same as the number of nodes in the binary tree, i.e., 2^(h+1) -1. The minimum number of nodes in the full binary tree is 2*(h+1)-1 Complete Binary Tree The complete binary tree is a tree in which all the nodes are completely filled except the last level. In the last level, all the nodes must be as left as possible. In a complete binary tree, the nodes should be added from the left. Properties of Complete Binary Tree The maximum number of nodes in complete binary tree is 2^(h+1) - 1. The minimum number of nodes in complete binary tree is 2^h. The minimum height of a complete binary tree is log2(n+1) - 1. The maximum height of a complete binary tree is log n Perfect Binary Tree A tree is a perfect binary tree if all the internal nodes have 2 children, and all the leaf nodes are at the same level. Perfect Binary Tree Not Perfect Binary Tree Degenerate Binary Tree The degenerate binary tree is a tree in which all the internal nodes have only one child. Right Skewed Binary Tree left Skewed Binary Tree Balanced Binary Tree The balanced binary tree is a tree in which both the left and right trees differ by atmost 1. For example, AVL and Red-Black trees are balanced binary tree. The above tree is a balanced binary The above tree is not a balanced tree because the difference between binary tree because the difference the left subtree and right subtree is between the left subtree and the right one. subtree is greater than 1. Binary Tree Traversal Tree traversal means traversing or visiting each node of a tree. The following are the three different ways of traversal: Inorder traversal Preorder traversal Postorder traversal Inorder Tree Traversal An inorder traversal is a traversal technique that follows the policy, i.e., Left Root Right. Here, Left Root Right means that the left subtree of the root node is traversed first, then the root node, and then the right subtree of the root node is traversed. Here, inorder name itself suggests that the root node comes in between the left and the right subtrees. Inorder Tree Traversal If a binary tree is traversed in-order, the output will produce sorted key values in an ascending order. We start from A, and following in-order traversal, we move to its left subtree B. B is also traversed in-order. The process goes on until all the nodes are visited. The output of inorder traversal of this tree will be − D→ B →E →A→ F→ C →G Until all nodes are traversed − Step 1 − Recursively traverse left subtree. Step 2 − Visit root node. Step 3 − Recursively traverse right subtree. Pre-order Traversal A preorder traversal is a traversal technique that follows the policy, i.e., Root Left Right. Pre-order Traversal In this traversal method, the root node is visited first, then the left subtree and finally the right subtree. We start from A, and following pre-order traversal, we first visit A itself and then move to its left subtree B. B is also traversed pre-order. The process goes on until all the nodes are visited. The output of pre-order traversal of this tree will be − A→B→D →E→C→F →G Until all nodes are traversed − Step 1 − Visit root node. Step 2 − Recursively traverse left subtree. Step 3 − Recursively traverse right subtree. Post-order Traversal A Postorder traversal is a traversal technique that follows the policy, i.e., Left Right Root. Here, Left Right Root means the left subtree of the root node is traversed first, then the right subtree, and finally, the root node is traversed. Post-order Traversal In this traversal method, the root node is visited last, hence the name. First we traverse the left subtree, then the right subtree and finally the root node. We start from A, and following Post-order traversal, we first visit the left subtree B. B is also traversed post-order. The process goes on until all the nodes are visited. The output of post-order traversal of this tree will be − D→E →B→F →G→C→A Until all nodes are traversed − Step 1 − Recursively traverse left subtree. Step 2 − Recursively traverse right subtree. Step 3 − Visit root node. Tree Traversal - Example Tree Traversal - Example Binary Search Tree Binary Search tree can be defined as a class of binary trees, in which the nodes are arranged in a specific order. This is also called ordered binary tree. In a binary search tree, the value of all the nodes in the left sub-tree is less than the value of the root. Similarly, value of all the nodes in the right sub-tree is greater than or equal to the value of the root. This rule will be recursively applied to all the left and right sub-trees of the root. Binary Search Tree A Binary search tree is shown in the above figure. As the constraint applied on the BST, we can see that the root node 30 doesn't contain any value greater than or equal to 30 in its left sub-tree and it also doesn't contain any value less than 30 in its right sub-tree. Advantages of using binary search tree Searching become very efficient in a binary search tree since, we get a hint at each step, about which sub-tree contains the desired element. The binary search tree is considered as efficient data structure in compare to arrays and linked lists. In searching process, it removes half sub-tree at every step. It also speed up the insertion and deletion operations as compare to that in array and linked list. Create the binary search tree BST for 43, 10, 79, 90, 12, 54, 11, 9, 50 Insert 43 into the tree as the root of the tree. Read the next element, if it is lesser than the root node element, insert it as the root of the left sub-tree. Otherwise, insert it as the root of the right of the right sub-tree. BST for 43, 10, 79, 90, 12, 54, 11, 9, 50 BST for 10,12,5,4,20,8,7,15 and 13 Searching in a BST Searching means finding or locating some specific element or node within a data structure. However, searching for some specific node in binary search tree is easy due to the fact that, element in BST are stored in a particular order. Compare the element with the root of the tree. If the item is matched then return the location of the node. Otherwise check if item is less than the element present on root, if so then move to the left sub-tree. If not, then move to the right sub-tree. Repeat this procedure recursively until match found. If element is not found then return NULL. Searching in BST Example Searching in BST Algorithm Algorithm: Search (ROOT, ITEM) Step 1: IF ROOT -> DATA = ITEM OR ROOT = NULL Return ROOT ELSE IF ROOT < ROOT -> DATA Return search(ROOT -> LEFT, ITEM) ELSE Return search(ROOT -> RIGHT,ITEM) [END OF IF] [END OF IF] Step 2: END Insertion in BST Insert function is used to add a new element in a binary search tree at appropriate location. Insert function is to be designed in such a way that, it must node violate the property of binary search tree at each value. Allocate the memory for tree. Set the data part to the value and set the left and right pointer of tree, point to NULL. If the item to be inserted, will be the first element of the tree, then the left and right of this node will point to NULL. Else, check if the item is less than the root element of the tree, if this is true, then recursively perform this operation with the left of the root. If this is false, then perform this operation recursively with the right sub- tree of the root. Insertion Example Insertion Algorithm Insert (TREE, ITEM) Step 1: IF TREE = NULL Allocate memory for TREE SET TREE -> DATA = ITEM SET TREE -> LEFT = TREE -> RIGHT = NULL ELSE IF ITEM < TREE -> DATA Insert(TREE -> LEFT, ITEM) ELSE Insert(TREE -> RIGHT, ITEM) [END OF IF] [END OF IF] Step 2: END Deletion in BST Delete function is used to delete the specified node from a binary search tree. However, we must delete a node from a binary search tree in such a way, that the property of binary search tree doesn't violate. There are three situations of deleting a node from binary search tree. The node to be deleted is a leaf node It is the simplest case, in this case, replace the leaf node with the NULL and simple free the allocated space. In the following image, we are deleting the node 85, since the node is a leaf node, therefore the node will be replaced with NULL and allocated space will be freed. Deletion in BST Deletion in BST The node to be deleted has only one child: In this case, replace the node with its child and delete the child node, which now contains the value which is to be deleted. Simply replace it with the NULL and free the allocated space. In the following image, the node 12 is to be deleted. It has only one child. The node will be replaced with its child node and the replaced node 12 (which is now leaf node) will simply be deleted. Deletion in BST The node to be deleted has two children: It is a bit complexed case compare to other two cases. However, the node which is to be deleted, is replaced with its in-order successor or predecessor recursively until the node value (to be deleted) is placed on the leaf of the tree. After the procedure, replace the node with NULL and free the allocated space. In the following image, the node 50 is to be deleted which is the root node of the tree. The in-order traversal of the tree given below. 6, 25, 30, 50, 52, 60, 70, 75. (successor of 50 is 52) replace 50 with its in-order successor 52. Now, 50 will be moved to the leaf of the tree, which will simply be deleted. Inorder successor of a node is the leftmost element of its right subtree. Inorder predecessor of a node is the rightmost element of its left subtree. Deletion in BST Deletion in BST Expression Trees Infix to postfix conversion Construct an Expression Tree To construct an Expression Tree for the given expression, we generally use Stack Data Structure. Initially we Iterate over the given postfix expression and follow the steps as given below - If we get an operand in the given expression, then push it in the stack. It will become the node of the expression Tree. If an operator gets two values in the expression, then add in the expression tree as its child, and push them in the current node. Repeat Step-1 and Step-2 until we do not complete over the given expression. Now check if every root node contains nothing but operands and every child node contains only values. Expression Trees 4 + ((7 + 9) * 2) 3 + ((5+9)*2) Postfix = 479+2*+ Postfix = 359+2*+ a + (b * c) + d * (e + f) = abc*+def+*+ Huffman Encoding Huffman tree or Huffman coding tree defines as a full binary tree in which each leaf of the tree corresponds to a letter in the given alphabet. The Huffman tree is treated as the binary tree associated with minimum external path weight that means, the one associated with the minimum sum of weighted path lengths for the given set of leaves. So the goal is to construct a tree with the minimum external path weight. It is used for the lossless compression of data. It uses variable length encoding. It assigns variable length code to all the characters. The code length of a character depends on how frequently it occurs in the given text. The character which occurs most frequently gets the smallest code. The character which occurs least frequently gets the largest code. Huffman Coding implements a rule known as a prefix rule. This is to prevent the ambiguities while decoding. It ensures that the code assigned to any character is not a prefix of the code assigned to any other character. Major steps in Huffman Encoding Building a Huffman Tree from the input characters. Assigning code to the characters by traversing the Huffman Tree. Huffman Tree Step-01: Create a leaf node for each character of the text. Leaf node of a character contains the occurring frequency of that character. Step-02: Arrange all the nodes in increasing order of their frequency value. Step-03: Considering the first two nodes having minimum frequency, Create a new internal node. The frequency of this new node is the sum of frequency of those two nodes. Make the first node as a left child and the other node as a right child of the newly created node. Step-04: Keep repeating Step-02 and Step-03 until all the nodes form a single tree. The tree finally obtained is the desired Huffman Tree. A file contains the following characters with the frequencies as shown. If Huffman Coding is used for data compression, determine- Huffman Code for each character Average code length Length of Huffman encoded message (in bits) We assign weight to all the edges of the constructed Huffman Tree. Let us assign weight ‘0’ to the left edges and weight ‘1’ to the right edges. To write Huffman Code for any character, traverse the Huffman Tree from root node to the leaf node of that character. Following this rule, the Huffman Code for each character is- a = 111 e = 10 i = 00 o = 11001 u = 1101 s = 01 t = 11000 From here, we can observe- Characters occurring less frequently in the text are assigned the larger code. Characters occurring more frequently in the text are assigned the smaller code. Average Code Length- Using formula we have- Average code length = ∑ ( frequencyi x code lengthi ) / ∑ ( frequencyi ) = { (10 x 3) + (15 x 2) + (12 x 2) + (3 x 5) + (4 x 4) + (13 x 2) + (1 x 5) } / (10 + 15 + 12 + 3 + 4 + 13 + 1) = 2.52 Length of Huffman Encoded Message- Using formula, we have- Total number of bits in Huffman encoded message = Total number of characters in the message x Average code length per character = 58 x 2.52 = 146.16 ≅ 147 bits Example: Huffman tree Huffman invented a greedy algorithm that creates an optimal prefix code called a Huffman Code. The algorithm builds the tree T analogous to the optimal code in a bottom-up manner. It starts with a set of |C| leaves (C is the number of characters) and performs |C| - 1 'merging' operations to create the final tree. In the Huffman algorithm 'n' denotes the quantity of a set of characters, z indicates the parent node, and x & y are the left & right child of z respectively. Huffman (C) 1. n=|C| 2. Q ← C 3. for i=1 to n-1 4. do 5. z= allocate-Node () 6. x= left[z]=Extract-Min(Q) 7. y= right[z] =Extract-Min(Q) 8. f [z]=f[x]+f[y] 9. Insert (Q, z) 10. return Extract-Min (Q) AVL Trees AVL Tree can be defined as height balanced binary search tree in which each node is associated with a balance factor which is calculated by subtracting the height of its right sub-tree from that of its left sub-tree. Tree is said to be balanced if balance factor of each node is in between -1 to 1, otherwise, the tree will be unbalanced and need to be balanced. Balance Factor (k) = height (left(k)) - height (right(k)) If balance factor of any node is 1, it means that the left sub-tree is one level higher than the right sub-tree. If balance factor of any node is 0, it means that the left sub-tree and right sub-tree contain equal height. If balance factor of any node is -1, it means that the left sub-tree is one level lower than the right sub-tree. IMPORTANCE: AVL tree controls the height of the binary search tree by not letting it to be skewed. balance factor associated with each node is in between -1 and +1. therefore, it is an example of AVL tree. LR Rotations Action A node has been inserted into the right subtree of the left subtree. This makes C an unbalanced node. These scenarios cause AVL tree to perform left-right rotation. We first perform the left rotation on the left subtree of C. This makes A, the left subtree of B. Node C is still unbalanced, however now, it is because of the left- subtree of the left-subtree. We shall now right-rotate the tree, making B the new root node of this subtree. C now becomes the right subtree of its own left subtree. The tree is now balanced. RL Rotations Action A node has been inserted into the left subtree of the right subtree. This makes A, an unbalanced node with balance factor 2. First, we perform the right rotation along C node, making C the right subtree of its own left subtree B. Now, B becomes the right subtree of A. Node A is still unbalanced because of the right subtree of its right subtree and requires a left rotation. A left rotation is performed by making B the new root node of the subtree. A becomes the left subtree of its right subtree B. The tree is now balanced. Operations on AVL Tree SN Operation Description 1 Insertion Insertion in AVL tree is performed in the same way as it is performed in a binary search tree. However, it may lead to violation in the AVL tree property and therefore the tree may need balancing. The tree can be balanced by applying rotations. 2 Deletion Deletion can also be performed in the same way as it is performed in a binary search tree. Deletion may also disturb the balance of the tree therefore, various types of rotations are used to rebalance the tree. Insertion in AVL Tree 63, 9, 19, 27, 18, 108, 99, 81 Deletion of a node in AVL tree Deleting a node from an AVL tree is similar to that in a binary search tree. Deletion may disturb the balance factor of an AVL tree and therefore the tree needs to be rebalanced in order to maintain the AVLness. For this purpose, we need to perform rotations. The two types of rotations are L rotation and R rotation. Here, we will discuss R rotations. L rotations are the mirror images of them. If the node which is to be deleted is present in the left sub-tree of the critical node, then L rotation needs to be applied else if, the node which is to be deleted is present in the right sub-tree of the critical node, the R rotation will be applied. Case 1 R0 rotation (Node B has balance factor 0 ) If the node B has 0 balance factor, and the balance factor of node A disturbed upon deleting the node X, then the tree will be rebalanced by rotating tree using R0 rotation. The critical node A is moved to its right and the node B becomes the root of the tree with T1 as its left sub-tree. The sub-trees T2 and T3 becomes the left and right sub- tree of the node A. the process involved in R0 rotation is shown in the following image. Delete the node 30 from the AVL tree shown in the following image. In this case, the node B has balance factor 0, therefore the tree will be rotated by using R0 rotation as shown in the following image. The node B(10) becomes the root, while the node A is moved to its right. The right child of node B will now become the left child of node A. Case 2 R1 Rotation (Node B has balance factor 1) R1 Rotation is to be performed if the balance factor of Node B is 1. In R1 rotation, the critical node A is moved to its right having sub-trees T2 and T3 as its left and right child respectively. T1 is to be placed as the left sub-tree of the node B. The process involved in R1 rotation is shown in the following image. Delete Node 55 from the AVL tree shown in the following image. Deleting 55 from the AVL Tree disturbs the balance factor of the node 50 i.e. node A which becomes the critical node. This is the condition of R1 rotation in which, the node A will be moved to its right (shown in the image below). The right of B is now become the left of A (i.e. 45). Case 3 R-1 Rotation (Node B has balance factor -1) R-1 rotation is to be performed if the node B has balance factor -1. This case is treated in the same way as LR rotation. In this case, the node C, which is the right child of node B, becomes the root node of the tree with B and A as its left and right children respectively. The sub-trees T1, T2 becomes the left and right sub-trees of B whereas, T3, T4 become the left and right sub-trees of A. The process involved in R-1 rotation is shown in the following image. Delete the node 60 from the AVL tree shown in the following image. node B has balance factor -1. Deleting the node 60, disturbs the balance factor of the node 50 therefore, it needs to be R-1 rotated. The node C i.e. 45 becomes the root of the tree with the node B(40) and A(50) as its left and right child. B Tree B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of order m can have at most m-1 keys and m children. One of the main reason of using B tree is its capability to store large number of keys in a single node and large key values by keeping the height of the tree relatively small. A B tree of order m contains all the properties of an M way tree. In addition, it contains the following properties. Every node in a B-Tree contains at most m children. Every node in a B-Tree except the root node and the leaf node contain at least m/2 children. The root nodes must have at least 2 nodes. All leaf nodes must be at the same level. It is not necessary that, all the nodes contain the same number of children but, each node must have m/2 number of nodes. B tree of order 4 While performing some operations on B Tree, any property of B Tree may violate such as number of minimum children a node can have. To maintain the properties of B Tree, the tree may split or join. Searching Searching in B Trees is similar to that in Binary search tree. For example, if we search for an item 49 in the following B Tree. The process will something like following : 1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-tree. 2. Since, 40