DSA Finals PDF
Document Details
Uploaded by ChivalrousFarce3904
Tags
Summary
This document covers the overview, objectives, and course materials for a unit on balanced trees in data structures. It discusses balanced trees, including heap trees, AVL trees, and B-trees. It also elaborates on the concept of a balanced tree and the necessary properties and their importance.
Full Transcript
Unit 8 – Balanced Trees Overview: The search time in a binary search tree depends on the form of the tree, that is on the order in which its nodes were inserted. In practice, one can however not always guarantee that binary search trees are built randomly. Binary search trees are thus only interest...
Unit 8 – Balanced Trees Overview: The search time in a binary search tree depends on the form of the tree, that is on the order in which its nodes were inserted. In practice, one can however not always guarantee that binary search trees are built randomly. Binary search trees are thus only interesting if they are “relatively complete.” So we must look for specializations of binary search trees whose worst-case performance on the basic tree operations can be guaranteed to be good, that is O(log2n) time. In this unit we will talk about balanced trees as a solution to the problem of search time. Particularly we will look into the balanced properties of binary search trees and m-way search trees. Module Objectives: After successful completion of this unit, you should be able to: Describe what a balanced tree is Discuss Heap Tree and its characteristics Discuss AVL tree and its characteristics Discuss B-tree and its characteristics Course Materials: The Binary Search Tree has a serious deficiency for practical use as a search structure. That is the fact that it can easily become unbalanced, so that some nodes are deep in the tree. In fact, it is possible for a BST with n nodes to have a depth of n, making it no faster to search in the worst case than a linked list. If we could keep the tree balanced in some way, then search cost would only be O(log2n), a huge improvement. Balanced Tree is a tree where no leaf is much farther away from the root than any other leaf. Different balancing schemes allow different definitions of "much farther" and different amounts of work to keep them balanced: 1. A binary tree is height-balanced if for each node the heights of its subtrees differ by at most 1. (Remember, the height of an empty subtree is -1). 2. A binary tree is weight-balanced if for each node the numbers of inner nodes in its left subtree and right subtree differ by at most 1. 83 The tree below is height-balanced. It is not weight-balanced because the left subtree of the root has 3 inner nodes but the right subtree of the root has only 1. It has been shown that a weight-balanced tree is also height-balanced. Figure 63 Height-balanced Tree The key term in both definitions is “for each node”. The property being described must hold for all nodes, not just the root. To see the need for “for each node”, consider the tree whose root has a copy of the tree to the right as its left subtree and another copy as its right subtree. It is not weight-balanced, even though the left and right subtrees of the root have the same number of inner nodes. The important data structure called a Binary Search Tree, or BST, was invented in order to make searching for a value more efficient, say, than searching in an unordered array. However, if the BST tree is unbalanced then search is not efficient. It is difficult and inefficient to keep a BST balanced as values are added and deleted. Therefore, computer scientists looked for ways to enhance the BST data structure in order to make balancing more efficient as values are added and deleted. This required modifying the BST —by adding more information to each node or by allowing a node to have more than two children. Invented were the AVL tree, the 2-3 tree, the B- tree, the red-black tree, and many more. With each such new tree data structure, the inventor had to say what property they meant by balanced. The goal was the same, of course: if the tree was balanced according to the definition, then the inventors could prove that the height of the tree was O(log size-of-tree). Heap Tree Heaps are based on the notion of a complete tree, for which we gave an informal definition earlier. A heap is a binary tree with two special properties: 1. Ordering property The value in a node is smaller than all the values in the node's subtrees (MinHeap property). This is very different than a binary search tree! Note that this property makes no distinction between ‘left’ and ‘right’. All the following trees have this property but only one is a heap: 84 Figure 64 Ordering property - node is smaller than all the values in the node's subtree The smallest value must always be in the root, but the largest value could be in any leaf. Similar values are not collected together in any convenient location. Incidentally, heaps can also be defined with the opposite ordering property: the value in each node is larger than all the values in its subtrees (MaxHeap property). 2. Structural property All the levels are full, except perhaps the bottom level, and all the nodes in the bottom level are as far to the ‘left’ as possible. A level is full if there are no ‘missing nodes’ at the level - there are 2**L nodes in level L. This structural property could be rephrased: the only nodes that can have an empty subtree are ones in the bottom level (by definition these have 2 empty subtrees) and the rightmost nodes in the next-to-bottom level. A tree that has this property is called a complete tree. There are two motives for restricting ourselves to complete trees: a. In a complete tree Height = O(logN). As we'll see, our Heap operations are O(Height), so by restricting ourselves to complete trees, we get O(logN) operations. b. complete trees have certain advantages over general binary trees in some implementations. Although all 3 of the Figure 64 trees have the ordering property, only one of them satisfies the structural requirement – that is the middle one. Operations on Heaps The common operation involved using heaps are: Heapify → Process to rearrange the heap in order to maintain heap-property. Find-max (or Find-min) → find a maximum item of a max-heap, or a minimum item of a min-heap, respectively. Insertion → Add a new item in the heap. 85 Deletion → Delete an item from the heap. Extract Min-Max → Returning and deleting the maximum or minimum element in max- heap and min-heap respectively. Heapify It is a process to rearrange the elements of the heap in order to maintain the heap property. It is done when a certain node causes an imbalance in the heap due to some operation on that node. The heapify can be done in two methodologies: up_heapify() → It follows the bottom-up approach. In this, we check if the nodes are following heap property by going in the direction of rootNode and if nodes are not following the heap property, we do certain operations to let the tree follows the heap property. down_heapify() → It follows the top-down approach. In this, we check if the nodes are following heap property by going in the direction of the leaf nodes and if nodes are not following the heap property, we do certain operations to let the tree follows the heap property. void down_heapify(int heap[], int parent, int size) { largest = parent leftChild = 2*parent + 1 rightChild = 2*parent + 2 if(leftChild < size && heap[leftChild] > heap[largest]) largest = leftChild if(rightChild < size && heap[rightChild] > heap[largest]) largest = rightChild if(parent != largest) { swap(heap[parent], heap[largest]) down_heapify(heap,largest,size) } } void up_heapify(int heap[], int child) { parent = (child-1)/2; if(heap[parent] < heap[child]) { swap(heap[parent], heap[child]) up_heapify(heap,parent) } } 86 Insertion The insertion in the heap follows the following steps Insert the new element at the end of the heap. Since the newly inserted element can distort the properties of the Heap. So, we need to perform up_heapify() operation, in order to keep the properties of the heap in a bottom-up approach. (a) (b) (c) Figure 65 Insertion process in Heaps As an example, consider heap tree (a) in Figure 65 which follows max-heap property. New element to be inserted is 12. Performing Step 1: Insert the new element at the end will result in (b). Second step is to heapify the new element following bottom-up approach. Final heap is (c). void up_heapify(int heap[], int child) { parent = (child-1)/2; if(heap[parent] < heap[child]) { swap(heap[parent], heap[child]) up_heapify(heap,parent) } } void insert(int heap[], int size, int key) { heap.append(key) up_heapify(heap,size+1,size) } Deletion The deletion operations follow the following step: Replace the element to be deleted by the last element in the heap. Delete the last item from the heap. Now, the last element is placed at some position in heap, it may not follow the property of the heap, so we need to perform down_heapify() operation in order 87 to maintain heap structure. The down_heapify() operation does the heapify in the top-bottom approach. The standard deletion on Heap is to delete the element present at the root node of the heap. (a) (b) (c) Figure 66 Deletion process in Heaps For example, consider the heap tree (a) in Figure 66 following a max-heap property. Element to be deleted is 12. Replacing the root with the last element ‘4’ and deleting it will result to (b). Since the resulting tree does not follow the property of the heap we perform down_heapify() to the new root resulting to the final heap tree (c). void down_heapify(int heap[], int parent, int size) { largest = parent leftChild = 2*parent + 1 rightChild = 2*parent + 2 if(leftChild < size && heap[leftChild] > heap[largest]) largest = leftChild if(rightChild < size && heap[rightChild] > heap[largest]) largest = rightChild if(parent != largest) { swap(heap[parent], heap[largest]) down_heapify(heap,largest,size) } } void deleteRoot(int heap[], int size) { swap(heap, heap[size-1]) //swap first and last element heap.pop_back() //delete the last element down_heapify(heap,0,size-1) } 88 Find-max (or Find-min) The maximum element and the minimum element in the max-heap and min-heap is found at the root node of the heap. int findMax(int heap[]) { return heap } Extract Min-Max This operation returns and deletes the maximum or minimum element in max-heap and min-heap respectively. The maximum element is found at the root node. int extractMax(int heap[], int size) { ans = heap deleteRoot(heap, size) return ans } Height Balanced Trees Height balance tree (or self-balancing tree) is a binary tree which automatically maintain height of tree, and its sub tree on each insertion and deletion of node. And tree is always a complete tree. AVL tree, red-black tree are example of height balanced tree. AVL tree uses balance factor to balance the height of tree. Balance factor is nothing but difference between height of left subtree and right subtree. Figure 67 Example of an AVL tree 89 AVL Tree AVL trees is named after its two inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their 1962 paper "An algorithm for the organization of information." As the name suggests AVL trees are used for organizing information. They are not perfectly balanced, but maintain O(log2n) search, insertion, and deletion times, when n is the number of nodes in the tree. An AVL tree is a binary search tree where the sub-trees of every node differ in height by at most 1. A tree is perfectly height-balanced if the left and right subtrees of any node are the same height. e.g. Figure 68 Perfectly height-balanced tree It is clear that at every level there are twice as many nodes as at the previous level, so we do indeed get H = O(logN). However, perfect height balance is very rare: it is only possible if there are exactly 2^H-1 nodes. As a practical alternative, we use trees that are `almost' perfectly height balanced. We will say that a tree is height-balanced if the heights of the left and right subtree's of each node are within 1. The following tree fits this definition: Figure 69 Height balanced tree How can we tell if a tree is height-balanced? We have to check every node. The fastest way is to start at the leaves and work your way up. When you reach a node, you will know the heights of its two subtrees; from that you can tell whether it is height-balanced and also compute the node's height (for use by its parent). For example: 90 (a) (b) (c) Figure 70 Checking for height balance a. C and D are leaves. Their subtrees are all height 0 so C and D are both perfectly balanced. Having finished D we can compute the heights of B's subtrees. b. B is not perfectly balanced, but the heights of its subtrees differ only by 1, so B is regarded as height balanced. Now we can compute the balance of A. c. Like B, A's two subtrees differ by 1 in height. We have now looked at every node; every node is height-balanced, so the tree as a whole is considered to be height-balanced. Insertion in an AVL Tree Insertion in an AVL tree is the same as in a binary search tree which is always done by expanding an external node. To make sure that the given tree remains AVL after every insertion, we must augment the standard BST insert operation to perform some re- balancing. Following are two basic operations that can be performed to re-balance a BST without violating the BST property (keys(left) < key(root) < keys(right)). 1) Left Rotation 2) Right Rotation A tree rotation can be an intimidating concept at first. You end up in a situation where you're juggling nodes, and these nodes have trees attached to them, and it can all become confusing very fast. It helps to block out what's going on with any of the subtrees which are attached to the nodes you're working with. Left Rotation (LL) A Imagine we have this situation: B C Figure 71 Before Left Rotation (Single) 91 To fix this, we must perform a left rotation, rooted at A. This is done in the following steps: b becomes the new root. a takes ownership of b's left child as its right child, or in this case, null. b takes ownership of a as its left child. B The tree now looks like this: A C Figure 72 After performing a Left Rotation (Single) Right Rotation (RR) A right rotation is a mirror of the left rotation operation described above. Imagine we have this situation: To fix this, we will perform a single right rotation, C rooted at C. This is done in the following steps: b becomes the new root. B c takes ownership of b's right child, as its left child. In this case, that value is null. A b takes ownership of c, as it's right child. Figure 73 Before Right Rotation (Single) B The resulting tree: A C Figure 74 After performing a Right Rotation (Single) Left-Right Rotation (LR) or "Double left" Sometimes a single left rotation is not sufficient to balance an unbalanced tree. Take this situation: A C A C A C B B (a) (b) (c) Figure 75 Unbalanced tree after a Left Rotation (Single) 92 Initially with (a) we have a balanced tree. From this tree we insert node ‘B’, the resulting tree is shown in (b) which is an unbalanced tree. An initial reaction here is to do a single left rotation but the resulting tree (c) is still unbalanced. If we were to do a single right rotation in this situation, we would be right back where we started. What's causing this? The answer is that this is a result of the right subtree having a negative balance. In other words, because the right subtree was left heavy, our rotation was not sufficient. What can we do? The answer is to perform a right rotation on the right subtree. Read that again. We will perform a right rotation on the right subtree. We are not rotating on our current root. We are rotating on our right child. Think of our right subtree, isolated from our main tree, and perform a right rotation on it: A A B C B A C B C (a) (b) (c) Figure 76 Left-Right Rotation (or Double Left Rotation) Image (a) is the resulting unbalanced tree after adding ‘B’ in a balanced tree. Performing a right rotation on the right subtree will result to (b). After performing a rotation on our right subtree, we have prepared our root to be rotated left. The final tree is represented by (c). Right-Left Rotation (RL) or "Double right" A double right rotation, or right-left rotation, or simply RL, is a rotation that must be performed when attempting to balance a tree which has a left subtree, that is right heavy. This is a mirror operation of what was illustrated in the section on Left-Right Rotations, or double left rotations. Let's look at an example of a situation where we need to perform a Right- Left rotation. C A C B B A C B B A A C (a) (b) (c) (d) Figure 77 Right-Left Rotation (or Double Right Rotation) 93 In this situation, we have a tree (a) that is unbalanced. The left subtree has a height of 2, and the right subtree has a height of 0. This makes the balance factor of our root node, c, equal to -2. What do we do? Some kind of right rotation is clearly necessary, but a single right rotation will not solve our problem. Performing a right rotation resulted to a tree (b) that has a balance of 2. It would appear that we did not accomplish much. Let’s go back to the original tree, before we did our pointless right rotation. The reason our right rotation did not work, is because the left subtree, or 'a', has a positive balance factor, and is thus right heavy. Performing a right rotation on a tree that has a left subtree that is right heavy will result in the problem we just witnessed. What do we do? The answer is to make our left subtree left-heavy. We do this by performing a left rotation our left subtree. Doing so leaves us with (c). This is a tree which can now be balanced using a single right rotation. We can now perform our right rotation rooted at C resulting to (d). How to decide when you need a tree rotation is usually easy, but determining which type of rotation you need requires a little thought. A tree rotation is necessary when you have inserted or deleted a node which leaves the tree in an unbalanced state. An unbalanced state is defined as a state in which any subtree has a balance factor of greater than 1, or less than -1. That is, any tree with a difference between the heights of its two subtrees greater than 1, is considered unbalanced. Deletion in an AVL Tree Deleting a node from an AVL tree is similar to that in a binary search tree. Deletion may disturb the balance factor of an AVL tree and therefore the tree needs to be rebalanced in order to maintain the AVL property. The same rotation rules is followed as that of inserting a node in an AVL tree. Note however that unlike insertion, fixing the node won’t fix the complete AVL tree. After fixing the unbalanced subtree, we may have to fix ancestors as well. Weigth Balanced Trees M-way Search Trees A binary search tree has one value in each node and two subtrees. This notion easily generalizes to an M-way search tree, which has (M-1) values per node and M subtrees. M is called the degree of the tree. A binary search tree, therefore, has degree 2. In an M-way subtree a node can have anywhere from 1 to (M-1) values, and the number of (non-empty) subtrees can range from 0 (for a leaf) to 1+(the number of values). M is thus a fixed upper limit on how much data can be stored in a node. 94 The values in a node are stored in ascending order, V1 < V2 <... Vk (k