Unit 7 - Trees DSA PDF
Document Details
Uploaded by WealthyOrphism
Tags
Summary
This document introduces the concept of trees and related structures in computer science. It outlines the characteristics and representation of trees. The differences between trees and binary trees are highlighted.
Full Transcript
Unit 7 - Trees Overview: We have already considered two data structures, the stack and the queue. Both are linear in nature and are represented in the memory either by a sequentially allocated vector (array) or by a linked linear list. In this unit we now focus on two important non-linear structure...
Unit 7 - Trees Overview: We have already considered two data structures, the stack and the queue. Both are linear in nature and are represented in the memory either by a sequentially allocated vector (array) or by a linked linear list. In this unit we now focus on two important non-linear structures- the tree and the binary tree. Module Objectives: After successful completion of this unit, you should be able to: Discuss what a tree and its characteristics Discuss the different representation of a tree Discuss what a binary tree and its characteristics Differentiate between a tree and a binary tree Demonstrate the conversion between a tree and a binary tree Discuss the concept of searching using a binary tree Implement your own tree and binary tree application Course Materials: A tree is a nonlinear data structure and is a special form of directed graph. It consist of a finite set of elements called nodes (or vertices) and a finite set of directed arcs called branches (or edge) that connect pairs of nodes with the following properties: a) there is a specially designated node called the root b) the remaining nodes are partitioned into n >= 0 disjoint sets T1, …, Tn where each of these sets is a tree. T1, …, Tn are called the subtrees of the root. Vertices of a tree which have successors are called ‘nonterminal vertices,’ while vertices that have no successors are called ‘terminal vertices’ or ‘leaves.’ Figure 38 Illustrates a tree with root vertex a; two nonterminal vertices a, c; and four terminal vertices b, d, e, f. Trees in which each nonterminal vertex has at most n successors are called ‘n-ary’ trees. Trees in which nonterminal vertex has at most two successors are called ‘binary’ trees. The tree in Figure 38 is an example of an n-ary tree or a General Tree. 62 a b c d e f Figure 38 A tree showing terminal and nonterminal vertices Tree Representation Tree structures may be indicated by parentheses, nesting, or indentation as illustrated in Figure 39 which shows alternative representations of the tree of Figure 38. A B C (A(B)(C(D)(E)(F))) D E F Parentheses Indentation a B D E F b c C A d e f Nesting Tree Figure 39 Alternative representations of tree 63 Tree Vocabulary Node a component of the tree that holds data the line or connection between a parent and child (there may be a Edge/branch weight value associated with the edge) top element of the hierarchical structure; the starting element. There Root is only one root per tree a node's immediate successor; a node may have zero or more Child successors a node's predecessor: if node C is a child of node P, then P is a Parent parent of C; a node has exactly one parent Leaf /Terminal a node that has no children an extension of the child relationship: If C is a child of P, then C is a Descendant descendant of P, or if C is a child of a descendant D then C is a descendant of D an extension of the parent relationship: If D is a descendant of A then Ancestor A is an ancestor of D. Sibling child node that has the same parent as another node Internal node/ a node that has at least one child Non-terminal Path the sequence of edges between a node and one of its descendants Path length the number of edges in a path the length of the path connecting a node to the root. The root's depth Depth or level or level is 0. A level of a tree consists of all nodes at the same depth. The depth of the tree is equal to the height of a tree. the longest path length in the tree; the number of edges from the node Height to the deepest leaf; the height of a tree is the height of the root Subtree the tree made of all descendants using some non-root node Degree The number of subtrees of a node; the number of children of a node Degree of a The maximum degree of the nodes in the tree. tree A set of n >=0 disjoint trees. If you remove the root of a tree you get a Forest forest. If you add a root to a forest, you get a tree. N-ary Tree Traversals Many algorithms pertaining to tree structures usually involve a process in which each node of the tree is “visited”, or processed, exactly once. Such a process is called a traversal. A complete traversal of a tree yields a linear arrangement of the nodes of the tree. Such a linear ordering can be very useful in processing the information stored in the nodes of the tree. 64 There are various ways to visit (traverse or search) a general tree but the two most common ones will be discussed here. The simplest two search techniques are known as Depth- First Search (DFS) and Breadth-First Search (BFS). These two searches are described by looking at how the search tree (representing all the possible paths from the start) will be traversed. Depth-First Search with a Stack In depth-first search we go down a path until we get to a dead end; then we backtrack or back up (by popping a stack) to get an alternative path. Create a stack Create a new choice point Push the choice point onto the stack while (not found and stack is not empty) Pop the stack Find all possible choices after the last one tried Push these choices onto the stack Return Breadth-First Search with a Queue In breadth-first search we explore all the nearest possibilities by finding all possible successors and enqueue them to a queue. Create a queue Create a new choice point Enqueue the choice point onto the queue while (not found and queue is not empty) Dequeue the queue Find all possible choices after the last one tried Enqueue these choices onto the queue Return Binary Tree The binary tree is by far the most important non-linear data structure in computer algorithms. A binary tree is a finite set of nodes which is either empty or consists of a root node and maximum of two disjoint binary trees called left and right subtrees of the root. Note that while a tree must have at least one node, a binary tree may be empty. Likewise, a tree may have any number of subtrees; a binary tree can have at most two subtrees. Following are examples of binary trees. Figure 40 Different binary trees 65 Tree Vocabulary Left subtree The descendants to the left of a node Right subtree The descendants to the right of a node Similar Two binary trees are said to be similar if they have the same structure Two binary trees are said to be equivalent if they are similar and contain Equivalent the same information. Figure 41 Tree1, Tree2, Tree3 are all similar; Tree1, Tree2 are equivalent n-Ary Tree to Binary Tree Conversion One issue with n-ary traversal is that it is quite difficult to search or traverse as there can be uneven number of subtrees in the tree. A way to resolve this is to convert the n-ary tree into a binary tree. With a binary tree we can be assured of the search or traversal as there is only two possible paths to take. Following are the steps in converting a n-ary tree to a binary tree. Step 1 − The left child of a node in the n-ary tree becomes the left child of the node in the binary tree. Step 2 − The next sibling of each node in the n-ary tree becomes the right child of the node in the binary tree. A A B B C D C E D E F G F G n-ary tree binary tree Figure 42 Convert an N-ary Tree to a Binary Tree 66 Types of Binary (based on Structure) Rooted binary tree: It has a root node and every node has at most two children. Full binary tree: It is a tree in which every node in the tree has either 0 or 2 children. The number of nodes, n, in a full binary tree is at least n = 2h – 1, and at most n = 2h+1 – 1, where h is the height of the tree. The number of leaf nodes l, in a full binary tree is number, L of internal nodes + 1, i.e, l = L+1. Figure 43 Full Binary tree Perfect binary tree: It is a binary tree in which all interior nodes have two children and all leaves have the same depth or same level. A perfect binary tree with l leaves has n = 2l-1 nodes. In perfect full binary tree, l = 2h and n = 2h+1 - 1 where, n is number of nodes, h is height of tree and l is number of leaf nodes Figure 44 A perfect binary tree Complete binary tree: It is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible. Figure 45 An example of a complete binary tree The number of internal nodes in a complete binary tree of n nodes is floor(n/2). 67 Balanced binary tree: A binary tree is height balanced if it satisfies the following constraints: a) The left and right subtrees' heights differ by at most one, AND b) The left subtree is balanced, AND c) The right subtree is balanced An empty tree is height balanced. Figure 46 Balanced binary tree The height of a balanced binary tree is O(Log n) where n is number of nodes. Further discussion about balanced trees is presented in the next unit. Degenarate tree: It is a tree where each parent node has only one child node. It behaves like a linked list. This type of tree is considered a skewed tree. A skewed tree is one with more levels of nodes in one subtree of a node than in the other subtree. Figure 47 A Degenerate tree (skewed to the right) Binary Tree Traversals Recall that tree traversal is a process to visit all the nodes of a tree and may print their values too. Because, all nodes are connected via edges (links) we always start from the root (head) node. That is, we cannot randomly access a node in a tree. Generally, we traverse a tree to search or locate a given item or key in the tree or to print all the values it contains. 68 In a binary tree, there are three different entities involved: a root, its left subtree, and its right subtree. The sequence in which we process these entities defines a particular traversal method. We will consider three such traversal methods, namely: preorder traversal, inorder traversal, and postorder traversal. These three methods are defined recursively. When a binary tree is empty, it is traversed by doing nothing. 1. Preorder traversal In this traversal method, the root node is visited first, then the left subtree and finally the right subtree. Figure 48 Pre Order Traversal We start from A, and following pre-order traversal, we first visit A itself and then move to its left subtree B. B is also traversed pre-order. The process goes on until all the nodes are visited. The output of pre-order traversal of this tree will be − A→B→D→E→C→F→G Algorithm Until all nodes are traversed − Step 1 − Visit root node. Step 2 − Recursively traverse left subtree. Step 3 − Recursively traverse right subtree. 2. Inorder (or symmetric order) traversal In this traversal method, the left subtree is visited first, then the root and later the right sub-tree. We should always remember that every node may represent a subtree itself. 69 If a binary tree is traversed in-order, the output will produce sorted key values in an ascending order. Figure 49 In Order Traversal We start from A, and following in-order traversal, we move to its left subtree B. B is also traversed in-order. The process goes on until all the nodes are visited. The output of inorder traversal of this tree will be − D→B→E→A→F→C→G Algorithm Until all nodes are traversed − Step 1 − Recursively traverse left subtree. Step 2 − Visit root node. Step 3 − Recursively traverse right subtree. 3. Postorder traversal In this traversal method, the root node is visited last, hence the name. First, we traverse the left subtree, then the right subtree and finally the root node. Figure 50 Postorder Traversal 70 We start from A, and following Post-order traversal, we first visit the left subtree B. B is also traversed post-order. The process goes on until all the nodes are visited. The output of post-order traversal of this tree will be − D→E→B→F→G→C→A Algorithm Until all nodes are traversed − Step 1 − Recursively traverse left subtree. Step 2 − Recursively traverse right subtree. Step 3 − Visit root node. Expression Trees An expression tree is a representation of expressions arranged in a tree-like data structure. In other words, it is a tree with leaves as operands of the expression and nodes contain the operators. Similar to other data structures, data interaction is also possible in an expression tree. Expression trees are mainly used for analyzing, evaluating and modifying expressions, especially complex expressions. Expression tree represents an expression as a binary tree. Operators are parent nodes to operands (leaf nodes) for example expression tree for 3 + ((5+9)*2) would be: Figure 51 Expression tree for 3 + ( (5 + 9) * 2) The specified order of tree traversal gives method for evaluating expression. Inorder traversal of expression tree produces infix version of given postfix expression (same with preorder traversal it gives prefix expression) 71 Evaluating the expression represented by expression tree: Let t be the expression tree If t is not null then If t.value is operand then Return t.value A = solve(t.left) B = solve(t.right) // calculate applies operator 't.value' // on A and B, and returns value Return calculate(A, B, t.value) Construction of Expression Tree: In constructing an expression tree we use a stack. We loop through the input expression and do the following for every character. a) If character is operand push that into stack b) If character is operator pop two values from stack make them its child and push current node again. At the end only element of stack will be root of expression tree. Binary Search Tree A "binary search tree" (BST) or "ordered binary tree" is a type of binary tree where the nodes are arranged in order: for each node, all elements in its left subtree are less-or-equal to the node (). Recursively, each of the subtrees must also obey the binary search tree constraint: in the (1, 3, 4) subtree (see Figure 52), the 3 is the root, the 1 3. Figure 52 Example of a Binary Search Tree (BST) 72 The above properties of Binary Search Tree provide an ordering among keys so that the operations like search, minimum and maximum can be done fast. If there is no ordering, then we may have to compare every key to search a given key. Tree Vocabulary The largest value in the left subtree. The immediate predecessor is located Immediate by choosing the left pointer and chasing right pointers until a node is predecessor reached with a nil right pointer. The smallest value in the right subtree. Located by choosing the right Immediate pointer of a node, then chase left pointers until a node is reached with a nil successor left pointer. Searching in BST Search operation in binary search tree will be very similar to that of binary search. Let’s say we want to search for number, what we’ll do is we’ll start at root and then we will compare the value to be searched with value of root if it’s equal we are done with the search if it’s lesser we know that we need to go to the left subtree because in a binary search tree all the elements in the left subtree are lesser and all the elements in right subtree are greater. Searching an element in the binary search tree is basically this traversal in which at each step we will go either towards left or right and hence in at each step we discard one of the sub- trees. If the tree is balanced, we call a tree balanced if for all nodes the difference between the heights of left and right subtrees is not greater than one, we will start with a search space of ‘n’ nodes and when we will discard one of the sub-trees we will discard ‘n/2’ nodes so our search space will be reduced to ‘n/2’ and then in the next step we will reduce the search space to ‘n/4’ and we will go on reducing like this till we find the element or till our search space is reduced to only one node. The search here is also a binary search and that’s why the name binary search tree. Algorithm Until an element is found or not − Step 1 – Start from the root Step 2 − Compare the inserting element with root, if less than root, then recurse for left, else recurse for right. Step 3 − If element to search is found anywhere, return true, else return false. Insertion and Deletion of Nodes in a BST A new key is always inserted at leaf. We start searching a key from root till we hit a leaf node. Once a leaf node is found, the new node is added as a child of the leaf node. 73 The worst-case time complexity of search and insert operations is O(h) where h is height of Binary Search Tree. In worst case, we may have to travel from root to the deepest leaf node. The height of a skewed tree may become n and the time complexity of search and insert operation may become O(n). Algorithm Inserting key as child of leaf node − Step 1 – Start from the root Step 2 − Compare the inserting element with root, if less than root, then recurse for left, else recurse for right. Step 3 − After reaching end, just insert that node at left(if less than current) else right. When we delete a node, three possibilities arise. 1. Node to be deleted is leaf: Simply remove from the tree. Figure 53 BST - node deleted as a leaf 2. Node to be deleted has only one child: Copy the child to the node and delete the child. Figure 54 BST - node deleted has one child 3. Node to be deleted has two children: Find inorder successor of the node. Copy contents of the inorder successor to the node and delete the inorder successor. Note that inorder predecessor can also be used. Figure 55 BST - node deleted has two children 74 The important thing to note is, inorder successor is needed only when right child is not empty. In this particular case, inorder successor can be obtained by finding the minimum value in right child of the node. Heap Tree Heaps are based on the notion of a complete tree, for which we gave an informal definition earlier. A heap is a binary tree with two special properties: 1. Ordering property The value in a node is smaller than all the values in the node's subtrees (MinHeap property). This is very different than a binary search tree! Note that this property makes no distinction between ‘left’ and ‘right’. All the following trees have this property but only one is a heap: Figure 56 Ordering property - node is smaller than all the values in the node's subtree The smallest value must always be in the root, but the largest value could be in any leaf. Similar values are not collected together in any convenient location. Incidentally, heaps can also be defined with the opposite ordering property: the value in each node is larger than all the values in its subtrees (MaxHeap property). 2. Structural property All the levels are full, except perhaps the bottom level, and all the nodes in the bottom level are as far to the ‘left’ as possible. A level is full if there are no ‘missing nodes’ at the level - there are 2**L nodes in level L. This structural property could be rephrased: the only nodes that can have an empty subtree are ones in the bottom level (by definition these have 2 empty subtrees) and the rightmost nodes in the next-to-bottom level. A tree that has this property is called a complete tree. There are two motives for restricting ourselves to complete trees: a. In a complete tree Height = O(logN). As we'll see, our Heap operations are O(Height), so by restricting ourselves to complete trees, we get O(logN) operations. b. complete trees have certain advantages over general binary trees in some implementations. 75 Although all 3 of the Figure 56 trees have the ordering property, only one of them satisfies the structural requirement – that is the middle one. Operations on Heaps The common operation involved using heaps are: Heapify → Process to rearrange the heap in order to maintain heap-property. Find-max (or Find-min) → find a maximum item of a max-heap, or a minimum item of a min-heap, respectively. Insertion → Add a new item in the heap. Deletion → Delete an item from the heap. Extract Min-Max → Returning and deleting the maximum or minimum element in max- heap and min-heap respectively. Heapify It is a process to rearrange the elements of the heap in order to maintain the heap property. It is done when a certain node causes an imbalance in the heap due to some operation on that node. The heapify can be done in two methodologies: up_heapify() → It follows the bottom-up approach. In this, we check if the nodes are following heap property by going in the direction of rootNode and if nodes are not following the heap property, we do certain operations to let the tree follows the heap property. down_heapify() → It follows the top-down approach. In this, we check if the nodes are following heap property by going in the direction of the leaf nodes and if nodes are not following the heap property, we do certain operations to let the tree follows the heap property. void down_heapify(int heap[], int parent, int size) { largest = parent leftChild = 2*parent + 1 rightChild = 2*parent + 2 if(leftChild < size && heap[leftChild] > heap[largest]) largest = leftChild if(rightChild < size && heap[rightChild] > heap[largest]) largest = rightChild if(parent != largest) { swap(heap[parent], heap[largest]) down_heapify(heap,largest,size) } } 76 void up_heapify(int heap[], int child) { parent = (child-1)/2; if(heap[parent] < heap[child]) { swap(heap[parent], heap[child]) up_heapify(heap,parent) } } Insertion The insertion in the heap follows the following steps Insert the new element at the end of the heap. Since the newly inserted element can distort the properties of the Heap. So, we need to perform up_heapify() operation, in order to keep the properties of the heap in a bottom-up approach. (a) (b) (c) Figure 57 Insertion process in Heaps As an example, consider heap tree (a) in Figure 57 which follows max-heap property. New element to be inserted is 12. Performing Step 1: Insert the new element at the end will result in (b). Second step is to heapify the new element following bottom-up approach. Final heap is (c). void up_heapify(int heap[], int child) { parent = (child-1)/2; if(heap[parent] < heap[child]) { swap(heap[parent], heap[child]) up_heapify(heap,parent) } } void insert(int heap[], int size, int key) { heap.append(key) up_heapify(heap,size+1,size) } 77 Deletion The deletion operations follow the following step: Replace the element to be deleted by the last element in the heap. Delete the last item from the heap. Now, the last element is placed at some position in heap, it may not follow the property of the heap, so we need to perform down_heapify() operation in order to maintain heap structure. The down_heapify() operation does the heapify in the top-bottom approach. The standard deletion on Heap is to delete the element present at the root node of the heap. (a) (b) (c) Figure 58 Deletion process in Heaps For example, consider the heap tree (a) in Figure 58 following a max-heap property. Element to be deleted is 12. Replacing the root with the last element ‘4’ and deleting it will result to (b). Since the resulting tree does not follow the property of the heap we perform down_heapify() to the new root resulting to the final heap tree (c). void down_heapify(int heap[], int parent, int size) { largest = parent leftChild = 2*parent + 1 rightChild = 2*parent + 2 if(leftChild < size && heap[leftChild] > heap[largest]) largest = leftChild if(rightChild < size && heap[rightChild] > heap[largest]) largest = rightChild if(parent != largest) { swap(heap[parent], heap[largest]) down_heapify(heap,largest,size) } } void deleteRoot(int heap[], int size) { swap(heap, heap[size-1]) //swap first and last element heap.pop_back() //delete the last element down_heapify(heap,0,size-1) } 78 Find-max (or Find-min) The maximum element and the minimum element in the max-heap and min-heap is found at the root node of the heap. int findMax(int heap[]) { return heap } Extract Min-Max This operation returns and deletes the maximum or minimum element in max-heap and min-heap respectively. The maximum element is found at the root node. int extractMax(int heap[], int size) { ans = heap deleteRoot(heap, size) return ans } Watch: Introduction to Trees By CS Dojo (https://youtu.be/1-l_UOFi1Xw) Introduction to Trees By mycodeschool (https://youtu.be/qH6yxkw0u78) Binary Search Tree By mycodeschool (https://youtu.be/pYT9F8_LFTM) Deleting a node in a BST By mycodeschool (https://youtu.be/gcULXE7ViZw) Binary tree traversal - breadth-first and depth-first strategies By mycodeschool (https://youtu.be/9RHO6jU--GU) Binary tree traversal – Preorder, Inorder, Postorder By mycodeschool (https://youtu.be/gm8DUJJhmY4) Binary Expression Trees By Miran Fattah (https://youtu.be/_LxbhLNRZkI) Introduction to Binary Heaps By Algorithms with Attitude (https://youtu.be/WCm3TqScBM8) Heaps (MinHeaps) By HackerRank (https://youtu.be/t0Cq6tVNRBA) Read: Chapter 8 Trees Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition 79 Chapter 4 – Trees (Sec. 4.1 – 4.3, pp. 121-144) Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition Section. 4.6 Tree Traversals (pp. 166-168) Weiss, Mark Allen (2014). Data Structures And Algorithm Analysis In C++ 4th Edition Section 9.3 Heaps (pp. 370-384) Goodrich, Michael T. (2014). Data Structures and Algorithms in Java 6th Edition Review: 1. What is a tree? 2. What is a binary tree? How is it different from a tree? 3. What is an advantage of a binary search tree over a binary tree? 4. How to check if a given Binary Tree is BST or not? 5. What is a heap? 6. Describe the difference between a Max Heap over a Min Heap. Activities/Assessments: 1. The following questions refer to the tree below. a. Which node is the root? b. What are the internal nodes? c. How many descendants does node cs016/ have? d. How many ancestors does node cs016/ have? e. What are the siblings of node homeworks/? f. Which nodes are in the subtree rooted at node projects/? g. What is the depth of node papers/? h. What is the height of the tree? 80 2. Draw a sketch of the logical representation for the following tree. 3. Draw a sketch of the logical representation for the following tree. 4. Draw a sketch of the logical representation for the following tree. 5. Draw a sketch of the logical representation for the following tree. 6. Draw the binary tree representation of the following arithmetic expression: (((5+2) ∗ (2−1))/((2+9)+((7−2)−1)) ∗8) 7. Draw the structure of a binary search tree a. after these values have been inserted: 19, 34, 23, 16, 54, 89, 24, 29, 15, 61, 27. b. after two delete operations (deleting the root) 8. Draw the structure of a binary heap (MinHeap): 81 a. after these priorities have been inserted: 19, 34, 23, 16, 54, 89, 24, 29, 15, 61, 27. b. after two deleteMin operations (using the tree in (a)). Programming: 1. Write a program that implements BINARY TREE TRAVERSALS using preorder traversal, inorder traversal, and postorder traversal to produce a linear arrangement of the nodes. Program input consists of the root nodes and its left and right subtrees. The output should list the following information: 1. the left and right subtrees of each node (show null if none) 2. the root of the tree 3. preorder traversal 4. inorder traversal 5. postorder traversal Example Input: (A,B,E) (this means A is the root, B is the left subtree, and E is the right subtree) (B,C,null) (this means B is the root, C is the left subtree, and there is no right subtree) (C,null,D) (E,F,H) (F,null,G) (H,I,J) Output: Node L-Subtree R-Subtree A B E B C null C null D E F H F null G H I J Root of the Tree: A Preorder Traversal: A B C D E F G H I J Inorder Traversal: C D B A F G E I H J Postorder Traversal: D C B G F I J H E A 82