CSE 408 Unit 2 Lecture Notes (PDF)

CSE408 Brute Force(String Matching, Closest pair, Convex hull,Exhaustive,Voronori diagrams Lecture # 7&8 Brute Force A straightforward approach, usually based directly on the problem’s statement and definitions of the concepts involved Brute-Force String Matching pattern: a string of m characters to search for text: a (longer) string of n characters to search in problem: find a substring in the text that matches the pattern Brute-force algorithm Step 1 Align pattern at beginning of text Step 2 Moving from left to right, compare each character of pattern to the corresponding character in text until all characters are found to match (successful search); or a mismatch is detected Step 3 While pattern is not found and the text is not yet exhausted, realign pattern one position to the right and repeat Step 2 Examples of Brute-Force String Matching 1. Pattern: 001011 Text: 10010101101001100101111010 2. Pattern: happy Text: It is never too late to have a happy childhood. Closest-Pair Problem Find the two closest points in a set of n points (in the two-dimensional Cartesian plane). Brute-force algorithm Compute the distance between every pair of distinct points and return the indexes of the points for which the distance is the smallest. Closest-Pair Brute-Force Algorithm (cont.) Efficiency: Θ(n^2) multiplications (or sqrt) How to make it faster? Using divide-and-conquer! Convex Hull Problem The convex-hull problem is the problem of constructing the convex hull for a given set S of n points To solve it, we need to find the points that will serve as the vertices of the polygon in question. Mathematicians call the vertices of such a polygon “extreme points.” By definition, an extreme point of a convex set is a point of this set that is not a middle point of any line segment with endpoints in the set. how can we solve the convex-hull problem in a brute-force manner? Nevertheless, there is a simple but inefficient algorithm that is based on the following observation about line segments making up the boundary of a convex hull a line segment connecting two points pi and pj of a set of n points is a part of the convex hull’s boundary if and only if all the other points of the set lie on the same side of the straight line through these two points Brute-Force Strengths and Weaknesses Strengths – wide applicability – simplicity – yields reasonable algorithms for some important problems (e.g., matrix multiplication, sorting, searching, string matching) Weaknesses – rarely yields efficient algorithms – some brute-force algorithms are unacceptably slow – not as constructive as some other design techniques Exhaustive Search A brute force solution to a problem involving search for an element with a special property, usually among combinatorial objects such as permutations, combinations, or subsets of a set. Method: – generate a list of all potential solutions to the problem in a systematic manner (see algorithms in Sec. 5.4) – evaluate potential solutions one by one, disqualifying infeasible ones and, for an optimization problem, keeping track of the best one found so far – when search ends, announce the solution(s) found Example 1: Traveling Salesman Problem Given n cities with known distances between each pair, find the shortest tour that passes through all the cities exactly once before returning to the starting city Alternatively: Find2shortest Hamiltonian circuit a b in a weighted connected 5 3 graph 8 4 Example: c 7 d How do we represent a solution (Hamiltonian circuit)? TSP by Exhaustive Search Tour Cost a→b→c→d→a 2+3+7+5 = 17 a→b→d→c→a 2+4+7+8 = 21 a→c→b→d→a 8+3+4+5 = 20 a→c→d→b→a 8+7+4+2 = 21 a→d→b→c→a 5+4+3+8 = 20 Θ((n-1)!) a→d→c→b→a 5+7+3+2 = 17 Chapter 5 discusses how to generate permutations fast. Efficiency: Final Comments on Exhaustive Search Exhaustive-search algorithms run in a realistic amount of time only on very small instances In some cases, there are much better alternatives! – Euler circuits – shortest paths – minimum spanning tree – assignment problem The Hungarian method runs in O(n^3) time. In many cases, exhaustive search or its variation is the only known way to get exact solution Vornoi diagram The partitioning of a plane with points into convex polygons such that each polygon contains exactly one generating point and every point in a given polygon is closer to its generating point than to any other. A Voronoi diagram is sometimes also known as a Dirichlet tessellation. The cells are called Dirichlet regions, Thiessen polytopes, or Voronoi polygons. Voronoi diagrams were considered as early at 1644 by René Descartes and were used by Dirichlet (1850) in the investigation of positive quadratic forms. They were also studied by Voronoi (1907), who extended the investigation of Voronoi diagrams to higher dimensions. They find widespread applications in areas such as computer graphics, epidemiology, geophysics, and meteorology CSE408 Divide and Conquer Lecture #9&10 Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances recursively 3. Obtain solution to original (larger) instance by combining these solutions Divide-and-Conquer Technique (cont.) a problem of size n (instance) subproblem 1 subproblem 2 of size n/2 of size n/2 a solution to a solution to subproblem 1 subproblem 2 a solution to It general leads to a the original problem recursive algorithm! Divide-and-Conquer Examples Sorting: mergesort and quicksort Binary tree traversals Binary search (?) Multiplication of large integers Matrix multiplication: Strassen’s algorithm Closest-pair and convex-hull algorithms General Divide-and-Conquer Recurrence T(n) = aT(n/b) + f (n) where f(n)  (nd), d  0 Master Theorem: If a < bd, T(n)  (nd) If a = bd, T(n)  (nd log n) If a > bd, T(n)  (nlog b a ) Note: The same results hold with O instead of . Examples: T(n) = 4T(n/2) + n  T(n)  ? (n^2) T(n) = 4T(n/2) + n2  T(n)  ? (n^2log n) T(n) = 4T(n/2) + n3  T(n)  ? (n^3) Mergesort Split array A[0..n-1] into about equal halves and make copies of each half in arrays B and C Sort arrays B and C recursively Merge sorted arrays B and C into array A as follows: Repeat the following until no elements remain in one of the arrays: compare the first elements in the remaining unprocessed portions of the arrays copy the smaller of the two into A, while incrementing the index indicating the unprocessed portion of that array Once all elements in one of the arrays are processed, copy the remaining unprocessed elements from the other array into A. Pseudocode of Mergesort Pseudocode of Merge Time complexity: Θ(p+q) = Θ(n) comparisons Mergesort Example 8 3 2 9 7 1 5 4 8 3 2 9 7 1 5 4 8 3 2 9 71 5 4 8 3 2 9 7 1 5 4 The non-recursive version of Mergesort 3 8 2 9 1 7 4 5 starts from merging single elements into 2 3 8 9 1 4 5 7 sorted pairs. 1 2 3 4 5 7 8 9 Analysis of Mergesort All cases have same efficiency: Θ(n log n) T(n) = 2T(n/2) + Θ(n), T(1) = 0 Number of comparisons in the worst case is close to theoretical minimum for comparison-based sorting: log2 n! ≈ n log2 n - 1.44n Space requirement: Θ(n) (not in-place) Can be implemented without recursion (bottom-up) Quicksort Select a pivot (partitioning element) – here, the first element Rearrange the list so that all the elements in the first s positions are smaller than or equal to the pivot and all the elements in the remaining n-s positions are larger than or equal to the pivot (see next slide for an algorithm) p A[i]p A[i]p Exchange the pivot with the last element in the first (i.e., ) subarray — the pivot is now in its final position Sort the two subarrays recursively Partitioning Algorithm or i > r < or j = l Time complexity: Θ(r-l) comparisons Quicksort Example 5 3 1 9 8 2 4 7 2 3 1 4 5 8 9 7 1 2 3 4 5 7 8 9 1 2 3 4 5 7 8 9 1 2 3 4 5 7 8 9 1 2 3 4 5 7 8 9 Analysis of Quicksort Best case: split in the middle — Θ(n log n) Worst case: sorted array! — Θ(n2) T(n) = T(n-1) + Θ(n) Average case: random arrays — Θ(n log n) Improvements: better pivot selection: median of three partitioning switch to insertion sort on small subfiles elimination of recursion These combine to 20-25% improvement Considered the method of choice for internal sorting of large files (n ≥ 10000) WORST CASE Best Case Binary Search Very efficient algorithm for searching in sorted array: K vs A... A[m]... A[n-1] If K = A[m], stop (successful search); otherwise, continue searching by the same method in A[0..m-1] if K < A[m] and in A[m+1..n-1] if K > A[m] l  0; r  n-1 while l  r do m  (l+r)/2 if K = A[m] return m else if K < A[m] r  m-1 else l  m+1 return -1 Analysis of Binary Search Time efficiency worst-case recurrence: Cw (n) = 1 + Cw( n/2 ), Cw (1) = 1 solution: Cw(n) = log2(n+1) This is VERY fast: e.g., Cw(106) = 20 Optimal for searching a sorted array Limitations: must be a sorted array (not linked list) Bad (degenerate) example of divide-and-conquer because only one of the sub-instances is solved Has a continuous counterpart called bisection method for solving equations in one unknown f(x) = 0 (see Sec. 12.4) Binary Tree Algorithms Binary tree is a divide-and-conquer ready structure! Ex. 1: Classic traversals (preorder, inorder, postorder) Algorithm Inorder(T) if T   a a Inorder(Tleft) b c b c print(root of T) d e d e Inorder(Tright) Efficiency: Θ(n). Why? Each node is visited/printed once. Binary Tree Algorithms (cont.) Ex. 2: Computing the height of a binary tree TL TR h(T) = max{h(TL), h(TR)} + 1 if T   and h() = -1 Efficiency: Θ(n). Why? Multiplication of Large Integers Consider the problem of multiplying two (large) n-digit integers represented by arrays of their digits such as: A = 12345678901357986429 B = 87654321284820912836 The grade-school algorithm: a1 a2 … an b1 b2 … bn (d10) d11d12 … d1n (d20) d21d22 … d2n ………………… (dn0) dn1dn2 … dnn Efficiency: Θ(n2) single-digit multiplications First Divide-and-Conquer Algorithm A small example: A  B where A = 2135 and B = 4014 A = (21·102 + 35), B = (40 ·102 + 14) So, A  B = (21 ·102 + 35)  (40 ·102 + 14) = 21  40 ·104 + (21  14 + 35  40) ·102 + 35  14 In general, if A = A1A2 and B = B1B2 (where A and B are n-digit, A1, A2, B1, B2 are n/2-digit numbers), A  B = A1  B1·10n + (A1  B2 + A2  B1) ·10n/2 + A2  B2 Recurrence for the number of one-digit multiplications M(n): M(n) = 4M(n/2), M(1) = 1 Solution: M(n) = n2 23*14 Second Divide-and-Conquer Algorithm A  B = A1  B1·10n + (A1  B2 + A2  B1) ·10n/2 + A2  B2 The idea is to decrease the number of multiplications from 4 to 3: (A1 + A2 )  (B1 + B2 ) = A1  B1 + (A1  B2 + A2  B1) + A2  B2, I.e., (A1  B2 + A2  B1) = (A1 + A2 )  (B1 + B2 ) - A1  B1 - A2  B2, which requires only 3 multiplications at the expense of (4-1) extra add/sub. Recurrence for the number of multiplications M(n): M(n) = 3M(n/2), M(1) = 1 What if we count Solution: M(n) = 3log 2n = nlog 23 ≈ n1.585 both multiplications and additions? Example of Large-Integer Multiplication 2135  4014 = (21*10^2 + 35) * (40*10^2 + 14) = (21*40)*10^4 + c1*10^2 + 35*14 where c1 = (21+35)*(40+14) - 21*40 - 35*14, and 21*40 = (2*10 + 1) * (4*10 + 0) = (2*4)*10^2 + c2*10 + 1*0 where c2 = (2+1)*(4+0) - 2*4 - 1*0, etc. This process requires 9 digit multiplications as opposed to 16. Conventional Matrix Multiplication Brute-force algorithm c00 c01 a00 a01 b00 b01 = * c10 c11 a10 a11 b10 b11 a00 * b00 + a01 * b10 a00 * b01 + a01 * b11 = a10 * b00 + a11 * b10 a10 * b01 + a11 * b11 8 multiplications Efficiency class in general:  (n3) 4 additions Strassen’s Matrix Multiplication Strassen’s algorithm for two 2x2 matrices (1969): c00 c01 a00 a01 b00 b01 = * c10 c11 a10 a11 b10 b11 m1 + m4 - m5 + m7 m3 + m5 = m2 + m4 m1 + m3 - m2 + m6 m1 = (a00 + a11) * (b00 + b11) m2 = (a10 + a11) * b00 m3 = a00 * (b01 - b11) m4 = a11 * (b10 - b00) 7 multiplications m5 = (a00 + a01) * b11 18 additions m6 = (a10 - a00) * (b00 + b01) m7 = (a01 - a11) * (b10 + b11) Strassen’s Matrix Multiplication Strassen observed that the product of two matrices can be computed in general as follows: C00 C01 A00 A01 B00 B01 = * C10 C11 A10 A11 B10 B11 M1 + M4 - M5 + M7 M3 + M5 = M2 + M4 M1 + M3 - M2 + M6 Formulas for Strassen’s Algorithm M1 = (A00 + A11)  (B00 + B11) M2 = (A10 + A11)  B00 M3 = A00  (B01 - B11) M4 = A11  (B10 - B00) M5 = (A00 + A01)  B11 M6 = (A10 - A00)  (B00 + B01) M7 = (A01 - A11)  (B10 + B11) Analysis of Strassen’s Algorithm If n is not a power of 2, matrices can be padded with zeros. What if we count both multiplications and additions? Number of multiplications: M(n) = 7M(n/2), M(1) = 1 Solution: M(n) = 7log 2n = nlog 27 ≈ n2.807 vs. n3 of brute-force alg. Algorithms with better asymptotic efficiency are known but they are even more complex and not used in practice. CSE408 Recurrence equations Lecture #12 Substitution method The most general method: 1. Guess the form of the solution. 2. Verify by induction. 3. Solve for constants. Substitution method The most general method: 1. Guess the form of the solution. 2. Verify by induction. 3. Solve for constants. EXAMPLE: T(n) = 4T(n/2) + n [Assume that T(1) = Q(1).] Guess O(n3). (Prove O and W separately.) Assume that T(k)  ck3 for k < n. Prove T(n)  cn3 by induction. 3 Example of substitution T (n) = 4T (n / 2) + n  4c ( n / 2) 3 + n = ( c / 2) n 3 + n = cn3 − ((c / 2)n3 − n) desired – residual  cn3 desired whenever (c/2)n3 – n  0, for example, if c  2 and n  1. residual 4 Example (continued) We must also handle the initial conditions, that is, ground the induction with base cases. Base: T(n) = Q(1) for all n < n0, where n0 is a suitable constant. For 1  n < n0, we have “Q(1)”  cn3, if we pick c big enough. 5 Example (continued) We must also handle the initial conditions, that is, ground the induction with base cases. Base: T(n) = Q(1) for all n < n0, where n0 is a suitable constant. For 1  n < n0, we have “Q(1)”  cn3, if we pick c big enough. This bound is not tight! 6 Recursion-tree method A recursion tree models the costs (time) of a recursive execution of an algorithm. The recursion-tree method can be unreliable, just like any method that uses ellipses (…). The recursion-tree method promotes intuition, however. The recursion tree method is good for generating guesses for the substitution method. 7 Example of recursion tree Solve T(n) = T(n/4) + T(n/2) + n2: 8 Example of recursion tree Solve T(n) = T(n/4) + T(n/2) + n2: T(n) 9 Example of recursion tree Solve T(n) = T(n/4) + T(n/2) + n2: n2 T(n/4) T(n/2) 10 Example of recursion tree Solve T(n) = T(n/4) + T(n/2) + n2: n2 (n/4)2 (n/2)2 T(n/16) T(n/8) T(n/8) T(n/4) 11 Example of recursion tree Solve T(n) = T(n/4) + T(n/2) + n2: n2 (n/4)2 (n/2)2 (n/16)2 (n/8)2 (n/8)2 (n/4)2 Q(1) 12 Example of recursion tree Solve T(n) = T(n/4) + T(n/2) + n2: n2 n2 (n/4)2 (n/2)2 (n/16)2 (n/8)2 (n/8)2 (n/4)2 Q(1) 13 Example of recursion tree Solve T(n) = T(n/4) + T(n/2) + n2: n2 n2 (n/2)2 5 n2 (n/4)2 16 (n/16)2 (n/8)2 (n/8)2 (n/4)2 Q(1) 14 Example of recursion tree Solve T(n) = T(n/4) + T(n/2) + n2: n2 n2 (n/2)2 5 n2 (n/4)2 16 (n/16)2 (n/8)2 (n/8)2 (n/4)2 25 n 2 256 … Q(1) 15 Example of recursion tree Solve T(n) = T(n/4) + T(n/2) + n2: n2 n2 (n/2)2 5 n2 (n/4)2 16 (n/16)2 (n/8)2 (n/8)2 (n/4)2 25 n 2 256 … Q(1) Total = n 2 ( ( ) +( ) 1 + 16 + 16 5 5 2 5 3 16 + ) = Q(n2) geometric series 16 Recursion Tree 17 Recursion Tree 18 Recursion Tree 19 Recursion Tree 20 Recursion Tree 21 Recursion Tree 22 Recursion Tree 23 Recursion Tree 24 Recursion Tree 25 The master method The master method applies to recurrences of the form T(n) = a T(n/b) + f (n) , where a  1, b > 1, and f is asymptotically positive. 26 Three common cases Compare f (n) with nlogba: 1. f (n) = O(nlogba – e) for some constant e > 0. f (n) grows polynomially slower than nlogba (by an ne factor). Solution: T(n) = Q(nlogba). 27 Three common cases Compare f (n) with nlogba: 1. f (n) = O(nlogba – e) for some constant e > 0. f (n) grows polynomially slower than nlogba (by an ne factor). Solution: T(n) = Q(nlogba). 2. f (n) = Q(nlogba lgkn) for some constant k  0. f (n) and nlogba grow at similar rates. Solution: T(n) = Q(nlogba lgk+1n). 28 Three common cases Compare f (n) with nlogba: 3. f (n) = W(nlogba + e) for some constant e > 0. f (n) grows polynomially faster than nlogba (by an ne factor), and f (n) satisfies the regularity condition that a f (n/b)  c f (n) for some constant c < 1. Solution: T(n) = Q( f (n) ). 29 Examples EX. T(n) = 4T(n/2) + n a = 4, b = 2  nlogba = n2; f (n) = n. CASE 1: f (n) = O(n2 – e) for e = 1.  T(n) = Q(n2). 30 Examples EX. T(n) = 4T(n/2) + n a = 4, b = 2  nlogba = n2; f (n) = n. CASE 1: f (n) = O(n2 – e) for e = 1.  T(n) = Q(n2). EX. T(n) = 4T(n/2) + n2 a = 4, b = 2  nlogba = n2; f (n) = n2. CASE 2: f (n) = Q(n2lg0n), that is, k = 0.  T(n) = Q(n2lg n). 31 Examples EX. T(n) = 4T(n/2) + n3 a = 4, b = 2  nlogba = n2; f (n) = n3. CASE 3: f (n) = W(n2 + e) for e = 1 and 4(n/2)3  cn3 (reg. cond.) for c = 1/2.  T(n) = Q(n3). 32 Examples EX. T(n) = 4T(n/2) + n3 a = 4, b = 2  nlogba = n2; f (n) = n3. CASE 3: f (n) = W(n2 + e) for e = 1 and 4(n/2)3  cn3 (reg. cond.) for c = 1/2.  T(n) = Q(n3). EX. T(n) = 4T(n/2) + n2/lg n a = 4, b = 2  nlogba = n2; f (n) = n2/lg n. Master method does not apply. In particular, for every constant e > 0, we have ne = w(lg n). 33 CSE408 Closest pair & Convex Hull Problem, Insertion Sort Lecture #13 Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances recursively 3. Obtain solution to original (larger) instance by combining these solutions Divide-and-Conquer Technique (cont.) a problem of size n (instance) subproblem 1 subproblem 2 of size n/2 of size n/2 a solution to a solution to subproblem 1 subproblem 2 a solution to It general leads to a the original problem recursive algorithm! Closest-Pair Problem by Divide-and-Conquer Step 0 Sort the points by x (list one) and then by y (list two). Step 1 Divide the points given into two subsets S1 and S2 by a vertical line x = c so that half the points lie to the left or on the line and half the points lie to the right or on the line. Closest Pair by Divide-and-Conquer (cont.) Step 2 Find recursively the closest pairs for the left and right subsets. Step 3 Set d = min{d1, d2} We can limit our attention to the points in the symmetric vertical strip of width 2d as possible closest pair. Let C1 and C2 be the subsets of points in the left subset S1 and of the right subset S2, respectively, that lie in this vertical strip. The points in C1 and C2 are stored in increasing order of their y coordinates, taken from the second list. Step 4 For every point P(x,y) in C1, we inspect points in C2 that may be closer to P than d. There can be no more than 6 such points (because d ≤ d2)! Closest Pair by Divide-and-Conquer: Worst Case The worst case scenario is depicted below: Efficiency of the Closest-Pair Algorithm Running time of the algorithm (without sorting) is: T(n) = 2T(n/2) + M(n), where M(n)  Θ(n) By the Master Theorem (with a = 2, b = 2, d = 1) T(n)  Θ(n log n) So the total time is Θ(n log n). Convex hull Algorithm Convex hull: smallest convex set that includes given points. An O(n^3) bruteforce time is given in Levitin, Ch 3. Assume points are sorted by x-coordinate values Identify extreme points P1 and P2 (leftmost and rightmost) Compute upper hull recursively: find point Pmax that is farthest away from line P1P2 compute the upper hull of the points to the left of line P1Pmax compute the upper hull of the points to the left of line PmaxP2 Compute lower hull in a similar manner Pmax P2 P1 Efficiency of Convex hull Algorithm Finding point farthest away from line P1P2 can be done in linear time Time efficiency: T(n) = T(x) + T(y) + T(z) + T(v) + O(n), where x + y + z +v 0; what if they aren’t?) max  max(A) freq[1..max]  0 for each x  A freq[x] +=1 Runtime? mode  freq for i  2 to max if freq[i] > freq[mode] mode  i return mode Presort Computing Mode Sort A i0 modefrequency  0 while i ≤ n-1 runlength  1; runvalue  A[i] while i+runlength ≤ n-1 and A[i+runlength] = runvalue runlength += 1 if runlength > modefrequency modefrequency  runlength modevalue  runvalue i+= runlength return modevalue AVL Animation Link https://www.cs.usfca.edu/~galles/visualization/AVLtree. html Binary Search Tree - Best Time All BST operations are O(d), where d is tree depth minimum d is d = log2N  a binary tree with for N nodes – What is the best case tree? – What is the worst case tree? So, best case running time of BST operations is O(log N) Binary Search Tree - Worst Time Worst case running time is O(N) – What happens when you Insert elements in ascending order? Insert: 2, 4, 6, 8, 10, 12 into an empty BST – Problem: Lack of “balance”: compare depths of left and right subtree – Unbalanced degenerate tree Balanced and unbalanced BST 1 4 2 2 5 3 1 3 4 4 Is this “balanced”? 5 2 6 6 1 3 5 7 7 Approaches to balancing trees Don't balance – May end up with some nodes very deep Strict balance – The tree must always be balanced perfectly Pretty good balance – Only allow a little out of balance Adjust on access – Self-adjusting Balancing Binary Search Trees Many algorithms exist for keeping binary search trees balanced – Adelson-Velskii and Landis (AVL) trees (height- balanced trees) – Splay trees and other self-adjusting trees – B-trees and other multiway search trees Perfect Balance Want a complete tree after every operation – tree is full except possibly in the lower right This is expensive – For example, insert 2 in the tree on the left and then rebuild as a complete tree 6 5 Insert 2 & 4 9 complete tree 2 8 1 5 8 1 4 6 9 AVL - Good but not Perfect Balance AVL trees are height-balanced binary search trees Balance factor of a node – height(left subtree) - height(right subtree) An AVL tree has balance factor calculated at every node – For every node, heights of left and right subtree can differ by no more than 1 – Store current heights in each node Height of an AVL Tree N(h) = minimum number of nodes in an AVL tree of height h. Basis – N(0) = 1, N(1) = 2 Induction h – N(h) = N(h-1) + N(h-2) + 1 Solution (recall Fibonacci analysis) – N(h) > h (  1.62) h-2 h-1 Height of an AVL Tree N(h) > h (  1.62) Suppose we have n nodes in an AVL tree of height h. – n > N(h) (because N(h) was the minimum) – n > h hence log n > h (relatively well balanced tree!!) – h < 1.44 log2n (i.e., Find takes O(logn)) Node Heights Tree A (AVL) Tree B (AVL) height=2 BF=1-0=1 2 6 6 1 0 1 1 4 9 4 9 0 0 0 0 0 1 5 1 5 8 height of node = h balance factor = hleft-hright empty height = -1 Node Heights after Insert 7 Tree A (AVL) Tree B (not AVL) 2 balance factor 3 1-(-1) = 2 6 6 1 1 1 2 4 9 4 9 0 0 0 0 0 1 -1 1 5 7 1 5 8 0 7 height of node = h balance factor = hleft-hright empty height = -1 Insert and Rotation in AVL Trees Insert operation may cause balance factor to become 2 or –2 for some node – only nodes on the path from insertion point to root node have possibly changed in height – So after the Insert, go back up to the root node by node, updating heights – If a new balance factor (the difference hleft-hright) is 2 or –2, adjust tree by rotation around the node Single Rotation in an AVL Tree 2 2 6 6 1 2 1 1 4 9 4 8 0 0 1 0 0 0 0 1 5 8 1 5 7 9 0 7 Insertions in AVL Trees Let the node that needs rebalancing be . There are 4 cases: Outside Cases (require single rotation) : 1. Insertion into left subtree of left child of . 2. Insertion into right subtree of right child of . Inside Cases (require double rotation) : 3. Insertion into right subtree of left child of . 4. Insertion into left subtree of right child of . The rebalancing is performed through four separate rotation algorithms. AVL Insertion: Outside Case Consider a valid AVL subtree j k h h h Z X Y AVL Insertion: Outside Case j Inserting into X destroys the AVL property at node j k h h+1 h Z Y X AVL Insertion: Outside Case j Do a “right rotation” k h h+1 h Z Y X Single right rotation j Do a “right rotation” k h h+1 h Z Y X Outside Case Completed “Right rotation” done! k (“Left rotation” is mirror symmetric) h+1 j h h X Y Z AVL property has been restored! AVL Insertion: Inside Case Consider a valid AVL subtree j k h h h Z X Y AVL Insertion: Inside Case Inserting into Y destroys the j Does “right rotation” restore balance? AVL property at node j k h h h+1 Z X Y AVL Insertion: Inside Case “Right rotation” k does not restore balance… now k is h j out of balance X h+1 h Z Y AVL Insertion: Inside Case Consider the structure of subtree Y… j k h h h+1 Z X Y AVL Insertion: Inside Case Y = node i and subtrees V and W j k h h i h+1 Z X h or h-1 V W AVL Insertion: Inside Case j We will do a left-right “double rotation”... k i Z X V W Double rotation : first rotation j left rotation complete i k Z W X V Double rotation : second rotation j Now do a right rotation i k Z W X V Double rotation : second rotation right rotation complete Balance has been i restored k j h h h or h-1 X V W Z Implementation balance (1,0,-1) key left right No need to keep the height; just the difference in height, i.e. the balance factor; this has to be modified on the path of insertion even if you don’t perform rotations Once you have performed a rotation (single or double) you won’t need to go back up the tree Single Rotation RotateFromRight(n : reference node pointer) { p : node pointer; p := n.right; n n.right := p.left; p.left := n; n := p } You also need to modify the heights or X balance factors of n Insert and p Y Z Double Rotation Implement Double Rotation in two lines. DoubleRotateFromRight(n : reference node pointer) { ???? n } X Z V W Insertion in AVL Trees Insert at the leaf (as for all BST) – only nodes on the path from insertion point to root node have possibly changed in height – So after the Insert, go back up to the root node by node, updating heights – If a new balance factor (the difference hleft-hright) is 2 or –2, adjust tree by rotation around the node Example of Insertions in an AVL Tree 2 20 Insert 5, 40 0 1 10 30 0 0 25 35 Example of Insertions in an AVL Tree 2 3 20 20 1 1 1 2 10 30 10 30 0 0 0 1 0 0 5 25 35 5 25 35 0 40 Now Insert 45 Single rotation (outside case) 3 3 20 20 1 2 1 2 10 30 10 30 0 0 2 0 0 1 5 25 35 5 25 40 0 0 35 45 Imbalance 1 40 0 45 Now Insert 34 Double rotation (inside case) 3 3 20 20 1 3 1 2 10 30 10 35 0 0 2 0 1 Imbalance 1 5 25 40 5 30 40 0 1 0 0 25 34 45 35 45 Insertion of 34 0 34 AVL Tree Deletion Similar but more complex than insertion – Rotations and double rotations needed to rebalance – Imbalance may propagate upward so that many rotations may be needed. Pros and Cons of AVL Trees Arguments for AVL trees: 1. Search is O(log N) since AVL trees are always balanced. 2. Insertion and deletions are also O(logn) 3. The height balancing adds no more than a constant factor to the speed of insertion. Arguments against using AVL trees: 1. Difficult to program & debug; more space for balance factor. 2. Asymptotically faster but rebalancing costs time. 3. Most large searches are done in database systems on disk and use other structures (e.g. B-trees). 4. May be OK to have O(N) for a single operation if total run time for many consecutive operations is fast (e.g. Splay trees).

CSE 408 Unit 2 Lecture Notes (PDF)

Document Details

Tags

Related

Summary

Full Transcript