F04_MST_in_External_Memory_part_1 (1) PDF

Summary

This document covers trees in external memory, specifically M-ary search trees and B-trees. It details definitions, properties, algorithms, and variants of these data structures.

Full Transcript

Chapter 4 Trees in External Memory 1. Organizing Files Using M-ary Search Trees - Definitions - Algorithms : Search, Inorder, Insertion, and Deletion 2. Organizing Files Using B-Trees - Definition and Properties - Algorithms: Insertion, Deletion, Optimized Loading - Variants: B+-Tre...

Chapter 4 Trees in External Memory 1. Organizing Files Using M-ary Search Trees - Definitions - Algorithms : Search, Inorder, Insertion, and Deletion 2. Organizing Files Using B-Trees - Definition and Properties - Algorithms: Insertion, Deletion, Optimized Loading - Variants: B+-Trees, Prefix B+-Trees, B*-Trees Teaching Team 2024 / 2025 KERMI Adel BOURAS Rym ARTABAZ Saliha BOUKABENE Randa HIDOUCI Walid-Khaled SFSD / 2CP / ESI / 2025 1 1. Organizing Files Using M-ary Search Trees The blocks of a file can be organized as a tree of blocks. This is useful for representing M-ary search trees in Secondary (External) Memory (SM). Definitions An M-ary search tree of order N is a tree where each node can contain at most: N-1 ordered values (val1,val2,…,valN−1) and N children (Child1,Child2,…,ChildN) For a given node, the degree represents the current number of children. For a given node, the current number of values is always = degree - 1. The degree is at least = 2, and at most = N. val1 val2 val3 … valN-1 Properties F1 F2 F3 F4 … FN-1 FN a) All values in the subtree of Child1 are less than val1. b) All values in the subtree of Childj are greater than valj−1 and less than valj (with j in [2,degree−1]. 2 c) All values in the subtree of Child are greater than val. Example of a M-ary search tree with order = 5 i 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 30 42 47 50 54 57 60 71 76 80 94 j n 43 44 89 90 92 93 m 82 83 84 88 The root node : i Internal nodes : i,a,h,d,e,n Leaf nodes : b , g , c , k , j , f , m The tree depth (height)= 4 (the level of the farthest leave) degree( a ) = 5, degree( b ) = 5, degree( c ) = 3, degree( d ) = 5, degree( e ) = 5, degree( f ) = 3, degree( g ) = 3,… etc Top-Down Property: all internal nodes are 100% full (optionally verified). 3 Example of a M-ary search tree with order = 5 i 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 30 42 47 50 54 57 60 71 76 80 94 j n 43 44 89 90 92 93 m 82 83 84 88 block a block b block k val 10 | 16 | 18 | 20 deg 5 val 2 | 3 | 5 | 9 deg 5 val 37 | | | deg 2... child b | g | -1 | -1 | c child -1 | -1 | -1 | -1 | -1 child -1 | -1 | | | 4 Example of a M-ary search tree with order = 5 i 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 30 42 47 50 54 57 60 71 76 80 94 j n 43 44 89 90 92 93 m 82 83 84 88 a b c d e f g h i j k m n 10,16,18,20 2, 3, 5, 9 27,30, , 42,47,50,54 71,76,80,94 57,60, , 12,14, , 55,70,96,97 32,33,40,99 43,44, , 37, , , 82,83,84,88 89,90,92,93....................................... File organized in M-ary search trees: the blocks are linked using an M-ary tree structure → The Root block number is stored as a characteristic of the file (in the Header Block). 5 Uses of M-ary search trees … 595, ccc, 78,... 1- As an index in SM towards a data file a1...... Type Tval = struct (key , addr )...... a2 243, aaa, 78 , … … … Data File … … Index File … | …,. | (243,a2) |... … a3 187, bbb, 21, … …,. | (595,a1) | … | … … … … | …,. | (187,a3) |... … …... 2- As a data file organised using a tree in SM Type Tval = Trecord … | …,. | (243,aaa,78,...) |... Data File …,. | (595,ccc,78,...) | … |… … | …,. | (187,bbb,21,...) |... 6 Search Mechanism i 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 30 42 47 50 54 57 60 71 76 80 94 j n 43 44 89 90 92 93 m 82 83 84 88 The search for a value C starts from the root node P and go through all the branch : 1- If C exists in P Then the search stops successfully 2- If C does not exist in P then - let k be the actual position of C in P (values must remain ordered) - If Childk ≠ -1 (null) then P ← Childk ; goto 1 - Else the search stops with failure. Ex : Search( 47 ) → traversal of the branch : i , h , d (stop successfully : P = d and k = 2) Search( 15 ) → traversal of the branch : i , a , g (stop with failure : P = g and k = 3) 7 Search Algorithm i 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 30 42 47 50 54 57 60 71 76 80 94 j n 43 44 89 90 92 93 buf : TypeBlock m 82 83 84 88 Search( c ) ⇒ ( found , numB , pos ) open( F , … , ‘A’ ) numB ← getHeader( F , 1 ) // the Root node number found ← false ; i ← numB while ( i -1 and not found ) readBlock( F , i , buf ) ; numB ← i internal search for c in buf.val[ ] ⇒ ( pos , found ) IF (not found ) i ← buf.Child[ pos ] end_if end_while close( F ) 8 Keeping track of the visited blocks i (using a stack) 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 30 42 47 50 54 57 60 71 76 80 94 j n 43 44 89 90 92 93 P : stack of < TypeBlock , integer , integer > buf : TypeBlock m 82 83 84 88 Search( c ) ⇒ ( found , numB , pos ) open( F , … , ‘A’ ) CreateStack( P ) numB ← getHeader( F , 1 ) // the root block number found ← false ; i ← numB while ( i -1and not found ) readBlock( F , i , buf ) ; numB ← i internal search for c in buf.val[ ] ⇒ ( pos , found ) IF (not found ) push( P , < buf , i, pos > ) ; i ← buf.child[ pos ] endif end_while close( F ) 9 Example : search fore 92 i Visited Blocks: 32 33 40 99 i h stack a k h e 10 16 18 20 37 55 70 96 97 n buf b g c d f e 2 3 5 9 12 14 27 30 42 47 50 54 57 60 71 76 80 94 j n 43 44 89 90 92 93 P : stack of < TypeBlock , integer , integer > buf : TypeBlock m 82 83 84 88 Search( c ) ⇒ ( found , numB , pos ) open( F , … , ‘A’ ) CreateStack( P ) numB ← getHeader( F , 1 ) // the root block number found ← false ; i ← numB while ( i -1and not found ) readBlock( F , i , buf ) ; numB ← i internal search for c in buf.val[ ] ⇒ ( pos , found ) IF (not found ) push( P , < buf , i, pos > ) ; i ← buf.child[ pos ] endif end_while close( F ) 10 Inorder Traversal i 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 42 47 50 54 57 71 76 80 94 j n 43 44 89 90 92 93 m 82 83 84 88 The values are visited (processed) in ascendent order : 2, 3, 5, 9, 10, 12, 14, 16, 18, 20, 27, 32, 33, 37, 40, 42, 43, 44, 47, 50, 54, 55, 57, 70, 71,76, 80, b a g a c i k i d j d h f h e 82, 83, 84, 88, 89, 90, 92, 93, 94, 96, 97, 99 m n e h i 11 Inorder Traversal i 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 42 47 50 54 57 71 76 80 94 j n 43 44 89 90 92 93 m 82 83 84 88 Inorder ( r:integer ) var buf:TypeBlock // local var i:integer IF ( r -1 ) readBlock( F , r , buf ) For (i = 1 , buf.deg – 1 ) Inorder( buf.child[ i ] ) process the value buf.val[ i ]... End_For Inorder( buf.child[ buf.deg ] ) End_IF 12 Inorder Traversal i 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 42 47 50 54 57 71 76 80 94 j n 43 44 89 90 92 93 m 82 83 84 88 Inorder ( r:integer ) var buf:TypeBlock // local var i:integer The smallest value in this tree (2) se trouve IF ( r -1 ) ⇒ at the first position of the leftmost node. readBlock( F , r , buf ) For (i = 1 , buf.deg – 1 ) Inorder( buf.child[ i ] ) (The highest? ) process the value buf.val[ i ]... End_For Inorder( buf.child[ buf.deg ] ) How to locate the inorder successor of a End_IF value? 13 i Next Inorder 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 d f e 2 3 5 9 12 14 27 42 47 50 54 57 71 76 80 94 b g c Childk+1 == -1 j n 43 44 89 90 92 93 Next inorder of valk (valk is in node p) m 82 83 84 88 Ex : next of 2 ⇒ 3, (because the right child of 2 is -1) (i.e. the right child of valk is -1) (i.e. Childk+1 is -1) 14 i Next Inorder 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 42 47 50 54 57 71 76 80 94 j n 43 44 89 90 92 93 Next of valk (valk is in node p) m 82 83 84 88 If ( k < degree-1 && Childk+1 == -1 ) Next ← valk+1 (remain in the same node p) Ex : next of 2 ⇒ 3, next of 3 ⇒ 5, next of 5 ⇒ 9, le next of 12 ⇒ 14, next of 16 ⇒ 18, next of 18 ⇒ 20, next of 32 ⇒ 33, …. etc 15 Next Inorder i 32 33 40 99 Childk+1 ≠-1 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 42 47 50 54 57 71 76 80 94 j n 43 44 89 90 92 93 m 82 83 84 88 Ex : Next of 40 ⇒ 42 (because the right child of 40 is ≠ -1) (i.e. Childk+1 is ≠ -1) 16 i Next Inorder 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 42 47 50 54 57 71 76 80 94 j n 43 44 89 90 92 93 Next of valk (valk is in node p) m 82 83 84 88 If ( k < degree-1 && Childk+1 == -1 ) Next ← valk+1 (in the same node p) Sinon If ( Childk+1 ≠ -1 ) The next is the smallest value of the subtreet rooted at Childk+1 Ex : next of 10 ⇒ 12, le next of 20 ⇒ 27, next of 33 ⇒ 37, next of 40 ⇒ 42, next of 42 ⇒ 43, next of 55 ⇒ 57, next of 70 ⇒ 71 and next of 80 ⇒ 82 17 i Next Inorder 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 42 47 50 54 57 71 76 80 94 Childdegree == -1 n 43 44 89 90 92 93 j m 82 83 84 88 Ex : next of 27 ⇒ 32 ( because 27 is the last value of this node and its right child = -1 ) ( i.e. k == degree-1 and Childdegree == -1 ) 18 i Next Inorder 32 33 40 99 a k h 10 16 18 20 37 55 70 96 97 b g c d f e 2 3 5 9 12 14 27 42 47 50 54 57 71 76 80 94 j n 43 44 89 90 92 93 Next ofvvalk (valk is in node p) m 82 83 84 88 If ( k < degree-1 && Childk+1 == -1 ) Next ← valk+1 (in the same node p) Else If ( Childk+1 ≠-1 ) The next is the smallest value of the subtree rooted by Childk+1 Else // i.e. : If ( k == degree-1 && Childdegree == -1 ) The next is found in the first ancestor where the traversal occurred through a child that is not the last one. ( let Childj with j < degree ). The next is then the value valj in this ancestor. Ex : next of 9 ⇒ 10, next of 14 ⇒ 16, next of 27 ⇒ 32, next of 37 ⇒ 40, next of 44 ⇒ 47, next of 54 ⇒ 55, next of 57 ⇒ 70, next of 88 ⇒ 89, next of 93 ⇒ 94, next of 94 ⇒ 96, next of 97 ⇒ 99. 19 Exercise Provide the algorithm for the following interval query : « find all values in a tree belonging to the interval [ a , b ] Hints : - Use the search algorithm with a stack, to locate the value a - use the stack to access the next elements in inorder traversal, continuing through the interval until you reach or exceed the value. 20 Insertion Mechanism If the last accessed (visited) node through the search algorithm is not full... a 24 40 69 82 b c d e f 2 5 12 20 27 30 42 50 55 60 71 80 90 95 97 99 g h i 10 47 48 63 65 66 Then insert the new value in this node (shifts within the block) a 24 40 69 82 b c d e f 2 5 12 20 27 30 42 50 55 60 71 80 90 95 97 99 Ex : Insertion g h i of 64 10 47 48 63 64 65 66 The last visited node 21 Insertion Mechanism If the last accessed (visited) node through the search algorithm is already full 100 %... a 24 40 69 82 b c d e f 2 5 12 20 27 30 42 50 55 60 71 80 90 95 97 99 g 10 h 47 48 i 63 64 65 66 Then allocate a new block, put the new value and link it to the tree a 24 40 69 82 b c d e f 2 5 12 20 27 30 42 50 55 60 71 80 90 95 97 99 Ex : Insertion g h 47 48 i 10 63 64 65 66 of 61 Le dernier nœud visité n 61 The new allocated node 22 Insertion Mechanism To insert a new value V in an m-ary search tree, you must : 1- Search V to check that it does not exist and locate the last visited node P. The search returns also k index that indicates where V is found to preserve values order in P. 2- IF P is not full, 2.1- Insert V in P, by shifts to preserve values order in the table. SINON 2.2- Allocate a new block Q, and put the unique value V and initialize its 2 children with value = -1 2.3- Link Q as Childk of P (necessarily Childk was -1) FSI Why ? Remarks - Preserve the Top-Down property - Can lead to a significant imbalance in the tree - → Therefore requires periodic reorganizations. 23 Deletion Mechanism a- Logical Deletion: Add a logical deletion indicator for each value i 32/f 33/f 40/f 99/f a k h 10/f 16/t 18/f 20/f 37/f 55/t 70/t 96/f 97/f b c d e 2/f 3/f 5/f 9/f 27/f 30/f 42/f 47/f 50/t 54/f 71/f 76/f 80/f 94/f f 12/f 14/f 57/t 60/f g j n 43/f 44/f 89/f 90/t 92/f 93/f m 82/t 83/f 84/f 88/t In this example, the values 16, 50, 55, 57, 70, 82, 88 and 90 has been deleted 24 Deletion Mechanism b- Physical Deletion of a value v : 1. Search for the node that contains the value v ⇒ let p be the found node and q its parent (or -1) 2. IF p is a leaf Delete v using shifts ; IF p becomes empty, free p (and update the parent q) End_IF stop 3. ELSE // p is an internal node 3.1 Find the next or the previous inorder ⇒ value : v’ and node : p’ 3.2 Replace v with v’ in p 3.3 v ← v’ ; p ← p’ ; Goto 2 a 24 40 69 82 b c d e f 2 5 12 20 27 30 42 50 55 60 71 80 90 95 97 99 g h i 10 47 48 63 65 66 25 - Physical Deletion of a value v : 1. Search for the node that contains the value v ⇒ let p be the found node and q its parent (or -1) 2. IF p is a leaf Delete v using shifts ; IF p becomes empty, free p (and update the parent q) End_IF stop 3. ELSE // p is an internal node 3.1 Find the next or the previous inorder ⇒ value : v’ and node : p’ 3.2 Replace v with v’ in p 3.3 v ← v’ ; p ← p’ ; Goto 2 Ex : del (40) 1 look for 40 ⇒ (a,2) // an internal node... a 24 40 69 82 b c d e f 2 5 12 20 27 30 42 50 55 60 71 80 90 95 97 99 g h i 10 47 48 63 65 66 26 - Physical Deletion of a value v : 1. Search for the node that contains the value v ⇒ let p be the found node and q its parent (or -1) 2. IF p is a leaf Delete v using shifts ; IF p becomes empty, free p (and update the parent q) End_IF stop 3. ELSE // p is an internal node 3.1 Find the next or the previous inorder ⇒ value : v’ and node : p’ 3.2 Replace v with v’ in p 3.3 v ← v’ ; p ← p’ ; Goto 2 Ex : del (40) 1 look for 40 ⇒ (a,2) // an internal node … 3 next inorder of 40 = 42 (d,1) and replacing 40 with 42 in a // d is also an internal node... a 24 42 69 82 b c d e f 2 5 12 20 27 30 42 50 55 60 71 80 90 95 97 99 g h i 10 47 48 63 65 66 27 - Physical Deletion of a value v : 1. Search for the node that contains the value v ⇒ let p be the found node and q its parent (or -1) 2. IF p is a leaf Delete v using shifts ; IF p becomes empty, free p (and update the parent q) End_IF stop 3. ELSE // p is an internal node 3.1 Find the next or the previous inorder ⇒ value : v’ and node : p’ 3.2 Replace v with v’ in p 3.3 v ← v’ ; p ← p’ ; Goto 2 Ex : del (40) 1 look for 40 ⇒ (a,2) // an internal node … 3 next inorder of 40 = 42 (d,1) and replacing 40 with 42 in a // d is also an internal node … 3 next inorder of 42 = 47 (h,1) and replacing 42 with 47 in d // h is a leaf node … a 24 42 69 82 b c d e f 2 5 12 20 27 30 47 50 55 60 71 80 90 95 97 99 g h i 10 47 48 63 65 66 28 - Physical Deletion of a value v : 1. Search for the node that contains the value v ⇒ let p be the found node and q its parent (or -1) 2. IF p is a leaf Delete v using shifts ; IF p becomes empty, free p (and update the parent q) End_IF stop 3. ELSE // p is an internal node 3.1 Find the next or the previous inorder ⇒ value : v’ and node : p’ 3.2 Replace v with v’ in p 3.3 v ← v’ ; p ← p’ ; Goto 2 Ex : del (40) 1 look for 40 ⇒ (a,2) // an internal node … 3 next inorder of 40 = 42 (d,1) and replacing 40 with 42 in a // d is also an internal node … 3 next inorder of 42 = 47 (h,1) and replacing 42 with 47 in d // h is a leaf node … 2 deletion of 47 using shifts in h. a 24 42 69 82 b c d e f 2 5 12 20 27 30 47 50 55 60 71 80 90 95 97 99 g h i 10 48 63 65 66 29

Use Quizgecko on...
Browser
Browser