CICS 210-02 Data Structures Lecture 16 PDF

Summary

This document is a lecture on data structures, focusing on multiway search trees and B-trees. It details insertion and deletion algorithms, and shows examples using various diagrams and figures to demonstrate these concepts.

Full Transcript

CICS 210-02 Data Structures Lecture 16 v2 11/04/2024 MJ Golin Some material modified from textbook © 2014 Goodrich, Tamassia, G...

CICS 210-02 Data Structures Lecture 16 v2 11/04/2024 MJ Golin Some material modified from textbook © 2014 Goodrich, Tamassia, Goldwasser CICS Fall 2024 Outline Multiway Search Trees 2-3-4 Trees Bottom Up Insertion into 2-3-4 trees Deletion from a 2-3-4 Tree Top Down Insertion into a 2-3-4 tree B trees 2 Multiway Search Trees A multi-way search tree is an ordered tree such that Each internal node has at least two children and stores 𝑑 − 1 key-object items 𝑘𝑖 , 𝑜𝑖 , where 𝑑 is the number of children the node has. For a node with children 𝑣1 , 𝑣2 , … , 𝑣𝑑 storing keys 𝑘1 , 𝑘2 , … , 𝑘𝑑−1 keys in the subtree of 𝑣1 are less than 𝑘1 keys in the subtree of 𝑣𝑖 are between 𝑘𝑖−1 and 𝑘𝑖 (𝑖 = 2, 3, … , 𝑑 − 1) keys in the subtree of 𝑣𝑑 are greater than 𝑘𝑑−1 The leaves store no items and serve as placeholders Note: just like with BSTs, when describing the data structure we will just concentrate on the keys and ignore the objects/values associated 11 24 with them. 2 6 8 15 27 32 Each node needs to store an ordered list of keys and 30 pointers. There are different ways of doing this. 3 Multiway Search Trees A multi-way search tree is an ordered tree such that Each internal node has at least two children and stores 𝑑 − 1 key-object items 𝑘𝑖 , 𝑜𝑖 , where 𝑑 is the number of children the node has. For a node with children 𝑣1 , 𝑣2 , … , 𝑣𝑑 storing keys 𝑘1 , 𝑘2 , … , 𝑘𝑑−1 keys in the subtree of 𝑣1 are less than 𝑘1 keys in the subtree of 𝑣𝑖 are between 𝑘𝑖−1 and 𝑘𝑖 (𝑖 = 2, 3, … , 𝑑 − 1) keys in the subtree of 𝑣𝑑 are greater than 𝑘𝑑−1 The leaves store no items and serve as placeholders Note: The leaves as drawn here are essentially nulls. They are convenient for the definition but we will often leave them out later when 11 24 drawing the trees and sometime call the bottom 2 6 8 15 27 32 nodes with data the leaves. 30 4 Multiway Search Trees We can extend the notion of inorder traversal from binary trees to multi-way search trees Namely, we visit item 𝑘𝑖 of node 𝑣 between the recursive traversals of the subtrees of 𝑣 rooted at children 𝑣𝑖 and 𝑣𝑖+1 An inorder traversal of a multi-way search tree visits the keys in increasing order 11 24 8 12 2 6 8 15 27 32 2 4 6 10 14 18 30 1 3 5 7 9 11 13 16 19 15 17 5 Multiway Search Trees Muti-way Searching is similar to search in a binary search tree Start at the root. At each internal node with children 𝑣1 , 𝑣2 , … , 𝑣𝑑 and keys 𝑘1 , 𝑘2 , … , 𝑘𝑑−1. If 𝑘 = 𝑘𝑖 𝑖 = 2, 3, … , 𝑑 − 1 : the search terminates successfully 𝑘 < 𝑘1 : we continue the search in child 𝑣1. 𝑘𝑖−1 < 𝑘 < 𝑘𝑖 𝑖 = 2, 3, … , 𝑑 − 1 : we continue the search in child 𝑣𝑖. 𝑘 > 𝑘𝑑−1 : we continue the search in child 𝑣𝑑. Reaching an external node terminates the search unsuccessfully Example: successful search for 30 11 24 2 6 8 15 27 32 30 6 Multiway Search Trees Muti-way Searching is similar to search in a binary search tree Start at the root. At each internal node with children 𝑣1 , 𝑣2 , … , 𝑣𝑑 and keys 𝑘1 , 𝑘2 , … , 𝑘𝑑−1. If 𝑘 = 𝑘𝑖 𝑖 = 2, 3, … , 𝑑 − 1 : the search terminates successfully 𝑘 < 𝑘1 : we continue the search in child 𝑣1. 𝑘𝑖−1 < 𝑘 < 𝑘𝑖 𝑖 = 2, 3, … , 𝑑 − 1 : we continue the search in child 𝑣𝑖. 𝑘 > 𝑘𝑑−1 : we continue the search in child 𝑣𝑑. Reaching an external node terminates the search unsuccessfully Example: unsuccessful search for 7 11 24 2 6 8 15 27 32 30 7 Outline Multiway Search Trees 2-3-4 Trees Bottom Up Insertion into 2-3-4 trees Deletion from a 2-3-4 Tree Top Down Insertion into a 2-3-4 tree B trees 8 2-3-4 trees A 2-3-4 tree (also called 2-4 tree or 2-3-4 tree) is a multi-way search with the following properties Node-Size Property: every internal node has at most four children (leaves contain 1, 2 or 3 keys) Depth Property: all the external nodes have the same depth Depending on the number of children, an internal node of a tree is called a 2- node, 3-node or 4-node 10 15 24 2 8 12 18 27 32 35 9 Height of a 2-3-4 tree Theorem: A 2-3-4 tree storing 𝑛 items has height 𝑂(log 𝑛). Proof: Let ℎ be the height of a 2-3-4 tree containing 𝑛 items. Since there are at least 2𝑖 items at depth 𝑖 = 0,1,2 … , ℎ and no items at depth ℎ + 1 we have 𝑛 ≥ 1 + 2 + 22 + 23 + ⋯ + 2ℎ = 2ℎ+1 − 1 𝑛+1 ⇒ ℎ ≤ log 2 ≤ log 2 2𝑛 = log2 𝑛 − 1. 2 ⇒ Searching in a 2-3-4 tree with 𝑛 items only takes 𝑂(log 𝑛) time! depth items 0 1 1 2 h 2h h+1 0 10 More on 2-3-4 trees Theorem: A 2-3-4 tree storing 𝑛 items has height 𝑂(log 𝑛). Consequence: Searching for an item in a 2-3-4 tree only requires 𝑂(log 𝑛) time. Note that a lot in the 2-3-4 tree implementation has been left unspecified, e.g., how to implement a node and how to search for a key or child pointer in a node. Without going into further details we note that one way to implement 2-3-4 trees is via (restricted) Binary Search Trees with a special type of edge coloring. These are known as red- black trees and are the balanced BSTs that Java uses in its implantations of TreeMap and to store a chain when chaining in Hashing. 11 Outline Multiway Search Trees 2-3-4 Trees Bottom Up Insertion into 2-3-4 trees Deletion from a 2-3-4 Tree Top Down Insertion into a 2-3-4 tree B trees 12 Bottom Up Insertion into a 2-3-4 tree. (1) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝒌. We preserve the depth property 18 Insert 30 10 25 35 5 13 15 20 27 29 40 50 13 Bottom Up Insertion into a 2-3-4 tree. (2) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝑘. We preserve the depth property 18 Insert 30 10 25 35 Insert 28 5 13 15 20 27 29 30 40 50 14 Bottom Up Insertion into a 2-3-4 tree. (3) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝑘. We preserve the depth property We must preserve the node-size property If an overflow occurs in node 𝑣 with (4 keys) we 18 split (𝑘1 , 𝑘2 , 𝑘3 , 𝑘4 ) into two nodes (𝑘1 ) and (𝑘3 , 𝑘4 ) with a parent 𝑘2. 10 25 35 5 13 15 20 27 28 29 30 40 50 Violates the Insert 28 node-size property 15 Bottom Up Insertion into a 2-3-4 tree. (4) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝑘. We preserve the depth property We must preserve the node-size property If an overflow occurs in node 𝑣 with (4 keys) we 18 split (𝑘1 , 𝑘2 , 𝑘3 , 𝑘4 ) into two nodes (𝑘1 ) and (𝑘3 , 𝑘4 ) with a parent 𝑘2. 10 25 35 The (𝑘2 ) node is then inserted appropriately into the parent 𝑢 of 𝑣 5 13 15 20 40 50 28 27 28 29 30 ⇒ 27 29 30 16 Bottom Up Insertion into a 2-3-4 tree. (5) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝑘. We preserve the depth property We must preserve the node-size property If an overflow occurs in node 𝑣 with (4 keys) we 18 split (𝑘1 , 𝑘2 , 𝑘3 , 𝑘4 ) into two nodes (𝑘1 ) and (𝑘3 , 𝑘4 ) with a parent 𝑘2. 10 25 28 35 The (𝑘2 ) node is then inserted appropriately into the parent 𝑢 of 𝑣 5 13 15 20 27 29 30 40 50 ⇒ Note: We chose to split into (𝑘1 ) and (𝑘3 , 𝑘4 ) with a parent 𝑘2. We could have split into 27 28 29 30 (𝑘1 , 𝑘2 ) and (𝑘4) with a parent 𝑘3. Some implementations do this. 17 Bottom Up Insertion into a 2-3-4 tree. (6) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝑘. We preserve the depth property We must preserve the node-size property We handle an overflow at node 𝒗 with 4 keys with a 18 split operation: let 𝒌𝟏, 𝒌𝟐 , 𝒌𝟑, , 𝒌𝟒 be the keys of 𝒗 node 𝒗 is replaced by nodes 𝒗′ and 𝒗′′. 10 25 28 35 𝒗′ is a 2-node with key 𝒌𝟏 𝒗′′ is a 3-node with keys 𝒌𝟑, 𝒌𝟒 key 𝒌𝟐 is inserted into the parent 𝒖 of 𝒗 5 13 15 20 27 29 30 40 50 Note this this split operation maintained the depth property! This only works at the bottom level, though. 28 What happens if we overflow at the next level? For example, suppose that we try to insert 31 and 32. 27 28 29 30 ⇒ 27 29 30 18 Bottom Up Insertion into a 2-3-4 tree. (7) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝑘. We preserve the depth property We must preserve the node-size property 18 This only works at the bottom level, though. What happens if we overflow at the next level? 10 25 28 35 For example, suppose that we try to insert 31 and 32. We would split the node into (29) and (31, 32) with parent (30). 5 13 15 20 27 29 30 31 32 40 50 19 Bottom Up Insertion into a 2-3-4 tree. (8) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝑘. We preserve the depth property We must preserve the node-size property 18 This only works at the bottom level, though. What happens if we overflow at the next level? 10 25 28 35 For example, suppose that we try to insert 31 and 32. We would split the node into (29) and (31, 32) with parent (30). 5 13 15 20 27 40 50 30 29 30 31 32 ⇒ 29 31 32 20 Bottom Up Insertion into a 2-3-4 tree. (9) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝑘. We preserve the depth property We must preserve the node-size property 28 25 30 35 18 This only works at the bottom level, though. What happens if we overflow at the next level? 10 25 28 30 35 For example, suppose that we try to insert 31 and 32. We would split the node into (29) and (31, 32) with parent (30). 5 13 15 20 27 29 31 32 40 50 The (30) now gets pushed up into the parent node but its parent node is now overflowing (it has 4 keys). So we need to split the parent node. 21 Bottom Up Insertion into a 2-3-4 tree. (10) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝑘. We preserve the depth property We must preserve the node-size property 18 28 This only works at the bottom level, though. What happens if we overflow at the next level? 10 25 30 35 For example, suppose that we try to insert 31 and 32. We would split the node into (29) and (31, 32) with parent (30). 5 13 15 20 27 29 31 32 40 50 The (30) now gets pushed up into the parent node but its parent node is now overflowing (it has 4 keys). So we need to split the parent node. 22 Bottom Up Insertion into a 2-3-4 tree. (11) We insert a new item 𝒌 into the leaf that should contain it in a search for 𝑘. We preserve the depth property We must preserve the node-size property 2 20 Overview 1 3 6 9 12 23 27 We handle an overflow at 𝒗 with 4 keys using a split operation: Let 𝒗𝟏, … , 𝒗𝟓 and 𝒌𝟏, 𝒌𝟐, 𝒌𝟑, 𝒌𝟒 be the children and keys of 𝒗 𝒗𝟏 𝒗 𝒗 𝒗 𝒗𝟓 𝟐 𝟑 𝟒 Node 𝒗 is replaced nodes by 𝒗′ and 𝒗′′. ⇒ 𝒗′ is a 2-node with key 𝒌𝟏 ; children 𝒗𝟏 , 𝒗𝟐 , 𝒗′′ is a 3-node with keys 𝒌𝟑 , 𝒌𝟒 ; children 𝒗𝟑 , 𝒗𝟒 , 𝒗𝟓 2 6 20 Key 𝒌𝟐 is inserted into the parent 𝒖 of 𝒗 If 𝑣 had no parent a new root may be created The overflow may propagate to the parent of 𝑣. 1 3 9 12 23 27 𝒗𝟓 𝒗𝟏 𝒗𝟐 𝒗𝟑 𝒗𝟒 23 Bottom Up Insertion into a 2-3-4 tree. (12) If 𝑣 had no parent a new root may be created 6 𝒗’ 𝒗’ 2 20 30 2 20 30 2 6 20 30 ⇒ ⇒ 𝒗 1 3 6 9 12 23 27 40 23 27 40 1 3 9 12 23 27 40 1 3 9 12 But 𝒗′ had been the root 9 was just inserted This pushed 𝟔 up into and had no parent. 6 now into node 𝑣. Either at node 𝒗′. This overflows becomes the new root. bottommost level or 𝑣′ which splits and tries because of splits to push 𝟔 up into its Note that the depth propagating up parent. property is still maintained! 24 Bottom Up Insertion into a 2-3-4 tree. (12) Running time of insertion of key 𝑘 uses only 𝑂(log 𝑛) 𝑡𝑖𝑚𝑒. 6 1. Find place to insert 𝑘. 𝑂(log 𝑛) 2 20 30 2. Requires 𝑂 1 time for one step of: checking for overflow in current node splitting, 1 3 9 12 23 27 40 creating two new nodes, moving item up and appropriately modifying pointers. This can propagate up the (height 𝑂(log 𝑛)) tree from the first insertion leaf, so all of the extra work is only 𝑂(log 𝑛). 25 Outline Multiway Search Trees 2-3-4 Trees Bottom Up Insertion into 2-3-4 trees Deletion from a 2-3-4 Tree Top Down Insertion into a 2-3-4 tree B trees 26 Bottom Up Deletion from a 2-3-4 tree. (1) 10 15 24 2 8 12 18 27 32 35 𝑇1 𝑇2 𝑇3 𝑇4 𝑇5 𝑇6 𝑇7 𝑇8 𝑇9 𝑇10 𝑇11 To reduce the number of cases necessary to analyze we are going to combine leaf cases with non-leaf cases. To represent that, the blue boxes will represent subtrees. If the subtrees are null, i.e., we’re at the bottom of the tree, the blue boxes will be empty trees. 27 Bottom Up Deletion from a 2-3-4 tree. (2) Deletion of a key in a leaf node is pretty simple (except if it’s the only key, which we will discuss in a few pages). We just remove the key and its associated null pointers. Example: to delete key 27, just remove it 10 15 24 2 8 12 18 27 32 35 10 15 24 2 8 12 18 32 35 28 Bottom Up Deletion from a 2-3-4 tree. (3) Similar to BSTs we reduce deletion of an entry to the case where the item is at the node with leaf children If the entry to be deleted is not in a leaf, we replace the entry with its inorder successor (or, equivalently, with its inorder predecessor) and delete the latter entry Example: to delete key 24, we replace it with 27 (inorder successor) and “remove from old 27 location” 10 15 24 2 8 12 18 27 32 35 10 15 27 2 8 12 18 32 35 29 Bottom Up Deletion from a 2-3-4 tree. (4) Deleting an entry from a node 𝒗 may cause an underflow, where node 𝒗 becomes a 1- node with one child and no keys To handle an underflow at node 𝒗 with parent 𝒖, we consider two cases Case 1: 𝒗 has some adjacent sibling (child of same parent) 𝒘 that is a 2-node Fusion operation: merge 𝒗 with 𝒘 and move an entry from 𝒖 to the merged node 𝒗′ After a fusion, the underflow may propagate to the parent 𝒖 u u No propagation 9 14 9 Procedure terminates. w v v' 2 5 7 10 2 5 7 10 14 𝑇1 𝑇2 𝑇3 𝑇4 𝑇5 𝑇6 𝑇7 𝑇1 𝑇2 𝑇3 𝑇4 𝑇5 𝑇6 𝑇7 30 Bottom Up Deletion from a 2-3-4 tree. (5) Deleting an entry from a node 𝒗 may cause an underflow, where node 𝒗 becomes a 1- node with one child and no keys To handle an underflow at node 𝒗 with parent 𝒖, we consider two cases Case 1: 𝒗 has some adjacent sibling (child of same parent) 𝒘 that is a 2-node Fusion operation: merge 𝒗 with 𝒘 and move an entry from 𝒖 to the merged node 𝒗′ After a fusion, the underflow may propagate to the parent 𝒖 u u Underflow Propagation. 14 Procedure continues w v unless 𝑢 was root and v' 10 10 14 then 𝑣′ becomes root 𝑇1 𝑇2 𝑇3 𝑇1 𝑇2 𝑇3 31 Bottom Up Deletion from a 2-3-4 tree. (6) To handle an underflow at node 𝒗 with parent 𝒖, we consider two cases Case 2: Not case 1, so an adjacent sibling 𝒘 of 𝒗 is a 3-node or a 4-node Transfer operation: 1. Move a child of 𝒘 to 𝒗. (“closest” child in 𝑤 to 𝑣) 2. Move appropriate key from 𝒖 to 𝒗 3. Move appropriate key from 𝒘 to 𝒖 After a transfer, no underflow occurs and process stops u u 4 9 4 8 w v w v 2 6 8 2 6 9 𝑇1 𝑇2 𝑇3 𝑇4 𝑇5 𝑇6 𝑇1 𝑇2 𝑇3 𝑇4 𝑇5 𝑇6 32 Bottom Up Deletion from a 2-3-4 tree. (7) To handle an underflow at node 𝒗 with parent 𝒖, we consider two cases Case 2: Not case 1, so an adjacent sibling 𝒘 of 𝒗 is a 3-node or a 4-node Transfer operation: 1. Move a child of 𝒘 to 𝒗. (“closest” child in 𝑤 to 𝑣) 2. Move appropriate key from 𝒖 to 𝒗 3. Move appropriate key from 𝒘 to 𝒖 After a transfer, no underflow occurs and process stops u u 4 5 4 6 Another case 2 v w v w Example 2 68 2 5 8 𝑇1 𝑇2 𝑇3 𝑇4 𝑇5 𝑇6 𝑇1 𝑇2 𝑇3 𝑇4 𝑇5 𝑇6 33 Bottom Up Deletion from a 2-3-4 tree. (8) Running time analysis of Deletion 1. Finding the location of the item to remove (and possibly its smallest successor) requires 𝑂(log 𝑛) time 2. One application of Case 1 or Case 2 requires 𝑂(1) time 3. Case 1 might propagate UP but it can only propagate at most 𝑂(log 𝑛) times. So total running time is 𝑂 log 𝑛. 34 Outline Multiway Search Trees 2-3-4 Trees Bottom Up Insertion into 2-3-4 trees Deletion from a 2-3-4 Tree Top Down Insertion into a 2-3-4 tree B trees 35 Top Down Insertion into a 2-3-4 tree. (1) We previously saw how to insert items into a 2-3-4 tree from the bottom up. That is, the algorithm first inserted the new key at the bottom of the tree and then rebalanced the tree, if necessary, working from the bottom up, splitting nodes if needed, and then pushing a key up. We are now going to see how to perform insertion top down. This procedure walks down the tree, identifying the insertion path. At each step, it preemptively splits 4-nodes. This will ensure that, when it reaches the bottom, there will always be space to insert a key into the final node that is reached. 36 Top Down Insertion into a 2-3-4 tree. (2) To insert a key 𝑘. Walking down the insertion path. Currently at node 𝑣 Case 1: If 𝑣 is the root. Then if 𝑣 is a 4-node, i.e., has 4 children. split 𝑣 into a new root with two children. a bc b ⇒ a c 𝑇1 𝑇2 𝑇3 𝑇4 𝑇1 𝑇2 𝑇3 𝑇4 37 Top Down Insertion into a 2-3-4 tree. (3) To insert a key 𝑘. Walking down the insertion path. Currently at node 𝑣 (could be a leaf) Case 2: If 𝒗 is not the root. Then, if 𝒗 is a 4-node, let 𝒖 be 𝒗’s parent. Move middle key in 𝒗 up to 𝒖. Because 𝒖 has already been processed, 𝒖 is not 4-node, so there is room to insert this middle key. 𝒖 𝒖 …b.. 𝒗 𝒗′ 𝒗′′ a bc a c ⇒ 𝑇1 𝑇2 𝑇3 𝑇4 𝑇1 𝑇2 𝑇3 𝑇4 38 Top Down Insertion into a 2-3-4 tree. (4) To insert a key 𝑘. Walking down the insertion path. Currently at node 𝑣 Case 3: Node 𝒗 is a leaf in the tree and is not a 4-node. Then there is space in 𝒗 to insert 𝑘. Do so. 𝒗 Insert 𝑘 into the appropriate a b place in 𝑣. leaf 39 Top Down Insertion into a 2-3-4 tree. (4) To insert a key 𝑘. Walking down the insertion path. Currently at node 𝑣 Case 4: Node 𝒗 is not a 4-node and is not a leaf in the tree. Then do nothing. Just walk down to the next level with no change in the tree. 40 Top Down Insertion into a 2-3-4 tree. (5) To insert a key 𝑘. Walking down the insertion path. Currently at node 𝑣. Case 1: If 𝑣 is the root and 𝑣 is a 4-node, i.e., has 4 children. Split 𝑣 into a new root with two children Case 2: If 𝒗 is not the root and 𝒗 is a 4-node. Let 𝒖 be 𝒗’s parent. Move middle key in 𝒗 up to 𝒖. Because 𝒖 has already been processed 𝒖 is not 4-node, so there is place to insert this middle key. Case 3: Node 𝒗 is a leaf in the tree and is not a 4-node. Then there is space in 𝒗 to insert 𝑘. Do so. Case 4: Node 𝒗 is not a 4-node and not a leaf in the tree. Do Nothing. Algorithm must always reach case 3, so it always succeeds. Each case requires 𝑂(1) time. Case 1 can occur at most once. Case 3 occurs exactly once. ⇒ 𝑂(log 𝑛) time Cases 2 & 4 can occur at most 𝑂(log 𝑛) times (height of the tree). 41 Top Down Insertion into a 2-3-4 tree. (6) From. https://www.cs.usfca.edu/~galles/visualization/BTree.html Inserting 1-18 into tree, in order using max-degree = 4 and preemptive split/merge settings 42 Top Down Insertion into a 2-3-4 tree. (7) 43 Top Down Insertion into a 2-3-4 tree. (8) 44 Outline Multiway Search Trees 2-3-4 Trees Bottom Up Insertion into 2-3-4 trees Deletion from a 2-3-4 Tree Top Down Insertion into a 2-3-4 tree B trees Definitions and properties Insertion and deletion 45 B-trees A 𝐵 tree of order 𝑑 is a multiway tree in which 1. Every node has at most 𝑑 children 𝑑 2. Every node has at least ⌈ ⌉ children 2 3. The root node has at least 2 children unless it’s a leaf. 4. All leaves appear on the same level. 5. A non-leaf node with 𝑘 children has 𝑘 − 1 keys This is actually a generalization of 2-3-4 trees! Note: The definition of B-Trees is not fully standardized. All the 2-3-4 trees are B trees of order 4 definitions are almost For simplicity we’ll often assume 𝑑 is odd. equivalent, though. 46 B-trees A 𝐵 tree of order 𝑑 is a multiway tree in which 1. Every node has at most 𝑑 children 𝑑 2. Every node has at least ⌈2⌉ children 3. The root node has at least 2 children unless it’s a leaf. 4. All leaves appear on the same level. 5. A non-leaf node with 𝑘 children has 𝑘 − 1 keys A B tree of order 6 47 B-trees A 𝐵 tree of order 𝑑 is a multiway tree in which 𝑑 1. Every non-root node has between ⌈ 2⌉ and 𝑑 children and one less key than children 2. The root node has at least 2 children unless it’s a leaf. 3. All leaves appear on the same level. Theorem: An order 𝑑 B-tree storing 𝑛 items has height log 𝑑 𝑛 + O(1) Conclusion: The larger the order, the shallower the tree and the faster the search Why wouldn’t we want to use the highest order B tree possible? If 𝑑 is large, it can take a lot of time to search the keys in a large node. 48 B-trees A 𝐵 tree of order 𝑑 is a multiway tree in which 𝑑 1. Every non-root node has between ⌈ 2⌉ and 𝑑 children and one less key than children 2. The root node has at least 2 children unless it’s a leaf. 3. All leaves appear on the same level. Why is this even remotely interesting? Because B-trees (with various tweaks), with large orders, are the method of choice for storing large databases that don’t fit into RAM. Most large database systems are based on them. Essentially, once your DB gets too large, it has to be stored on external disks (HDDs, SSDs, etc). When this happens disk accesses (bringing memory in from disk) are orders of magnitude more expensive than RAM operations and tend to dominate running times. Part of the game here is minimizing the number of disk page accesses made. B trees are really good for this. 49 B-trees Essentially, once your DB gets too large, it has to be stored on external disks (HDDs, SSDs, etc). When this happens disk accesses (bringing memory in from disk) are orders of magnitude more expensive than RAM operations and tend to dominate running times. Part of the game here is minimizing the number of disk page accesses made. B trees are really good for this. Disc accesses don’t return 1 Byte. They return an entire page (a page could be typically between 512 to 8192 bytes). The main idea behind using B-trees as an external memory method for data storage is to set your B-tree order so that one node fits into a page of data. Modern database systems (from Oracle, Microsoft and others) have 𝑑 ranging between 50-256 or so 50 B trees Various Small issues The description on the previous page was a gross simplification. Modern computers have multi-level memory hierarchies, not just a 2-level one with RAM and disk drives. B trees still work well, though (the tweaks I mentioned). In practice the B-trees are actually maps, storing (key, value) pairs where the value is usually some record or object. The B-trees don’t really store the value in the pair, though. Instead, the value field in the pair is usually a pointer to the location of the real value record/object (possibly another location on disk). A well known (and commercially available) variant of B trees are B+ trees. In B+ trees the (key, value) pairs are only stored in the bottom (leaf) level of the tree. The internal nodes only store keys, which can be viewed as directional pointers to be followed down to the leaves. This technique permits leaving more space for the keys, allowing higher order trees and thus faster lookup times. 51 B+ tree From https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html Range [−∞, 7) Range [7,13) Range [13, ∞) Data is kept at the bottom (leaves). Internal node keys are only used for directional movement. Each internal node can be thought of as representing a range, with its subtree containing all leaves in that range. The leaves at the bottom can be linked together with pointers in sequential order 52 Outline Multiway Search Trees 2-3-4 Trees Bottom Up Insertion into 2-3-4 trees Deletion from a 2-3-4 Tree Top Down Insertion into a 2-3-4 tree B trees Definitions and properties Insertion and deletion 53 B trees. Insertion (1) A 𝐵 tree of order 𝑑 is a multiway tree in which 𝑑 1. Every non-root node has between 2 and 𝑑 children and one less key than children 2. The root node has at least 2 children unless it’s a leaf. 𝑑 d+1 Note that if 𝑑 odd, = 3. All leaves appear on the same level. 2 2 Insertion: Just a modification of 2-3-4 tree bottom-up insertion Insert key into appropriate (leaf) node 𝑣 If 𝑣 now has fewer than 𝑑 keys (𝑑 + 1 children), stop. 𝑑−1 𝑑+1 If 𝑣 now has 𝑑 keys, split these keys into two nodes each with keys and children and a 2 2 root with one child. Push that root up into 𝑣 ′ 𝑠 parent. Maintains that 𝒖 𝒖 all leaves are 𝑘𝑑+1 … , 𝑘𝑑+1 … at same height 2 2 ⇒ 𝒗 𝑘1 , … 𝑘𝑑−1 𝑘𝑑+3 , … , 𝑘𝑑 𝑘1 , … 𝑘𝑑−1 𝑘𝑑+3 , … , 𝑘𝑑 𝑘1 , 𝑘2 , … , 𝑘𝑑 2 2 2 2 54 B trees. Insertion (2) A 𝐵 tree of order 𝑑 is a multiway tree in which 𝑑 1. Every non-root node has between ⌈ 2⌉ and 𝑑 children and one less key than children 2. The root node has at least 2 children unless it’s a leaf. 3. All leaves appear on the same level. Insertion: Just a modification of 2-3-4 tree bottom-up insertion 𝑑−1 𝑑+1 If 𝑣 now has 𝑑 keys, split these keys into two nodes each with keys and children and a 2 2 root with one child. Push that root up into 𝑣 ′ 𝑠 parent. 𝒖 𝒖 𝑘𝑑+1 … , 𝑘𝑑+1 … Maintains that ⇒ 2 2 all leaves are at same height 𝒗 𝑘1 , … 𝑘𝑑−1 𝑘𝑑+3 , … , 𝑘𝑑 𝑘1 , … 𝑘𝑑−1 𝑘𝑑+3 , … , 𝑘𝑑 𝑘1 , 𝑘2 , … , 𝑘𝑑 2 2 2 2 This is legal because we assumed that 𝑑 is odd. The size of the two new subtrees therefore satisfies (2) It is possible that pushing 𝑘𝑑+1 will cause an overflow in 𝑢, with 𝑢 now having exactly 𝑑 keys. 2 If this happens, just recurse, same as with 2-3-4 trees. This procedure must terminate with a legal B tree. 55 B trees. Deletion (1) A 𝐵 tree of order 𝑑 is a multiway tree in which 𝑑 1. Every non-root node has between ⌈ 2⌉ and 𝑑 children and one less key than children 2. The root node has at least 2 children unless it’s a leaf. 𝑑 d+1 Note that if 𝑑 odd, = 3. All leaves appear on the same level. 2 2 Deletion: Again just a modification of 2-3-4 tree deletion First if key is not in a leaf, swap it with its smallest successor. So we may assume the deletion is a key from some leaf 𝑣. 𝑑 If after this deletion, 𝑣 still has ≥ children, do nothing. 2 𝑑 If it has < children, fix using the same techniques as with 2-3-4 trees 2 𝑑 If it has an adjacent sibling key with exactly children, can modify the fusion and 2 transfer procedure for 2-3-4 trees to fix this. 56 B trees. Deletion (2) A 𝐵 tree of order 𝑑 is a multiway tree in which 𝑑 1. Every non-root node has between ⌈ 2⌉ and 𝑑 children and one less key than children 2. The root node has at least 2 children unless it’s a leaf. 𝑑 d+1 Note that if 𝑑 odd, = 3. All leaves appear on the same level. 2 2 Deletion: Again just a modification of 2-3-4 tree deletion 𝑑 𝑑 𝑣 has < children, i.e. − 1. Fix using the same techniques as with 2-3-4 trees 2 2 𝑑 If it has an adjacent sibling key with exactly children, can modify 2-3-4 fusion op. 2 u u 𝑥𝑎𝑦 𝑥 𝑦 Note: This might v ⇒ propagate up if 𝑢 now 𝑘′1 , … 𝑘′ 𝑑 𝑘1 , … 𝑘 𝑑 𝑘1 , … 𝑘 𝑑 , 𝑎, 𝑘1 , … 𝑘 𝑑 undeflows! w 2 2 −1 w 2 2 −1 ≤ 𝑑, so legal 57 B trees. Deletion (3) A 𝐵 tree of order 𝑑 is a multiway tree in which 𝑑 1. Every non-root node has between ⌈ 2⌉ and 𝑑 children and one less key than children 2. The root node has at least 2 children unless it’s a leaf. 𝑑 d+1 Note that if 𝑑 odd, = 3. All leaves appear on the same level. 2 2 Deletion: Again just a modification of 2-3-4 tree deletion 𝑑 𝑑 𝑣 has < children, i.e. − 1. Fix using the same techniques as with 2-3-4 trees 2 2 𝑑 If does not have an adjacent sibling key with exactly children, can modify 2-3-4 transfer op 2 u u 𝑘𝑡′ 𝑥𝑎𝑦 𝑥 𝑘𝑡′ 𝑦 v ⇒ 𝑘1 , … 𝑘 𝑑 v 𝑘′1 , … 𝑘𝑡′ ′ 𝑘′1 , … 𝑘𝑡−1 𝑎, 𝑘1 , … 𝑘 𝑑 w 2 −1 w 2 −1 58

Use Quizgecko on...
Browser
Browser