Inf2003 Database Systems Indexing AY24/25 Trimester 1 Lecture Notes PDF

Document Details

ConscientiousDeciduousForest

Uploaded by ConscientiousDeciduousForest

Singapore Institute of Technology

2024

SIT

Zhang Wei

Tags

database systems indexing b+ tree data structures

Summary

These lecture notes cover database systems indexing, focusing on concepts like B+ trees and hashing, with examples and discussions, for the AY24/25 Trimester 1 student.

Full Transcript

INF2003: Database Systems Indexing AY24/25 Trimester 1 Zhang Wei SIT Oct. 22, 2024 About Me ▪ Email: [email protected] ▪ SIT Homepage: https://www.singaporetech.edu.sg/directory/faculty/wei-zhang ▪ Google Homepage: https://sites.google.com/view/sit-zhang-wei/ ▪ PhD in Computer Scienc...

INF2003: Database Systems Indexing AY24/25 Trimester 1 Zhang Wei SIT Oct. 22, 2024 About Me ▪ Email: [email protected] ▪ SIT Homepage: https://www.singaporetech.edu.sg/directory/faculty/wei-zhang ▪ Google Homepage: https://sites.google.com/view/sit-zhang-wei/ ▪ PhD in Computer Science from NTU. ▪ Research: applied AI + energy efficiency. ▪ Teaching database since 2019 ▪ Partly because I was involved in database tasks in A*STAR during 2017-19. ▪ Content was customized to diff. programmes, e.g., more in-depth for CS / SE. ▪ Now harmonized as INF2003; difficulty level lower. ▪ Looking after our Computing Science programme. ▪ One of SIT’s most popular programme and ~ half with 3.6+ GPA. ▪ Interested in inter-programme collaboration, activities, etc., email me. ▪ Interested in applied learning in my projects, welcome (but after this tri). 2nd Half Topics Indexing (mainly centralized) W8 Transactions and Concurrency W9 NoSQL W10 Blockchain W11 Data Warehouse W12 Files and File Organization ▪ 1 database, >= 1 tables. Data Files ▪ 1 table, >= 1 tuples, stored in data files. Page Page Page 1 2... n ▪ 1 file, >= data pages (a chunk of data file). ▪ Given a query requesting for certain data, DBMS figures out how to enable efficient access to the desired pages that contain the data. ▪ File organizations: ▪ Heap files (random order): scanning all records. ▪ Sorted files: good for order-sensitive data retrieval and range search. ▪ Indexes: additional data structures to optimize certain retrievals. ▪ Advantages (for a good file organization) – Fast. ▪ Speed up search. ▪ Fast updates. Discussion A. SELECT * FROM Students; CREATE INDEX index_name ON table_name (col_name) B. SELECT * FROM Students WHERE sid = 2001234; C. SELECT * FROM Students WHERE year >= 2013 AND year insert -> update. ▪ Insert the value into the leaf. Enough space? ▪ Ture: done. ▪ False: split the leaf into two new nodes. Redistribute evenly. Add parent node new key, i.e., duplicate the min value of the 2nd new node. ▪ Recursive execution. Level by level until no split is needed. ▪ Tree can grow, wider, deeper. Do not violate the min occupancy. ▪ Example: insert 8. ▪ Locate: which node? Insert: enough space? 14 20 27 33 root 2 3 5 7 14 16 20 24 27 29 33 34 38 39 B+ Tree: Insert 1) Split leaf -> insert 5 to parent. 14 20 27 33 root 2 3 5 7 8 14 16 20 24 27 29 33 34 38 39 2) Split parent. Done? 5 14 20 27 33 2 3 5 7 8 14 16 20 24 27 29 33 34 38 39 B+ Tree: Insert 3) New root. 20 root Tree grows. Wider. Deeper. 5 14 27 33 Be patient. 2 3 5 7 8 14 16 20 24 27 29 33 34 38 39 B+ Tree: Delete ▪ Locate -> remove -> low occupancy? ▪ True: done. ▪ False: try borrowing one from the nearest siblings; if cannot, merge. ▪ Recursive. Level by level (upwards) until no changes needed. Be patient. ▪ Tree can be smaller. 1) Delete 8. 20 root Will the tree be smaller? 5 14 27 33 2 3 5 7 8 14 16 20 24 27 29 33 34 38 39 B+ Tree: Delete 2) Delete 29. 20 root Will the tree be smaller? 5 14 27 33 2 3 5 7 14 16 20 24 27 29 33 34 38 39 Try to borrow if possible. 20 root Less structure changes. 5 14 27 34 2 3 5 7 14 16 20 24 27 33 34 38 39 B+ Tree: Delete 3) Delete 7. 20 root Will the tree be smaller? 5 14 27 34 2 3 5 7 14 16 20 24 27 33 34 38 39 Tree becomes smaller. 14 20 27 34 root Be patient until root. 2 3 5 14 16 20 24 27 33 34 38 39 A dynamic data structure that adjusts efficiently under inserts and deletes.

Use Quizgecko on...
Browser
Browser