Podcast
Questions and Answers
What is the primary goal of an index in the context of databases?
What is the primary goal of an index in the context of databases?
- To complicate the retrieval of data.
- To reorganize the physical storage of data.
- To slow down the retrieval of data.
- To provide speedy retrieval of data. (correct)
What are search keys also referred to as?
What are search keys also referred to as?
- Index Value
- Search Attribute
- Termed Value
- Termed Key (correct)
What does a search key consist of?
What does a search key consist of?
- Attributes from a different table
- The file path to the table
- Randomly generated values
- Attributes from the schema of the underlying table (correct)
Which type of index has a search key that is not the primary key for the underlying table?
Which type of index has a search key that is not the primary key for the underlying table?
Which of the following scenarios exemplifies a secondary-key index?
Which of the following scenarios exemplifies a secondary-key index?
What is a characteristic of a primary-key index?
What is a characteristic of a primary-key index?
Which of the following is NOT a choice of what to store in an index?
Which of the following is NOT a choice of what to store in an index?
In an index-based table, how many indexes can store the entire table?
In an index-based table, how many indexes can store the entire table?
When the table is sorted by the same attribute as the Search Key, this produces a comb-like shape which is an indication of what?
When the table is sorted by the same attribute as the Search Key, this produces a comb-like shape which is an indication of what?
What is a key characteristic of a clustered index?
What is a key characteristic of a clustered index?
What is a key difference between dense and sparse indexes?
What is a key difference between dense and sparse indexes?
If an un-clustered index is used for a range search, what is likely to occur?
If an un-clustered index is used for a range search, what is likely to occur?
In the context of query processing with un-clustered indexes, what is the advantage of collecting TIDs?
In the context of query processing with un-clustered indexes, what is the advantage of collecting TIDs?
What is a multi-dimensional index?
What is a multi-dimensional index?
What is the simple solution to realize a composite index?
What is the simple solution to realize a composite index?
What type of queries can a tree-based index on concatenated attributes A and B efficiently answer?
What type of queries can a tree-based index on concatenated attributes A and B efficiently answer?
Assume a table Enrolled (sid, cid, grade) and Scenario 2: Build two single-attribute indexes; one on A and one on B related to composite indexes vs. One-attribute indexes; which of the following is a strategy to execute the query that use the index of SID and CID to find the grade?
Assume a table Enrolled (sid, cid, grade) and Scenario 2: Build two single-attribute indexes; one on A and one on B related to composite indexes vs. One-attribute indexes; which of the following is a strategy to execute the query that use the index of SID and CID to find the grade?
When building single-attribute indexes for a composite query, which factor determines the best strategy?
When building single-attribute indexes for a composite query, which factor determines the best strategy?
What is a characteristic of ISAM?
What is a characteristic of ISAM?
Which operation does ISAM handle well?
Which operation does ISAM handle well?
In ISAM, what happens when data is skewed during inserts?
In ISAM, what happens when data is skewed during inserts?
What is a key feature of all nodes in a B-tree or B+-tree?
What is a key feature of all nodes in a B-tree or B+-tree?
What performance characteristic is associated with B-tree operations?
What performance characteristic is associated with B-tree operations?
What is the minimum occupancy guarantee for a B-tree node (except for the root)?
What is the minimum occupancy guarantee for a B-tree node (except for the root)?
Which of the following is specific to B+-trees compared to B-trees?
Which of the following is specific to B+-trees compared to B-trees?
Why are leaf nodes linked together in B+-trees?
Why are leaf nodes linked together in B+-trees?
How does B+-tree search begin?
How does B+-tree search begin?
After locating the leaf page in a B+-tree, how is data within the leaf page typically accessed?
After locating the leaf page in a B+-tree, how is data within the leaf page typically accessed?
In a B+-tree insert operation, what happens if the target leaf node is full?
In a B+-tree insert operation, what happens if the target leaf node is full?
What is the first consideration when deleting from B+-tree?
What is the first consideration when deleting from B+-tree?
What happens when a B+-tree leaf node is exactly half full during a deletion?
What happens when a B+-tree leaf node is exactly half full during a deletion?
During B+-tree deletion, what initiates a merge of leaf nodes?
During B+-tree deletion, what initiates a merge of leaf nodes?
What is the first step in bulk loading data into a B+-tree?
What is the first step in bulk loading data into a B+-tree?
What occupancy level should be used when bulk loading a B+-tree?
What occupancy level should be used when bulk loading a B+-tree?
What performance benefit does bulk loading provide over standard insertion into a B+-tree?
What performance benefit does bulk loading provide over standard insertion into a B+-tree?
What is Bulk Loading?
What is Bulk Loading?
Besides speeding up data retrieval, what other purposes can B+-tree indexes serve?
Besides speeding up data retrieval, what other purposes can B+-tree indexes serve?
Why are tree indexes popular?
Why are tree indexes popular?
For what usage scenarios is the B-tree particularly well-suited?
For what usage scenarios is the B-tree particularly well-suited?
What is the overall time complexity for operations performed on a B-tree?
What is the overall time complexity for operations performed on a B-tree?
What is the expected output of an index, given one or more values, or a range of values, as input?
What is the expected output of an index, given one or more values, or a range of values, as input?
If an index's search key consists of the primary key of the underlying table, how is the index classified?
If an index's search key consists of the primary key of the underlying table, how is the index classified?
Under what circumstance is the Search key of the index is NOT the primary key for the underlying table?
Under what circumstance is the Search key of the index is NOT the primary key for the underlying table?
Which of the following is true when considering what an index stores?
Which of the following is true when considering what an index stores?
In an index that contains (Key Value, Tuple-Identifier) pairs, what do the tuple identifiers do?
In an index that contains (Key Value, Tuple-Identifier) pairs, what do the tuple identifiers do?
What is indicated when comb-like connections are observed between the index and the table?
What is indicated when comb-like connections are observed between the index and the table?
In the context of sparse indexes, what does an index entry typically point to?
In the context of sparse indexes, what does an index entry typically point to?
Which of the following is true about index-based tables?
Which of the following is true about index-based tables?
How does a clustered index affect the physical ordering of a table?
How does a clustered index affect the physical ordering of a table?
In the context of database indexes, what is the primary difference between a dense index and a sparse index?
In the context of database indexes, what is the primary difference between a dense index and a sparse index?
What is a notable disadvantage of using un-clustered indexes?
What is a notable disadvantage of using un-clustered indexes?
How can the cost of query processing for un-clustered indexes be addressed?
How can the cost of query processing for un-clustered indexes be addressed?
What is the result of concatenating sid and cid?
What is the result of concatenating sid and cid?
If a database has a table 'Employees' with attributes 'department' and 'salary', which query would benefit most from a composite index on ('department', 'salary')?
If a database has a table 'Employees' with attributes 'department' and 'salary', which query would benefit most from a composite index on ('department', 'salary')?
What benefit can be derived from using a secondary key index that stores a Key Value and Set of Tuple-identifiers?
What benefit can be derived from using a secondary key index that stores a Key Value and Set of Tuple-identifiers?
When using single-attribute indexes for a composite query, which strategy is the best?
When using single-attribute indexes for a composite query, which strategy is the best?
How can building multiple one-attribute indexes be advantageous over building a composite index?
How can building multiple one-attribute indexes be advantageous over building a composite index?
What happens to index performance in an ISAM (Indexed Sequential Access Method) system when data is skewed during inserts?
What happens to index performance in an ISAM (Indexed Sequential Access Method) system when data is skewed during inserts?
What is the consequence of inserting a large amount of data into an ISAM index, leading to long chains of overflow pages?
What is the consequence of inserting a large amount of data into an ISAM index, leading to long chains of overflow pages?
What is a common feature exhibited by nodes in a B-tree and B+-tree?
What is a common feature exhibited by nodes in a B-tree and B+-tree?
How does performance scale with the number of disk pages at the leaf level in a B-tree?
How does performance scale with the number of disk pages at the leaf level in a B-tree?
Which aspect distinguishes a B+-tree from a B-tree?
Which aspect distinguishes a B+-tree from a B-tree?
Why does a B+-tree link leaf nodes together?
Why does a B+-tree link leaf nodes together?
In a B+-tree, where does data exist?
In a B+-tree, where does data exist?
What is one of the first considerations during the deletion of a B+-tree?
What is one of the first considerations during the deletion of a B+-tree?
In the context of bulk loading a B+-tree, what dictates the order for adding data entries?
In the context of bulk loading a B+-tree, what dictates the order for adding data entries?
What happens to the tree height if the root is full and root splits into two?
What happens to the tree height if the root is full and root splits into two?
What happens to the rest of the nodes in a B+-tree after deletion?
What happens to the rest of the nodes in a B+-tree after deletion?
In B+-tree indexes: Testing for Integrity Constraints, is it true or false that you could check if key of a table is unique, system can construct a B+-tree index to check quickly for this integrity constraint condition?
In B+-tree indexes: Testing for Integrity Constraints, is it true or false that you could check if key of a table is unique, system can construct a B+-tree index to check quickly for this integrity constraint condition?
Which of the following is true about ISAM?
Which of the following is true about ISAM?
What is the initial strategy for handling inserts?
What is the initial strategy for handling inserts?
In what usage scenarios is ISAM particularly well-suited?
In what usage scenarios is ISAM particularly well-suited?
Of the options, which one is the main feature of the B-tree family?
Of the options, which one is the main feature of the B-tree family?
Which of the following is the correct Big O representation for the B-tree family of indexes insert, deletes, updates, and search?
Which of the following is the correct Big O representation for the B-tree family of indexes insert, deletes, updates, and search?
Assume one performed operations such that there were an odd number of children, what would be the correct way to determine which key to copy to the parent?
Assume one performed operations such that there were an odd number of children, what would be the correct way to determine which key to copy to the parent?
Flashcards
Target of an index
Target of an index
The target of an index is to provide speedy retrieval of data from an underlying table.
Primary-key index
Primary-key index
The search key of the index is the primary key for the underlying table
Secondary-key index
Secondary-key index
Search key of the index is NOT the primary key for the underlying table
Clustered index
Clustered index
Signup and view all the flashcards
Un-clustered index
Un-clustered index
Signup and view all the flashcards
Dense index
Dense index
Signup and view all the flashcards
Sparse index
Sparse index
Signup and view all the flashcards
Composite index
Composite index
Signup and view all the flashcards
Tree-based index on AB
Tree-based index on AB
Signup and view all the flashcards
Hash-based index on AB
Hash-based index on AB
Signup and view all the flashcards
ISAM
ISAM
Signup and view all the flashcards
B-tree or B+-tree is balanced
B-tree or B+-tree is balanced
Signup and view all the flashcards
How many keys can a B-tree node store?
How many keys can a B-tree node store?
Signup and view all the flashcards
Searching in B+-tree
Searching in B+-tree
Signup and view all the flashcards
Invariant of B+-tree
Invariant of B+-tree
Signup and view all the flashcards
Study Notes
- Indexes target speedy data retrieval from a table.
- Inputs may be a single value, multiple values (search key), or a range of values.
- Outputs are tuples from the index with matching values, or within the specified range.
- Search Key refers to one or more attributes from the underlying table's schema.
Classifications and Properties
Primary vs. Secondary Indexes
- Primary-key index search key is the primary key for the data table.
- Secondary-key index differs as its search key is not the table's primary key.
- Multiple tuples in the table can share the same search key value.
- GPA index of students is an example of a secondary-key index.
- Multiple students can have the same GPA and multiple tids of students will match the same GPA.
- Student-id index is a primary-key index.
- Only one tuple exists for a given sid; one tid has that value.
Relationship Between Index and Underlying Table
- Three storage choices exist for the index: store the entire table (index-based tables), store key value with tuple-identifier pairs, or store key value with a set of tuple-identifiers.
- Index-based tables contain the entire table stored within the index and it generally uses hash and B-tree tables .
- Best used if the Search Key for the index corresponds to the primary key for the table, or is a unique key and tuples are stored only once inside the index.
- Can be accessed in log time
- Hash-based tables buckets contain the entire table and a hash function produces the bucket (disk page) containing the entire tuple from a tuple's key.
- Tables can store to one index, but multiple indexes can exist.
- If the table is sorted by the same attribute as the Search Key, it has comb-like connections between the index and the table, and indicates a clustered index.
- A leaf level contains a key and tuple-identifier pair.
- If an index contains a Key Value and Tuple-identifiers, a second key index, such as key on GPA, is suitable where values are repeated to save on storage.
- Variable-length records must have support.
- In contrast to fixed-length records of Key-value and tid, fixed length is an exception when the key-value is a string.
Clustered vs. Un-clustered Indexes
- Clustered index: Table is sorted based on the same attribute as that of the index.
- Un-clustered index: Table is sorted based on a different attribute compared to that of the index.
- Range Search operations will exhibit contiguous tuples within the range for a clustered index.
- Clustered indexes significantly reduce I/Os, whereas un-clustered indexes don't.
- Clustered index I/Os in Range Search= Ceiling(Nr/DC). Number of I/Os ~Nr for un-clustered indexes. Nr is number of tuples in range, DC is data page capacity.
Dense vs. Sparse Indexes
- Comb-like connections exist between index and table when a table is sorted on the same key as the index.
- Indexes can either have the index point to only the first tuple in the page, or every tuple.
- They give rise to sparse vs. dense indexes.
- Dense index: Has an entry in the index for every tuple in the table.
- Sparse index: Has an entry in the index per page in the table.
- Sparse indexes result in a smaller-size index.
Range Search Using Un-clustered Indexes
- Range Search operations will exhibit contiguous tuples within the range for a clustered index.
- I/Os are reduced.
- Inevitable problem: tables can have one clustered index, requiring others to be unclustered.
- If the number of tuples qualifying a range is small, e.g., 1, no clustering is needed.
- For one-tuple result, clustered and un-clustered operations costs are the same (1 I/O).
- Otherwise, collect set of tids qualifying the range query, without retrieving the tuples where tid = (Page-id, slot-id).
- Sort the tids based on Page-id.
- Retrieve each page once and retrieve all the qualifying tuples from corresponding slot-ids for some saving.
Multi-dimensional Indexes: Composite Attributes
- Question: Can an index be realized when the search key is more than one attribute?
- Composite or Multi-dimensional Index is a possibility.
- For example, the Enrolled table has (sid, cid, grade).
- An index can be made on (sid,cid) can be made.
- The index can answer queries on sid,cid, such as "find grade of sid=0111 in cid=CS580" .
- The index may also answer queries on sid only like finding courses taken by sid=0111.
- Depends on how the composite index is constructed and structured.
- How a composite index is constructed/realized is something which varies.
How to Realize Composite/Multi-Dimensional Indexes?
- There can be multiple ways to build a composite/Multi-Dimensional index with various costs and query capabilities.
- Simple solution: Concatenate the two attribute values together and build a hash or tree-based index. For example, concatenate the sid and cid.
- Predicates that can be answered "efficiently" depends on the type of index used.
- Tree-based index on AB: Can provide Equality on AB, Equality on A only, Range predicate on A and Range predicate on B, Range on only A.
- Hash-based index on AB: Only delivers equality on AB unless hash supports order-preserving property. It can only support ranges on A and B or on A only if order-preserving.
Composite indexes vs. One-attribute indexes
- Scenario 1: create a composite index on AB (the concatenation of A and B).
- Scenario 2: Build two single-attribute indexes; one on A and one on B.
- In Scenario 1, the composite index may be used on sid,cid.
- Scenario 2 offers multiple strategies to answer a query.
- Strategy 1 uses the index of sid.
- Find the qualifying tids.
- Given a tid, retrieve the corresponding tuple and check the cid value on the fly.
- If there is a match, report this tuple as output and otherwise discard it.
- Strategy 2 performs similar steps with the index of cid.
- Strategy 3: Use the index of sid to find the qualifying tids
- Store these tids in Set Ssid
- Use the index of cid to find the qualifying tids
- Store these tids in Set Scid
- Intersect both sets Ssid and Scid to find the tids of the tuples that qualify both predicates
- The strategy that ends up retrieving the least number of tuples should be used (the one with high selectivity).
Tree-based indexing
- ISAM: Indexed Sequential Access Method.
- The B-tree
Tree-based Index: Structure
- Three choices exist on what to store in the index: the entire table (Index-based Tables), Key value with Tuple-identifier pairs, or Key value with Set of Tuple-identifiers.
- Data is entirely sorted in the data level and Data pages are linked to each other.
- Extend to multiple levels/from Root level to leaf level/good for storing static data.
ISAM: Indexed Sequential Access Method
- Approach 1 is to leave gaps in pages while building the index. Eventually, gaps will fill up with data.
- Approach 2 employs overflow pages.
- Skewed data leads to long chains of overflow pages. Index performance degrades to that of a linked list (O(n)). Rebuild index periodically.
The B-tree Index
- Main features of the B-tree family of indexes include each node is a disk page.
- B-tree or B+-tree is balanced, each leaf node is at the same height starting from the root.
- Exhibits O(Log N) performance for inserts, deletes, updates, and searches (N = # of disk pages at the leaf level).
- Guarantee that each B-tree node (except root) is at least 50% full.
- Can hold up to 2D keys and 2D+1 Children, referred to as the Fanout.
- Each B-tree node can store up to m key ways, D <= m <= 2D keys except the root.
- Leaf nodes are linked together using a doubly linked list (to facilitate range search).
- Data is either anywhere in the tree or at the leaf level.
- Non-leaf nodes either are routing keys or have data.
- With data at the leaf level, the Nodes exhibit routing keys that are a copy of data.
- B+-tree is used by industrial-strength database engines.
- B+-trees are more suitable for range search and is the focus.
- Each node guarantees to be at least 50% full (except root).
The B+-Tree Node Structure
- Has an array of values, child subtree pointers.
- Leaf Nodes - are all doubly linked together to facilitate depending on the Choice and Key values are sorted within the page. Can contain Prev. Pg. Ptr., Next Pg. Ptr. and Free Space Ptr.
Operations on the B+-tree
- Search
- Insert
- Delete
B+-tree Search
- If given Key Value k, search retrievals are wanted if the Tuple(s) contains k.
- Perform Tree starting at the root and navigate to the leaf node should also contain that element
- Conduct the operations within leaf page, for Key k Data is sorted by key
B+-tree Insert
- If given Key Value k and Tuple exists (or k & tid) that search retrievals should insert the Element into B+- tree.
- Case 1, when element Search the tree starting at the root, until the leaf node is located
- Case 2 occurs if not enough capacity exists and insert into located place.
- This case leads to Q evenly splitting to 2 sublings
- Recursively split until the root happens where, the tree adds.
B+-tree Delete
- If given Key Value, Tuplet & tuple tid, remove it from the B+-tree
- Case 1: Delete based on having capacity, and otherwise, it needs to borrow
- Case 2 (Q is slightly off): Deleing the Tuple that makes it 50% full and node is needed
- Find Sublin, and borrow
- Case 3: merge Q with its Subling.
Bulk Loading of Data into the B+-tree
- Building the tree is done from bottom up.
- Sort the data at the leaf level, Partition the data in a way that will be compatible.
- Allowing is a way to help occupancy.
- Build Level next up (one at a time) where, Repeat for all levels, where results in Page levels that occur faster.
Other Uses of B+-tree Indexes: Testing for Integrity Constraints
- It can be checked to determine if Keys of a table are present.
- If keys of the foreign table of the first table exists.
Summary of Tree Indexing
- Tree indexes are popular
- ISAM predates the B-tree and mostly works well for indexing static data
- The B-tree supports indexing of dynamic data sets
- The B-tree is both theoretically and practically appealing
- The B-tree is the gold standard for tree indexing in industrial database systems
- The B-tree is theoretically optimal for external memory data structures
- Can perform search, insert, and delete operations in O(log N) time, where N is the number of pages/blocks
- Many adaptations, extensions, and uses of the B-tree exist to support
- Multi-dimensional data, high-dimensional multimedia data, variable length keys, bulk insertion, bulk deletion, query processing, and integrity constraints checking
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.