Indexes: Primary vs. Secondary Keys

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary goal of an index in the context of databases?

  • To complicate the retrieval of data.
  • To reorganize the physical storage of data.
  • To slow down the retrieval of data.
  • To provide speedy retrieval of data. (correct)

What are search keys also referred to as?

  • Index Value
  • Search Attribute
  • Termed Value
  • Termed Key (correct)

What does a search key consist of?

  • Attributes from a different table
  • The file path to the table
  • Randomly generated values
  • Attributes from the schema of the underlying table (correct)

Which type of index has a search key that is not the primary key for the underlying table?

<p>Secondary-key index (D)</p> Signup and view all the answers

Which of the following scenarios exemplifies a secondary-key index?

<p>Index on GPA over the students table. (B)</p> Signup and view all the answers

What is a characteristic of a primary-key index?

<p>Has only one tuple for a given search key. (C)</p> Signup and view all the answers

Which of the following is NOT a choice of what to store in an index?

<p>Links to external resources (A)</p> Signup and view all the answers

In an index-based table, how many indexes can store the entire table?

<p>Only one index (B)</p> Signup and view all the answers

When the table is sorted by the same attribute as the Search Key, this produces a comb-like shape which is an indication of what?

<p>Clustered index (B)</p> Signup and view all the answers

What is a key characteristic of a clustered index?

<p>Table is sorted based on the same attribute as that of the index (C)</p> Signup and view all the answers

What is a key difference between dense and sparse indexes?

<p>Dense index has an entry in the index per tuple, while sparse index has an entry per page (C)</p> Signup and view all the answers

If an un-clustered index is used for a range search, what is likely to occur?

<p>Tuples in the range will end up being in separate pages. (C)</p> Signup and view all the answers

In the context of query processing with un-clustered indexes, what is the advantage of collecting TIDs?

<p>It allows retrieving a page only once for multiple qualifying tuples and reduces cost. (B)</p> Signup and view all the answers

What is a multi-dimensional index?

<p>An index where the search key is more than one attribute. (D)</p> Signup and view all the answers

What is the simple solution to realize a composite index?

<p>Concatenate the two attribute values together and build a hash or tree-based index (A)</p> Signup and view all the answers

What type of queries can a tree-based index on concatenated attributes A and B efficiently answer?

<p>Equality on AB, Equality on A only, Range predicate on A and Range predicate on B (D)</p> Signup and view all the answers

Assume a table Enrolled (sid, cid, grade) and Scenario 2: Build two single-attribute indexes; one on A and one on B related to composite indexes vs. One-attribute indexes; which of the following is a strategy to execute the query that use the index of SID and CID to find the grade?

<p>Use the index of SID to find the qualifying TIDs and then retrieve the corresponding tuple and check the CID value on the fly (B)</p> Signup and view all the answers

When building single-attribute indexes for a composite query, which factor determines the best strategy?

<p>The one that ends up retrieving the least number of tuples (i.e., the one with high selectivity) (A)</p> Signup and view all the answers

What is a characteristic of ISAM?

<p>Extend to multiple levels and Good for static data (C)</p> Signup and view all the answers

Which operation does ISAM handle well?

<p>Range queries (B)</p> Signup and view all the answers

In ISAM, what happens when data is skewed during inserts?

<p>Index performance degrades to that of a linked list O(n) (B)</p> Signup and view all the answers

What is a key feature of all nodes in a B-tree or B+-tree?

<p>Each node is a disk page (C)</p> Signup and view all the answers

What performance characteristic is associated with B-tree operations?

<p>O(log N) (D)</p> Signup and view all the answers

What is the minimum occupancy guarantee for a B-tree node (except for the root)?

<p>At least 50% full (C)</p> Signup and view all the answers

Which of the following is specific to B+-trees compared to B-trees?

<p>Data only at the leaf level (A)</p> Signup and view all the answers

Why are leaf nodes linked together in B+-trees?

<p>To facilitate range search (C)</p> Signup and view all the answers

How does B+-tree search begin?

<p>At the root node (B)</p> Signup and view all the answers

After locating the leaf page in a B+-tree, how is data within the leaf page typically accessed?

<p>Binary search (A)</p> Signup and view all the answers

In a B+-tree insert operation, what happens if the target leaf node is full?

<p>The node is split evenly into two siblings (D)</p> Signup and view all the answers

What is the first consideration when deleting from B+-tree?

<p>Every node is guaranteed to continuously be at least 50% full (B)</p> Signup and view all the answers

What happens when a B+-tree leaf node is exactly half full during a deletion?

<p>The node borrows an item from a sibling node. (B)</p> Signup and view all the answers

During B+-tree deletion, what initiates a merge of leaf nodes?

<p>When adjacent sibling leaf nodes are both exactly half full (D)</p> Signup and view all the answers

What is the first step in bulk loading data into a B+-tree?

<p>Sort the data at the leaf level (D)</p> Signup and view all the answers

What occupancy level should be used when bulk loading a B+-tree?

<p>Any amount of desired occupancy (e.g., 80% full nodes) (C)</p> Signup and view all the answers

What performance benefit does bulk loading provide over standard insertion into a B+-tree?

<p>Allows page-level inserts and hence faster than tuple-level inserts. (C)</p> Signup and view all the answers

What is Bulk Loading?

<p>Propagate the key of each page in the leaf level to the Non-leaf level (B)</p> Signup and view all the answers

Besides speeding up data retrieval, what other purposes can B+-tree indexes serve?

<p>Testing for integrity constraints (C)</p> Signup and view all the answers

Why are tree indexes popular?

<p>The B-tree is both theoretically and practically appealing (A)</p> Signup and view all the answers

For what usage scenarios is the B-tree particularly well-suited?

<p>Indexing dynamic data sets (B)</p> Signup and view all the answers

What is the overall time complexity for operations performed on a B-tree?

<p>O(log N) (A)</p> Signup and view all the answers

What is the expected output of an index, given one or more values, or a range of values, as input?

<p>Tuples from the underlying table that match the given values or fall within the specified range. (C)</p> Signup and view all the answers

If an index's search key consists of the primary key of the underlying table, how is the index classified?

<p>Primary-key index. (B)</p> Signup and view all the answers

Under what circumstance is the Search key of the index is NOT the primary key for the underlying table?

<p>Secondary-key index (D)</p> Signup and view all the answers

Which of the following is true when considering what an index stores?

<p>Indexes can store the entire table, key-value tuple identifier pairs, or a key value set of tuple identifiers (D)</p> Signup and view all the answers

In an index that contains (Key Value, Tuple-Identifier) pairs, what do the tuple identifiers do?

<p>Point to the tuples in the underlying table. (C)</p> Signup and view all the answers

What is indicated when comb-like connections are observed between the index and the table?

<p>A clustered index is being used. (A)</p> Signup and view all the answers

In the context of sparse indexes, what does an index entry typically point to?

<p>The first tuple in each page of the table. (A)</p> Signup and view all the answers

Which of the following is true about index-based tables?

<p>A table can be conceptually indexed multiple times, but stored in only one index. (B)</p> Signup and view all the answers

How does a clustered index affect the physical ordering of a table?

<p>Sorts the table based on the same attribute as the index. (C)</p> Signup and view all the answers

In the context of database indexes, what is the primary difference between a dense index and a sparse index?

<p>A dense index has an entry for every record in the file, whereas a sparse index has entries for a subset of records. (C)</p> Signup and view all the answers

What is a notable disadvantage of using un-clustered indexes?

<p>Sorting the table by a different attribute can lead to non-contiguous range search results. (D)</p> Signup and view all the answers

How can the cost of query processing for un-clustered indexes be addressed?

<p>Collecting and sorting TIDs (Tuple Identifiers) to minimize page retrievals. (C)</p> Signup and view all the answers

What is the result of concatenating sid and cid?

<p>Multi-dimensional Index (D)</p> Signup and view all the answers

If a database has a table 'Employees' with attributes 'department' and 'salary', which query would benefit most from a composite index on ('department', 'salary')?

<p>SELECT * FROM Employees WHERE department = 'Sales' AND salary &gt; 50000; (C)</p> Signup and view all the answers

What benefit can be derived from using a secondary key index that stores a Key Value and Set of Tuple-identifiers?

<p>It can save on storage space when multiple records have the same key value. (D)</p> Signup and view all the answers

When using single-attribute indexes for a composite query, which strategy is the best?

<p>The one that ends up retrieving the least tuples (i.e., the one with high selectivity) (C)</p> Signup and view all the answers

How can building multiple one-attribute indexes be advantageous over building a composite index?

<p>Offers multiple strategies to answer the query. (B)</p> Signup and view all the answers

What happens to index performance in an ISAM (Indexed Sequential Access Method) system when data is skewed during inserts?

<p>Index performance degrades to that of a linked list. (A)</p> Signup and view all the answers

What is the consequence of inserting a large amount of data into an ISAM index, leading to long chains of overflow pages?

<p>The index becomes fragmented, and search performance degrades. (D)</p> Signup and view all the answers

What is a common feature exhibited by nodes in a B-tree and B+-tree?

<p>Every node is a disk page. (B)</p> Signup and view all the answers

How does performance scale with the number of disk pages at the leaf level in a B-tree?

<p>Logarithmic. (D)</p> Signup and view all the answers

Which aspect distinguishes a B+-tree from a B-tree?

<p>Data is located only at the leaf level. (D)</p> Signup and view all the answers

Why does a B+-tree link leaf nodes together?

<p>To allow sequential data retrieval. (C)</p> Signup and view all the answers

In a B+-tree, where does data exist?

<p>Data only goes at the leaf level (D)</p> Signup and view all the answers

What is one of the first considerations during the deletion of a B+-tree?

<p>Whether Q is greater than half full (D)</p> Signup and view all the answers

In the context of bulk loading a B+-tree, what dictates the order for adding data entries?

<p>Data entries need to be sorted first. (A)</p> Signup and view all the answers

What happens to the tree height if the root is full and root splits into two?

<p>Tree height will increase by 1. (A)</p> Signup and view all the answers

What happens to the rest of the nodes in a B+-tree after deletion?

<p>Every node is guaranteed to at least 50% full (except root) (B)</p> Signup and view all the answers

In B+-tree indexes: Testing for Integrity Constraints, is it true or false that you could check if key of a table is unique, system can construct a B+-tree index to check quickly for this integrity constraint condition?

<p>True (B)</p> Signup and view all the answers

Which of the following is true about ISAM?

<p>ISAM predates the B-tree and mostly works well for indexing static data (C)</p> Signup and view all the answers

What is the initial strategy for handling inserts?

<p>Leave gaps in pages while building the index (D)</p> Signup and view all the answers

In what usage scenarios is ISAM particularly well-suited?

<p>Data that is primarily static and requires fast read access. (A)</p> Signup and view all the answers

Of the options, which one is the main feature of the B-tree family?

<p>Each node in the B-tree or B+-tree is a disk page (B)</p> Signup and view all the answers

Which of the following is the correct Big O representation for the B-tree family of indexes insert, deletes, updates, and search?

<p>O(Log N) (B)</p> Signup and view all the answers

Assume one performed operations such that there were an odd number of children, what would be the correct way to determine which key to copy to the parent?

<p>Pick the key closest to the middle (C)</p> Signup and view all the answers

Flashcards

Target of an index

The target of an index is to provide speedy retrieval of data from an underlying table.

Primary-key index

The search key of the index is the primary key for the underlying table

Secondary-key index

Search key of the index is NOT the primary key for the underlying table

Clustered index

Table is sorted based on the same attribute as that of the index.

Signup and view all the flashcards

Un-clustered index

Table is sorted based on a different attribute than that of the index or not sorted at all.

Signup and view all the flashcards

Dense index

Entry in the index per tuple in the table.

Signup and view all the flashcards

Sparse index

Entry in the index per page in the table. Results in a smaller-size index

Signup and view all the flashcards

Composite index

An index when the search key is more than one attribute

Signup and view all the flashcards

Tree-based index on AB

Equality on AB, Equality on A only, Range predicate on A and Range predicate on B, Range on only A

Signup and view all the flashcards

Hash-based index on AB

Only equality on AB unless hash supports order-preserving property

Signup and view all the flashcards

ISAM

An indexing method that is good for static data

Signup and view all the flashcards

B-tree or B+-tree is balanced

The B-tree or B+-tree is balanced, i.e., each leaf node is at the same height starting from the root.

Signup and view all the flashcards

How many keys can a B-tree node store?

Each B-tree node can store up to m key ways, D <= m <= 2D keys except the root

Signup and view all the flashcards

Searching in B+-tree

Search the tree starting at the root and navigate down until we reach the leaf node that should contain k

Signup and view all the flashcards

Invariant of B+-tree

Every node is guaranteed to continuously be at least 50% full (except root)

Signup and view all the flashcards

Study Notes

  • Indexes target speedy data retrieval from a table.
  • Inputs may be a single value, multiple values (search key), or a range of values.
  • Outputs are tuples from the index with matching values, or within the specified range.
  • Search Key refers to one or more attributes from the underlying table's schema.

Classifications and Properties

Primary vs. Secondary Indexes

  • Primary-key index search key is the primary key for the data table.
  • Secondary-key index differs as its search key is not the table's primary key.
  • Multiple tuples in the table can share the same search key value.
  • GPA index of students is an example of a secondary-key index.
  • Multiple students can have the same GPA and multiple tids of students will match the same GPA.
  • Student-id index is a primary-key index.
  • Only one tuple exists for a given sid; one tid has that value.

Relationship Between Index and Underlying Table

  • Three storage choices exist for the index: store the entire table (index-based tables), store key value with tuple-identifier pairs, or store key value with a set of tuple-identifiers.
  • Index-based tables contain the entire table stored within the index and it generally uses hash and B-tree tables .
  • Best used if the Search Key for the index corresponds to the primary key for the table, or is a unique key and tuples are stored only once inside the index.
  • Can be accessed in log time
  • Hash-based tables buckets contain the entire table and a hash function produces the bucket (disk page) containing the entire tuple from a tuple's key.
  • Tables can store to one index, but multiple indexes can exist.
  • If the table is sorted by the same attribute as the Search Key, it has comb-like connections between the index and the table, and indicates a clustered index.
  • A leaf level contains a key and tuple-identifier pair.
  • If an index contains a Key Value and Tuple-identifiers, a second key index, such as key on GPA, is suitable where values are repeated to save on storage.
  • Variable-length records must have support.
  • In contrast to fixed-length records of Key-value and tid, fixed length is an exception when the key-value is a string.

Clustered vs. Un-clustered Indexes

  • Clustered index: Table is sorted based on the same attribute as that of the index.
  • Un-clustered index: Table is sorted based on a different attribute compared to that of the index.
  • Range Search operations will exhibit contiguous tuples within the range for a clustered index.
  • Clustered indexes significantly reduce I/Os, whereas un-clustered indexes don't.
  • Clustered index I/Os in Range Search= Ceiling(Nr/DC). Number of I/Os ~Nr for un-clustered indexes. Nr is number of tuples in range, DC is data page capacity.

Dense vs. Sparse Indexes

  • Comb-like connections exist between index and table when a table is sorted on the same key as the index.
  • Indexes can either have the index point to only the first tuple in the page, or every tuple.
  • They give rise to sparse vs. dense indexes.
  • Dense index: Has an entry in the index for every tuple in the table.
  • Sparse index: Has an entry in the index per page in the table.
  • Sparse indexes result in a smaller-size index.

Range Search Using Un-clustered Indexes

  • Range Search operations will exhibit contiguous tuples within the range for a clustered index.
  • I/Os are reduced.
  • Inevitable problem: tables can have one clustered index, requiring others to be unclustered.
  • If the number of tuples qualifying a range is small, e.g., 1, no clustering is needed.
  • For one-tuple result, clustered and un-clustered operations costs are the same (1 I/O).
  • Otherwise, collect set of tids qualifying the range query, without retrieving the tuples where tid = (Page-id, slot-id).
  • Sort the tids based on Page-id.
  • Retrieve each page once and retrieve all the qualifying tuples from corresponding slot-ids for some saving.

Multi-dimensional Indexes: Composite Attributes

  • Question: Can an index be realized when the search key is more than one attribute?
  • Composite or Multi-dimensional Index is a possibility.
  • For example, the Enrolled table has (sid, cid, grade).
  • An index can be made on (sid,cid) can be made.
  • The index can answer queries on sid,cid, such as "find grade of sid=0111 in cid=CS580" .
  • The index may also answer queries on sid only like finding courses taken by sid=0111.
  • Depends on how the composite index is constructed and structured.
  • How a composite index is constructed/realized is something which varies.

How to Realize Composite/Multi-Dimensional Indexes?

  • There can be multiple ways to build a composite/Multi-Dimensional index with various costs and query capabilities.
  • Simple solution: Concatenate the two attribute values together and build a hash or tree-based index. For example, concatenate the sid and cid.
  • Predicates that can be answered "efficiently" depends on the type of index used.
  • Tree-based index on AB: Can provide Equality on AB, Equality on A only, Range predicate on A and Range predicate on B, Range on only A.
  • Hash-based index on AB: Only delivers equality on AB unless hash supports order-preserving property. It can only support ranges on A and B or on A only if order-preserving.

Composite indexes vs. One-attribute indexes

  • Scenario 1: create a composite index on AB (the concatenation of A and B).
  • Scenario 2: Build two single-attribute indexes; one on A and one on B.
  • In Scenario 1, the composite index may be used on sid,cid.
  • Scenario 2 offers multiple strategies to answer a query.
  • Strategy 1 uses the index of sid.
  • Find the qualifying tids.
  • Given a tid, retrieve the corresponding tuple and check the cid value on the fly.
  • If there is a match, report this tuple as output and otherwise discard it.
  • Strategy 2 performs similar steps with the index of cid.
  • Strategy 3: Use the index of sid to find the qualifying tids
  • Store these tids in Set Ssid
  • Use the index of cid to find the qualifying tids
  • Store these tids in Set Scid
  • Intersect both sets Ssid and Scid to find the tids of the tuples that qualify both predicates
  • The strategy that ends up retrieving the least number of tuples should be used (the one with high selectivity).

Tree-based indexing

  • ISAM: Indexed Sequential Access Method.
  • The B-tree

Tree-based Index: Structure

  • Three choices exist on what to store in the index: the entire table (Index-based Tables), Key value with Tuple-identifier pairs, or Key value with Set of Tuple-identifiers.
  • Data is entirely sorted in the data level and Data pages are linked to each other.
  • Extend to multiple levels/from Root level to leaf level/good for storing static data.

ISAM: Indexed Sequential Access Method

  • Approach 1 is to leave gaps in pages while building the index. Eventually, gaps will fill up with data.
  • Approach 2 employs overflow pages.
  • Skewed data leads to long chains of overflow pages. Index performance degrades to that of a linked list (O(n)). Rebuild index periodically.

The B-tree Index

  • Main features of the B-tree family of indexes include each node is a disk page.
  • B-tree or B+-tree is balanced, each leaf node is at the same height starting from the root.
  • Exhibits O(Log N) performance for inserts, deletes, updates, and searches (N = # of disk pages at the leaf level).
  • Guarantee that each B-tree node (except root) is at least 50% full.
  • Can hold up to 2D keys and 2D+1 Children, referred to as the Fanout.
  • Each B-tree node can store up to m key ways, D <= m <= 2D keys except the root.
  • Leaf nodes are linked together using a doubly linked list (to facilitate range search).
  • Data is either anywhere in the tree or at the leaf level.
  • Non-leaf nodes either are routing keys or have data.
  • With data at the leaf level, the Nodes exhibit routing keys that are a copy of data.
  • B+-tree is used by industrial-strength database engines.
  • B+-trees are more suitable for range search and is the focus.
  • Each node guarantees to be at least 50% full (except root).

The B+-Tree Node Structure

  • Has an array of values, child subtree pointers.
  • Leaf Nodes - are all doubly linked together to facilitate depending on the Choice and Key values are sorted within the page. Can contain Prev. Pg. Ptr., Next Pg. Ptr. and Free Space Ptr.

Operations on the B+-tree

  • Search
  • Insert
  • Delete
  • If given Key Value k, search retrievals are wanted if the Tuple(s) contains k.
  • Perform Tree starting at the root and navigate to the leaf node should also contain that element
  • Conduct the operations within leaf page, for Key k Data is sorted by key

B+-tree Insert

  • If given Key Value k and Tuple exists (or k & tid) that search retrievals should insert the Element into B+- tree.
  • Case 1, when element Search the tree starting at the root, until the leaf node is located
  • Case 2 occurs if not enough capacity exists and insert into located place.
  • This case leads to Q evenly splitting to 2 sublings
  • Recursively split until the root happens where, the tree adds.

B+-tree Delete

  • If given Key Value, Tuplet & tuple tid, remove it from the B+-tree
  • Case 1: Delete based on having capacity, and otherwise, it needs to borrow
  • Case 2 (Q is slightly off): Deleing the Tuple that makes it 50% full and node is needed
  • Find Sublin, and borrow
  • Case 3: merge Q with its Subling.

Bulk Loading of Data into the B+-tree

  • Building the tree is done from bottom up.
  • Sort the data at the leaf level, Partition the data in a way that will be compatible.
  • Allowing is a way to help occupancy.
  • Build Level next up (one at a time) where, Repeat for all levels, where results in Page levels that occur faster.

Other Uses of B+-tree Indexes: Testing for Integrity Constraints

  • It can be checked to determine if Keys of a table are present.
  • If keys of the foreign table of the first table exists.

Summary of Tree Indexing

  • Tree indexes are popular
  • ISAM predates the B-tree and mostly works well for indexing static data
  • The B-tree supports indexing of dynamic data sets
  • The B-tree is both theoretically and practically appealing
    • The B-tree is the gold standard for tree indexing in industrial database systems
    • The B-tree is theoretically optimal for external memory data structures
    • Can perform search, insert, and delete operations in O(log N) time, where N is the number of pages/blocks
  • Many adaptations, extensions, and uses of the B-tree exist to support
    • Multi-dimensional data, high-dimensional multimedia data, variable length keys, bulk insertion, bulk deletion, query processing, and integrity constraints checking

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser