BCSE302L DBMS Module 4 PDF
Document Details
Uploaded by CongratulatoryPluto
Vellore Institute of Technology
2024
BCSE302L
Dr. Shashank Mouli Satapathy
Tags
Summary
This document is an outline for a module on Physical Database Design and Query Processing. It covers topics like File Organization, Indexing, Hashing, and Relational Algebra. The document is from Vellore Institute of Technology, for BCSE302L.
Full Transcript
Module - 4 Physical Database Design and Query Processing LTPC BCSE302L - 3 0 0 3 BCSE302P - 0 0 2 1 Dr. Shashank Mouli Satapathy...
Module - 4 Physical Database Design and Query Processing LTPC BCSE302L - 3 0 0 3 BCSE302P - 0 0 2 1 Dr. Shashank Mouli Satapathy Professor, SCOPE School of Computer Science and Engineering Vellore Institute of Technology Vellore, Tamil Nadu - 632014, India September 23, 2024 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 1 Outline 1 File Organization - Indexing 2 Hashing Techniques 3 Relational Algebra 4 Query Optimization Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 2 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 3 Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 4 Indexing Many queries reference only a small proportion of the records in a file. Suppose you have a query, ➨ “Find all instructors in the Physics department” ➨ “Find the total number of credits earned by the student with ID 22201” References only a fraction of the student records. An index for a file in a database system works in much the same way as the index in a textbook. Database-system indices play the same role as book indices in libraries. Indexing is used to optimize the performance of a database by minimizing the number of disk accesses required when a query is processed. The index is a type of data structure. It is used to locate and access the data in a database table quickly. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 5 Indexing It is defined based on the indexing attribute. There are two basic kinds of indices: ➨ Ordered indices: Based on a sorted ordering of the values ➨ Hash indices: Based on a uniform distribution of values across a range of buckets. The bucket to which a value is assigned is determined by a function, called a hash function. Indexing techniques evaluated on basis of: ➨ Access types: finding records with a specified attribute value ➨ Access time: The time it takes to find a particular data item ➨ Insertion time: The time it takes to insert a new data item ➨ Deletion time: The time it takes to delete a data item ➨ Space overhead: The additional space occupied by an index structure An attribute or set of attributes used to look up records in a file is called a search key. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 6 Indexing Ordered Indices Index structure - fast random access to records in a file. Each index structure is associated with a particular search key. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 7 Indexing Suppose, if we do not have index then for a keyword we have to do linear search. The index typically stores each value of the index field along with a list of pointers to all disk blocks that contain records with that field value. The values in the index are ordered so that we can do a binary search on the index. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 8 Types of Indexing Single-level Index ➨ Primary Indexes ➨ Clustering Indexes ➨ Secondary Indexes Multilevel Index Dynamic Multilevel Indexes using B-Tree and B+ Tree Indexes on Multiple Keys Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 9 Types of Indexing Single-Level Ordered Indexes A single level index is an auxiliary file that makes it more efficient to search a record in the data file. The index is applied on one field known as indexing field of the file. The index is a file with entries The value field in the index file are ordered. The index file usually occupies considerably less disk blocks than the data file because its entries are much smaller. The binary search on the index gives a pointer to the file record. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 10 Types of Indexing Primary Index An ordered data file having fixed length record with 2 fields(Primary key of data file, pointer to disk block) A Primary Index is specified on the ordering key field where each tuple has a unique value. The data file is ordered on a key field. Includes One index entry for each block of the data file. The index entry has the key field for the first record in the block called as block anchor. The total number of entries in the index file is same as number of disk blocks in the ordered data file. Indexes can also be characterized as dense or sparse ➨ A dense index has an index entry for every search key value ➨ A sparse index on the other hand has index entries for only some of the search values Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 11 Types of Indexing A primary index is a non-dense (sparse) index, since it includes an entry for each disk block of the data file and the keys of its anchor record rather than for every search value. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 12 Types of Indexing Dense Index It has an index entry for every search key value (and hence every record) in the data file. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 13 Types of Indexing Sparse Index Sparse (or nondense) index, on the other hand, has index entries for only some of the search values. A sparse index has fewer entries than the number of records in the file. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 14 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 15 Types of Indexing Example - 1: Let an ordered file with r = 30,000 fixed size records with record length R= 100 bytes stored on a disk with block size B = 1024 byte. In the index file ordering key field value = 9 byte and block pointer P = 6 byte. Calculate the number of block access required without and with using primary indexing approach. Solution: Blocking factor (bfr) = ⌊(B/R)⌋ = ⌊(1024/100)⌋ = 10 records per block No of block required (b) = ⌈(r /bfr )⌉ = ⌈(30, 000/10)⌉ = 3000 blocks No of block access for binary search = ⌈log2 b⌉ = ⌈log2 3000⌉ = 12 block access The size of each index entry = 9 + 6 = 15 byte Blocking factor for index file = ⌊(1024/15)⌋ = 68 index per block No of index block = ⌈(3000/68)⌉ = 45 blocks No of index block access for binary search = ⌈log2 45⌉ = 6 block access Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 16 Types of Indexing To search an record one additional block access to the data file 6 + 1 = 7 block access. Hence, the indexing process saves 12 - 7 = 5 block access Disadvantage of Primary index Insertion and deletion of records for ordered file since moving of records may change some index entries as well as block anchor. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 17 Types of Indexing Practice Question: Let an ordered file with r = 300,000 fixed size records with record length R= 100 bytes stored on a disk with block size B = 4096 byte. In the index file ordering key field value = 9 byte and block pointer P = 6 byte. Calculate the number of block access required without and with using primary indexing approach. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 18 Types of Indexing Clustering Index Defined on ordered data file like primary index. The data file is ordered on a non-key field unlike primary index, which requires that the ordering field of the data file have a distinct value for each record. Includes one index entry for each distinct value of the field; the index entry points to the first data block that contains records with that field value. It is another example of non-dense index clustering index. If file records are physically ordered on a non-key field, which does not have a distinct value for each record - that field is called the clustering field and the data file is called a clustered file. Insertion and Deletion is relatively straightforward in clustering index. To alleviate problem of insertion a block is reserved for each value of the clustering field. All records with that value are placed in the block. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 19 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 20 Types of Indexing Example: Suppose that we consider the same ordered file with r = 300,000 records stored on a disk with block size B = 4,096 bytes. Imagine that it is ordered by the attribute Zipcode and there are 1,000 zip codes in the file (with an average of 300 records per zip code, assuming even distribution across zip codes.) In the index file ordering key field value (zip code) = 5 bytes and block pointer P = 6 bytes. Calculate the number of block access required using indexing approach and the space the index would occupy in the main memory after loading. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 21 Types of Indexing Solution: The index in this case has 1,000 index entries of 11 bytes each (5-byte Zipcode and 6-byte block pointer) Blocking factor for index file = ⌊(4096/11)⌋ = 372 index per block No of index block = ⌈(1000/372)⌉ = 3 blocks No of index block access for binary search = ⌈log2 3⌉ = 2 block access To search an record one additional block access to the data file 2 + 1 = 3 block access. Again, this index would typically be loaded in the main memory (occupies 11,000 or 11 Kbytes) and takes negligible time to search in memory. One block of access to the data file would lead to the first record with a given zip code. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 22 Types of Indexing Secondary Index The secondary index may be on a field that is a candidate key and has a unique value in every record, or a non-key with duplicate values. The index is an ordered file with two fields. The first field is of the same data type as some non-ordering field of the data file that is an indexing field. The second field is either a block pointer or a record pointer. There can be many secondary indexes (and hence, indexing fields) for the same file. Includes one entry for each record in the data file; hence, it is a dense index. Index record points to a bucket that contains pointers to all the actual records with that particular search-key value. A secondary index usually needs more storage space and longer search time than a primary index. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 23 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 24 Types of Indexing We can also create a secondary index on a non-key, non-ordering field of a file. In this case, numerous records in the data file can have the same value for the indexing field. There are several options for implementing such an index: ➨ Option 1: include duplicate index entries with the same K(i) value - one for each record. This would be a dense index. ➨ Option 2: have variable-length records for the index entries, with a re- peating field for the pointer. We keep a list of pointers in the index entry for K(i) - one pointer to each block that contains a record whose indexing field value equals K(i). ➨ Option 3: keep the index entries themselves at a fixed length and have a single entry for each index field value, but to create an extra level of indirection to handle the multiple pointers. In this non-dense scheme, the pointer P(i) in index entry points to a disk block, which contains a set of record pointers; each record pointer in that disk block points to one of the data file records with value K(i) for the indexing field. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 25 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 26 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 27 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 28 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 29 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 30 Types of Indexing Multilevel Index Because a single-level index is an ordered file, we can create a primary index to the index itself; In this case, the original index file is called the first-level index and the index to the index is called the second-level index. We can repeat the process, creating a third, fourth,... , top level until all entries of the top level fit in one disk block. The blocking factor for the index bfri is called fan-out (fo) of the multilevel index. Searching a multilevel index requires approximately logfo (bi) block accesses. A multi-level index can be created for any type of first-level index (primary, secondary, clustering) as long as the first-level index consists of more than one disk block. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 31 Types of Indexing If the first level has r1 entries and blocking factor = fo = bfri then no of blocks = r1/fo which is the no of entries needed for second level(r2= r1/fo). Similarly r3 = r2 /fo. Each level reduces the number of entries at the previous level by a factor of fo. t = logfo (r 1) where t is the top index level. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 32 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 33 Types of Indexing Example: Consider the first level index file with r = 30,000 fixed-length records of size R = 15 bytes stored on a disk with block size B = 1024 bytes. Calculate the number of block access required using the indexing approach. Solution: No. of blocking factor for index file = ⌊(1024/15)⌋ = 68 index per block (fo) No of first level index block = ⌈(30000/68)⌉ = 442 blocks No of second level index block = ⌈(442/68)⌉ = 7 blocks No of third level index block = ⌈(7/68)⌉ = 1 block Hence t = 3. Total number of block access = t + 1 = 4 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 34 Types of Indexing A multilevel index reduces no of blocks accessed when searching for a record given its indexing field value. However the problem of index insertion and deletion as all index level are ordered files. To alleviate the problem of insertion and deletion of indexes dynamic multilevel index is introduced using the concept of B-Tree and B+-Tree. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 35 Types of Indexing Dynamic Multilevel Indexes Using B-Trees and B+-Trees Most multi-level indexes use B-tree or B+-tree data structures because of the insertion and deletion problem. These data structures are variations of search trees that allow efficient insertion and deletion of new search values. Tree data structure terminology: ➨ Tree is formed of nodes ➨ Each node (except root) has one parent and zero or more child nodes ➨ Leaf node has no child nodes [Unbalanced if leaf nodes occur at different levels] ➨ Nonleaf node called internal node ➨ Subtree of node consists of node and all descendant nodes. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 36 Types of Indexing B-Tree B-Tree is a balanced search tree. It provides multi level access structure. Each node of the B-Tree is at least half full. In B-Tree and B+-Tree data structures, each node corresponds to a disk block Each node is kept between half-full and completely full. An insertion into a node that is not full is quite efficient. If a node is full the insertion causes a split into two nodes. Splitting may propagate to other tree levels A deletion is quite efficient if a node does not become less than half full If a deletion causes a node to become less than half full, it must be merged with neighboring nodes. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 37 Types of Indexing Definition: A B-tree of order m is an m-way tree in which: 1 the number of keys in each non-leaf node is one less than the number of its children. 2 all leaves are at the same level 3 all non-leaf nodes except the root have at least ⌈m/2⌉ children 4 the root is either a leaf node, or it has from 2 to m children 5 a leaf node contains no more than m – 1 keys 6 Keys are arranged in a defined order Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 38 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 39 Types of Indexing Constructing a B-Tree: Suppose we start with an empty B-tree and keys arrive in the following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68 3 26 29 53 55 45. We want to construct a B-tree of order 5 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 40 Types of Indexing Solution: Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 41 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 42 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 43 Types of Indexing Exercise: Insert the following keys to a 5-way B-tree: 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 44 Types of Indexing Delete from a B-tree During insertion, the key always goes into a leaf. For deletion we wish to remove from a leaf. There are three possible ways we can do this: 1 - If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted. 2 - If the key is not in a leaf then it is guaranteed (by the nature of a B-tree) that its predecessor or successor will be in a leaf – in this case we can delete the key and promote the predecessor or successor key to the non-leaf deleted key’s position. If (1) or (2) lead to a leaf node containing less than the minimum number of keys then we have to look at the siblings immediately adjacent to the leaf in question: 3: if one of them has more than the min. number of keys then we can promote one of its keys to the parent and take the parent key into our lacking leaf Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 45 Types of Indexing 4: if neither of them has more than the min. number of keys then the lacking leaf and one of its neighbours can be combined with their shared parent (the opposite of promoting a key) and the new leaf will have the correct number of keys; if this step leave the parent with too few keys then we repeat the process up to the root itself, if required Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 46 Types of Indexing Let k be the key to be deleted, x the node containing the key. Then the three cases are: Case - I: If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted. key k is in node x and x is a leaf, simply delete k from x. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 47 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 48 Types of Indexing Case - II: If key k is in node x and x is an internal node, there are three cases to consider: Case - II-a: If the child y that precedes k in node x has at least t keys (more than the minimum), then find the predecessor key k’ in the subtree rooted at y. Recursively delete k’ and replace k with k’ in x Case - II-b: Symmetrically, if the child z that follows k in node x has at least t keys, find the successor k’ and delete and replace as before. Note that finding k’ and deleting it can be performed in a single downward pass. Case - II-c: Otherwise, if both y and z have only t-1 (minimum number) keys, merge k and all of z into y, so that both k and the pointer to z are removed from x. y now contains 2t - 1 keys, and subsequently k is deleted. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 49 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 50 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 51 Types of Indexing Case - III: If key k is not present in an internal node x, determine the root of the appropriate subtree that must contain k. If the root has only t - 1 keys, execute either of the following two cases to ensure that we descend to a node containing at least t keys. Finally, recurse to the appropriate child of x. Case - III-a: If the root has only t-1 keys but has a sibling with t keys, give the root an extra key by moving a key from x to the root, moving a key from the roots immediate left or right sibling up into x, and moving the appropriate child from the sibling to x. Case - III-b: If the root and all of its siblings have t-1 keys, merge the root with one sibling. This involves moving a key down from x into the new merged node to become the median key for that node. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 52 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 53 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 54 Types of Indexing Exercise: Given 5-way B-tree created by these data (last exercise): 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56 Add these further keys: 2, 6,12 Delete these keys: 4, 5, 7, 3, 14 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 55 Types of Indexing B+-Tree: Variant of B trees. Two types of nodes: Internal nodes have no data pointers Leaf nodes have no in-tree pointers If m is the order of the tree Every internal node has at most m children. Every internal node (except root) has at least m ⁄ 2 children. The root has at least two children if it is not a leaf node. Every leaf has at most m - 1 keys An internal node with k children has k - 1 keys. All leaves appear in the same level Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 56 Types of Indexing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 57 Types of Indexing Difference between B-tree and B+-tree In a B-tree, pointers to data records exist at all levels of the tree. In a B+-tree, all pointers to data records exists at the leaf-level nodes A B+ tree can have less levels (or higher capacity of search values) than the corresponding B-tree The leaf nodes of the B+ tree are linked together to provide ordered access on the search field of the record. Leaf nodes are similar to the first level of an index. Internal nodes of the B+-tree correspond to the other levels of a multilevel index. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 58 Types of Indexing Constructing a B+-Tree: Suppose we start with an empty B+-tree and keys arrive in the following order:1 4 7 10 17 21 31 25 19 20 28 42. We want to construct a B+-tree of order 4 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 59 Types of Indexing Solution: (a) (b) (c) (d) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 60 Types of Indexing (e) (f) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 61 Types of Indexing (g) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 62 Types of Indexing B+-Tree Deletion: Consider the following B+-tree with order m = 4. Delete 21, 31, 20, 10, 7, 25, 42 from the tree. Display the resultant tree after each deletion. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 63 Types of Indexing Solution: Delete 21 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 64 Types of Indexing Delete 31 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 65 Types of Indexing Delete 20 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 66 Types of Indexing Delete 10 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 67 Types of Indexing Delete 7 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 68 Types of Indexing Delete 25 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 69 Types of Indexing Delete 42 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 70 Types of Indexing Practice Weblink B-Tree: Link B+-Tree: Link Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 71 Hashing For a huge database structure it is not sometime feasible to search index through all its level and then reach the destination data block to retrieve the desired data. Hashing is an effective technique to calculate direct location of data record on the disk without using index structure. It uses a function, called hash function and generates address when called with search key as parameters. Hash function computes the location of desired data on the disk. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 72 Hash Organization Bucket: Hash file stores data in bucket format. Bucket is considered a unit of storage. Bucket typically stores one complete disk block, which in turn can store one or more records. Hash Function: A hash function h, is a mapping function that maps all set of search-keys K to the address where actual records are placed. It is a function from search keys to bucket addresses. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 73 Static Hashing In static hashing, when a search-key value is provided the hash function always computes the same address Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 74 Static Hashing Operation Insertion - When a record is required to be entered using static hash, the hash function h computes the bucket address for search key K, where the record will be stored. Bucket address = h(K) Search - When a record needs to be retrieved, the same hash function can be used to retrieve the address of the bucket where the data is stored. Delete - This is simply a search followed by a deletion operation. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 75 Static Hashing Bucket Overflow The condition of bucket-overflow is known as collision. Overflow Chaining - When buckets are full, a new bucket is allocated for the same hash result and is linked after the previous one. This mechanism is called Closed Hashing. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 76 Static Hashing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 77 Static Hashing Linear Probing - When a hash function generates an address at which data is already stored, the next free bucket is allocated to it. This mechanism is called Open Hashing. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 78 Dynamic Hashing The problem with static hashing is that it does not expand or shrink dynamically as the size of the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and on-demand. Dynamic hashing is also known as extended hashing. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 79 Dynamic Hashing Consider the key value in a table and insert in the order given: 16,4,6,22,24,10,31,7,9,20,26. Perform dynamic hashing with a maximum data bucket size of 3. Hash Function: Suppose the global depth is X. Then the Hash Function returns X LSBs. (After converting the numbers to its binary form) Solution: Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 80 Dynamic Hashing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 81 Dynamic Hashing Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 82 Dynamic Hashing Practice Exercise Suppose that extendable hashing is being used on a database file that contains records with the following search key values: 2, 3, 5, 7, 11, 17, 19, 23, 29, 31 a) Construct the extendable hash structure for this file if the hash function is h(x) = x mod 7 and each bucket can hold three records. b) Show how the structure from part a) changes after inserting a record with the search key value of 16 and then deleting the record with the search key value of 11.) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 83 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 84 Relational Algebra Query: A query is a statement requesting the retrieval of information. Query Languages: Language in which user requests information from the database. Allow manipulation and retrieval of data from a database. There are two types of formal query languages for the relational model. ➨ Procedural Query Language: Describe the operation how to get the desired information.(ex-Relational Algebra) ➨ Non-procedural Query Language: Describe what information the result should contain, rather than how to compute it. (ex-Relational Calculus) ⇒ Tuple Relational Calculus(TRC) ⇒ Domain Relational Calculus (DRC) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 85 Relational Algebra The relational algebra is a procedural query language. It consists of a set of operations that take one or two relations as input and produce a new relation as their result. These operations enable a user to specify basic retrieval requests (or queries). It is used to tell the DBMS how to build a new relation from one or more existing relation. It forms the basis of Query language. Six Basic operators: Select Project Cartesian Product or Cross product Set difference Union Intersection Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 86 Relational Algebra Additional operators: Joins (natural, equi join, theta join, outer join) Division Renaming The relation algebra operators are classified into either Unary: Operate on a single relation(one operator) Binary: Operate on two relations(two operator) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 87 Relational Algebra Select Operation (σ) Unary operator Notation: σp (r ) where p is called section condition or predicate and r is a relation The selection condition acts as a filter ➨ Keeps only those tuples that satisfy the qualifying condition ➨ Tuples satisfying the condition are selected whereas the other tuples are discarded (filtered out) Selects a subset of rows or tuples from relation according to a given selection condition or predicate. It is used as an expression to choose tuples that meet the selection condition. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 88 Relational Algebra Select the EMPLOYEE tuples whose department number is 4. σDNO=4 (EMPLOYEE ) Select the EMPLOYEE tuples whose salary is greater than $30000. σSALARY >30000 (EMPLOYEE ) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 89 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 90 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 91 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 92 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 93 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 94 Relational Algebra SELECT operation properties Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 95 Relational Algebra Projection Operation (Π) Unary operator This operation keeps certain columns (attributes) from a relation and discards the other columns. PROJECT creates a vertical partitioning ➨ The list of specified columns (attributes) is kept in each tuple ➨ The other attributes in each tuple are discarded The general form of the project operation is: ΠA1 ,A2 ,...,Ak (R) The result is defined as relation of k columns obtained by eliminating duplicate rows from the result and erasing the columns that are not listed. Example: To list last name, first name and salary of each employee in the EMPLOYEE relation, the following is used: ΠLNAME ,FNAME ,SALARY (EMPLOYEE ) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 96 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 97 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 98 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 99 Relational Algebra PROJECT operation properties The number of tuples in the result of projection Π (R) is always less or equal to the number of tuples in R. If the list of attributes includes a key of R then the number of tuples in the result of PROJECT is equal to the number of tuples in R PROJECT is not commutative Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 100 Relational Algebra Combination of Selection and Projection Operator Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 101 Relational Algebra BASIC RELATIONAL ALGEBRA OPERATION FROM SET THEORY Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 102 Relational Algebra Cartesian Product or Cross Product Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 103 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 104 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 105 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 106 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 107 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 108 Relational Algebra JOIN EquiJoin Natural Join Theta join Outer Join Left Outer Join Right Outer Join Full Outer Join Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 109 Relational Algebra Equijoin Uses an equality operator (=) to join tables When the join condition c contains only comparison operator = then such type of join is called as EQUIJOIN. Ex.: R ▷◁c S = σc (R × S) Equijoin is also a theta join. Example Consider the following relation schema: Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 110 Relational Algebra Emp(eno , dno ,sal) Dept(deptno ,dname,dloc) Question: Write the relational algebra expression to find the name and salary of all the employees who belong to MCA department. 1 EMP ▷◁dno=deptno DEPT 2 σdname=′ MCA′ (EMP ▷◁dno=deptno DEPT) 3 Πename,sal (σdname=′ MCA′ (EMP ▷◁dno=deptno DEPT)) ename sal Smith 5000 Blake 7000 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 111 Relational Algebra Natural Join denoted by * or ▷◁ was created to get rid of the second (superfluous) attribute of an EQUIJOIN, because one of each pair of attributes with identical values is superfluous in equijoin. Natural join is an equi join of two relation R and S over all common attributes. The standard definition of natural join requires that the two join attributes, or each pair of corresponding join attributes, have the same name and domain in both relations. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 112 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 113 Relational Algebra Example branch (branch name, branch city, assets) customer (customer name, customer street, customer city) account (account number, branch name, balance) loan(loan number, branch name, amount) depositor(customer name, account number) borrower(customer name, loan number) Q1: Find all loans of amount over $1200 Q2: Find the loan number for each loan amount greater than $1200 Q3: Find the name, loan no and loan amount of all customers who have a loan at the bank Q4: Find names of all customers who have both a loan and deposit account at the bank. Q5: Find names of all customers who have a loan at ‘DUMDUM’ branch. Q6: Find names of all customers who have an account at ‘Redwood’ branch. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 114 Relational Algebra Q1: Find all loans of amount over $1200 σamount>1200 (loan) Q2: Find the loan number for each loan amount greater than $1200 Πloan number (σamount>1200 (loan)) Q3: Find the name, loan no and loan amount of all customers who have a loan at the bank Πcustomer name,loan number ,amount (borrower ▷◁borrower.loan number =loan.loan number loan) Q4: Find names of all customers who have both a loan and deposit account at the bank. Πcustomer name (borrower ▷◁borrower.customer name=depositor.customer name depositor) Q5: Find names of all customers who have a loan at ‘DUMDUM’ branch. Πcustomer name (σbranch name=′ DUMDUM ′ (borrower ▷◁borrower.loan number =loan.loan number loan)) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 115 Relational Algebra Q6: Find names of all customers who have an account at ‘Redwood’ branch. Πcustomer name (σbranch name=′ Redwood ′ (depositor ▷◁depositor.account number =account.account number account)) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 116 Relational Algebra Theta Join Operation The Theta join operation is an extension to the cross join operation that allows us to combine a selection and a cartesian product into a single operation. It defines a relation that contains tuples satisfying the predicate from the cartesian product of R and S. Ex.: r ▷◁θ s = σθ (r × s) where, θ may be any of the comparison operator , ≥, =, ̸= Equi join is special type of θ join where join condition uses only = as the comparison operator. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 117 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 118 Relational Algebra Outer Join In NATURAL JOIN and EQUIJOIN, tuples without a matching (or related ) tuple are eliminated from the join result. Tuples with null in the join attributes are also eliminated. This amounts to loss of information. Outer join is an extension of the join operation that avoids loss of information. Computes the join and then adds tuples from one relation that does not match tuples in the other relation to the results of the join. A set of operations, called OUTER joins, can be used when we want to keep all the tuples in R, or all those in S, or all those in both relations in the result of the join, regardless of whether or not they have matching tuples in the other relation. It uses NULL Values. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 119 Relational Algebra Left Outer Join ( ▷◁) Matching Tuple in natural join + Extra tuple in the left relation. Takes all the tuples in left relation that did not match with any tuple in the right relation. The attributes of the relation for the unmatched tuples filled with NULL Values. The left outer join operation keeps every tuple in the first or left relation R in R ▷◁ S; if no matching tuple is found in R, then the attributes of S in the natural join result are filled or “padded” with null values. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 120 Relational Algebra Right Outer Join (▷◁ ) Matching Tuple in natural join + Extra tuple in the right relation. Takes all the tuples in right relation that did not match with any tuple in the left relation. The attributes of the left relation for the unmatched tuples filled with NULL Values. The right outer join operation keeps every tuple in the second or right relation R in R ▷◁ S; if no matching tuple is found in S, then the attributes of R in the natural join result are filled or “padded” with null values. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 121 Relational Algebra Full Outer Join ( ▷◁ ) Matching Tuple in natural join + Extra tuple in both left and right relation. FULL OUTER JOIN = LEFT OUTER JOIN + RIGHT OUTER JOIN keeps all tuples in both the left and the right relations when no matching tuples are found, padding them with null values as needed. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 122 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 123 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 124 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 125 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 126 Relational Algebra UNION, INTERSECTION, SET DIFFERENCE All of these operations take two input relations, which must be union compatible: ➨ Same number of fields. ➨ Corresponding’ fields have the same type. Two relation R1(A1, A2,..., An) and R2(B1, B2,..., Bn ) are said to be union or type compatible, if they satisfy following two conditions: ➨ Both the relations , i.e. R1 and R2 should be of same arity , i.e. they should have same number of attributes ➨ Domain of the similar attributes, i.e. domain of ith attribute of R1 and domain of ith attribute of R2, should be same. (i.e. dom(Ai) = dom(Bi) for i = 1, 2,..., n). The resulting relation for R1 ∪ R2 (also for R1 ∩ R2, or R1 - R2) has the same attribute names as the first operand relation R1 (by convention) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 127 Relational Algebra Union If R and S are two union compatible relation then the resultant relation P = R ∪ S is a relation includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated. It is defined as R ∪ S = { t | t ϵ R or t ϵ S} Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 128 Relational Algebra Q: Write the relational algebra expression to find name of those customer who have account or loan or both. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 129 Relational Algebra Q: Write the relational algebra expression to find name of those customer who have account or loan or both. Solution: ΠName (Account) ∪ ΠName (Loan) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 130 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 131 Relational Algebra Intersection The result of this operation, denoted by R ∩ S, is a relation that includes all tuples that are in both R and S. It is defined as R ∩ S = { t | t ϵ R and t ϵ S} Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 132 Relational Algebra Q: Write the relational algebra expression to find name of those customer who have both an account and loan. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 133 Relational Algebra Q: Write the relational algebra expression to find name of those customer who have both an account and loan. Solution: ΠName (Account) ∩ ΠName (Loan) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 134 Relational Algebra Difference The result of this operation, denoted by R - S, is a relation that includes all tuples that are in R but not in S. It is defined as R - S = { t | t ϵ R or t ∈ / S} Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 135 Relational Algebra Q: Write the relational algebra expression to find name of those customer who have an account, but not taken a loan. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 136 Relational Algebra Q: Write the relational algebra expression to find name of those customer who have an account, but not taken a loan. Solution: ΠName (Account) - ΠName (Loan) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 137 Relational Algebra Properties of UNION, INTERSECTION and SET DIFFERENCE Notice that both union and intersection are commutative operations; that is R ∪ S = S ∪ R, and R ∩ S = S ∩ R Both union and intersection can be treated as n ary operations applicable to any number of relations as both are associative operations; that is R ∪ (S ∪ T) = (R ∪ S) ∪ T (R ∩ S) ∩ T = R ∩ (S ∩ T) The minus operation is not commutative ; that is, in general R - S ̸= S - R Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 138 Relational Algebra Example Assume a database with the following three relations: Sailors (sid, sname, rating) Boats (bid, bname, color) Reserve (sid, bid, date) Q1: Find the bid of red coloured boats Q2: Find the name of sailors who have reserved boat number 2 Q3: Find the names of sailors who have reserved a red boat. Q4: Find the colors of the boats reserved by sailor ‘xyz’. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 139 Relational Algebra Q1: Find the bid of red coloured boats Πbid (σcolor =′ Red ′ (Boats)) Q2: Find the name of sailors who have reserved boat number 2 Πsname (σbid=2 (Sailors * Reserve)) Q3: Find the names of sailors who have reserved a red boat. Πsname (σcolor =′ Red ′ (Boats ▷◁ Reserve ▷◁ Sailors)) Q4: Find the colors of the boats reserved by sailor ‘xyz’. Πcolor (σsname=′ xyz ′ (Sailors * Reserve * Boats)) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 140 Relational Algebra RENAME OPERATION (ρ) The RENAME operator is denoted by ρ (rho) The rename operation allows to name the results of a relational algebra expression and also individual column. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 141 Relational Algebra In some cases, we may want to rename the attributes of a relation or the relation name or both. Useful when a query requires multiple operations. Rename operation can be expressed by any of the following forms: ➨ ρS(B1,B2,...,Bn) (R) - Changes Both ➨ ρ(B1,B2,...,Bn) (R) - Changes the column names only to B1, B2,...Bn. ➨ ρS (R) - Changes the relation name only to S. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 142 Relational Algebra Relational Algebra Expressions We may want to apply several relational algebra operations one after the other Either we can write the operations as a single relational algebra expression by nesting the operations, or We can apply one operation at a time and create intermediate result relations In the latter case, we must give names to the relations that hold the intermediate results using assignment operator ←(left arrow) Ex.: To retrieve the first name, last name and salary of all employees who work in the department number 5. DEP5 EMPS ← σDNO=5 (EMPLOYEE) RESULT ← ΠFNAME ,LNAME ,SALARY (DEP5 EMPS) (OR) ΠFNAME ,LNAME ,SALARY (σDNO=5 (EMPLOYEE)) —- [Single Relation Expression] Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 143 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 144 Relational Algebra Practice Weblink Relational Algebra: Link Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 145 Relational Algebra Generalized Projection Extends the projection operation by allowing arithmetic functions to be used in the projection list. ΠF 1,F 2,...,Fn (E) where, - E is any relational-algebra expression - each of F1, F2,..., Fn are arithmetic expressions involving constants and attributes in the schema of E. Ex.: Given relation Employee (emp name, SSN, desgn, basic sal, TA) Find the total salary of each employee: Πemp name,(basic sal+TA) (Employee) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 146 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 147 Relational Algebra Aggregate Functions Aggregation function takes a collection of values and returns a single value as a result. avg: average value min: minimum value max: maximum value sum: sum of values count: number of values Some books/articles use γ instead of G (Calligraphic G) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 148 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 149 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 150 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 151 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 152 Relational Algebra Modification of Database The content of database may be modified using following operations: Deletion Insertion Updating All these operations are expressed using the assignment operator. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 153 Relational Algebra Deletion r ← r - E, where r is a relation and E is a relational algebra expression. A delete is expressed similar to a query except that the result is removed from the database. Cannot remove single attributes. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 154 Relational Algebra Deletion Example account (account number, branch name, balance) loan(loan number, branch name, amount) Delete all account records of Perryridge branch. account ← account - σbranch name=”Perryridge” (account) Delete all loan records with amount in the range of 0 to 50. loan ← loan - σamount≥0andamount≤50 (loan) Delete emp smith record from emp(eno, ename, sal, dno ) relation. emp ← emp - σename=”smith” (emp) Delete all employees record whose salary ≥ 5000 from emp(eno, ename, sal, dno) relation. emp ← emp - σsal≥5000 (emp) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 155 Relational Algebra Insert It is used to insert a new record into a relation. r ← r ∪ E, where r is a relation and E is a relational algebra expression. Ex.: Write the relational algebra expression to insert a emp tuple (305, ”rama”, 10000, 10) into emp( eno, ename, sal, dno). emp ← emp ∪ {305, ”rama”, 10000, 10} Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 156 Relational Algebra Update A mechanism to change a value in a tuple without charging all values in the tuple r ← ΠF 1,F 2,....,Fn (r), where Fi is either the i th attribute or updated value of i th attribute of r. Ex.: Write a relational algebra expression to increase the salary of each employee( eno ,ename,sal,dno ) by 30% employee ← Πeno,ename,sal+0.3∗sal,dno (employee) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 157 Relational Algebra Division Operation Denoted by ÷ Notation: r ÷ s It is a binary operator. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 158 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 159 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 160 Relational Algebra Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 161 Relational Algebra Practice Exercise - 1 Consider the following relation schema: employee (person-name, street, city) works (person-name, company-name, salary) company (company-name, city) manages (person-name, manager-name) Q1: Find the names of all employees who work for First Bank Corporation Q2: Find the names and cities of residence of all employees who work for First Bank Corporation. Q3: Find the names, street address, and cities of residence of all employees who work for First Bank Corporation and earn more than $10,000. Q4: Find the names of all employees in this database who live in the same city as the company for which they work. Q5: Give all employees of First Bank Corporation a 10 percent salary raise. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 162 Relational Algebra Practice Exercise - 2 Consider the following relation schema: Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 163 Relational Algebra Q1: Query 1. Retrieve the name and address of all employees who work for the ‘Research’ department. Q2: For every project located in ‘Stafford’, list the project number, the controlling department number, and the department manager’s last name, address, and birth date. Q3: Query 5. Make a list of project numbers for projects that involve an employee whose last name is ‘Smith’, either as a worker or as a manager of the department that controls the project. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 164 Query Processing and Optimization Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 165 Query Processing A query expressed in a high-level query language such as SQL must first be scanned, parsed, and validated. The scanner identifies the query tokens—such as SQL keywords, attribute names, and relation names—that appear in the text of the query, whereas the parser checks the query syntax to determine whether it is formulated according to the syntax rules (rules of grammar) of the query language. The query must also be validated by checking that all attribute and relation names are valid and semantically meaningful names in the schema of the particular database being queried. An internal representation of the query is then created, usually as a tree data structure called a query tree. It is also possible to represent the query using a graph data structure called a query graph, which is generally a directed acyclic graph (DAG). Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 166 Query Processing The DBMS must then devise an execution strategy or query plan for retrieving the results of the query from the database files. A query has many possible execution strategies, and the process of choosing a suitable one for processing a query is known as query optimization. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 167 Query Optimization Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 168 Query Optimization Query Tree It corresponds to a relational algebra expression. Leaf: relation Internal node: Relational algebra operator Root: Result Ex.: SELECT Name FROM Customer CU, Checkedout CH, Film F WHERE Title =’Terminator’ AND F. FilmID = CH. FilmID AND CU.CustomerID = CH. CustomerID AND CU.Street = ’Elm’ Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 169 Query Optimization Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 170 Query Optimization Initial (canonical) query tree for SQL query Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 171 Query Optimization Query tree corresponding to the relational algebra expression Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 172 Query Optimization Query graph Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 173 Heuristic Optimization of Query Tree Transformation Algorithm Outline Transform a query represented in relational algebra to an equivalent one (generates the same result) Step 1: Decompose σ operations. Step 2: Move σ as far down the query tree as possible. Step 3: Rearrange leaf nodes to apply the most restrictive σ operation first. Step 4: Form joins from × and subsequent σ operations. Step 5: Decompose Π and move down the query tree as far as possible. Step 6: Identify candidates for combined operations. Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 174 Heuristic Optimization of Query Tree Practice Example Consider the following relational schema Employee (SSN, Name, Bdate, Address) Works On (ESSN, Pno) Project (Pnumber, Pname) Q:- Find the names of Employees born after 1957 who work on a project named “Aquarius”. SQL: Select name from Employee, Works-on, Project where Pname=’Aquarius’ and Pno= Pnumber and ESSN = SSN and Bdate > ’31-Dec-1957’; Relational Algebra: Πname (σPname=′ Aquarius ′ AND Pno=Pnumber AND ESSN=SSN AND Bdate>′ 31−Dec−1957′ (Employee × Works On × Project)) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 175 Heuristic Optimization of Query Tree Draw the initial query tree (canonical tree) Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 176 Heuristic Optimization of Query Tree Moving selection operation down the tree Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 177 Heuristic Optimization of Query Tree Applying the most restrictive selection operation first Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 178 Heuristic Optimization of Query Tree Replacing Cartesian Product and selection with Join Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 179 Heuristic Optimization of Query Tree Moving Projection operation down the tree Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 180 Heuristic Optimization of Query Tree Practice Exercise Consider the following relation schema: Employee (eno, ename, Mgrsrno, dno) Department ( dno, dname, mgrsrno) Dept loc (dnumber, dloc) Project (Pno, Pname, Ploc, dnum) Works on (eno, pno) Q1: Select ename, dname from employee e, department d where e.eno=d.mgrsrno; Q2: Select ename from employee e, department d where d.name= “academic” and e.dno = d.dno; Q3: Select pname, ename from employee e, project p, works on w where e.eno=w.eno and p.pno=w.pno Q4: Display the ename of employees which have salary>5000 and dname is ‘RESEARCH’ Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 181 Heuristic Optimization of Query Tree General Transformation Rules / Equivalence Rules Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 182 Heuristic Optimization of Query Tree General Transformation Rules / Equivalence Rules Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 183 Heuristic Optimization of Query Tree General Transformation Rules / Equivalence Rules Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 184 Dr. Shashank Mouli Satapathy VIT - SCOPE - Database Systems BCSE302L - Module 4 September 23, 2024 185