8. Relational Algebra.pdf
Document Details
Tags
Full Transcript
Relational Algebra September 3, 4, 9, 2024 Relational Algebra ❖ Collection of operations on relations ❖ Relations – Sets ❖ Tuple of a relation – elements of a set ❖ Each operation takes one/more relations as its operand(s) and produces another relation as its result ❖ The...
Relational Algebra September 3, 4, 9, 2024 Relational Algebra ❖ Collection of operations on relations ❖ Relations – Sets ❖ Tuple of a relation – elements of a set ❖ Each operation takes one/more relations as its operand(s) and produces another relation as its result ❖ The result is a relation, that can be subjected to further algebraic operations Relational Algebra Operators ❖ Traditional Set Operators ▪ Union, ▪ Intersection, ▪ Difference, and ▪ Cartesian product ❖ Special Relational Operators ▪ Select, ▪ Project, ▪ Rename, ▪ Join, and ▪ Division Traditional Set Operators ❖ Binary operators - will work on two relations. ❖ Can only be used on union compatible sets ▪ R (A1, A2, …, AN) and S(B1, B2, …, BN) are union compatible if: degree (R) = degree (S) = N domain (Ai) = domain (Bi) for all i Union (⋃) ❖ Assuming R & S are union compatible… ▪ union: R ⋃ S is the set of tuples in either R or S or both. ▪ since it is a set, there are no duplicate tuples Intersection (⋂) ❖ Assuming that R & S are union compatible: ▪ intersection: R ⋂ S is the set of tuples in both R and S ❖ Note that R ⋂ S = S ⋂ R ❖ Example: use R and S from before… Difference (-) ❖ Difference: R – S is the set of tuples that appear in R but do not appear in S ❖ Is (R – S) = (S – R) ??? Cartesian Product (X) ❖ Also called “cross product” ❖ Sets do not have to be union compatible (usually not) ❖ In general, the result of: R(A1, A2, …, AN) X S(B1, B2, …, BM) is Q (A1, A2, …, AN, B1, B2, …, BM) ❖ If R has C tuples and S has D tuples, the result is C*D tuples. ❖ Note that union, intersection and cross product are commutative and associative R1 sid bid day Example Instances 22 101 10/10/96 58 103 11/12/96 S1 sid sname rating age ❖ Sailors (sid, sname, rating, age) 22 dustin 7 45.0 ❖ Reserves (sid, bid, day) 31 lubber 8 55.5 ❖ “Sailors” and “Reserves” 58 rusty 10 35.0 relations for our examples. “bid”= boats. “sid”: sailors sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 S2 58 rusty 10 35.0 Union, Intersection, Set-Difference S1 sid sname rating age sid sname rating age 22 dustin 7 45.0 22 dustin 7 45.0 31 lubber 8 55.5 31 lubber 8 55.5 58 rusty 10 35.0 58 rusty 10 35.0 44 guppy 5 35.0 S2 sid sname rating age 28 yuppy 9 35.0 28 yuppy 9 35.0 S1 S2 31 lubber 8 55.5 sid sname rating age 44 guppy 5 35.0 31 lubber 8 55.5 58 rusty 10 35.0 58 rusty 10 35.0 sid sname rating age S1 S2 22 dustin 7 45.0 S1− S2 Cross-Product ❖ Each row of S1 is paired with each row of R1. ❖ Result schema has one field per field of S1 and R1, with field names `inherited’ if possible. S1 sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 R1 58 rusty 10 35.0 sid bid day 22 101 10/10/96 58 103 11/12/96 Cross-Product S1 sid sname rating age R1 sid bid day 22 dustin 7 45.0 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 (sid) sname rating age (sid) bid day ❖ Conflict: Both 22 dustin 7 45.0 22 101 10/10/96 S1 and R1 22 dustin 7 45.0 58 103 11/12/96 have a field 31 lubber 8 55.5 22 101 10/10/96 called sid. 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96 58 rusty 10 35.0 58 103 11/12/96 ▪ Use Renaming operator Renaming Operator S (B1, B2...,Bn)(R) ❖ Renames both, relation and its attribute S (R) ❖ Renames relation only (B1, B2..., Bn)(R) ❖ Renames the attribute only S is the new relation name and B1, B2, …,Bn are new attribute names. If the attribute of R are A1, A2, …,An in that order, then each Ai is renamed as Bi Selection ❖ Selects rows that satisfy selection condition. ❖ Schema of result identical to schema of (only) input relation. ❖ Result relation can be the input for another relational algebra operation! (Operator composition.) Selection S2 sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 ❖ selection condition 58 rusty 10 35.0 ❖ Find the sailors who have rating higher than 8. sid sname rating age 28 yuppy 9 35.0 rating 8(S2) 58 rusty 10 35.0 Projection ❖ Deletes attributes that are not in projection list. ❖ Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation. sname rating S2 sid sname rating age yuppy 9 28 yuppy 9 35.0 lubber 8 31 lubber 8 55.5 guppy 5 44 guppy 5 35.0 rusty 10 58 rusty 10 35.0 sname,rating(S2) Projection S2 sid sname rating age age 28 yuppy 9 35.0 35.0 31 lubber 8 55.5 55.5 44 guppy 5 35.0 58 rusty 10 35.0 age(S2) ❖ Projection operator has to eliminate duplicates! (Why??, what are the consequences?) ▪ Note: real systems typically don’t do duplicate elimination unless the user explicitly asks for it. Selection + Projection S2 sid sname rating age sid sname rating age 28 yuppy 9 35.0 28 yuppy 9 35.0 31 lubber 8 55.5 58 rusty 10 35.0 44 guppy 5 35.0 58 rusty 10 35.0 rating 8(S2) sname rating yuppy 9 sname,rating( rating 8(S2)) rusty 10 Join ❖ Condition Join: R c S = c ( R S) R1 sid bid day S1 sid sname rating age 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 58 rusty 10 35.0 S1 R1 S1. sid R1. sid (sid) sname rating age (sid) bid day 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 58 103 11/12/96 Join ❖ Condition Join: R c S = c ( R S) (sid) sname rating age (sid) bid day 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 58 103 11/12/96 S1 R1 S1. sid R1. sid ❖ Result schema same as that of cross-product. ❖ Fewer tuples than cross-product. Filters tuples not satisfying the join condition. ❖ theta-join. Si θ Rj ; θ={=, , ≤, ≥, } ❖ Use Rename operator if two join attributes have same name Joins ❖ Theta-Join: Si θ Rj ; θ = {=, , ≤, ≥, } ❖ Equi-Join: A special case of condition join where the condition c contains only equalities. ▪ Applied when join attributes have different names ▪ Still two columns with same values ❖ Natural Join (*): Equijoin on all common fields. ▪ Same as Equijoin except that if join attribute have the same names, they do not have to be specified at all. ▪ Requires that two join attributes have the same name in both relations. If not, then use rename operation first ❖ Inner Join Joins ❖ Natural Join (*): Equijoin on all common fields. ▪ If join attribute have the same names, they do not have to be specified at all. R1 sid bid day S1 sid sname rating age 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 58 rusty 10 35.0 ( S1 S1. sid = R1. sid R1 ) sid sname rating age bid day (𝑆1 ∗ 𝑅1) 22 dustin 7 45.0 101 10/10/96 58 rusty 10 35.0 103 11/12/96 Division ❖ Not supported as a primitive operator, but useful for expressing queries like: Find sailors who have reserved all boats. ❖ Precondition: in A/B, the attributes in B must be included in the schema for A. Also, the result has attributes A-B. ▪ SALES (supId, prodId); ▪ PRODUCTS (prodId); ▪ Relations SALES and PRODUCTS must be built using projections. ▪ SALES/PRODUCTS: the ids of the suppliers supplying ALL products. Examples of Division A/B sno pno pno pno pno s1 p1 p2 p2 p1 s1 p2 p4 p2 s1 p3 B1 p4 B2 s1 p4 s2 p1 sno B3 s2 p2 s1 s3 p2 s2 sno s4 p2 s3 s1 sno s4 p4 s4 s4 s1 A A/B1 A/B2 A/B3 Relational Algebra Queries – Example 1 Consider the following schema ❖ Sailors (sid, sname, rating, age) ❖ Boats (bid, bname, color) ❖ Reserves (sid, bid, day) Relational Algebra Queries – Example 1a Sailors (sid, sname, rating, age) Single Table Queries Boats (bid, bname, color) Reserves (sid, bid, day) 1. Find names of sailors above 30 years. 𝜋𝑠𝑛𝑎𝑚𝑒 (𝜎𝑎𝑔𝑒>30 𝑆𝑎𝑖𝑙𝑜𝑟𝑠) 2. Find names of sailors above 30 years with a rating of more than 7. 𝜋𝑠𝑛𝑎𝑚𝑒 (𝜎𝑎𝑔𝑒>30 𝐴𝑁𝐷 𝑟𝑎𝑡𝑖𝑛𝑔>7 𝑆𝑎𝑖𝑙𝑜𝑟𝑠) 3. Find names of all red colored boat. 𝜋𝑏𝑛𝑎𝑚𝑒 (𝜎𝑐𝑜𝑙𝑜𝑟=′𝑅𝑒𝑑′ 𝐵𝑜𝑎𝑡𝑠) Relational Algebra Queries – Example 1b 1. Find names of sailors who’ve reserved boat #103. 2. Find names of sailors who’ve reserved a red boat. 3. Find sailors who’ve reserved a red or a green boat. 4. Find sailors who’ve reserved a red and a green boat. 5. Find the names of sailors who’ve reserved all boats. 6. Find the number of boats reserved on Monday. 7. Find the name of the sailor having the highest rating. 1. Find names of sailors who’ve reserved boat #103 Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day) ❖ Solution 1: sname(( Re serves)*Sailors) bid =103 Solution 2: sname( (Re serves*Sailors)) ❖ bid =103 2. Find names of sailors who’ve reserved a red boat Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day) ❖ Information about boat color only available in Boats; so need an extra join: sname(( Boats)*Re serves*Sailors) color ='red ' ❖Other solutions: 𝜋𝑠𝑛𝑎𝑚𝑒 (𝜋𝑠𝑖𝑑 ((𝜋𝑏𝑖𝑑 𝜎𝑐𝑜𝑙𝑜𝑟=′𝑟𝑒𝑑′ 𝐵𝑜𝑎𝑡𝑠) ∗ Re 𝑠𝑒𝑟𝑣𝑒𝑠) ∗ 𝑆𝑎𝑖𝑙𝑜𝑟𝑠) 𝜋𝑠𝑛𝑎𝑚𝑒 (𝜎𝑐𝑜𝑙𝑜𝑟=′𝑟𝑒𝑑′ (𝐵𝑜𝑎𝑡𝑠 ∗ Re 𝑠𝑒𝑟𝑣𝑒𝑠 ∗ 𝑆𝑎𝑖𝑙𝑜𝑟𝑠) 3. Find names of sailors who’ve reserved a red or a green boat Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day) ❖ Can identify all red or green boats, then find sailors who’ve reserved one of these boats: (Tempboats,( Boats)) color ='red 'ORcolor =' green' sname(Tempboats*Re serves*Sailors) ❖ Can also define Tempboats using union! (How?) ❖ What happens if is replaced by in this query? 4. Find names of sailors who’ve reserved a red and a green boat Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day) ❖ Previous approach won’t work! Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors): (Tempred, (( Boats) * Re serves)) sid color =' red ' (Tempgreen, (( Boats) * Re serves)) sid color =' green' sname((Tempred Tempgreen) * Sailors) 5. Find the names of sailors who’ve reserved all boats Sailors (sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day) ❖ Uses division; schemas of the input relations to / must be carefully chosen: (Tempsids, ( Re serves) / ( Boats)) sid, bid bid sname (Tempsids * Sailors) ❖ To find sailors who’ve reserved all ‘Interlake’ boats: / ( Boats)..... bid bname =' Interlake' Additional relational Operations 6. Find the number of boats reserved on Monday. 7. Find the name of the sailor having the highest rating. ❖ Aggregate Functions and Grouping ❖ Outer Join Aggregate Functions and Grouping ❖ To specify mathematical aggregate functions on collections of values from the database. ❖ Consider the queries like : ▪ Find average rating of sailors. ▪ How many sailors are there? ▪ Find the sailor having the highest rating ❖ Common functions: ▪ COUNT, SUM, AVERAGE, MAXIMUM, and MINIMUM Aggregate Functions Operations ❖ Use of the Aggregate Functional operation ℱ ▪ ℱMAX rating (Sailors) retrieves the maximum rating value from the Sailors relation ▪ ℱMIN rating (Sailors) retrieves the minimum rating value from the Sailors relation ▪ ℱSUM rating (Sailors) retrieves the sum of the rating from the Sailors relation ▪ ℱCOUNT sid, AVERAGE rating (Sailors) computes the count (number) of sailors and their average rating ❖ Note: count just counts the number of rows, without removing duplicates Using Grouping with Aggregation ❖ Grouping can be combined with Aggregate Functions ❖ Consider Reserves (sid, bid, day) and find the number of sailors reserved boats on each weekday ❖ A variation of aggregate operation ℱ allows this: ▪ Grouping attribute placed to left of symbol ▪ Aggregate functions to right of symbol ▪ day ℱCOUNT sid (Reserves) ▪ The operation groups Reserves by day and computes the count of sailors ❖ Consider Sailors (sid, sname, rating, age) ▪ rating ℱCOUNT sid, AVERAGE age (Sailors) Examples of applying aggregate functions and grouping Outer Join ❖ In Natural Join and Equijoin, tuples without a matching (or related) tuple are eliminated from the join result ▪ Tuples with null in the join attributes are also eliminated ▪ This amounts to loss of information. ❖ Outer joins are used when we want to keep all the tuples in R, or all those in S, or all those in both relations in the result of the join, regardless of whether or not they have matching tuples in the other relation. Outer Join ❖ left outer join: R S ❖ right outer join: R S ❖ full outer join: R S Outer Join - Example Relational Algebra Queries – Example 2 Consider the following schema (Suppliers-Parts-Projects Database) ❖ Suppliers (s#, sname, status, city) ❖ Parts (p#, pname, color, weight, city) ❖ Projects (pr#, prname, city) ❖ Shipments (s#, p#, pr#, qty) Queries on Example 2 (1) 1. Get names of suppliers who supply in project J1 2. Get all shipments where the quantity is in the range 300 to 750 inclusive. 3. Get full details of parts supplied by a supplier in Bombay. 4. Get the total numbers of projects supplied by supplier S1. 5. Get the total quantity of part P1 supplied by supplier S1. Queries on Example 2 (2) 6. Get project names for projects supplied by supplier S1. 7. Get colors of parts supplied by supplier S5. 8. Get part numbers for parts supplied to any project in Bombay. 9. Get project numbers for projects using at least one part supplied by supplier S2. 10. Get project names for projects using at least one part supplied by supplier S2. 11. Get all cities in which at least one supplier, part, or project is located. Summary ❖ The relational model has rigorously defined query languages that are simple and powerful. ❖ Relational algebra is more operational; useful as internal representation for query evaluation plans. ❖ Several ways of expressing a given query; a query optimizer should choose the most efficient version. Reference ❖ Slides of CSCD343 - Introduction to databases - A. Vaisman