DBS Chapter Three PDF
Document Details
Uploaded by NonViolentVenus
Tags
Summary
This document presents an overview of relational algebra and relational calculus, crucial concepts in database management systems. It discusses the fundamentals of relational algebra, including set operations like union, intersection, and difference, as well as specialized operations like selection, projection, and join. The document also details the principles of relational calculus, a non-procedural approach to querying databases.
Full Transcript
Chapter Three Relational Algebra and Relational Calculus 1 Database Management Systems In mathematics, a formal language is normally defined by an alphabet and formation rules. The alphabet of a formal l...
Chapter Three Relational Algebra and Relational Calculus 1 Database Management Systems In mathematics, a formal language is normally defined by an alphabet and formation rules. The alphabet of a formal language is a set of symbols on which this language is built. The formation rules specify which strings of symbols count as well-formed. The well-formed strings of symbols are also called words, expressions, formulas, or terms. Two mathematical formal languages form the basis for “real” Query Languages (e.g. SQL), and for implementation Relational Algebra: More operational, very useful for representing execution plans. Relational Calculus: Lets users describe what they want, rather than how to compute it. (Nonoperational, declarative.) A query is applied to relation instances, and the result of a query is also a relation instance. Schemas of input relations for a query are fixed.The schema for the result of a given query is also fixed which is determined by definition of query language constructs. 2 Database Management Systems Relational Algebra An algebra is a formal language consisting of sets and operations on those sets. Relational algebra defines the basic set of operations of relational database model. These operations enable a user to specify basic retrieval requests as relational algebra expressions. A sequence of relational algebra operations forms a relational algebra expression. The result of this expression represents the result of a database query. Relational algebra is a procedural query language, which takes instances of relations as input and yields instances of relations as output. It uses operators to perform queries. An operator can be either unary or binary. They accept relations as their input and yield relations as their output. Relational algebra can be performed recursively on a relation and intermediate results are also considered relations. Relational algebra is a formal system for manipulating relations. Operands and result of this algebra are relations. Operations of this algebra include the usual set operations (since relations are sets of tuples), and special operations defined for relations like selection, projection , join and the like 3 Database Management Systems Relational algebra provides a formal foundation for relational model operations and it is used as a basis for implementing and optimizing queries in the query processing and optimization modules that are integral parts of relational database management systems (RDBMSs) Operations of relational algebra can be divided into two groups. One group includes set of operations from mathematical set theory; these are applicable because each relation is defined to be a set of tuples in the formal relational model. Set operations include UNION, INTERSECTION, SET DIFFERENCE, and CARTESIAN PRODUCT ( CROSS PRODUCT). The other Fundamental operations developed specifically for relational databases including SELECT, PROJECT, and JOIN. 4 Database Management Systems Set Operations on Relations Union Operation (∪) It performs binary union between two given relations and is defined as R1 ∪ R2 = { r | r ∈ R1 or r ∈ R2} R1 U R2 (union) is the relation containing all tuples that appear in R1, R2, or both. Where R1 and R2 are either database relations or relation result set (temporary relation). For a union operation to be valid the relations should be Union compatible. To be Relations R1 and R2 are union compatible, the following conditions must be satisfied R1 and R2 must have the same number of attributes. each corresponding pair of attributes has the same domain Duplicate tuples are automatically eliminated. 5 Database Management Systems Intersection Operation ∩ It performs binary Intersection between two given relations and is defined as R1 ∩ R2 = { r | r ∈ R1 and r ∈ R2} R1 ∩ R2(intersection) is the relation containing all tuples that are in both R1 and R2. R1 and R2 must be union-compatible. Note : Both UNION and INTERSECTION are commutative operations; Both UNION and INTERSECTION can be treated as n-ary operations 6 Database Management Systems Set Difference (-) It performs binary Difference between two given relations and is defined as R1 – R2 = { r | r ∈ R1 and r ∉ R2} The result of R1 – R2, is a relation which includes all tuples that are in R1 but not in R2. R1 - R2 (set difference) is the relation containing all tuples of R1 that do not appear in R2. The attribute name of R1 has to match with the attribute name in R2. The two-operand relations R1 and R2 should be either compatible or Union compatible. It should be defined relation consisting of the tuples that are in relation R1, but not in R2. Set Difference is not commutative; that is, in general Note : In SQL, there are three operations—UNION, INTERSECT, and EXCEPT— that correspond to the set operations described here. In addition, there are multiset operations (UNION ALL, INTERSECT ALL, and EXCEPT ALL) that do not eliminate duplicates 7 Database Management Systems CARTESIAN PRODUCT(CROSS PRODUCT) It performs Cartesian product between two given relations and is defined as R1 ✕ R2 = { r1r2| r1 ∈ R1 and r2 ∈ R2} Cartesian product takes every tuple one by one from the left set(relation) and will pair it up with all the tuples in the right set(relation). The Cartesian product of two relation R1(A1, A2, A3, …, An) with degree n, and R2(a1, a2, a3, …, am) with degree m, is a relation R1✕ R2(A1, A2, A3, …, An, a1, a2, a3, …, am) with degree n + m attributes. Cartesian product is a binary set operation The relations on which it is applied do not have to be union compatible. Cartesian product produces new tuples by concatenating all possible combinations of tuples from n underlying relations. Cartesian product operation applied by itself is generally meaningless. It is mostly useful when followed by a selection that matches values of attributes coming from the component relations. 8 Database Management Systems Fundamental operations Projection (Project Operation) (π) Projection operation is a unary operations that displays a subset of fields of a table. It projects column(s) that satisfy a given condition., The result of the PROJECT operation can be visualized as a vertical partition that chooses a subset of the columns in a relation, and discards the rest. The general form of the PROJECT operation is π(R) where π (pi) is the symbol used to represent the PROJECT operation, Notation − ∏ (R) Where attribute list attribute names of relation R. The result of the PROJECT operation has only the attributes specified in in the same order as they appear in the list. Hence, its degree is equal to the number of attributes in attribute list Duplicate rows are automatically eliminated, as relation is a set. The number of tuples in a relation resulting from a PROJECT operation is always less than or equal to the number of tuples in R. If the projection list includes some key of R the resulting relation has the same number of tuples as R. 9 Database Management Systems Selection (Select Operation) (σ) The SELECT operator is unary; that is, it is applied to a single relation The SELECT operation is used for selecting a subset of the tuples according to a given selection condition. SELECT operation filter and keeps only those tuples that satisfy a qualifying condition. The SELECT operation can be visualized as a horizontal partition of the relation into two sets of tuples—those tuples that satisfy the condition and are selected, and those tuples that do not satisfy the condition and are filtered out Notation − σ (R) where the symbol σ (sigma) is used to denote the SELECT operator and the selection condition is a Boolean expression (condition) specified on the attributes of relation R. The relation resulting from the SELECT operation has the same attributes as R. Note : Boolean expression are expressions whose operators are the logical connectives (and, or, not) and arithmetic comparisons (=, =, ≠), and whose operands are either domain names or domain constants. operation—its number of attributes—is the same as the degree of R. The number of tuples in the resulting relation is always less than or equal to the number of tuples in R. 10 Database Management Systems Relational Algebra Operations 11 Database Management Systems 8/18/2024 examples 1. ∏subject, author (Books) projects columns named subject and author from the relation Books. 2. σsubject = "database"(Books) Output − Selects tuples from books where subject is 'database'. 3. σsubject = "database" and price = "450"(Books) Output − Selects tuples from books where subject is 'database' and 'price' is 450. 12 Database Management Systems Sequences of Operations and the RENAME Operation Several relational algebra operations can be applied one after the other. It is possible to write the operations as a single relational algebra expression by nesting the operations, or by applying one operation at a time and creating intermediate result relations. In the latter case, names should be given to the relations that hold the intermediate results. For example, to retrieve the subject, author, of all books whose subject is database , first we have to apply a SELECT and then a PROJECT operation. We can write a single relational algebra expression, also known as an in-line expression, as follows ∏subject, author(σsubject = "database"(Books)) shows the result of this in-line relational algebra expression. It is sometimes simpler to break down a complex sequence of operations by specifying intermediate result relations than to write a single relational algebra expression. we can explicitly show the sequence of operations, giving a name to each intermediate relation, and using the assignment operation, denoted by ← (left arrow), as follows: SUBDB_BOOKS ← σsubject = "database"(Books) RESULT ← ∏subject, author (SUBDB_BOOKS ) 13 Database Management Systems A formal RENAME operation can rename either the relation name or the attribute names, or both—as a unary operator. The general RENAME operation when applied to a relation R of degree n is denoted by any of the following three forms: ρS(B1, B2,... , Bn)(R) or ρS(R) or ρ(B1, B2,... , Bn)(R where the symbol ρ (rho) is used to denote the RENAME operator, S is the new relation name, and B1, B2, … , Bn are the new attribute names. The first expression renames both the relation and its attributes, the second renames the relation only, and the third renames the attributes only. If the attributes of R are (A1, A2, … , An) in that order, then each Ai is renamed as Bi. To rename the attributes in a relation, we simply list the new attribute names in parentheses, If no renaming is applied, the names of the attributes in the resulting relation of a SELECT operation are the same as those in the original relation and in the same order. For a PROJECT operation with no renaming, the resulting relation has the same attribute names as those in the projection list and in the same order in which they appear in the list. 14 Database Management Systems The JOIN Operation JOIN operation is a Binary Relational Operation, denoted by ⋈ is used to combine related tuples from two relations into single “longer” tuples. This operation is very important for any relational database with more than a single relation because it allows us to process relationships among relations. Notion of a JOIN operation on two relations R1(A1, A2, … , An) and R2(a1, a2, … , am) is R1 ⋈ R2 The result of the JOIN is a relation R1 ⋈ R2 with n + m attributes R1 ⋈ R2 (A1, A2, … , An, a1, a2, … , am) in that order R1 ⋈ R2 has one tuple for each combination of tuples one from R1 and one from R2 whenever the combination satisfies the join condition. In JOIN, only combinations of tuples satisfying the join condition appear in the result, whereas in the CARTESIAN PRODUCT all combinations of tuples are included in the result. The join condition is specified on attributes from the two relations R1 and R2 and is evaluated for each combination of tuples. Each tuple combination for which the join condition evaluates to TRUE is included in the resulting relation R1 ⋈ R2 as a single combined tuple. 15 Database Management Systems Example : Given the following two relations DEPARTMENT(Did, Dname,Mgr_ssn, other attributes) EMPLOYEE(Ssn , Lname ,Fname,other attributes) Where Mgr_ssn is a foreign key of the DEPARTMENT relation that references Ssn, the primary key of the EMPLOYEE relation. to retrieve the name of the manager of each department. each department tuple need to be combined with the employee tuple whose Ssn value matches the Mgr_ssn value in the department tuple. To do this the JOIN operation is applied between DEPARTMENT and EMPLOYEE and then projecting the result over the necessary attributes, as follows: DEPT_MGR ← DEPARTMENT ⋈ Mgr_ssn=Ssn EMPLOYEE RESULT ← πDname, Lname, Fname(DEPT_MGR) Note :The JOIN operation can be specified as a CARTESIAN PRODUCT operation followed by a SELECT operation. However, JOIN is very important because it is used frequently when specifying database queries. Join operation = select operation + Cartesian product operation 16 Database Management Systems Types of Joins Join operation combines relations R1 and R2 with respect to a condition( R1 ⋈ R2 ) where Condition is of the form Ai θ aj , Ai is an attribute of R1, aj is an attribute of R2, Ai and aj have the same domain, and θ (theta) is one of the comparison operators {=,, ≥, ≠}. Theta join A JOIN operation with such a general join condition is called a THETA JOIN. Tuples whose join attributes are NULL or for which the join condition is FALSE do not appear in the result. Can rewrite Theta join using basic Selection and Cartesian product operations. Where F is the predicate (Join condition) R1 FR2 = F(R1 R2) Note; If predicate F (Join condition) contains only equality (=), the term Equijoin is used. Degree of a Theta join is sum of degrees of the operand relations R1 and R2 17 Database Management Systems Equijoin When Theta join uses only equality comparison operator, it is said to be equijoin. Equijoin is a special case of conditional join where only equality condition holds between a pair of attributes. As values of two attributes will be equal in result of equijoin, only one attribute will be appeared in result. Natural join R1 R2 An Equijoin of the two relations R1 and R2 over all common attributes x. One occurrence of each common attribute is eliminated from the final result. Note Equi Join is used when both tables are needed in the output without missing any column. Use Natural Join is used when only the unique columns are needed without repetition. 18 Database Management Systems Inner joins : An inner join is a type of match-and combine operation defined formally as a combination of CARTESIAN PRODUCT and SELECTION on two relations so that related information can be presented in a single table If inner join is applied on R1 and R2 then only tuples from R1 that have matching tuples in R2 and vice versa appear in the result and tuples without a matching (or related) tuple are eliminated from the JOIN result. Tuples with NULL values in the join attributes are also eliminated. 19 Database Management Systems Outer joins An outer join operations keep all the tuples in R1, or all those in R2, or all those in both relations in the result of the JOIN, regardless of whether or not they have matching tuples in the other relation. This satisfies the need of queries in which tuples from two tables are to be combined by matching corresponding rows, but without losing any tuples for lack of matching values. LEFT OUTER JOIN( ) The LEFT OUTER JOIN operation keeps every tuple in the first, or left, relation R1 in R1 R2; if no matching tuple is found in R2, then the attributes of R2 in the join result are filled or padded with NULL values. RIGHT OUTER JOIN( ) The RIGHT OUTER JOIN keeps every tuple in the second, or right, relation R2 in the result of R1 R2. FULL OUTER JOIN( ) FULL OUTER JOIN keeps all tuples in both the left and the right relations when no matching tuples are found, padding them with NULL values as needed. Full outer join=left outer join U right outer join. 20 Database Management Systems Aggregate Operations GAAL(R) Groups tuples of R by grouping attributes, GA, and then applies aggregate function list, AL, to define a new relation. AL contains one or more (, ) pairs. Resulting relation contains the grouping attributes, GA, along with results of each of the aggregate functions. Applies aggregate function list, AL, to R to define a relation over the aggregate list. Main aggregate functions are: COUNT, SUM, AVG, MIN, and MAX. The resulting relation has the grouping attributes plus one attribute for each pair in the function list Example Find the number of staff working in each branch and the sum of their salaries. R(branchNo, myCount, mySum) branchNo COUNT staffNo, SUM salary (Staff) 21 Database Management Systems Write relational algebra expression for the following queries use the following schema for question 1-5 Suppliers(sID, sName, address) Parts(pID, pName, colour) Catalog(sID, pID, price) 1 Find the names of all red parts. 2. Find all prices for parts that are red or green. (A part may have different prices from different manufacturers.) 3. Find the sIDs of all suppliers who supply a part that is red or green. 4.Find the names of all suppliers who supply a part that is red or green. 5.Find the sIDs of all suppliers who supply a part that is red and green. Use the following schema for question 6 Clients(cID, name, phone) Staff(sID, name) Appointments(cID, date, time, service, sID) 6. Find the appointment time and client name of all appointments for staff member Giuliano on Feb14. (assume that = can be used also to compare date) 22 Database Management Systems Answer: 1. ΠpName(σcolour=“red”Parts) 2. Πprice((σcolour=“red”∨colour=“green”Parts) ⋈ Catalog) 3. ΠsID((σcolour=“red”∨colour=“green”P arts) ⋈ Catalog) 4.ΠsName((ΠsID((σcolour=“red”∨colour=“green”P arts) ⋈ Catalog)) ⋈ Suppliers) 5.no soln 6.Πname,time((ΠcID,sID,timeσdate=“F eb14”Appointments) ⋈ (ΠsIDσname=“Giuliano”Staff) ⋈ Clients) 23 Database Management Systems Relational Calculus Relational calculus is a non-procedural query language that tells the system what data to be retrieved but doesn’t tell how to retrieve it. Instead of algebra, it uses mathematical predicate calculus. Note :In predicate calculus or first-order logic , a predicate is a truth-valued function with arguments. When arguments are replaced with values , the function yields an expression, called a proposition, which will be either true or false. Relational calculus only focusses on what to do, and not on how to do it.Relational Calculus exists in two forms: 1.Tuple Relational Calculus (TRC) 2.Domain Relational Calculus (DRC) Tuple Relational Calculus (TRC) Tuple relational calculus is used for selecting those tuples that satisfy the given condition. Tuples are filtered based on the given condition. A simple tuple relational calculus query is of the form: {t | COND(t)} where t is a tuple variable and COND(t) is a conditional (Boolean) expression involving t that evaluates to either TRUE or FALSE for different assignments of tuples to the variable t. The result of such a query is the set of all tuples t that evaluate COND(t) to TRUE. These tuples are said to satisfy COND(t). 24 Database Management Systems or {t| P(t)} where t = resulting tuples, P(t) = known as Predicate and these are the conditions that are used to fetch t Thus, it generates set of all tuples t, such that Predicate P(t) is true for t. There are two parts separated by | symbol. In the first part the fields that needs to be displayed for the selected tuples are specified and the second part is where the condition is defined. Column name can be specified using a. dot operator, with the tuple variable to only get a certain attribute(column) in result. P(t) may have various conditions logically combined with OR (∨), AND (∧), NOT(¬). It also uses quantifiers: Existential quantifier (∃) ∃ t ∈ r (Q(t)) = ”there exists” a tuple in t in relation r such that predicate Q(t) is true. Universal quantifier (∀) ∀ t ∈ r (Q(t)) = Q(t) is true “for all” tuples in relation r. 25 Database Management Systems Expressions and Formulas in Tuple Relational Calculus A general expression of the tuple relational calculus is of the form {t1.Aj , t2.Ak,... , tn.Am | COND(t1, t2,..., tn, tn+1, tn+2,..., tn+m)} where t1, t2, … , tn, tn+1, … , tn+m are tuple variables, each Ai is an attribute of the relation on which ti ranges, and COND is a condition or formula of the tuple relational calculus. A formula is made up of predicate calculus atoms, which can be one of the following: 1. An atom of the form R(ti), where R is a relation name and ti is a tuple variable. This atom identifies the range of the tuple variable ti as the relation whose name is R. It evaluates to TRUE if ti is a tuple in the relation R, and evaluates to FALSE otherwise. 2. An atom of the form ti.A op tj.B, where op is one of the comparison operators in the set {=,,≥, ≠}, ti and tj are tuple variables, A is an attribute of the relation on which ti ranges, and B is an attribute of the relation on which tj ranges. 3. An atom of the form ti.A op c or c op tj.B, where op is one of the comparison operators in the set {=,,≥, ≠}, ti and tj are tuple variables, A is an attribute of the relation on which ti ranges, B is an attribute of the relation on which tj ranges, and c is a constant value. 26 Database Management Systems Each of the preceding atoms evaluates to either TRUE or FALSE for a specific combination of tuples; this is called the truth value of an atom. A formula (Boolean condition) is made up of one or more atoms connected via the logical operators AND, OR, and NOT and is defined recursively by Rules 1 and 2 as follows: Rule 1: Every atom is a formula. Rule 2: If F1 and F2 are formulas, then so are (F1 AND F2), (F1 OR F2), NOT (F1), and NOT (F2). The truth values of these formulas are derived from their component formulas F1 and F2 as follows: (F1 AND F2) is TRUE if both F1 and F2 are TRUE; otherwise, it is FALSE. (F1 OR F2) is FALSE if both F1 and F2 are FALSE; otherwise, it is TRUE. NOT (F1) is TRUE if F1 is FALSE; it is FALSE if F1 is TRUE. NOT (F2) is TRUE if F2 is FALSE; it is FALSE if F2 is TRUE. 27 Database Management Systems The Existential and Universal Quantifiers Truth values for formulas with quantifiers If F is a formula, then so is (∃t)(F), where t is a tuple variable. The formula (∃t)(F) is TRUE if the formula F evaluates to TRUE for some (at least one) tuple assigned to free occurrences of t in F; otherwise, (∃t)(F) is FALSE. If F is a formula, then so is (∀t)(F), where t is a tuple variable. The formula (∀t)(F) is TRUE if the formula F evaluates to TRUE for every tuple (in the universe) assigned to free occurrences of t in F; otherwise, (∀t)(F) is FALSE. The (∃) quantifier is called an existential quantifier because a formula (∃t)(F) is TRUE if there exists some tuple that makes F TRUE. For the universal quantifier, (∀t)(F) is TRUE if every possible tuple that can be assigned to free occurrences of t in F is substituted for t, and F is TRUE for every such substitution. It is called the universal or for all quantifier because every tuple in the universe of tuples must make F TRUE to make the quantified formula TRUE 28 Database Management Systems Domain Relational Calculus (DRC) Domain relational calculus is much similar to the tuple relational calculus the only difference is that the variables it uses in the formula are the domain variables. The domain variable is the variable whose value ranges over the domain of an attribute. Domain Relational Calculus uses domain Variables to get the column values required from the database based on the predicate expression or condition. In domain relational calculus, filtering is done based on the domain of the attributes and not based on the tuple values. Syntax: { c1, c2, c3,..., cn | F(c1, c2, c3,... ,cn)} where, c1, c2... etc represents domain of attributes(columns) and F defines the formula including the condition for fetching the data. 29 Database Management Systems Example TRC For the schema Employee( Other attributes,Salary) find all employees whose salary is above 50,000, {t | EMPLOYEE(t) AND t.Salary>50000 DRC For the given schema Customer (Customerid,CustomerName,CustomerAdress) Loan(LNumber,Branchname,amount ) Borrower(Customerid, Lnumber) a. Find the loan number, branch, amount of loans of greater than or equal to 100 amount. {≺l, b, a≻ | ≺l, b, a≻ ∈ loan ∧ (a ≥ 100)} b. Find all customers having a loan at the “Main” branch and find the loan amount. {≺ci, a≻ | ∃ l (≺ci, l≻ ∈ borrower ∧ ∃ b (≺l, b, a≻ ∈ loan ∧ (b = “Main”)))} 30 Database Management Systems