Relational Algebra and Relational Calculus PDF
Document Details
Uploaded by ExcellentAbstractArt
Tags
Summary
This chapter introduces relational algebra and relational calculus, which are formal languages for manipulating data in relational databases.
Full Transcript
Chapter 5 Relational Algebra and Relational Calculus Chapter Objectives In this chapter you will learn: The meaning...
Chapter 5 Relational Algebra and Relational Calculus Chapter Objectives In this chapter you will learn: The meaning of the term “relational completeness.” How to form queries in the relational algebra. How to form queries in the tuple relational calculus. How to form queries in the domain relational calculus. The categories of relational Data Manipulation Languages (DMLs). In the previous chapter we introduced the main structural components of the relational model. As we discussed in Section 2.3, another important part of a data model is a manipulation mechanism, or query language, to allow the underlying data to be retrieved and updated. In this chapter we examine the query languages associated with the relational model. In particular, we concentrate on the rela- tional algebra and the relational calculus as defined by Codd (1971) as the basis for relational languages. Informally, we may describe the relational algebra as a (high-level) procedural language: it can be used to tell the DBMS how to build a new relation from one or more relations in the database. Again, informally, we may describe the relational calculus as a nonprocedural language: it can be used to formulate the definition of a relation in terms of one or more database relations. However, the relational algebra and relational calculus are formally equivalent to one another: for every expression in the algebra, there is an equivalent expression in the calculus (and vice versa). Both the algebra and the calculus are formal, non-user-friendly languages. They have been used as the basis for other, higher-level Data Manipulation Languages (DMLs) for relational databases. They are of interest because they illustrate the basic operations required of any DML and because they serve as the standard of comparison for other relational languages. The relational calculus is used to measure the selective power of relational languages. A language that can be used to produce any relation that can be derived using the relational calculus is said to be relationally complete. Most relational query languages are relationally complete but have more expressive 167 M05_CONN3067_06_SE_C05.indd 167 06/06/14 5:01 PM 168 | Chapter 5 Relational Algebra and Relational Calculus power than the relational algebra or relational calculus, because of additional operations such as calculated, summary, and ordering functions. Structure of this Chapter In Section 5.1 we examine the relational algebra and in Section 5.2 we examine two forms of the relational calculus: tuple relational calculus and domain relational calculus. In Section 5.3 we briefly discuss some other relational languages. We use the DreamHome rental database instance shown in Figure 4.3 to illustrate the operations. In Chapters 6–9 we examine SQL, the formal and de facto standard language for RDBMSs, which has constructs based on the tuple relational calculus. In Appendix M we examine QBE (Query-By-Example), another highly popular visual query language for RDBMSs, which is based in part on the domain rela- tional calculus. 5.1 The Relational Algebra The relational algebra is a theoretical language with operations that work on one or more relations to define another relation without changing the original relation(s). Thus both the operands and the results are relations, and so the output from one operation can become the input to another operation. This ability allows expres- sions to be nested in the relational algebra, just as we can nest arithmetic opera- tions. This property is called closure: relations are closed under the algebra, just as numbers are closed under arithmetic operations. The relational algebra is a relation-at-a-time (or set) language in which all tuples, possibly from several relations, are manipulated in one statement without looping. There are several variations of syntax for relational algebra commands and we use a common symbolic notation for the commands and present it informally. The interested reader is referred to Ullman (1988) for a more formal treatment. There are many variations of the operations that are included in relational alge- bra. Codd (1972a) originally proposed eight operations, but several others have been developed. The five fundamental operations in relational algebra—Selection, Projection, Cartesian product, Union, and Set difference—perform most of the data retrieval operations that we are interested in. In addition, there are also the Join, Intersection, and Division operations, which can be expressed in terms of the five basic operations. The function of each operation is illustrated in Figure 5.1. The Selection and Projection operations are unary operations, as they operate on one relation. The other operations work on pairs of relations and are therefore called binary operations. In the following definitions, let R and S be two relations defined over the attributes A 5 (a1, a2,... , aN) and B 5 (b1, b2,... , bM), respectively. 5.1.1 Unary Operations We start the discussion of the relational algebra by examining the two unary opera- tions: Selection and Projection. M05_CONN3067_06_SE_C05.indd 168 06/06/14 5:01 PM 5.1 The Relational Algebra | 169 Figure 5.1 Illustration showing the function of the relational algebra operations. Selection (or Restriction) The Selection operation works on a single relation R and defines a spredicate(R) relation that contains only those tuples of R that satisfy the specified condition (predicate). M05_CONN3067_06_SE_C05.indd 169 06/06/14 5:01 PM 170 | Chapter 5 Relational Algebra and Relational Calculus Example 5.1 Selection operation List all staff with a salary greater than £10000. ssalary. 10000(Staff) Here, the input relation is Staff and the predicate is salary. 10000. The Selection operation defines a relation containing only those Staff tuples with a salary greater than £10000. The result of this operation is shown in Figure 5.2. More complex predicates can be generated using the logical operators Ù (AND), Ú (OR), and ~ (NOT). Figure 5.2 Selecting salary. 10000 from the Staff relation. Projection The Projection operation works on a single relation R and defines a Pa1 … , an(R) relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates. Example 5.2 Projection operation Produce a list of salaries for all staff, showing only the staffNo, fName, IName, and salary details. PstaffNo, fName, IName, salary(Staff) In this example, the Projection operation defines a relation that contains only the desig- nated Staff attributes staffNo, fName, IName, and salary, in the specified order. The result of this operation is shown in Figure 5.3. Figure 5.3 Projecting the Staff relation over the staffNo, fName, IName, and salary attributes. M05_CONN3067_06_SE_C05.indd 170 06/06/14 5:01 PM 5.1 The Relational Algebra | 171 5.1.2 Set Operations The Selection and Projection operations extract information from only one rela- tion. There are obviously cases where we would like to combine information from several relations. In the remainder of this section, we examine the binary opera- tions of the relational algebra, starting with the set operations of Union, Set differ- ence, Intersection, and Cartesian product. Union The union of two relations R and S defines a relation that contains all the RS tuples of R, or S, or both R and S, duplicate tuples being eliminated. R and S must be union-compatible. If R and S have I and J tuples, respectively, their union is obtained by concatenating them into one relation with a maximum of (I 1 J) tuples. Union is possible only if the schemas of the two relations match, that is, if they have the same number of attributes with each pair of corresponding attributes having the same domain. In other words, the relations must be union-compatible. Note that attributes names are not used in defining union-compatibility. In some cases, the Projection opera- tion may be used to make two relations union-compatible. Example 5.3 Union operation List all cities where there is either a branch office or a property for rent. Pcity(Branch) Pcity(PropertyForRent) Figure 5.4 To produce union-compatible relations, we first use the Projection operation to project Union based on the Branch and PropertyForRent relations over the attribute city, eliminating duplicates the city attribute where necessary. We then use the Union operation to combine these new relations to from the Branch produce the result shown in Figure 5.4. and Property- ForRent relations Set difference The Set difference operation defines a relation consisting of the tuples that R–S are in relation R, but not in S. R and S must be union-compatible. Example 5.4 Set difference operation List all cities where there is a branch office but no properties for rent. Figure 5.5 Set difference Pcity(Branch) 2 Pcity(PropertyForRent) based on the city attribute from As in the previous example, we produce union-compatible relations by projecting the the Branch and Branch and PropertyForRent relations over the attribute city. We then use the Set differ- PropertyForRent ence operation to combine these new relations to produce the result shown in Figure 5.5. relations. M05_CONN3067_06_SE_C05.indd 171 06/06/14 5:01 PM 172 | Chapter 5 Relational Algebra and Relational Calculus Intersection The Intersection operation defines a relation consisting of the set of all RS tuples that are in both R and S. R and S must be union-compatible. Example 5.5 Intersection operation Figure 5.6 List all cities where there is both a branch office and at least one property for rent. Intersection based on city Pcity(Branch) Pcity(PropertyForRent) attribute from the Branch and As in the previous example, we produce union-compatible relations by projecting the PropertyForRent Branch and PropertyForRent relations over the attribute city. We then use the Intersection relations. operation to combine these new relations to produce the result shown in Figure 5.6. Note that we can express the Intersection operation in terms of the Set difference operation: R S 5 R 2 (R 2 S) Cartesian product The Cartesian product operation defines a relation that is the concatena- R3S tion of every tuple of relation R with every tuple of relation S. The Cartesian product operation multiplies two relations to define another relation consisting of all possible pairs of tuples from the two relations. Therefore, if one relation has I tuples and N attributes and the other has J tuples and M attributes, the Cartesian product relation will contain (I * J) tuples with (N 1 M) attributes. It is possible that the two relations may have attributes with the same name. In this case, the attribute names are prefixed with the relation name to maintain the uniqueness of attribute names within a relation. Example 5.6 Cartesian product operation List the names and comments of all clients who have viewed a property for rent. The names of clients are held in the Client relation and the details of viewings are held in the Viewing relation. To obtain the list of clients and the comments on properties they have viewed, we need to combine these two relations: PclientNo, fName, IName (Client)) 3 (PclientNo, propertyNo, comment(Viewing)) The result of this operation is shown in Figure 5.7. In its present form, this relation contains more information than we require. For example, the first tuple of this rela- tion contains different clientNo values. To obtain the required list, we need to carry out a Selection operation on this relation to extract those tuples where Client.clientNo = Viewing.clientNo. The complete operation is thus: sClient.clientNo = Viewing. clientNo((PclientNo, fName, IName(Client)) 3 (PclientNo, propertyNo, comment(Viewing)) M05_CONN3067_06_SE_C05.indd 172 06/06/14 5:01 PM 5.1 The Relational Algebra | 173 The result of this operation is shown in Figure 5.8. Figure 5.7 Cartesian product of reduced Client and Viewing relations. Figure 5.8 Restricted Cartesian product of reduced Client and Viewing relations. Decomposing complex operations The relational algebra operations can be of arbitrary complexity. We can decom- pose such operations into a series of smaller relational algebra operations and give a name to the results of intermediate expressions. We use the assignment operation, denoted by d, to name the results of a relational algebra operation. This works in a similar manner to the assignment operation in a programming language: in this case, the right-hand side of the operation is assigned to the left-hand side. For instance, in the previous example we could rewrite the operation as follows: TempViewing(clientNo, propertyNo, comment) d PclientNo, propertyNo, comment(Viewing) TempClient(clientNo, fName, lName) d PclientNo, fName, lName(Client) Comment(clientNo, fName, lName, vclientNo, propertyNo, comment) d TempClient 3 TempViewing Result d sclientNo 5 vclientNo(Comment) M05_CONN3067_06_SE_C05.indd 173 06/06/14 5:01 PM 174 | Chapter 5 Relational Algebra and Relational Calculus Alternatively, we can use the Rename operation r (rho), which gives a name to the result of a relational algebra operation. Rename allows an optional name for each of the attributes of the new relation to be specified. rS(E) or The Rename operation provides a new name S for the expression rS(a1, a2,... , an)(E) E, and optionally names the attributes as a1, a2,... , an. 5.1.3 Join Operations Typically, we want only combinations of the Cartesian product that satisfy certain conditions and so we would normally use a Join operation instead of the Cartesian product operation. The Join operation, which combines two relations to form a new relation, is one of the essential operations in the relational algebra. Join is a derivative of Cartesian product, equivalent to performing a Selection operation, using the join predicate as the selection formula, over the Cartesian product of the two operand relations. Join is one of the most difficult operations to implement efficiently in an RDBMS and is one of the reasons why relational systems have intrinsic performance problems. We examine strategies for implementing the Join operation in Section 23.4.3. There are various forms of the Join operation, each with subtle differences, some more useful than others: Theta join Equijoin (a particular type of Theta join) Natural join Outer join Semijoin Theta join (-join) The Theta join operation defines a relation that contains tuples satisfy- ing the predicate F from the Cartesian product of R and S. The predi- R 1F S cate F is of the form R.ai u S.bi, where u may be one of the comparison operators (,, #,., $, 5, ). We can rewrite the Theta join in terms of the basic Selection and Cartesian product operations: R 1 F S 5 sF(R 3 S) As with a Cartesian product, the degree of a Theta join is the sum of the degrees of the operand relations R and S. In the case where the predicate F contains only equality (5), the term Equijoin is used instead. Consider again the query of Example 5.6. M05_CONN3067_06_SE_C05.indd 174 06/06/14 5:01 PM 5.1 The Relational Algebra | 175 Example 5.7 Equijoin operation List the names and comments of all clients who have viewed a property for rent. In Example 5.6 we used the Cartesian product and Selection operations to obtain this list. However, the same result is obtained using the Equijoin operation: (PclientNo, fName, lName(Client)) 1 Client.clientNo 5 Viewing.clientNo (PclientNo, propertyNo, comment(Viewing)) or Result ¬ TempClient 1 TempClient.clientNo 5 TempViewing.clientNo TempViewing The result of these operations was shown in Figure 5.8. Natural join The Natural join is an Equijoin of the two relations R and S over all R1S common attributes x. One occurrence of each common attribute is eliminated from the result. The Natural join operation performs an Equijoin over all the attributes in the two relations that have the same name. The degree of a Natural join is the sum of the degrees of the relations R and S less the number of attributes in x. Example 5.8 Natural join operation List the names and comments of all clients who have viewed a property for rent. In Example 5.7 we used the Equijoin to produce this list, but the resulting relation had two occurrences of the join attribute clientNo. We can use the Natural join to remove one occurrence of the clientNo attribute: (PclientNo, fName, lName(Client)) 1 (PclientNo, propertyNo, comment(Viewing)) or Result ¬ TempClient 1 TempViewing The result of this operation is shown in Figure 5.9. Figure 5.9 Natural join of restricted Client and Viewing relations. M05_CONN3067_06_SE_C05.indd 175 06/06/14 5:01 PM 176 | Chapter 5 Relational Algebra and Relational Calculus Outer join Often in joining two relations, a tuple in one relation does not have a matching tuple in the other relation; in other words, there is no matching value in the join attributes. We may want tuples from one of the relations to appear in the result even when there are no matching values in the other relation. This may be accomplished using the Outer join. The (left) Outer join is a join in which tuples from R that do not have R5 S matching values in the common attributes of S are also included in the result relation. Missing values in the second relation are set to null. The Outer join is becoming more widely available in relational systems and is a specified operator in the SQL standard (see Section 6.3.7). The advantage of an Outer join is that information is preserved; that is, the Outer join preserves tuples that would have been lost by other types of join. Example 5.9 Left Outer join operation Produce a status report on property viewings. In this example, we want to produce a relation consisting of the properties that have been viewed with comments and those that have not been viewed. This can be achieved using the following Outer join: (PpropertyNo, street, city(PropertyForRent)) 5 Viewing The resulting relation is shown in Figure 5.10. Note that properties PL94, PG21, and PG16 have no viewings, but these tuples are still contained in the result with nulls for the attributes from the Viewing relation. Figure 5.10 Left (natural) Outer join of PropertyForRent and Viewing relations. Strictly speaking, Example 5.9 is a Left (natural) Outer join, as it keeps every tuple in the left-hand relation in the result. Similarly, there is a Right Outer join that keeps every tuple in the right-hand relation in the result. There is also a Full Outer join that keeps all tuples in both relations, padding tuples with nulls when no matching tuples are found. M05_CONN3067_06_SE_C05.indd 176 06/06/14 5:01 PM 5.1 The Relational Algebra | 177 Semijoin The Semijoin operation defines a relation that contains the tuples of R R 2F S that participate in the join of R with S satisfying the predicate F. The Semijoin operation performs a join of the two relations and then projects over the attributes of the first operand. One advantage of a Semijoin is that it decreases the number of tuples that need to be handled to form the join. It is particularly useful for computing joins in distributed systems (see Sections 24.4.2 and 25.6.2). We can rewrite the Semijoin using the Projection and Join operations: R 2 F S 5 PA(R 1 F S) A is the set of all attributes for R This is actually a Semi-Theta join. There are variants for the Semi-Equijoin and the Semi-Natural join. Example 5.10 Semijoin operation List complete details of all staff who work at the branch in Glasgow. If we are interested in seeing only the attributes of the Staff relation, we can use the fol- lowing Semijoin operation, producing the relation shown in Figure 5.11. Staff 2 (scity 5 ‘Glasgow’ (Branch)) Staff branchNo 5 Branch branchNo Figure 5.11 Semijoin of Staff and Branch relations. 5.1.4 Division Operation The Division operation is useful for a particular type of query that occurs quite frequently in database applications. Assume that relation R is defined over the attribute set A and relation S is defined over the attribute set B such that B A (B is a subset of A). Let C 5 A 2 B, that is, C is the set of attributes of R that are not attributes of S. We have the following definition of the Division operation. The Division operation defines a relation over the attributes C that R4S consists of the set of tuples from R that match the combination of every tuple in S. We can express the Division operation in terms of the basic operations: T1 ¬ PC(R) T2 ¬ PC((T1 3 S) 2 R) T ¬ T1 2 T2 M05_CONN3067_06_SE_C05.indd 177 06/06/14 5:01 PM 178 | Chapter 5 Relational Algebra and Relational Calculus Example 5.11 Division operation Identify all clients who have viewed all properties with three rooms. We can use the Selection operation to find all properties with three rooms followed by the Projection operation to produce a relation containing only these property numbers. We can then use the following Division operation to obtain the new relation shown in Figure 5.12. (PclientNo, propertyNo(Viewing)) 4 (PpropertyNo(srooms 5 3(PropertyForRent))) Figure 5.12 Result of the Division operation on the Viewing and PropertyForRent relations. 5.1.5 Aggregation and Grouping Operations As well as simply retrieving certain tuples and attributes of one or more relations, we often want to perform some form of summation or aggregation of data, similar to the totals at the bottom of a report, or some form of grouping of data, similar to subtotals in a report. These operations cannot be performed using the basic rela- tional algebra operations considered earlier. However, additional operations have been proposed, as we now discuss. Aggregate operations Applies the aggregate function list, AL, to the relation R to define a ÁAL(R) relation over the aggregate list. AL contains one or more (, ) pairs. The main aggregate functions are: COUNT – returns the number of values in the associated attribute. SUM – returns the sum of the values in the associated attribute. AVG – returns the average of the values in the associated attribute. MIN – returns the smallest value in the associated attribute. MAX – returns the largest value in the associated attribute. Example 5.12 Aggregate operations (a) How many properties cost more than £350 per month to rent? We can use the aggregate function COUNT to produce the relation R shown in Figure 5.13(a): M05_CONN3067_06_SE_C05.indd 178 06/06/14 5:01 PM 5.1 The Relational Algebra | 179 Figure 5.13 Result of the Aggregate operations: (a) finding the number of properties whose rent is greater than £350; (b) finding the minimum, maximum, and average staff salary. rR(myCount) Á COUNT propertyNo (srent. 350 (PropertyForRent)) (b) Find the minimum, maximum, and average staff salary. We can use the aggregate functions—MIN, MAX, and AVERAGE—to produce the rela- tion R shown in Figure 5.13(b) as follows: rR(myMin, myMax, myAverage) Á MIN salary, MAX salary, AVERAGE salary (Staff) Grouping operation Groups the tuples of relation R by the grouping attributes, GA, and then applies the aggregate function list AL to define a new relation. AL Á (R) GA AL contains one or more (, ) pairs. The resulting relation contains the grouping attributes, GA, along with the results of each of the aggregate functions. The general form of the grouping operation is as follows: a1, a2,... , an Á ,Apap., ,Aqaq.,... , ,Azaz. (R) where R is any relation, a1, a2,... , an are attributes of R on which to group, ap, aq,... , az are other attributes of R, and Ap, Aq,... , A2 are aggregate functions. The tuples of R are partitioned into groups such that: all tuples in a group have the same value for a1, a2,... , an; tuples in different groups have different values for a1, a2,... , an. We illustrate the use of the grouping operation with the following example. Example 5.13 Grouping operation Find the number of staff working in each branch and the sum of their salaries. We first need to group tuples according to the branch number, branchNo, and then use the aggregate functions COUNT and SUM to produce the required relation. The rela- tional algebra expression is as follows: rR(branchNo, myCount, mySum) branchNo Á COUNT staffNo, SUM salary (Staff) The resulting relation is shown in Figure 5.14. Figure 5.14 Result of the grouping operation to find the number of staff working in each branch and the sum of their salaries. M05_CONN3067_06_SE_C05.indd 179 06/06/14 5:01 PM 180 | Chapter 5 Relational Algebra and Relational Calculus 5.1.6 Summary of the Relational Algebra Operations The relational algebra operations are summarized in Table 5.1. Table 5.1 Operations in the relational algebra. OPERATION NOTATION FUNCTION Selection spredicate(R) Produces a relation that contains only those tuples of R that satisfy the specified predicate. Projection Pa1,... , an(R) Produces a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates. Union RS Produces a relation that contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated. R and S must be union-compatible. Set difference R−S Produces a relation that contains all the tuples in R that are not in S. R and S must be union-compatible. Intersection RS Produces a relation that contains all the tuples in both R and S. R and S must be union-compatible. Cartesian R3S Produces a relation that is the concatenation of every tuple of product relation R with every tuple of relation S. Theta join R 1F S Produces a relation that contains tuples satisfying the predicate F from the Cartesian product of R and S. Equijoin R 1F S Produces a relation that contains tuples satisfying the predicate F (which contains only equality comparisons) from the Cartesian product of R and S. Natural join R1S An Equijoin of the two relations R and S over all common attributes x. One occurrence of each common attribute is eliminated. (Left) Outer R5S A join in which tuples from R that do not have matching values join in the common attributes of S are also included in the result relation. Semijoin R 2F S Produces a relation that contains the tuples of R that participate in the join of R with S satisfying the predicate F. Division R4S Produces a relation that consists of the set of tuples from R defined over the attributes C that match the combination of every tuple in S, where C is the set of attributes that are in R but not in S. Aggregate ÁAL(R) Applies the aggregate function list, AL, to the relation R to define a relation over the aggregate list. AL contains one or more (, ) pairs. Grouping Á (R) GA AL Groups the tuples of relation R by the grouping attributes, GA, and then applies the aggregate function list AL to define a new relation. AL contains one or more (, ) pairs. The resulting relation contains the grouping attributes, GA, along with the results of each of the aggregate functions. M05_CONN3067_06_SE_C05.indd 180 06/06/14 5:01 PM 5.2 The Relational Calculus | 181 5.2 The Relational Calculus A certain order is always explicitly specified in a relational algebra expression and a strategy for evaluating the query is implied. In the relational calculus, there is no description of how to evaluate a query; a relational calculus query specifies what is to be retrieved rather than how to retrieve it. The relational calculus is not related to differential and integral calculus in mathematics, but takes its name from a branch of symbolic logic called predicate calculus. When applied to databases, it is found in two forms: tuple relational cal- culus, as originally proposed by Codd (1972a), and domain relational calculus, as proposed by Lacroix and Pirotte (1977). In first-order logic or predicate calculus, a predicate is a truth-valued function with arguments. When we substitute values for the arguments, the function yields an expression, called a proposition, which can be either true or false. For example, the sentences, “John White is a member of staff” and “John White earns more than Ann Beech” are both propositions, because we can determine whether they are true or false. In the first case, we have a function, “is a member of staff,” with one argu- ment (John White); in the second case, we have a function, “earns more than,” with two arguments (John White and Ann Beech). If a predicate contains a variable, as in “x is a member of staff,” there must be an associated range for x. When we substitute some values of this range for x, the proposition may be true; for other values, it may be false. For example, if the range is the set of all people and we replace x by John White, the proposition “John White is a member of staff” is true. If we replace x by the name of a person who is not a member of staff, the proposition is false. If P is a predicate, then we can write the set of all x such that P is true for x, as: {x | P(x)} We may connect predicates by the logical connectives Ù (AND), Ú (OR), and ~ (NOT) to form compound predicates. 5.2.1 Tuple Relational Calculus In the tuple relational calculus, we are interested in finding tuples for which a pred- icate is true. The calculus is based on the use of tuple variables. A tuple variable is a variable that “ranges over” a named relation: that is, a variable whose only permit- ted values are tuples of the relation. (The word “range” here does not correspond to the mathematical use of range, but corresponds to a mathematical domain.) For example, to specify the range of a tuple variable S as the Staff relation, we write: Staff(S) To express the query “Find the set of all tuples S such that F(S) is true,” we can write: {S | F(S)} F is called a formula (well-formed formula, or wff in mathematical logic). For example, to express the query “Find the staffNo, fName, IName, position, M05_CONN3067_06_SE_C05.indd 181 06/06/14 5:01 PM 182 | Chapter 5 Relational Algebra and Relational Calculus sex, DOB, salary, and branchNo of all staff earning more than £10,000,” we can write: {S | Staff(S) Ù S.salary > 10000} S.salary means the value of the salary attribute for the tuple variable S. To retrieve a particular attribute, such as salary, we would write: {S.salary | Staff(S) Ù S.salary > 10000} The existential and universal quantifiers There are two quantifiers we can use with formulae to tell how many instances the predicate applies to. The existential quantifier $ (“there exists”) is used in formu- lae that must be true for at least one instance, such as: Staff(S) Ù ($B) (Branch(B) Ù (B.branchNo 5 S.branchNo) Ù B.city 5 ‘London’) This means, “There exists a Branch tuple that has the same branchNo as the branchNo of the current Staff tuple, S, and is located in London.” The universal quantifier " (“for all”) is used in statements about every instance, such as: ("B) (B.city ‘Paris’) This means, “For all Branch tuples, the address is not in Paris.” We can apply a gen- eralization of De Morgan’s laws to the existential and universal quantifiers. For example: ($X)(F(X)) º ~ ("X)(~(F(X))) ("X)(F(X)) º ~($X)(~(F(X))) ($X)(F1(X) Ù F2(X)) º ~("X)(~(F1(X)) Ú ~(F2(X))) ("X)(F1(X) Ù F2(X)) º ~($X)(~(F1(X)) Ú ~(F2(X))) Using these equivalence rules, we can rewrite the previous formula as: ~($B) (B.city 5 ‘Paris’) which means, “There are no branches with an address in Paris.” Tuple variables that are qualified by "or $ are called bound variables; the other tuple variables are called free variables. The only free variables in a relational calculus expression should be those on the left side of the bar (|). For example, in the following query: {S.fName, S.lName | Staff(S) Ù ($B) (Branch(B) Ù (B.branchNo 5 S.branchNo) Ù B.city 5 ‘London’)} S is the only free variable and S is then bound successively to each tuple of Staff. Expressions and formulae As with the English alphabet, in which some sequences of characters do not form a correctly structured sentence, in calculus not every sequence of formulae is accept- able. The formulae should be those sequences that are unambiguous and make sense. An expression in the tuple relational calculus has the following general form: M05_CONN3067_06_SE_C05.indd 182 06/06/14 5:01 PM 5.2 The Relational Calculus | 183 {S1.a1, S2.a2,... , Sn.an | F(S1, S2,... , Sm)} m$n where S1, S2,... , Sn... , Sm are tuple variables; each ai is an attribute of the relation over which Si ranges; and F is a formula. A (well-formed) formula is made out of one or more atoms, where an atom has one of the following forms: R(Si), where Si is a tuple variable and R is a relation. S1.a1 u Sj.a2, where Si and Sj are tuple variables, a1, is an attribute of the relation over which Si ranges, a2 is an attribute of the relation over which Sj ranges, and is one of the comparison operators (,, #,., $, 5, ); the attributes a1 and a2 must have domains whose members can be compared by u. Si.a1 u c, where Si is a tuple variable, a1 is an attribute of the relation over which Si ranges, c is a constant from the domain of attribute a1, and u is one of the com- parison operators. We recursively build up formulae from atoms using the following rules: An atom is a formula. lf F1 and F2 are formulae, so are their conjunction F1 Ù F2, their disjunction F1 Ú F2, and the negation ~F1. If F is a formula with free variable X, then ($X)(F ) and ("X)(F ) are also formulae. Example 5.14 Tuple relational calculus (a) List the names of all managers who earn more than £25,000. {S.fName, S.lName | Staff(S) Ù S.position 5 ‘Manager’ Ù S.salary. 25000} (b) List the staff who manage properties for rent in Glasgow. {S | Staff(S) Ù ($P) (PropertyForRent(P) Ù (P.staffNo 5 S.staffNo) Ù P.city 5 ‘Glasgow’)} The staffNo attribute in the PropertyForRent relation holds the staff number of the member of staff who manages the property. We could reformulate the query as: “For each member of staff whose details we want to list, there exists a tuple in the relation PropertyForRent for that member of staff with the value of the attribute city in that tuple being Glasgow.” Note that in this formulation of the query, there is no indication of a strategy for executing the query—the DBMS is free to decide the operations required to fulfil the request and the execution order of these operations. On the other hand, the equivalent relational algebra formulation would be: “Select tuples from PropertyForRent such that the city is Glasgow and perform their join with the Staff relation,” which has an implied order of execution. (c) List the names of staff who currently do not manage any properties. {S.fName, S.lName | Staff(S) Ù (~($P) (PropertyForRent(P) Ù (S.staffNo 5 P.staffNo)))} Using the general transformation rules for quantifiers given previously, we can rewrite this as: {S.fName, S.lName | Staff(S) Ù (("P) (~PropertyForRent(P) Ú ~(S.staffNo 5 P.staffNo)))} (d) List the names of clients who have viewed a property for rent in Glasgow. {C.fName, C.lName | Client(C) Ù (($V) ($P) (Viewing(V) Ù PropertyForRent(P) Ù (C.clientNo 5 V.clientNo) Ù (V.propertyNo 5 P.propertyNo) Ù P.city 5 ‘Glasgow’))} M05_CONN3067_06_SE_C05.indd 183 06/06/14 5:01 PM 184 | Chapter 5 Relational Algebra and Relational Calculus To answer this query, note that we can rephrase “clients who have viewed a property in Glasgow” as “clients for whom there exists some viewing of some property in Glasgow.” (e) List all cities where there is either a branch office or a property for rent. {T.city | ($B) (Branch(B) Ù (B.city 5 T.city)) Ú ($P) (PropertyForRent(P) Ù (P.city 5 T.city))} Compare this with the equivalent relational algebra expression given in Example 5.3. (f) List all the cities where there is a branch office but no properties for rent. {B.city | Branch(B) Ù (~($P) (PropertyForRent(P) Ù (B.city 5 P.city)))} Compare this with the equivalent relational algebra expression given in Example 5.4. (g) List all the cities where there is both a branch office and at least one property for rent. {B.city | Branch(B) Ù (($P) (PropertyForRent(P) Ù (B.city 5 P.city)))} Compare this with the equivalent relational algebra expression given in Example 5.5. Safety of expressions Before we complete this section, we should mention that it is possible for a calculus expression to generate an infinite set. For example: {S | ~ Staff(S)} would mean the set of all tuples that are not in the Staff relation. Such an expression is said to be unsafe. To avoid this, we have to add a restriction that all values that appear in the result must be values in the domain of the expression E, denoted dom(E). In other words, the domain of E is the set of all values that appear explicitly in E or that appear in one or more relations whose names appear in E. In this example, the domain of the expression is the set of all values appearing in the Staff relation. An expression is safe if all values that appear in the result are values from the domain of the expression. The previous expression is not safe, as it will typically include tuples from outside the Staff relation (and so outside the domain of the expression). All other examples of tuple relational calculus expressions in this sec- tion are safe. Some authors have avoided this problem by using range variables that are defined by a separate RANGE statement. The interested reader is referred to Date (2000). 5.2.2 Domain Relational Calculus In the tuple relational calculus, we use variables that range over tuples in a relation. In the domain relational calculus, we also use variables, but in this case the vari- ables take their values from domains of attributes rather than tuples of relations. An expression in the domain relational calculus has the following general form: {d1, d2,... , dn | F(d1, d2, … , dm)} m $ n where d1, d2,... , dn,... , dm represent domain variables and F(d1, d2,... , dm) represents a formula composed of atoms, where each atom has one of the following forms: R(d1, d2,... , dn), where R is a relation of degree n and each di is a domain variable. M05_CONN3067_06_SE_C05.indd 184 06/06/14 5:01 PM 5.2 The Relational Calculus | 185 di u dj, where di and dj are domain variables and u is one of the comparison opera- tors (,, #,., $, 5, ); the domains di and dj must have members that can be compared by u. di u c, where di is a domain variable, c is a constant from the domain of di, and u is one of the comparison operators. We recursively build up formulae from atoms using the following rules: An atom is a formula. If F1 and F2 are formulae, so are their conjunction F1 Ù F2, their disjunction F1 Ú F2, and the negation ~F1. If F is a formula with domain variable X, then ($X)(F) and ("X)(F) are also formulae. Example 5.15 Domain relational calculus In the following examples, we use the following shorthand notation: ($d1, d2,... , dn) in place of ($d1), ($d2),... , ($dn) (a) Find the names of all managers who earn more than £25,000. {fN, lN | ($sN, posn, sex, DOB, sal, bN) (Staff(sN, fN, lN, posn, sex, DOB, sal, bN) Ù posn 5 ‘Manager’ Ù sal. 25000)} If we compare this query with the equivalent tuple relational calculus query in Example 5.14(a), we see that each attribute is given a (variable) name. The condition Staff(sN, fN,... , bN) ensures that the domain variables are restricted to be attributes of the same tuple. Thus, we can use the formula posn = ‘Manager’, rather than Staff.position 5 ‘Manager’. Also note the difference in the use of the existential quantifier. In the tuple relational calculus, when we write $posn for some tuple variable posn, we bind the vari- able to the relation Staff by writing Staff(posn). On the other hand, in the domain rela- tional calculus posn refers to a domain value and remains unconstrained until it appears in the subformula Staff(sN, fN, IN, posn, sex, DOB, sal, bN), when it becomes constrained to the position values that appear in the Staff relation. For conciseness, in the remaining examples in this section we quantify only those domain variables that actually appear in a condition (in this example, posn and sal). (b) List the staff who manage properties for rent in Glasgow. {sN, fN, lN, posn, sex, DOB, sal, bN | ($sN1, cty) (Staff(sN, fN, lN, posn, sex, DOB, sal, bN) Ù PropertyForRent(pN, st, cty, pc, typ, rms, rnt, oN, sN1, bN1) Ù (sN 5 sN1) Ù cty 5 ‘Glasgow’)} This query can also be written as: {sN, fN, IN, posn, sex, DOB, sal, bN | (Staff(sN, fN, IN, posn, sex, DOB, sal, bN) Ù PropertyForRent(pN, st, ‘Glasgow’, pc, typ, rms, rnt, oN, sN, bN1))} In this version, the domain variable cty in PropertyForRent has been replaced with the constant “Glasgow” and the same domain variable sN, which represents the staff num- ber, has been repeated for Staff and PropertyForRent. (c) List the names of staff who currently do not manage any properties for rent. {fN, IN | ($sN) (Staff(sN, fN, IN, posn, sex, DOB, sal, bN) Ù (~($sN1) (PropertyForRent(pN, st, cty, pc, typ, rms, rnt, oN, sN1, bN1) Ù (sN 5 sN1))))} M05_CONN3067_06_SE_C05.indd 185 06/06/14 5:01 PM 186 | Chapter 5 Relational Algebra and Relational Calculus (d) List the names of clients who have viewed a property for rent in Glasgow. {fN, IN | ($cN, cN1, pN, pN1, cty) (Client(cN, fN, IN, tel, pT, mR, eM) Ù Viewing(cN1, pN1, dt, cmt) Ù PropertyForRent(pN, st, cty, pc, typ, rms, rnt, oN, sN, bN) Ù (cN 5 cN1) Ù (pN 5 pN1) Ù cty 5 ‘Glasgow’)} (e) List all cities where there is either a branch office or a property for rent. {cty | (Branch(bN, st, cty, pc) Ú PropertyForRent(pN, st1, cty, pc1, typ, rms, rnt, oN, sN, bN1))} (f ) List all the cities where there is a branch office but no properties for rent. {cty | (Branch(bN, st, cty, pc) Ù (~($cty1)(PropertyForRent(pN, st1, cty1, pc1, typ, rms, rnt, oN, sN, bN1) Ù (cty 5 cty1))))} (g) List all the cities where there is both a branch office and at least one property for rent. {cty | (Branch(bN, st, cty, pc) Ù ($cty1) (PropertyForRent(pN, st1, cty1, pc1, typ, rms, rnt, oN, sN, bN1) Ù (cty 5 ctyl)))} These queries are safe. When the domain relational calculus is restricted to safe expressions, it is equivalent to the tuple relational calculus restricted to safe expres- sions, which in turn is equivalent to the relational algebra. This means that for every relational algebra expression there is an equivalent expression in the relational calculus, and for every tuple or domain relational calculus expression there is an equivalent relational algebra expression. 5.3 Other Languages Although the relational calculus is hard to understand and use, it was recognized that its nonprocedural property is exceedingly desirable, and this resulted in a search for other easy-to-use nonprocedural techniques. This led to another two categories of relational languages: transform-oriented and graphical. Transform-oriented languages are a class of nonprocedural languages that use relations to transform input data into required outputs. These languages provide easy-to-use structures for expressing what is desired in terms of what is known. SQUARE (Boyce et al., 1975), SEQUEL (Chamberlin et al., 1976), and SEQUEL’s offspring, SQL, are all transform-oriented languages. We discuss SQL in Chapters 6–9. Graphical languages provide the user with a picture or illustration of the struc- ture of the relation. The user fills in an example of what is wanted and the system returns the required data in that format. QBE (Query-By-Example) is an example of a graphical language (Zloof, 1977). We demonstrate the capabilities of QBE in Appendix M. Another category is fourth-generation languages (4GLs), which allow a complete customized application to be created using a limited set of commands in a user- friendly, often menu-driven environment (see Section 2.2). Some systems accept a form of natural language, a restricted version of natural English, sometimes called a fifth-generation language (5GL), although this development is still at an early stage. M05_CONN3067_06_SE_C05.indd 186 06/06/14 5:01 PM