Week 5_The Relational Model and Relational Algebra PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document is about the relational model and relational algebra, from notes in a database concepts course. It explains different aspects of the model, including schemas, structures, and mathematical definitions. It's likely lecture notes or study material.
Full Transcript
The Relational Model and Relational Algebra Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Outline Structure of Relational Databases...
The Relational Model and Relational Algebra Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Outline Structure of Relational Databases Database Schema Keys Schema Diagrams Relational Query Languages The Relational Algebra Database System Concepts - 7th Edition 2.2 ©Silberschatz, Korth and Sudarshan The Relational Model Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Basic Structure Account Let D1 denote the set of all account numbers, D2 the set of all branch names, and D3 the set of all balances. Any row of account must consist of a tuple (v1, v2, v3), where v1 is an account number (that is, v1 is in domain D1), v2 is a branch name (that is, v2 is in domain D2), and v3 is a balance (that is, v3 is in domain D3). In general, account will contain only a subset of the set of all possible rows. Therefore, account is a subset of D1 × D2 × D3 In general, a table of n attributes must be a subset of D1 × D2 × · · · × Dn−1 × Dn 4 Database System Concepts - 7 Edition th 2.4 ©Silberschatz, Korth and Sudarshan Structure of Relational Databases A relational database consists of a collection of tables, each of which is assigned a unique name. A row in a table represents a relationship among a set of values. Since a table is a collection of such relationships, there is a close correspondence between the concept of table and the mathematical concept of relation, from which the relational data model takes its name. Mathematically, a relation is defined as a subset of a Cartesian product of a list of domains. Because tables are essentially relations, we shall use the mathematical terms relation and tuple in place of the terms table and row. 5 Database System Concepts - 7 Edition th 2.5 ©Silberschatz, Korth and Sudarshan Example of a Instructor Relation attributes (or columns) tuples (or rows) Database System Concepts - 7th Edition 2.6 ©Silberschatz, Korth and Sudarshan Relation Schema and Instance A1, A2, …, An are attributes R = (A1, A2, …, An ) is a relation schema Example: instructor = (ID, name, dept_name, salary) A relation instance r defined over schema R is denoted by r (R). The current values in a relation are specified by a table An element t of relation r is called a tuple and is represented by a row in a table Database System Concepts - 7th Edition 2.7 ©Silberschatz, Korth and Sudarshan Tuple Variable A tuple variable is a variable that stands for a tuple; in other words, a tuple variable is a variable whose domain is the set of all tuples. Let the tuple variable t refer to the first tuple of the relation. We use the notation t[ID] to denote the value of t on the ID attribute. Thus, t[ID] = 10101, and t[name] = “Srinivasan”. Alternatively, we may write t to denote the value of tuple t on the first attribute (ID), t to denote name, and so on. Since a relation is a set of tuples, we use the mathematical notation of t ∈ r to denote that tuple t is in relation r. Database System Concepts - 7th Edition 2.8 ©Silberschatz, Korth and Sudarshan Attributes The set of allowed values for each attribute is called the domain of the attribute Attribute values are (normally) required to be atomic; that is, indivisible The special value null is a member of every domain. Indicated that the value is “unknown” A null value signifies that the value is unknown or does not exist. The null value causes complications in the definition of many operations For all relations r, it is required that the domains of all attributes of r must be atomic. A domain is atomic if elements of the domain are considered to be indivisible units. It is possible for several attributes to have the same domain. Database System Concepts - 7th Edition 2.9 ©Silberschatz, Korth and Sudarshan Relations are Unordered Order of tuples is irrelevant (tuples may be stored in an arbitrary order) Example: An instance of instructor relation with unordered tuples Database System Concepts - 7th Edition 2.10 ©Silberschatz, Korth and Sudarshan Database Schema Database schema -- is the logical structure of the database. Database instance -- is a snapshot of the data in the database at a given instant in time. Example: schema: instructor (ID, name, dept_name, salary) Instance: Database System Concepts - 7th Edition 2.11 ©Silberschatz, Korth and Sudarshan Keys Let K R K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R) Example: {ID} and {ID,name} are both superkeys of instructor. Superkey K is a candidate key if K is minimal Example: {ID} is a candidate key for Instructor One of the candidate keys is selected to be the primary key. Which one? Foreign key constraint: Value in one relation must appear in another Referencing relation Referenced relation Example: dept_name in instructor is a foreign key from instructor referencing department Database System Concepts - 7th Edition 2.12 ©Silberschatz, Korth and Sudarshan Relational Schema Diagram for University Database Database System Concepts - 7th Edition 2.13 ©Silberschatz, Korth and Sudarshan Relational Query Languages Procedural versus non-procedural, or declarative “Pure” languages: Relational algebra Tuple relational calculus Domain relational calculus The above 3 pure languages are equivalent in computing power We will concentrate on relational algebra Not Turing-machine equivalent Consists of 6 basic operations Database System Concepts - 7th Edition 2.15 ©Silberschatz, Korth and Sudarshan Relational Algebra A procedural language consisting of a set of operations that take one or two relations as input and produce a new relation as their result. Six basic operators select: project: union: set difference: – Cartesian product: x rename: Unary operators: select, project, and rename operations Binary operators: union, set difference and Cartesian product operations. In addition to the fundamental operations, there are several other operations—namely, set intersection, natural join, division, and assignment. Database System Concepts - 7th Edition 2.16 ©Silberschatz, Korth and Sudarshan Select Operation The select operation selects tuples that satisfy a given predicate. Notation: the lowercase Greek letter sigma (σ). p (r) p is called the selection predicate and appears as a subscript to σ. Example: select those tuples of the instructor relation where the instructor is in the “Physics” department. Query dept_name=“Physics” (instructor) Result Database System Concepts - 7th Edition 2.17 ©Silberschatz, Korth and Sudarshan Select Operation (Cont.) We allow comparisons using =, , >, . 90,000 (instructor) The select predicate may include comparisons between two attributes. Example, find all departments whose name is the same as their building name: (department) dept_name=building Database System Concepts - 7th Edition 2.18 ©Silberschatz, Korth and Sudarshan Project Operation A unary operation that returns its argument relation, with certain attributes left out. Projection is denoted by the uppercase Greek letter pi (). We list those attributes that we wish to appear in the result as a subscript to . The argument relation follows in parentheses (). Notation: A1,A2,A3 ….Ak (r) where A1, A2, …, Ak are attribute names and r is a relation name. The result is defined as the relation of k columns obtained by erasing the columns that are not listed Duplicate rows are removed from result, since relations are sets Database System Concepts - 7th Edition 2.19 ©Silberschatz, Korth and Sudarshan Project Operation Example Example: eliminate the dept_name attribute of instructor Query: ID, name, salary (instructor) Result: Database System Concepts - 7th Edition 2.20 ©Silberschatz, Korth and Sudarshan Composition of Relational Operations The result of a relational-algebra operation is a relation and therefore of relational-algebra operations can be composed together into a relational-algebra expression. Consider the query -- Find the names of all instructors in the Physics department. name( dept_name =“Physics” (instructor)) Instead of giving the name of a relation as the argument of the projection operation, we give an expression that evaluates to a relation. Database System Concepts - 7th Edition 2.21 ©Silberschatz, Korth and Sudarshan Cartesian-Product Operation The Cartesian-product operation (denoted by X) allows us to combine information from any two relations. We write the Cartesian product of relations r1 and r2 as r1 × r2. Recall the definition of a relation……. Example: the Cartesian product of the relations instructor and teaches is written as: instructor X teaches We construct a tuple of the result out of each possible pair of tuples: one from the instructor relation and one from the teaches relation (see next slide) Since the instructor ID appears in both relations we distinguish between these attribute by attaching to the attribute the name of the relation from which the attribute originally came. instructor.ID teaches.ID Database System Concepts - 7th Edition 2.22 ©Silberschatz, Korth and Sudarshan A Simple Example Boys Table Girls Table Name State Name State Adamu Kogi Kemi Kwara Obi Imo Hadiza Kaduna Chioma Rivers Find the Cartesian product between Boys Table and Girls Table. Database System Concepts - 7th Edition 2.23 ©Silberschatz, Korth and Sudarshan Cartesian Product Boys.Name Boys.state Girls.name Girls.state Adamu Kogi Kemi kwara Adamu Kogi Hadiza kaduna Adamu Kogi Chioma rivers Obi Imo Kemi Kwara Obi Imo Hadiza Kaduna Obi Imo Chioma Rivers 24 Database System Concepts - 7 Edition th 2.24 ©Silberschatz, Korth and Sudarshan The instructor X teaches table Database System Concepts - 7th Edition 2.25 ©Silberschatz, Korth and Sudarshan Join Operation The Cartesian-Product instructor X teaches associates every tuple of instructor with every tuple of teaches. Most of the resulting rows have information about instructors who did NOT teach a particular course. To get only those tuples of “instructor X teaches “ that pertain to instructors and the courses that they taught, we write: instructor.id = teaches.id (instructor x teaches )) We get only those tuples of “instructor X teaches” that pertain to instructors and the courses that they taught. The result of this expression, shown in the next slide Database System Concepts - 7th Edition 2.26 ©Silberschatz, Korth and Sudarshan Join Operation (Cont.) The table corresponding to: instructor.id = teaches.id (instructor x teaches)) Database System Concepts - 7th Edition 2.27 ©Silberschatz, Korth and Sudarshan Join Operation (Cont.) The join operation allows us to combine a select operation and a Cartesian-Product operation into a single operation. Consider relations r (R) and s (S) Let “theta” be a predicate on attributes in the schema R “union” S. The join operation r s is defined as follows: Thus instructor.id = teaches.id (instructor x teaches )) Can equivalently be written as instructor Instructor.id = teaches.id teaches. Database System Concepts - 7th Edition 2.28 ©Silberschatz, Korth and Sudarshan Union Operation The union operation allows us to combine two relations Notation: r s For r s to be valid. 1. r, s must have the same arity (same number of attributes) 2. The attribute domains must be compatible (i.e. the domains of the ith attribute of r and the ith attribute of s must be the same, for all i. For example: 2nd column of r deals with the same type of values as does the 2nd column of s) Example: to find all courses taught in the Fall 2017 semester, or in the Spring 2018 semester, or in both course_id ( semester=“Fall” Λ year=2017 (section)) course_id ( semester=“Spring” Λ year=2018 (section)) Database System Concepts - 7th Edition 2.29 ©Silberschatz, Korth and Sudarshan Union Operation (Cont.) Result of: course_id ( semester=“Fall” Λ year=2017 (section)) course_id ( semester=“Spring” Λ year=2018 (section)) Database System Concepts - 7th Edition 2.30 ©Silberschatz, Korth and Sudarshan Set-Intersection Operation The set-intersection operation allows us to find tuples that are in both input relations. Notation: r s Assume: r, s have the same arity attributes of r and s are compatible Example: Find the set of all courses taught in both the Fall 2017 and the Spring 2018 semesters. course_id ( semester=“Fall” Λ year=2017 (section)) course_id ( semester=“Spring” Λ year=2018 (section)) Result Database System Concepts - 7th Edition 2.31 ©Silberschatz, Korth and Sudarshan Set Difference Operation The set-difference operation allows us to find tuples that are in one relation but are not in another. Notation r – s Set differences must be taken between compatible relations. r and s must have the same arity attribute domains of r and s must be compatible Example: to find all courses taught in the Fall 2017 semester, but not in the Spring 2018 semester course_id ( semester=“Fall” Λ year=2017 (section)) − course_id ( semester=“Spring” Λ year=2018 (section)) Database System Concepts - 7th Edition 2.32 ©Silberschatz, Korth and Sudarshan The Assignment Operation It is convenient at times to write a relational-algebra expression by assigning parts of it to temporary relation variables. The assignment operation is denoted by and works like assignment in a programming language. Example: Find all instructor in the “Physics” and Music department. Physics dept_name=“Physics” (instructor) Music dept_name=“Music” (instructor) Physics Music With the assignment operation, a query can be written as a sequential program consisting of a series of assignments followed by an expression whose value is displayed as the result of the query. Database System Concepts - 7th Edition 2.33 ©Silberschatz, Korth and Sudarshan The Rename Operation The results of relational-algebra expressions do not have a name that we can use to refer to them. The rename operator, , is provided for that purpose Given a relational-algebra expression E, the expression : x (E) returns the result of expression E under the name x Another form of the rename operation: x(A1,A2,.. An) (E) renames the attributes of the new relation, A1, A2,…,An. Database System Concepts - 7th Edition 2.34 ©Silberschatz, Korth and Sudarshan Formal Definition of the Relational Algebra A basic expression in the relational algebra consists of either one of the following: A relation in the database A constant relation A constant relation is written by listing its tuples within { }, for example { (20101, Bob, Law, 5000) (22215, Mary, Medicine. 7000) }. A general expression in relational algebra is constructed out of smaller sub- expressions. Let E1 and E2 be relational-algebra expressions. Then, these are all relational algebra expressions: E1 ∪ E2 E1 − E2 E1 × E2 σP (E1), where P is a predicate on attributes in E1 ∏S(E1), where S is a list consisting of some of the attributes in E1 ρx (E1), where x is the new name for the result of E1 Database System Concepts - 7th Edition 2.35 ©Silberschatz, Korth and Sudarshan Equivalent Queries There is more than one way to write a query in relational algebra. Example: Find information about courses taught by instructors in the Physics department with salary greater than 90,000 Query 1 dept_name=“Physics” salary > 90,000 (instructor) Query 2 dept_name=“Physics” ( salary > 90.000 (instructor)) The two queries are not identical; they are, however, equivalent -- they give the same result on any database. Database System Concepts - 7th Edition 2.36 ©Silberschatz, Korth and Sudarshan Equivalent Queries There is more than one way to write a query in relational algebra. Example: Find information about courses taught by instructors in the Physics department Query 1 dept_name=“Physics” (instructor instructor.ID = teaches.ID teaches) Query 2 (dept_name=“Physics” (instructor)) instructor.ID = teaches.ID teaches The two queries are not identical; they are, however, equivalent -- they give the same result on any database. Database System Concepts - 7th Edition 2.37 ©Silberschatz, Korth and Sudarshan Aggregate Functions Aggregate functions take a collection of values and return a single value as a result. For example, the aggregate function sum takes a collection of values and returns the sum of the values. Thus, the function sum applied on the collection {1, 1, 3, 4, 4, 11}? The symbol G (Calligraphic G) is the letter G in calligraphic font. 38 Database System Concepts - 7 Edition th 2.38 ©Silberschatz, Korth and Sudarshan Aggregate Functions G sum(salary) (Employee) G avg(salary) (Employee) G min(salary) (Employee) G max(salary) (Employee) G count(branch-name) (Employee) G count(employee-name) (Employee) 39 Database System Concepts - 7 Edition th 2.39 ©Silberschatz, Korth and Sudarshan Null Values A null value indicates a value non-existent or unknown and any arithmetic operations involving null values will return a null result, likewise all comparisons involving null values will return an unknown result. Null values are special values, hence operations and comparisons with null values should be avoided if possible. In Boolean operations (and, or, not), the particular application specifies what to do with the resultant unknown value, but in relational operations they are dealt with differently, depending on the operation. We shall defer the relational operations treatments for now. 40 Database System Concepts - 7 Edition th 2.40 ©Silberschatz, Korth and Sudarshan Select and Project Operation Examples: Student Table 1. Course = “SEN314” (Student) S_ID Name Course Score 2. Score > 50 (Student) 17056 Kenny SEN314 90 3. Course = “SEN314” ^ Score > 50 (Student) 18805 Mary MAT325 45 4. S_ID, Course, Score (Student) 45813 Bala CPT313 70 5. Name,Course (Student) 62586 Henry CPT317 23 6. Name ( course = “SEN314” (Student)) 75184 Richard SEN314 66 98235 Ibrahim IFT314 30 17056 Kenny IFT314 40 Database System Concepts - 7th Edition 2.41 ©Silberschatz, Korth and Sudarshan The Union Operation IFT Table SEN Table S_ID Name Course Score S_ID Name Course Score 17056 Kenny IFT214 90 17056 Kenny SEN314 90 18805 Mary MAT225 45 18805 Mary MAT325 45 98813 Zubairu CPT313 70 45813 Bala CPT313 70 62586 Henry CPT217 23 62586 Henry CPT317 23 75184 Richard SEN414 66 75184 Richard SEN314 66 98235 Ibrahim IFT314 30 98235 Ibrahim IFT314 30 17906 Chidi IFT314 40 17056 Kenny IFT314 40 Name ( IFT) ∪ Name (SEN)? 42 Database System Concepts - 7 Edition th 2.42 ©Silberschatz, Korth and Sudarshan Further Reading Chapters 2 and 6 Database System Concepts - 7th Edition 2.43 ©Silberschatz, Korth and Sudarshan