Database Management System (DBMS) AY-2024-25 PDF
Document Details
Uploaded by AmicableCello251
MIT World Peace University
2024
Tags
Summary
This document contains a set of notes for a Database Management System (DBMS) course from MIT World Peace University for the academic year 2024-2025 detailing various aspects of Relational Database Design and Normalization.
Full Transcript
AID2PM02A Database Management System SY AIDS SEM-3 AY-2024-25 School of Computer Science & Engineering Unit 2: Relational Database Design and Normalisation Unit 2: Relational Database Design and Normalisation Relational Model: Attributes, Tuple, Domain, CODD’s rule Relationa...
AID2PM02A Database Management System SY AIDS SEM-3 AY-2024-25 School of Computer Science & Engineering Unit 2: Relational Database Design and Normalisation Unit 2: Relational Database Design and Normalisation Relational Model: Attributes, Tuple, Domain, CODD’s rule Relational Integrity, Referential Integrities, Enterprise Constraints, Normalization: 1NF, 2NF, 3NF, BCNF, Functional dependency, Decomposition Query Processing: Overview, Measures of Query cost, Selection and Join operations, Evaluation of Expressions. 2 Relational Database Design: Relational Model Relational model is simple model in which database is represented as a collection of “relations” where each relation is represented by two-dimensional table. The relational model was founded by E. F. Codd of the IBM in 1972. The basic concept in the relational model is that of a relation. Properties: – It is column homogeneous. In other words, in any given column of a table, all items are of the same kind. – Each item is a simple number or a character string. That is a table must be in first normal form. – All rows of a table are distinct. – The ordering of rows with in a table is immaterial. – The column of a table are assigned distinct names and the ordering of these columns is immaterial 3 Relational Database Design: Relational Model Tuple: – Each row in a table represents a record and is called a tuple.A table containing ‘n’ attributes in a record is called is called n-tuple. Attributes: – The name of each column in a table is used to interpret its meaning and is called an attribute. Each table is called a relation. In the above table, account_number, branch name, balance are the attributes. Domain: – A domain is a set of values that can be given to an attributes. So every attribute in a table has a specific domain. Values to these attributes can not be assigned outside their domains. 4 Relational Database Design: Relational Model Relation: A relation consist of – Relational schema – Relation instance Relational Schema: A relational schema specifies the relation’s name, its attributes and the domain of each attribute. If R is the name of a relation and A1, A2,…An is a list of attributes representing R then R(A1,A2,…,An) is called a Relational Schema. Each attribute in this relational schema takes a value from some specific domain called domain(Ai). Example: PERSON (PERSON_ID:INTEGER, NAME:STRING, AGE:INTEGER, ADDRESS:STRING) Total number of attributes in a relation denotes the degree of a relation since the PERSON relation scheme contains four attributes, so this relation is of degree 4. 5 Relational Database Design: Relational Model Relation Instance: A relational instance denoted as r is a collection of tuples for a given relational schema at a specific point of time. A relation state r to the relations schema – R(A1, A2…, An) also denoted by r(R) is a set of n-tuples R{t1,t2,…tm} – Where each n-tuple is an ordered list of n values T= Where each vi belongs to domain (Ai) or contains null values. Eg: – Relation schema for Student STUDENT(rollno:string, name:string, city:string, age:integer) 6 Referential integrity Referential integrity requires that a foreign key must have a matching primary key or it must be null. This constraint is specified between two tables (parent and child); it maintains the correspondence between rows in these tables. It means the reference from a row in one table to another table must be valid. Examples of referential integrity constraint in the Customer/Order database of the Company: – Customer(CustID, CustName) – Order(OrderID, CustID, OrderDate) The referential integrity constraint states that the customer ID (CustID) in the Order table must match a valid CustID in the Customer table. 7 Enterprise Constraints Enterprise constraints – sometimes referred to as semantic constraints – are additional rules specified by users or database administrators and can be based on multiple tables. Here are some examples. – A class can have a maximum of 30 students. – A teacher can teach a maximum of four classes per semester. – An employee cannot take part in more than five projects. – The salary of an employee cannot exceed the salary of the employee’s manager. 8 Codd’s Rule A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as introduced by E. F. Codd. A short definition of an RDBMS may be a DBMS in which data is stored in the form of tables and the relationship among the data is also stored in the form of tables. E.F. Codd, the famous mathematician has introduced 12 rules (0-12)for the relational model for databases commonly known as Codd's rules. The rules mainly define what is required for a DBMS for it to be considered relational, i.e., an RDBMS. 9 Rule-0 This rule states that all subsequent rules are based on the notation that in order for a database to be considered relational, it must use it’s relational facilities exclusively to manage the database. 10 Rule 1: Information Rule All information in the database is to be represented in one and only one way. All information in an RDB is represented as values in the tables. This is achieved by values in column and rows of tables. All information including table names, column names and column data types should be available in same table within the database. The basic requirement of the relational model. The data stored in a database, may it be user data or metadata, must be a value of some table cell. Everything in a database must be stored in a table format. Rule 2: Guaranteed Access Rule Each unique piece of data should be accessible by: table name + primary key(row) + attribute(column). All data are uniquely identified and accessible via this identity. Most RDBMS do not make the definition of the primary key mandatory and are deficient to that extent. 12 The NULL values in a database must be given a systematic and uniform treatment. This is a very important rule because a NULL can be interpreted as Rule 3 : Systematic one the following − data is missing, data is not known, or data is not applicable. treatment of null values "Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type." NULL may mean: Missing data, Not applicable Should be handled consistently - Not Zero or Blank Primary keys — Not NULL Expressions on NULL should give NULL. Separate handling of missing and/or non applicable data. This is distinct to zero or empty strings 13 Rule 4:Database Description Rule The data base description is represented at the logical level in the same way as-ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data. The authorized users can access the database structure by using common language i.e. SQL The structure description of the entire database must be stored in an online catalog, known as data dictionary, which can be accessed by authorized users. Users can use the same query language to access the catalog which they use to access the database itself. 14 Rule 5: Comprehensive Data Sublanguage A relational system may support several languages and various modes of terminal use.However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items : Data Definition (create,insert,update) View Definition Data Manipulation (alter,delete,truncate) Integrity Constraints (primary key,foreign key,null values) Authorization (GRANT , REVOKE) Transaction boundaries (begin,commit,rollbacketc) Every RDBMS should provide a language to allow the user to query the contents of the RDBMS and also manipulate the contents of the RDBMS. 15 Rule 6: View Updating All the views of a database, which can theoretically be Rule updated, must also be updatable by the system View = ”Virtual table”, temporarily derived from base tables. Example: If a view is formed as join of 3 tables, changes to view should be reflected in base tables. Not updatable: View does not have NOT-NULL attribute of base table. It allow the update of simple theoretically updatable views, but disallow attempts to update complex views. A TABLE NAMED ‘STUDENT‘: RDBMS GIVES US THE FACILITY TO VIEW ONLY SOME PARTICULAR FIELDS ACCORDING TO OUR NEED WHICH ARE DIRECTLY ACCESSED FROM BASE TABLES WHEN REQUIRED. 16 Rule 7 : High-level Insert , Update And Delete This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table. It also perform the operation on multiple row simultaneously. There must be delete, updating and insertion at the each level of operation. Set operation like union, all union , insertion and minus should also supported. EXAMPLE: Suppose if we need to change ID then it will reflect everywhere automatically. 17 Rule 8:Physical Data Independence The ability to change the physical schema without changing the logical schema is called physical data independence. This is saying that users shouldn’t be concerned about how the data is stored or how it’s accessed. In fact, users of the data need only be able to get the basic definition of the data they need. EXAMPLE: A change to the internal schema, such as using different file organization or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual or external schemas. 18 Rule 9 :Logical Data Independence Rule What is independence? The ability to modify schema definition in on level without affecting schema definition in the next higher level is called data independence The ability to change the logical (conceptual) schema without changing the External schema (User View) is called logical data independence. EXAMPLE: The addition or removal of new entities, attributes, or relationships to the conceptual schema should be possible without having to change existing external schemas or having to rewrite existing application programs. 19 Rule 10 : Integrity Independence Rule Data integrity refers to maintaining assuring the accuracy and consistency of data over its entire life cycle. 1. First insure that correct data type is used. 2. Check constraints: these allow column value to be checked agenized other column before insertion is allowed 20 Rule 11 : Distribution Independence Rule “THE RELATION DATA BASE MANAGEMENT HAS DISTRIBUTION INDEPENDENCE” Distribution independence implies that user should not have to be aware of whether a database is distributed at different sites or not. Application program and adhoc request are not affected by the change in distribution of physical data. Application program will work even if the programs and data are moved on different site The RDBMS may spread across the more one system or several networks 21 Rule 12 : Non-subversion Rule There should be no way to modify to database structure other then through the multiple row data base language(SQL). If a system has an interface that provides access to low-level records, then the interface must not be able to subvert the system and bypass security and integrity constraints. Example: A relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity Rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).” 22 Relational Constraints There are three types of constraints on relational database that include 1. Key Constraints: – This constraints states that the key attribute value in each tuple msut be unique.i.e, no two tuples contain the same value for the key attribute.(null values can allowed) Emp(empcode,name,address). here empcode can be unique 1. DOMAIN CONSTRAINTS: – It specifies that each attribute in a relation an atomic value from the corresponding domains. – The data types associated with commercial RDBMS domains include: o Standard numeric data types for integer o Real numbers o Characters o Fixed length strings and variable length strings Thus, domain constraints specifies the condition that we to put on each instance of the relation. 23 Integrity Constraints There are two types of integrity constraints: – Entity Integrity Constraints – Referential Integrity constraints Entity Integrity Constraints: It states that no primary key value can be null and unique. This is because the primary key is used to identify individual tuple in the relation. So we will not be able to identify the records uniquely containing null values for the primary key attributes. This constraint is specified on one individual relation. Referential Integrity Constraints: It states that the tuple in one relation that refers to another relation must refer to an existing tuple in that relation. This constraints is specified on two relations. If a column is declared as foreign key that must be primary key of another table 24 Normalization 25 Functional Dependency ▪ Redundancy in relational databases is often caused by a functional dependency ▪ A functional dependency (FD) : a link between two sets of attributes in a relation ▪ We can normalize a relation by removing undesirable FD ▪ A set of attributes, A, functionally determines another set, B, or: there exists a functional dependency between A and B (A ->B) B is functionally A B dependent on A Determinant Refers to the attribute or group of attributes on the left- hand side of the arrow of a functional dependency ▪ DATABASE MANAGEMENT SYSTEMS 26 Functional Dependencies Constraints on the set of legal relations. Require that the value for a certain set of attributes determines uniquely the value for another set of attributes. A functional dependency is a generalization of the notion of a key. 27 Functional Dependencies (Cont.) Let R be a relation schema α ⊆ R and β ⊆ R The functional dependency α→β holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes α, they also agree on the attributes β. That is, t1[α] = t2 [α] ⇒ t1[β ] = t2 [β ] Example: Consider r(A,B ) with the following instance of r. 1 4 1 5 3 7 On this instance, A → B does NOT hold, but B → A does hold. Functional Dependencies (Cont.) K is a superkey for relation schema R if and only if K → R K is a candidate key for R if and only if – K → R, and – for no α ⊂ K, α → R Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: bor_loan = (customer_id, loan_number, amount ). We expect this functional dependency to hold: loan_number → amount but would not expect the following to hold: amount → customer_name Functional Dependencies Continued Set of FDs : 1. {ID} - >{First,Last} Example 2. {ID,modCode}->{First,Last,modName} 3. {modCode}->{modName} Represented by an arrow sign (→) that is, X→Y, where X functionally determines Y. The left-hand side attributes determine the values of attributes on the right-hand side. 30 Functional Dependency Types ▪ Trivial Functional Dependency : ⮚Ifa functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial FDs always hold. Non-trivial Functional Dependency : If an FD X → Y holds, where Y is ▪ not a subset of X, then it is called a non-trivial FD. ▪AB->BC Non-Trivial FD AB->CD ▪ Completely Non-trivial : If an FD X → Y holds, where x intersect Y = Φ, it is said to be a completely non-trivial FD. 31 Database Normalization 32 Database Normalization : Need What is an Anomaly? Anything we try to do with a database that leads to unexpected and/or unpredictable results. Relations that have redundant data may have problems called update anomalies ⮚Three types of Update Anomaly to guard against: ▪ insert ▪ delete ▪ update ⮚Need to check your database design carefully: ▪ the only good database is an anomaly free database. 33 Question. Consider the database given below. Suppose we want to insert a new staff in the StaffBranch relation. What can be the problem for inserting new staff details? StaffBranch staffNo sName position salary branchNo bAddress SL21 John White Manager 30000 B005 22 Deer Rd, London SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London 34 1.Insert Anomaly Answer : The attempt to insert staff details will be prevented/not allowed unless and until it has been associated with some branch. ⮚When we want to enter a value into a data cell but the attempt is prevented, as another value is not known it leads to Insert Anomaly staffNo sName position salary branchNo bAddress SL21 John White Manager 30000 B005 22 Deer Rd, London SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London ? ? 35 2. Delete Anomaly ▪ When a value we want to delete also means we will delete values we wish to keep. ▪Example : To delete a tuple that represents the last member of staff located at a branch B007. ▪If we delete the staff details the branch also will be deleted. staffNo sName position salary branchNo bAddress SL21 John White Manager 30000 B005 22 Deer Rd, London SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London 36 3.Update Anomaly ▪ When we want to change a single data item value, but must update multiple entries ⮚e.g. To change the address of B003.If we update it in one tuple it is to be updated for all staff associated with that branch staffNo sName position salary branchNo bAddress SL21 John White Manager 30000 B005 22 Deer Rd, London SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London 37 Database Normalization Forms Normalization:a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise. The process of normalization is a formal method that identifies relations based on their primary or candidate keys and the functional dependencies among their attributes. Normal Forms : ▪ 1 NF (Atomicity) Single Column-Single Value ▪ 2 NF (Remove Partial Dependency) ▪ 3NF (Remove Transitive Dependency) ▪Boyce-Code Normal Form (BCNF) ▪Fourth Normal Form (4NF) 38 1 NF : First Normal Form A method to remove all these anomalies and bring the database to a consistent state. Conversion to 1NF Consider the relation Course_info Rules : ⮚All the attributes in a relation must have atomic domains. ▪ Each attribute must contain only a single ⮚The values in an atomic domain must be value from its pre-defined domain. indivisible units. 39 1 NF : First Normal Form Consider the relation ClientRental Repeating group = (propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName) ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName 6 lawrence Tina 1-Jul-00 31-Aug-01 350 CO40 PG4 St,Glasgow Murphy John CR76 kay PG16 5 Novar Dr, Tony Shaw 1-Sep-02 1-Sep-02 450 CO93 Glasgow 6 lawrence PG4 1-Sep-99 10-Jun-00 350 CO40 Tina St,Glasgow Murphy Aline 2 Manor Rd, CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Tony Shaw Stewart Glasgow Tony Shaw 5 Novar Dr, PG16 1-Nov-02 1-Aug-03 450 CO93 Glasgow ▪ Each attribute must contain only a single value from its pre-defined domain. 40 1 NF : First Normal Form There are two approaches to removing repeating groups from unnormalized tables: 1. Removes the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data.\ 2. Removes the repeating group by placing the repeating data, along with a copy of the original key attribute(s), in a separate relation. A primary key is identified for the new relation. 41 1 NF : First Normal Form Continued 1.With the first approach, we remove the repeating group (property rented details) by entering the appropriate client data into each row. ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName John 6 lawrence Tina CR76 PG4 1-Jul-00 31-Aug-01 350 CO40 Kay St,Glasgow Murphy John 5 Novar Dr, Tony CR76 PG16 1-Sep-02 1-Sep-02 450 CO93 Kay Glasgow Shaw Aline 6 lawrence Tina CR56 PG4 1-Sep-99 10-Jun-00 350 CO40 Stewart St,Glasgow Murphy Tony Aline 2 Manor Rd, CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw Stewart Glasgow Tony Aline 5 Novar Dr, CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Shaw Stewart Glasgow 1NF ClientRental relation with the first approach 42 1 NF : First Normal Form Continued 2.With the second approach, we remove the repeating group (property rented details) by placing the repeating data along with a copy of the original key attribute (clientNo) in a separate relation. ClientRental ClienNo propertyNo pAddress rentStart rentFinish rent ownerNo oName 6 lawrence Tina CR76 PG4 1-Jul-00 31-Aug-01 350 CO40 St,Glasgow Murphy 5 Novar Dr, Tony CR76 PG16 1-Sep-02 1-Sep-02 450 CO93 Glasgow Shaw Client CR56 PG4 6 lawrence 1-Sep-99 10-Jun-00 350 CO40 Tina St,Glasgow Murphy 2 Manor Rd, Tony CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 ClientNo cName Glasgow Shaw CR76 John Kay 5 Novar Dr, Tony CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Glasgow Shaw CR56 Aline Stewart 1NF ClientRental relation with the second approach 43 2 NF : Second Normal Form Full functional dependency indicates Second Normal Form : a relation that is in first normal form and every non-primary-key attribute is that if A and B are fully functionally dependent on the primary key. attributes of a relation, B is fully functionally dependent on A if B is The normalization of 1NF relations to 2NF involves functionally dependent on A, but not the removal of partial dependencies. If a partial on any proper subset of A. dependency exists, we remove the function dependent attributes from the relation by placing them in a new A functional dependency A🡪B is relation along with a copy of their determinant. partially dependent if there is some attributes that can be removed from A and the dependency still holds. 44 2 NF : Second Normal Form 2 NF Conversion to 2 NF Prime attribute − An attribute, which is a part From example , we find that Stu_Name can be identified of the prime-key, is known as a prime attribute. by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which Non-prime attribute − An attribute, which is is not allowed in Second Normal Form. not a part of the prime-key, is said to be a non- prime attribute. Therefore, we can convert the relation into as shown below Consider the relation student_project Student {Stud_id,Stud-name} and {Stud_id, Proj_id, Stud_name, Project_Title} Stud_id Stud_name Rule : Every non-prime attribute should be Project {Proj_id,Project_Title} fully functionally dependent on prime key attribute. That is, if X → A holds, then there Prod_id Project_Title should not be any proper subset Y of X, for which Y → A also holds true. 45 2 NF: Second Normal Form Continued The ClientRental relation has the following functional dependencies: fd1 clientNo, propertyNo 🡪 rentStart, rentFinish (Primary Key) fd2 clientNo 🡪 cName (Partial dependency) fd3 propertyNo 🡪 pAddress, rent, ownerNo, oName (Partial dependency) fd4 ownerNo 🡪 oName (Transitive Dependency) fd5 clientNo, rentStart 🡪 propertyNo, pAddress, rentFinish, rent, ownerNo, oName (Candidate key) fd6 propertyNo, rentStart 🡪 clientNo, cName, rentFinish (Candidate key) 46 2 NF: Second Normal Form Continued After removing the partial ClientNo cName Client CR76 John Kay dependencies, the creation of the CR56 Aline Stewart three ClientNo propertyNo rentStart rentFinish CR76 PG4 1-Jul-00 31-Aug-01 new relations called Client, Rental CR76 PG16 1-Sep-02 1-Sep-02 CR56 PG4 1-Sep-99 10-Jun-00 Rental, and PropertyOwner CR56 PG36 10-Oct-00 1-Dec-01 CR56 PG16 1-Nov-02 1-Aug-03 propertyNo pAddress rent ownerNo oName Tina PG4 6 lawrence St,Glasgow 350 CO40 Murphy PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw PropertyOwner 47 3 NF : Third Normal Form Third normal form (3NF) Transitive dependency A relation that is in first and second A condition where A, B, and C are normal form, and in which no non- attributes of a relation such that primary-key attribute is transitively dependent on the primary key. ifA 🡪 B and B 🡪 C, then C is transitively dependent on A via B The normalization of 2NF relations to 3NF involves the removal of (providedthat A is not transitive dependencies by placing functionally dependent on B or C). the attribute(s) in a new relation along with a copy of the determinant. 48 3 NF : Third Normal Form Rules : ▪ For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy − ▪ No non-prime attribute is transitively dependent on prime key attribute. ▪ For any non-trivial functional dependency, X → A, then either – X is a super key or, – A is prime attribute. Consider relation Stud_info Stud_info {rno,name,marks,zip,city,dob} Reflection Spot 3 rno name marks zip city dob Question. Does the above relation satisfy the 3 NF criteria??If no convert it to 3 NF. 49 3 NF : Third Normal Form Ans : No. The given relation does not satisfy the 3NF as it contains following transitive dependency: We have : rno->zip but zip->city Therefore rno->city (Transitive Dependency) Conversion to 3NF : Stud_info {rno,name,marks,dob,zip} rno name marks dob zip Zip_city {zip,city} zip city 50 3 NF : Third Normal Form Continued Thefunctional dependencies for the Client, Rental and PropertyOwner relations are as follows: Client fd2 clientNo 🡪 cName (Primary Key) Rental Agent Company Product fd1 clientNo,Ford Smith 🡪 propertyNocar rentStart, rentFinish (Primary Key) fd5 Smith clientNo,Ford rentStart 🡪truck propertyNo, rentFinish (Candidate key) Smith GM car fd6 propertyNo, rentStart 🡪 clientNo, rentFinish (Candidate key) Smith GM Truck PropertyOwner Jones Ford Car fd3 propertyNo 🡪 pAddress, rent, ownerNo, oName (Primary Key) fd4 ownerNo 🡪 oName (Transitive Dependency) DATABASE MANAGEMENT SYSTEMS 51 3 NF : Third Normal Form Continued ClientNo cName CR76 John Kay Client The resulting 3NF relations have CR56 Aline Stewart the forms: Rental ClientNo propertyNo rentStart rentFinish Client (clientNo, cName) CR76 PG4 1-Jul-00 31-Aug-01 CR76 PG16 1-Sep-02 1-Sep-02 Rental(clientNo, propertyNo, CR56 CR56 PG4 PG36 1-Sep-99 10-Oct-00 10-Jun-00 1-Dec-01 rentStart, rentFinish) CR56 PG16 1-Nov-02 1-Aug-03 PropertyOwner (propertyNo, Owner PropertyOwner pAddress, rent, ownerNo) property ren owner ownerNo oName pAddress No t No Owner(ownerNo, oName) PG4 6 lawrence 350 CO40 CO40 Tina Murphy St,Glasgow CO93 Tony Shaw 5 Novar Dr, PG16 450 CO93 Glasgow 2 Manor Rd, PG36 370 CO93 Glasgow DATABASE MANAGEMENT SYSTEMS 52 Boyce-Code Normal Form (BCNF) A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. These could be same situation when a 3NF relation may not be in BCNF the following conditions are found true. – The candidate keys are composite. – There are more than one candidate keys in the relation. – There are some common attributes in the relation 53 Boyce-Code Normal Form (BCNF) contd. Consider, as an example, the above relation. It is assumed that: The given relation is in 3NF. Observe, however, that 1. A professor can work in more than one department the names of Dept. and Head of Dept. are duplicated. 2. The percentage of the time he spends in each department is given. Further, if Professor P2 resigns, rows 3 and 4 are 3. Each department has only one Head of Department deleted. We lose the information that Rao is the Head of Department of Chemistry. The normalized relations are shown in the following The relation diagram for the above relation is given as the following: 54 Fourth Normal Form (4NF) When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF are required. Let us first find out what multi-valued dependency is. A multi-valued dependency is a typical kind of dependency in which each and every attribute within a relation depends upon the other, yet none of them is a unique primary key. We will illustrate this with an example. Consider a vendor supplying many items to many projects in an organization. The following are the assumptions: 1. A vendor is capable of supplying many items. 2. A project uses many items. 3. A vendor supplies to many projects. 4. An item may be supplied by many vendors. A multi valued dependency exists here because all the attributes depend upon the other and yet none of them is a primary key having unique value. 55 Fourth Normal Form (4NF) contd. The given relation has a number of The table can be expressed as the two 4NF problems. For example: relations given as following. The fact that 1. If vendor V1 has to supply to project vendors are capable of supplying certain P2, but the item is not yet decided, then a items and that they are assigned to supply row with a blank for item code has to be for some projects in independently specified introduced. in the 4NF relation. 2. The information about item I1 is stored twice for vendor V3. 3. Observe that the relation given is in 3NF and also in BCNF 56 Exercise : Is it 1NF? Take the following table. Create new rows so each cell contains only one value But now look – is the studentID primary key still valid? So. We now have 1NF. Is it 2NF? Create new tables for Normalization Create a new table for each primary key field Give each new table its own primary key Move columns from the original table to the new table that matches their primary key. Step 1 STUDENT TABLE (key = StudentID,Subject) Student_ID,Subject- >StudentName,Address,HouseName,HouseColor,Subject,SubjectCost,Grade Student_ID->StudentName,Address,HouseName,HouseColor HouseName->HouseColor Subject->SubjectCost,Grade StudentID,subject->Grade Step 2 STUDENT TABLE (key = StudentID) SUBJECTS TABLE (key = Subject) Step 3 STUDENT TABLE (key = StudentID) SUBJECTS TABLE (key = Subject) RESULTS TABLE (key = StudentID+Subject) Step 4 - relationships STUDENT TABLE (key = StudentID) SUBJECTS TABLE (key = Subject) RESULTS TABLE (key = StudentID+Subject) Step 4 - cardinality STUDENT TABLE (key = StudentID) 1 Each student can only appear ONCE in the student table SUBJECTS TABLE (key = Subject) RESULTS TABLE (key = StudentID+Subject) Step 4 - cardinality STUDENT TABLE (key = StudentID) 1 SUBJECTS TABLE (key = Subject) 1 Each subject can only appear ONCE in the subjects table RESULTS TABLE (key = StudentID+Subject) Step 4 - cardinality STUDENT TABLE (key = StudentID) 1 SUBJECTS TABLE (key = Subject) A subject can be listed MANY times in the results table (for different students) 8 1 RESULTS TABLE (key = StudentID+Subject) Step 4 - cardinality STUDENT TABLE (key = StudentID) 1 SUBJECTS TABLE (key = Subject) 1 A student can be listed MANY times in the results table (for different subjects) 8 8 RESULTS TABLE (key = StudentID+Subject) Step 4 - cardinality STUDENT TABLE (key = StudentID) 1 SUBJECTS TABLE (key = Subject) 1 A student can be listed MANY times in the results table (for different subjects) 8 8 RESULTS TABLE (key = StudentID+Subject) A 3NF fix 8 1 1 SUBJECTS TABLE (key = Subject) 8 8 1 RESULTS TABLE (key = StudentID+Subject) A 3NF win! 8 1 1 8 8 1 SUBJECTS TABLE (key = Subject) RESULTS TABLE (key = StudentID+Subject) Or… QUERY PROCESSING 72 QUERY PROCESSING Query processing includes translation of high- level queries into low-level expressions that can be used at the physical level of the file system, query optimization and actual execution of the query to get the result. It is a three-step process that consists of parsing and translation, optimization and execution of the query submitted by the user. A query is processed in four general steps: 1. Scanning and Parsing 2. Query Optimization or planning the execution strategy 3. Query Code Generator (interpreted or compiled) 4. Execution in the runtime database processor 73 Optimizer 74 75 76 Optimizer with Transformer and Estimator 77 Scanning and Parsing Scanning and Parsing When a query is first submitted (via an applications program), it must be scanned and parsed to determine if the query consists of appropriate syntax. Scanning is the process of converting the query text into a tokenized representation. The tokenized representation is more compact and is suitable for processing by the parser. This representation may be in a tree form. The Parser checks the tokenized representation for correct syntax. In this stage, checks are made to determine if columns and tables identified in the query exist in the database and if the query has been formed correctly with the appropriate keywords and structure. If the query passes the parsing checks, then it is passed on to the Query Optimizer. 78 Query Optimization or Planning the Execution Strategy For any given query, there may be a number of different ways to execute it. Each operation in the query (SELECT, JOIN, etc.) can be implemented using one or more different Access Routines. For example, an access routine that employs an index to retrieve some rows would be more efficient that an access routine that performs a full table scan. The goal of the query optimizer is to find a reasonably efficient strategy for executing the query (not quite what the name implies) using the access routines. Optimization typically takes one of two forms: – Heuristic Optimization or Cost Based Optimization In Heuristic Optimization, the query execution is refined based on heuristic rules for reordering the individual operations. – With Cost Based Optimization, the overall cost of executing the query is systematically reduced by estimating the costs of executing several different execution plans. 79 Query Code Generator Query Code Generator (interpreted or compiled) Once the query optimizer has determined the execution plan (the specific ordering of access routines), the code generator writes out the actual access routines to be executed. With an interactive session, the query code is interpreted and passed directly to the runtime database processor for execution. It is also possible to compile the access routines and store them for later execution. 80 Execution in the runtime database processor Execution in the runtime database processor At this point, the query has been scanned, parsed, planned and (possibly) compiled. The runtime database processor then executes the access routines against the database. The results are returned to the application that made the query in the first place. Any runtime errors are also returned 81 Query optimization The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans. There are broadly two ways a query can be optimized: – Analyze and transform equivalent relational expressions: Try to minimize the tuple and column counts of the intermediate and final query processes (discussed here). – Using different algorithms for each operation: These underlying algorithms determine how tuples are accessed from the data structures they are stored in, indexing, hashing, data retrieval and hence influence the number of disk and block accesses (discussed in query processing). 82 Evaluation of Expressions Evaluation of Expressions: Alternatives for evaluating an entire expression tree – Materialization: generate results of an expression whose inputs are relations or are already computed, materialize (store) it on disk. Repeat. – Pipelining: pass on tuples to parent operations even as an operation is being executed 83 Measure of Query Cost Measure of Query Cost: Cost is generally measured as total elapsed time for answering the query. There are many factors that contribute to time cost. These are disk accesses, CPU time, or even network communication. Typically disk access is the predominant cost as disk transfer is a very slow event and is also relatively easy to estimate. It is measured by taking into account the following activities: – Number of seeks – average-seek-cost Number of blocks read – average-block-read-cost Number of blocks written – average-block-written-cost. 84 Selection and Join Operation File Scan: These are the search algorithms that locate and retrieve records that fulfil a selection condition in a file. The following are the two basic files scan algorithms for selection operation: 1. Linear search: 1. This algorithm scans each file block and tests all records to see whether they satisfy the selection condition. 2. The cost of this algorithm (in terms of block transfer) is defined as: 3. Cost of searching records satisfying a condition = Number of blocks in a database = Nb. 4. Cost for searching a key attribute value = Average number of block transfer for locating the value (on an average, half of the file needs to be traversed) so it is = Nb/2 2. Binary search: 1. It is applicable when the selection is an equality comparison on the attribute on which file is ordered. 2. Assume that the blocks of a relation are stored continuously then, cost can be estimated as: 3. Cost = Cost of locating the first tuple by a binary search on the blocks + Sequence of other blocks that continue to satisfy the condition. 4. = [log2 (Nb)] + Average number of tuples with the same value/ Blocking factor (Number of tuples in a block) of the relation 5. These two values may be calculated from the statistics of the database. 85 Index Scan Index Scan Search algorithms that use an index are restricted because the selection condition must be on the search-key of the index. 1. Primary index-scan for equality: This search retrieves a single record that satisfies the corresponding equality condition. The cost here can be calculated as: Cost = Height traversed in index to locate the block pointer +1(block of the primary key is transferred for access). 2. Hash key: It retrieves a single record in a direct way thus, cost in hash key may also be considered as lock transfer needed for finding hash target +1 86 Join Operation Join Operation: There are number of algorithms that can be used to implement joins 1. Nested-loop join 2. Block nested-loop join 3. Indexed nested-loop join 87 References 1.Ramakrishnan, R. and Gherke, J., “Database Management Systems”, 3rd Ed., McGraw-Hill. 2. Connally T, Begg C.,”Database Systems”,Pearson Education 3. Codd's Rules in DBMS - Webeduclick.com 4. INTRODUCTION TO DBMS (cet.edu.in) 5. Chapter 9 Integrity Rules and Constraints – Database Design – 2nd Edition (opentextbc.ca) 6. INTRODUCTION TO DBMS (cet.edu.in) 7. Query Optimizer Concepts (oracle.com) 88