Relational Database Design PDF
Document Details
Uploaded by GreatestIrrational8477
2024
Tags
Summary
These lecture notes cover relational database design concepts, focusing on normalization techniques, including functional dependencies and various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF, and DKNF).
Full Transcript
Relational Database Design Functional Dependencies and Normalization for Relational Databases October 18, 21, 2024 Normalization of Relations Normalization: – The process of decomposing unsatisfactory "bad" relations by breaking up their attributes i...
Relational Database Design Functional Dependencies and Normalization for Relational Databases October 18, 21, 2024 Normalization of Relations Normalization: – The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations. – Concept built around the concept of normal form. Normal form: – A relation is said to be in a particular normal form if it satisfies some condition. Normalized Design Produce a set of relations which: – Eliminate redundancy of non-key attributes. – Uses foreign keys to express relationships. – Has good update properties. – Satisfy referential and entity integrity. Created through “normalization” process – Successive decomposition of relations. – Create relations exhibiting certain properties. Normal Forms Normal forms are based on FDs. Data normalized to, – Minimize redundancies – Minimize anomalies Relations are decomposed to normalize. Concerns with decomposition; – Loss-less join or nonadditive join property – Dependency preserving property Normal Forms First Normal Form – Domain of an attribute must include only atomic (simple, indivisible) values – Value of any attribute in a tuple must be a single value from domain of that attribute. – Now considered part of definition of a relation in relational model. 2NF, 3NF, BCNF – based on keys and FDs of a relation schema Normalization of Relations 4NF - – based on keys, multi-valued dependencies 5NF – based on keys, join dependencies Denormalization: – The process of storing the join of higher normal form relations as a base relation — which is in a lower normal form Remember: Definitions of Keys A superkey of a relation schema R = {A1, A2,...., An} is a set of attributes S subset- of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S] A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more. Remember: Definitions of Keys If a relation schema has more than one key, each is called a candidate key. – One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. A Prime attribute must be a member of some candidate key. A Nonprime attribute is not a prime attribute— that is, it is not a member of any candidate key. First Normal Form Disallows – composite attributes – multivalued attributes – nested relations; attributes whose values for an individual tuple are non-atomic Considered to be part of the definition of relation Normalization into 1NF Normalization of nested relations into 1NF Second Normal Form Uses the concepts of primary key, FDs. A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key. R can be decomposed into 2NF relations via the process of 2NF normalization. Two Relational Schemas Suffering from Update Anomalies Normalizing into 2NF - Example Every non-prime attribute A in R is fully functionally dependent on the primary key. Third Normal Form Transitive functional dependency: – A FD X → Z that can be derived from two FDs X → Y and Y → Z. Examples: – SSN -> DMGRSSN is a transitive FD SSN → DNUMBER and DNUMBER → DMGRSSN – SSN -> ENAME is non-transitive Since there is no set of attributes X where SSN → X and X → ENAME Normalizing into 3NF - Example No non-prime attribute A in relation schema R is transitively dependent on the primary key. Third Normal Form A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key R can be decomposed into 3NF relations via the process of 3NF normalization NOTE: – In X → Y and Y → Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. – When Y is a candidate key, there is no problem with the transitive dependency. Normal Forms Defined Informally 1st normal form – All attributes depend on the key 2nd normal form – All attributes depend on the whole key 3rd normal form – All attributes depend on nothing but the key General Normal Form Definitions (For Multiple Keys) Previous definitions consider the primary key only More general definitions take into account relations with multiple candidate keys A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R Another Example: 2NF The LOTS relation with its FDs FD3 violates the condition of 2NF Decomposition into the 2NF relations LOTS1 and LOTS2 FD4 does not violates the condition of 2NF and is carried over to LOTS1 General Normal Form Definitions (For Multiple Keys) A relation schema R is in third normal form (3NF) if every non-prime attribute of R is – Fully functionally dependent on every key of R; and – Nontransitively dependent on every key of R. Another Example: 2NF and 3NF The LOTS relation with its FDs Decomposition into the 2NF relations LOTS1 and LOTS2 FD4 violates 3NF Decomposing LOTS1 into the 3NF relations and LOTS1A and LOTS1B Progressive normalization of lots LOTS1B Boyce-Codd Normal Form An FD is trivial if RHS is subset of LHS. A relation schema R is in Boyce-Codd Normal Form (BCNF) iff every nontrivial, left-irreducible FD has a candidate key as its determinant. Less formal definition – A relation is in BCNF iff the only determinants are candidate keys. In other words, there will always be arrows out of candidate keys. BCNF- Example LOTS1A with additional FD (FD5) LOTS1A is still in 3NF because county_name is a prime attribute A relation schema R is in third normal form (3NF) if every non-prime attribute of R is fully functionally dependent on every key of R; and nontransitively dependent on every key of R. It does not talk about prime attributes. BCNF- Example LOTS1A with additional FD (FD5) LOTS1A is still in 3NF because county_name is a prime attribute Boyce-Codd Normal Form Each normal form is strictly stronger than the previous one – Every 2NF relation is in 1NF – Every 3NF relation is in 2NF – Every BCNF relation is in 3NF There exist relations that are in 3NF but not in BCNF BCNF - Another Example A relation that is in 3NF but not in BCNF Two FDs exist in the relation TEACH: – fd1: { student, course} → instructor – fd2: instructor → course {student, course} : candidate key BCNF - Another Example fd1: { student, course} -> instructor fd2: instructor -> course Which decomposition should be followed ? Three possible decompositions for this relation – {student, instructor} and {student, course} – {course, instructor } and {course, student} – {instructor, course } and {instructor, student} BCNF - Another Example fd1: { student, course} -> instructor fd2: instructor -> course See that all three decompositions will lose fd1 and only the 3rd decomposition satisfies lossless join property Three possible decompositions for this relation – {student, instructor} and {student, course} – {course, instructor } and {course, student} – {instructor, course } and {instructor, student} Here we have to settle for sacrificing the functional dependency preservation. But we cannot sacrifice lossless join property Designing a Set of Relations Goals: – Lossless join/nonadditive property (a must) – Dependency preservation property Algorithms are available – To tests for general losslessness. – To decompose a relation into BCNF components by sacrificing the dependency preservation. Multivalued Dependencies In many cases relations have constraints that can not be specified as FDs. Multivalued dependencies (MVDs) are a consequence of 1NF, which disallows an attribute in a tuple to have a set of values. A multivalued dependency occurs when a determinant determines a particular set of values Consider the problem when we have two/more multivalued independent attributes in the same relation schema …. Multivalued Dependencies The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>> DNAME. This constraint is specified as a multivalued dependency on the EMP An employee may work on several projects and may have several dependents. The employee’s projects and dependents are independent of one another. Constraint: a separate tuple to represent every combination of an employee’s dependent and an employee’s project. Need to repeat every value of one of the attributes with every value of the other attribute Multivalued Dependencies An MVD A —>> B in R is a trivial MVD if – B is a subset of A, or AυB = R. A trivial MVD will hold in any relation state r of R – It is called trivial because it does not specify any significant or meaningful constraint on R. If we have nontrivial MVD in a relation, we may have to repeat values redundantly in the tuples – This redundancy is clearly undesirable. EMP schema is in BCNF because no functional dependencies hold in EMP Need to define a fourth normal form Relations containing nontrivial MVDs tend to be all - key relations Multivalued Dependencies The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>> DNAME. Whenever two independent 1:N relationships A:B and A:C are mixed in the same relation, R(A, B, C) an MVD may arise MVDs are a generalization of FDs, in the sense that every FD is an MVD but the converse is not true. Multivalued Dependencies and Fourth Normal Form - Example (a) The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>> DNAME (b) Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS. Fourth Normal Form Fagin’s Theorem – A relation R with attributes A, B, and C, can be nonloss-decomposed into its two projections R1(A, B) and R2 (A, C) iff the MVD A —>> B|C holds in R. A relation R is in 4NF iff, whenever there exist an MVD in R, say A —>> B, then all attributes of R are also functionally dependent on A (i.e., A->x for all attributes x of R) Two Projections Always? So far every relation was non-loss decomposable into two projections – is this always possible? n-decomposable relations Example: Relation CTL Courses - Tutors - Levels (CTL) Course Tutor Level Databases Usha Level3 Databases Mahesh Level2 Programming Usha Level2 Databases Usha Level2 Example: Relation CTL Courses - Tutors - Levels (CTL) Course Tutor Level Databases Usha Level3 Databases Mahesh Level2 Programming Usha Level2 Databases Usha Level2 Two Attribute Projections of CTL Courses - Tutors - Levels (CTL) Course Tutor Level Databases Usha Level3 Databases Mahesh Level2 Programming Usha Level2 Databases Usha Level2 CT TL Course Tutor Tutor Level Databases Usha Usha Level3 Databases Mahesh Mahesh Level2 Programming Usha Usha Level2 Two Attribute Projections of CTL CT Course Tutor Tutor Level TL Databases Usha Usha Level3 Databases Mahesh Mahesh Level2 Programming Usha Usha Level2 Join(CT, TL) Course Tutor Level Databases Usha Level3 The join of Databases Usha Level2 any two projections Databases Mahesh Level2 is not CTL Programming Usha Level3 Programming Usha Level2 CTL is a 3-decomposable relation Two Attribute Projections of CTL Courses - Tutors - Levels (CTL) CT TL Course Tutor Tutor Level Databases Usha Usha Level3 Databases Mahesh Mahesh Level2 Programming Usha Usha Level2 CL Course Level Databases Level3 Databases Level2 Programming Level2 3-Decomposable Relation Constraint: Let R be a degree 3 relation. IF (a, b, x) R AND (a, y, c) R AND (z, b, c) R THEN (a, b, c) R Constraint illustrated on the CTL relation IF tutor t1 teaches subject s1 AND level l1 studies subject s1 AND tutor t1 teaches level l1 THEN tutor t1 teaches subject s1 for level l1 Note: this constraint is not expressed in CTL Join Dependencies and 5NF Very peculiar semantic constraint that is very difficult to detect in practice – Normalization into 5NF - rarely done in practice. Definition of Join Dependency: – A join dependency denoted by JD(R1, R2,.., Rn), specified on relation schema R, specifies a constraint on the states of R. – The constraint states that every legal state r of R should have a nonadditive join decomposition into R1, R2,.., Rn – i.e., for every such r we have *(πR1(r), πR2(r), …πRn(r)) = r Join Dependencies and 5NF MVD is a special case of a JD where n = 2. A relation schema R is in fifth normal form (5NF) iff every join dependency in R is implied by the candidate keys of R. Let R be a relation. Let A, B,..., Z be arbitrary subsets of R’s attributes. R satisfies the * (R1, R2,.., Rn) if and only if R is equal to the join of its projections on (R1, R2,.., Rn) Also called projection-join form (PJ/NF) Domain-Key Normal Form Proposed by Fagin DKNF is not defined in terms of FDs, MVDs, or JDs at all A relation schema is said to be in DKNF if all constraints and dependencies that should hold on the valid relation states can be enforced simply by enforcing the domain constraints and key constraints on the relation. “if every constraint on the relation is a logical consequence of the definition of keys and domains” Domain-Key Normal Form For a relation in DKNF, it becomes very straightforward to enforce all db constraints by simply checking that each attribute value in a tuple is of the appropriate domain and that every key constraint is enforced. Fagin shows that any DKNF is necessarily in 5NF (and hence in 4NF..etc.) However, DKNF is not always achievable, nor has the question “exactly when can it be achieved ?” been answered. Let us take some problems…. Consider a relation R(A,B,C,D,E) with functional dependencies: A→B, BC→E, and D→A Find the key for R. A→B, BC→E; so AC→E AC→E and D→A; so CD→E A→B, and D→A so D→B and so CD→B D→A; so CD→A CD is the key. Let us take some problems…. Consider a relation R(A,B,C,D,E) with functional dependencies: A→D, C→AB, and DB→E Find the key for R. A→D, DB→E; so AB→E C→AB and AB→E so C→E C→A and A→D; so C→D C is the key. Let us take some problems…. Consider the following relation: CARSALE(car#, date_sold, salesman#, commission%, discount) Assume that a car may be sold by multiple salesmen, and hence {car#, salesman#} is the primary key. Additional dependencies are date_sold → discount Salesman# → commission%. Based on the given primary key, is the relation in 1NF, 2NF, or 3NF? Why or why not? How would you successively normalize it completely? Let us take some problems…. CARSALE (car#, date_sold, salesman#, commission%, discount) date_sold → discount and salesman# → commission%. Relation is in the 1NF according to the criteria. 2NF: R1(car#, date_sold, salesman#, discount) R2(salesman#, commission%) 3NF: R1(car#, date_sold, salesman#) R2(salesman#, commission%) R3(date_sold, discount) You are supposed to write criteria and show the decomposition based on dependencies. More practice…. Given a relation R = {A, B, C, D, E, H} and having the following FDs F = {{A → BC}, {CD → E}, {E → C}, {D → A, E, H}, {A, B, H → B, D}, {D, H → B, C}}. Find the key for relation R with FD F. From 1 and 4: D → A; A → B; then D → B Similarly D → A; A → C; then D → C Therefore D → A, D → B, D → C, D → E. Hence D is the key. More practice…. Consider the following relation: TRIP (trip_id, startdate, cities_visited, cards_used) This relation refers to business trips made by salesmen in a company. Suppose the trip has a single startdate but involves many cities and one may use multiple credit cards for that trip. a. Discuss what FDs and / or MVDs exist in this relation. b. Show how you will go about normalizing it. More practice…. TRIP (trip_id, startdate, cities_visited, cards_used) Suppose the trip has a single startdate but involves many cities and one may use multiple credit cards for that trip. a. Discuss what FDs and / or MVDs exist in this relation. trip_id → startdate trip_id →→ cities_visited trip_id →→ cards_used More practice…. TRIP (trip_id, startdate, cities_visited, cards_used) Suppose the trip has a single startdate but involves many cities and one may use multiple credit cards for that trip. b. Show how you will go about normalizing it. Because there are no interdependencies, this relation can be trivially decomposed to conform to 4NF: TRIP_DATE (trip_id, startdate) TRIP_CITIES (trip_id, cities_visited) TRIP_CARDS (trip_id, cards_used)