CS 4002 Unit 5 PDF
Document Details
Uploaded by Deleted User
Ankur Verma
Tags
Summary
This document covers different aspects of database normalization, such as functional dependencies and types of normal forms. It includes tables and an index of topics covered.
Full Transcript
CS 4002 Unit 5 Ankur Verma 3/28/22 Normalization CS 4002 Unit 3 - Normalization Table of Contents 1) FUNCTIONAL DEPENDENCIES...............................
CS 4002 Unit 5 Ankur Verma 3/28/22 Normalization CS 4002 Unit 3 - Normalization Table of Contents 1) FUNCTIONAL DEPENDENCIES........................................................................................................................ 1 What does functionally dependent mean?......................................................................................................... 1 Types of Functional Dependencies..................................................................................................................... 2 Trivial Dependency.......................................................................................................................................... 2 Non-Trivial Dependency.................................................................................................................................. 2 Transitive Dependency.................................................................................................................................... 2 Multivalued Dependency................................................................................................................................. 3 Join Dependency.............................................................................................................................................. 3 2) CONCPET OF DECOMPOSITION..................................................................................................................... 4 Lossless Join Decomposition............................................................................................................................... 4 Lossy Join Decomposition................................................................................................................................... 5 Lossless v/s Lossy Decomposition....................................................................................................................... 6 Dependency Preserving....................................................................................................................................... 6 3) WHAT IS NORMALIZATION?........................................................................................................................... 7 Definition.............................................................................................................................................................. 7 Need for Normalization....................................................................................................................................... 7 Advantages of Normalization.............................................................................................................................. 7 Disadvantages of Normalization......................................................................................................................... 7 Types of Normal Forms....................................................................................................................................... 8 1NF.................................................................................................................................................................... 8 2NF.................................................................................................................................................................... 8 3NF.................................................................................................................................................................... 9 BCNF............................................................................................................................................................... 10 4NF.................................................................................................................................................................. 10 5NF (PJNF)..................................................................................................................................................... 12 ---------------------------------------------------------------------------------------------------------------- 1) FUNCTIONAL DEPENDENCIES Functional Dependency is a constraint between two sets of attributes in relation to a database. A functional dependency is denoted by an arrow (→). A B 1 3 If an attribute A functionally determines B, then it is written as 2 3 A → B 4 0 What does functionally dependent mean? 1 3 o A function dependency A → B means for all instances of a particular 4 0 value of A, there is the same value of B. Page 1 CS 4002 Unit 3 - Normalization o Example: in the table, the FD A → B is true, but B → A is not true as there are different values of A for B = 3. Types of Functional Dependencies o Trivial FD. o Non-Trivial FD. o Transitive FD. o Multivalued Dependency. o Join Dependency. ------------------------------------------------------------------------------------------------------------ Trivial Dependency o Definition:- ▪ In Trivial Functional Dependency, a dependent is always a subset of the determinant. o Notation:- ▪ If X → Y and Y is the subset of X, then it is called trivial functional dependency o Example:- roll_no name age ▪ Here, {roll_no, name} → name is a trivial 42 abc 17 functional dependency, since the dependent 43 pqr 18 name is a subset of determinant set {roll_no, name} 44 xyz 18 ▪ Similarly, roll_no → roll_no is also an example of trivial functional dependency. ------------------------------------------------------------------------------------------------------------ Non-Trivial Dependency o Definition:- ▪ In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. o Notation:- ▪ i.e. If X → Y and Y is not a subset of X, then it roll_no name age is called Non-trivial functional dependency. o Example:- 42 abc 17 ▪ Here, roll_no → name is a non-trivial 43 pqr 18 functional dependency, since the dependent 44 xyz 18 name is not a subset of determinant roll_no ▪ Similarly, {roll_no, name} → age is also a non-trivial functional dependency, since age is not a subset of {roll_no, name} ------------------------------------------------------------------------------------------------------------ Transitive Dependency o Definition:- Page 2 CS 4002 Unit 3 - Normalization ▪ A transitive dependency in a database is an indirect relationship between values in the same table that causes a functional dependency. o Notation:- ▪ If P → Q and Q → R is true, then P → R is a transitive dependency. o Example:- Movie_ID Listing_ID Listing_Type DVD_Price ($) M08 L09 Crime 180 M03 L05 Drama 250 M05 L09 Crime 180 ▪ Here, the following functional dependency exists Movie_ID → Listing_ID Listing_ID → Listing_Type ------------------------------------------------------------------------------------------------------------ Multivalued Dependency o Definition:- ▪ In Multivalued functional dependency, entities of the dependent set are not dependent on each other. o Notation:- ▪ If a → {b, c} and there exists no functional dependency between b and c, then it is called a multivalued functional dependency. o Example:- ▪ Here, roll_no → {name, age} is a roll_no name age multivalued functional dependency, since the dependents name & age are not dependent on 42 abc 17 each other. 43 pqr 18 ▪ i.e. name → age or age → name doesn’t exist. 44 xyz 18 ------------------------------------------------------------------------------------------------------------ Join Dependency o Definition:- ▪ If a table can be recreated by joining multiple tables and each of this table have a subset of the attributes of the table, then the table is in Join EmpName EmpSkills EmpJob Dependency. ▪ It is a generalization of Tom Networking EJ001 Multivalued Dependency. Harry Web EJ002 o Example:- Development Employee Katie Programming EJ002 Page 3 CS 4002 Unit 3 - Normalization EmployeeSkills EmployeeJob JobSkills EmpName EmpSkills EmpName EmpJob EmpSkills EmpJob Tom Networking Tom EJ001 Networking EJ001 Harry Web Harry EJ002 Web EJ002 Development Development Katie EJ002 Katie Programming Programming EJ002 ▪ The above relations have join dependency. ▪ That would mean that a join relation of the above three relations is equal to our original relation Employee. 2) CONCPET OF DECOMPOSITION Decomposition means dividing a relation R into {R1, R2,......Rn}. The decomposition is said to be dependency preserving and lossless. Decomposition in DBMS removes redundancy, anomalies, and inconsistencies from a database by dividing the table into multiple tables. Lossless Join Decomposition o Definition:-. ▪ Decomposition is lossless if it is feasible to reconstruct relation R from decomposed tables using Joins. ▪ The information will not lose from the relation when decomposed. ▪ The join would result in the same original relation. o Notation:-. ▪ Consider there is a relation R which is decomposed into sub relations R1, R2, …, Rn. ▪ For lossless join decomposition, we always have- R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R where ⋈ is a natural join operator o Example :-. ▪ Consider the following relation R(A, B, C). ▪ Now, the relation R is split into :- R R1 R2 A B C A B B C 1 2 1 1 2 2 1 2 5 3 2 5 5 3 3 3 3 3 3 3 3 Page 4 CS 4002 Unit 3 - Normalization ▪ Now, let us check whether this decomposition is lossless or not. ▪ For lossless decomposition, we must have- R1 ⋈ R2 = R ▪ Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get relation is same as the original relation R. ▪ Thus, we conclude that the above decomposition is lossless join decomposition. Lossy Join Decomposition o Definition:-. ▪ In this, the join of the sub relations does not result in the same relation R that was decomposed. ▪ The natural join of the sub relations is always found to have some extraneous tuples. o Notation:-. ▪ Consider there is a relation R which is decomposed into sub relations R1, R2, …, Rn. ▪ For lossless join decomposition, we always have- R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R where ⋈ is a natural join operator o Example :-. ▪ Consider the following relation R(A, B, C). ▪ Now, the relation R is split into :- R R1 R2 A B C A C B C 1 2 1 1 1 2 1 2 5 3 2 3 5 3 3 3 3 3 3 3 3 ▪ Now, let us check whether this decomposition is lossless or A B C not. 1 2 1 ▪ For lossy decomposition, we must have- 2 5 3 R1 ⋈ R2 ⊃ R 2 3 3 ▪ Now, if we perform the natural join ( ⋈ ) of the sub 3 5 3 relations R1 and R2 , we get relation which is not the same 3 3 3 as the original relation R. ▪ Thus, we conclude that the above decomposition is lossy decomposition. Page 5 CS 4002 Unit 3 - Normalization Lossless v/s Lossy Decomposition Lossless Lossy The decompositions R1, R2, R2…Rn for a The decompositions R1, R2, R2…Rn for a relation schema R are said to be Lossless if relation schema R are said to be Lossy if there natural join results the original relation there natural join results into addition of R. extraneous tuples with the original relation R. Formally, Let R be a relation and R1, R2, R3 Formally, Let R be a relation and R1, R2, R3 … Rn be it’s decomposition, the … Rn be it’s decomposition, the decomposition is lossless if – decomposition is lossy if – R1 ⨝ R2 ⨝ R3.... ⨝ Rn = R R ⊂ R1 ⨝ R2 ⨝ R3.... ⨝ Rn There is no loss of information as the relation There is loss of information as extraneous obtained after natural join of decompositions tuples are added into the relation after natural is equivalent to original relation. Thus, it is join of decompositions. Thus, it is also also referred to as non-additive join referred to as careless decomposition. decomposition The common attribute of the sub relations is a The common attribute of the sub relation is superkey of any one of the relation. not a superkey of any of the sub relation. Dependency Preserving o Definition:- ▪ Let R is decomposed into {R1, R2,....,Rn} with projected FD set {F1, F2,......Fn}. ▪ This decomposition is dependency preserving if F+ ={F1 U F2 U......... Fn}+. o Example:- ▪ Let the relation R{A,B,C,D} F:{AB → C, C → D, AB → D} R is decomposed to R1(A,B,C), R2(C,D). Let F1 = {AB → C} and F2 = {C → D} => (F1 u F2) = {AB → C, C → D} AB+ under (F1 U F2) = {A,B,C,D} => AB → D holds true is under (F1 U F2) F+ = (F1 U F2)+ => Decomposition is dependency preserving. Page 6 CS 4002 Unit 3 - Normalization 3) WHAT IS NORMALIZATION? Definition o Normalization is a process to eliminate the flaws of a database with bad design. A poorly designed database is inconsistent and create issues while adding, deleting or updating information. or o Database normalization is the process of organizing the attributes of the database to reduce or eliminate data redundancy (having the same data but at different places). or o In DBMS, database normalization is a process of making the database consistent by - ▪ Reducing the redundancies. ▪ Ensuring the integrity of data through lossless decomposition. Need for Normalization o Resolving the database anomalies o Eliminate Redundancy of Data o Data Dependency o Isolation of Data o Data Consistency Advantages of Normalization o Normalization helps to minimize data redundancy. o Greater overall database organization. o Data consistency within the database. o Much more flexible database design. o Enforces the concept of relational integrity. Disadvantages of Normalization o You cannot start building the database before knowing what the user needs. o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF. o It is very time-consuming and difficult to normalize relations of a higher degree. o Careless decomposition may lead to a bad database design, leading to serious problems. Page 7 CS 4002 Unit 3 - Normalization Types of Normal Forms ------------------------------------------------------------------------------------------------------------ 1NF o Definitions: ▪ A relation is in 1NF if it contains an atomic value. o Associated Functional Dependency: Nil, it is associated to the atomicity of the attributes of the given relation. o Example: ▪ The given relation is not in 1NF, because the values of the attributes are not atomic. (Contains multiple values in the cells of Attribute Subject) Student_id Name Subjects Computer Networks, 100 Akshay Designing Database 101 Aman Management System Automata, 102 Anjali Compiler Design o NOTE: ▪ By default, every relation is in 1NF. ▪ This is because formal definition of a relation states that value of all the attributes must be atomic. ------------------------------------------------------------------------------------------------------------ 2NF o Definitions: ▪ A relation will be in 2NF if :- It is in 1NF, and No partial dependency exists in the relation. Page 8 CS 4002 Unit 3 - Normalization o Associated Functional Dependency: ▪ No Partial Dependency Should exist. ▪ Partial Dependency :- Partial Dependency occurs when a non-prime attribute is functionally dependent on part of a candidate key. or All non-key attributes are fully functional dependent on the primary key. Notation:- A → B is called a partial dependency if and only if- A is a subset of some candidate key B is a non-prime attribute. o Example: ▪ Consider a relation- R (V, W, X, Y, Z) with functional dependencies- VW → XY Y → V WX → YZ ▪ The possible candidate keys for this relation are- VW , WX , WY ▪ From here, Prime attributes = {V, W, X, Y} Non-prime attributes = {Z} ▪ Now, if we observe the given dependencies- There is no partial dependency. ▪ Thus, we conclude that the given relation is in 2NF. ------------------------------------------------------------------------------------------------------------ 3NF o Definitions: ▪ A relation will be in 3NF if :- it is in 2NF, and There exist no transitive dependency. o Associated Functional Dependency: ▪ There exist no Transitive dependency in the relation. ▪ Notation :- A → B is called a transitive dependency if and only if- o A is not a super key. o B is a non-prime attribute. If any one condition fails, then it is not a transitive dependency. ▪ NOTE :- Transitive dependency must not exist for non-prime attributes. However, transitive dependency can exist for prime attributes. ▪ Another Interpretation of 3NF by transitive:- A relation is called in Third Normal Form (3NF) if and only if- Any one condition holds for each non-trivial functional dependency A → B A is a super key Page 9 CS 4002 Unit 3 - Normalization B is a prime attribute o Example: ▪ Consider a relation- R (A, B, C, D, E) with functional dependencies- A → BC CD → E B → D E → A ▪ The possible candidate keys for this relation are- A, E, CD, BC ▪ From here, Prime attributes = {A, B, C, D, E} There are no non-prime attributes ▪ Now, ▪ It is clear that there are no non-prime attributes in the relation. ▪ In other words, all the attributes of relation are prime attributes. ▪ Thus, all the attributes on RHS of each functional dependency are prime attributes. ▪ Thus, we conclude that the given relation is in 3NF. ------------------------------------------------------------------------------------------------------------ BCNF o Definitions: ▪ A stronger definition of 3NF is known as Boyce Codd's normal form. ▪ A given relation is called in BCNF if and only if- Relation already exists in 3NF. For each non-trivial functional dependency A → B A is a super key of the relation. o Associated Functional Dependency: ▪ LHS of the functional dependency should be a super key. o Example: ▪ Consider a relation- R (A, B, C) with the functional dependencies- A → B B → C C → A ▪ The possible candidate keys for this relation are- A+ = {A, B, C} B+ = {A, B, C} C+ = {A, B, C} ▪ Now, we can observe that, LHS of every FD is a superkey. ▪ Thus, we conclude that the given relation is in BCNF. ------------------------------------------------------------------------------------------------------------ 4NF o Definitions: Page 10 CS 4002 Unit 3 - Normalization ▪ A relation will be in 4NF if:- It is in Boyce Codd's normal form, and Has no non-trivial multivalued dependencies other than a candidate key. o Associated Functional Dependency: ▪ No Multi-valued Dependency Should exist. o Example: ▪ Consider the following Relation :- STUDENT ID COURSE HOBBY 21 Computer Dancing 21 Math Singing 34 Chemistry Dancing 74 Biology Cricket 59 Physics Hockey ▪ The following Functional Depending exist :- ID → Course, Hobby ▪ The given STUDENT table is in 3NF, but the COURSE and HOBBY are independent. ▪ Hence, there is no relationship between COURSE and HOBBY. ▪ In the STUDENT relation, a student with ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. ▪ So there is a Multi-valued dependency on STU_ID ▪ Therefore, to make the above table into 4NF, we can decompose it into two tables. o Decomposition: STUDENT_COURSE STUDENT_HOBBY STU_ID COURSE STU_ID HOBBY 21 Computer 21 Dancing 21 Math 21 Singing 34 Chemistry 34 Dancing 74 Biology 74 Cricket 59 Physics 59 Hockey ------------------------------------------------------------------------------------------------------------ Page 11 CS 4002 Unit 3 - Normalization 5NF (PJNF) o Definitions: ▪ It is also known as Project Join Normal Form (PJNF) ▪ A relation is in 5NF if :- It is in 4NF, and does not contain any join dependency, joining should be lossless. ▪ 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy. o Associated Functional Dependency: ▪ Join Dependency. o Example: SUBJECT LECTURER SEMESTER Computer Vishal Semester 1 Computer Shivendra Semester 1 Math Shivendra Semester 1 Math Mohit Semester 2 Chemistry Shubham Semester 1 ▪ In the above table, Shivendra takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. ▪ In this case, combination of all these fields required to identify a valid data. ▪ Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL. ▪ But all three columns together acts as a primary key, so we can't leave other two columns blank. ▪ So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3: P1 P2 P3 SUBJECT SEMESTER SUBJECT LECTURER SEMESTER LECTURER Computer Semester 1 Computer Vishal Semester 1 Vishal Computer Semester 1 Computer Shivendra Semester 1 Shivendra Math Semester 1 Math Shivendra Semester 1 Shivendra Math Semester 2 Math Mohit Semester 2 Mohit Chemistry Semester 1 Chemistry Semester 1 Chemistry Semester 1 Page 12