DBMS Module 4 Part 2 PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides a detailed overview of database refinement, specifically focusing on functional dependencies and normal forms (1NF, 2NF, 3NF, BCNF). It outlines the importance of clear attribute semantics, avoiding redundancy and NULL values to create a robust relational database design.
Full Transcript
CONTEN Database Refinement : TS – Functional Dependencies – Features of good relational database design – Atomic domain and Normalization (1NF, 2NF, 3NF, BCNF)1NF Introduction How to decide whether your relation schema is good or b...
CONTEN Database Refinement : TS – Functional Dependencies – Features of good relational database design – Atomic domain and Normalization (1NF, 2NF, 3NF, BCNF)1NF Introduction How to decide whether your relation schema is good or bad? Use 2 types of guidelines – Informal Design Guidelines – Functional Dependency: is the formal guideline Informal Design Guidelines for Relation Schemas Informal Design Guidelines – May be used as measures to determine the quality of relation schema design: 1. Making sure that the semantics of the attributes is clear in the schema 2. Reducing the redundant information in tuples 3. Reducing the NULL values in tuples 4. Disallowing the possibility of generating spurious tuples Imparting Clear Semantics to Attributes in Relations The semantics of a relation – refers to meaning of relation from the interpretation of attribute values in a tuple. – To interpret the correct meaning of relation analyzed relation as – a set of facts Using concepts like domains, attributes, tuples, relations ,implicit constraints, explicit constraints etc.. – used conceptual design ER diagrams and mapping to relation schema. Imparting Clear Semantics to Attributes in Relations Examples of Violation of Guideline 1 Guideline 1: Imparting Clear Semantics to Attributes in Relations Design relation schema, so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation – Conclusion: if a relation schema corresponds to one entity type or one relationship type, it will be easy to interpret and to explain its meaning. Reducing the redundant information in tuples Redundant Information in Tuples and Update Anomalies Grouping attributes into relation schemas – Significant effect on storage space – Storing natural joins of base relations leads to an additional problem referred to as update anomalies. – These can be classified into insertion anomalies deletion anomalies modification anomalies Guideline 2: Reducing the redundant information in tuples These three anomalies are undesirable and cause difficulties to maintain consistency of data as well as require unnecessary updates that can be avoided; Guideline 2: Design base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present: – Note them clearly – Make sure that the programs that update the database will operate correctly NULL Values in Tuples If wrong schema design is used, may group many attributes together into a “fat” relation – Can end up with many NULLs Problems with NULLs – Wasted storage space – Problems understanding meaning how to account for them when aggregate operations such as COUNT or SUM are applied. SELECT and JOIN operations involve comparisons. NULLs can have multiple interpretations (doesn’t apply, unknown, known but absent). – Having the same representation for all NULLs compromises the different meanings they may have. Guideline 3:Avoid NULL Values in Tuples Guideline 3: Avoid placing attributes in a base relation whose values may frequently be NULL If NULLs are unavoidable: – Make sure that they apply in exceptional cases only, not to a majority of tuples. Note: – Whether to include Columns that may have NULLs in a relation or to have a separate relation for those columns (with the appropriate key columns). – For example, if only 15 percent of employees have individual offices, there is little justification for including an attribute Office_number in the EMPLOYEE relation; rather, a relation EMP_OFFICES(Essn, Office_number) can be created to include Generation of Spurious Tuples Guideline 4: Avoid Generation of Spurious Tuples Design relation schemas to be joined with equality conditions on attributes that are appropriately related – Guarantees that no spurious tuples are generated Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations. Functional Dependencies Functional Dependency is a formal tool for analysis of relational schemas to detect schema design related problems. Functional Dependency is a constraint between two sets of attributes from the database. Definition: A functional dependency, denoted by X -> Y (between two sets of attributes X and Y that are subsets of R) specifies a constraint on the possible tuples for every state r of relation R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X]= t2[X], they must also have t1[Y]=t2[Y]. – There is a functional dependency from X to Y (or the value of Y is determined by the value of X (or X uniquely determines the value of Y) (or Y is functionally dependent on X) – The abbreviation for functional dependency is FD or f.d. – The set of attributes X is called the left-hand side of the FD, and Y is called the right- hand side of the FD. An FD is to determine the relation ship between attributes in a relation R is denoted by x->y X is determinant and Y is dependant Which means y is functionally dependent on x Or X is functionally determining y FD= LHS -> RHS LHS of FD is called determinant and RHS of FD is called dependant For one value of LHS there should be only one value of RHS( If we get more than one value then it is not an FD) Functional Dependencies Ename is functionally determined by (or functionally dependent on) Ssn, Functional Typically, the schema Dependencies designer specifies the functional dependencies that are semantically obvious; Usually, however, numerous other functional dependencies hold in all legal relation instances among sets of attributes. Those other dependencies can be inferred or deduced from the FDs in F. Super keys : {Bookid}, {Bookid, Bookname}, {Bookid, Author}, {Bookname, Author}, {Bookid, bookname, Author} Candidate keys : {Bookid}, {Bookname, Author} Primary key : {Bookid} Functional Dependencies The following FDs may hold because the four tuples in the current extension have no violation of these constraints: B→ C; C → B; {A, B} → C; {A, B} → D; and Inferences Rules For Functional Dependencies If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all functional dependencies logically implied by F (or that can be logically inferred from F). F = {FD1,FD2,FD3} F+ = {FD1,FD2,FD3,FD4,FD5} Given that X, Y, and Z are sets of attributes in a relation R, one can derive several properties of functional dependencies. Among the most important are the following, usually called Armstrong's axioms. 1. IR1. (Reflexive rule) 2. IR2. (Augmentation rule) 3. IR3. (Transitive rule) Inferences Rules For Functional Dependencies Armstrong's Axioms are a set of rules, that when applied repeatedly, generates a closure of functional dependencies. Armstrong's axioms are used to conclude functional dependencies on a relational database. Using the inference rule, derive additional functional dependency from the initial set. Reflexivity: If Y is a subset of X, then X → Y Augmentation: If X → Y, then XZ → YZ Transitivity: If X → Y and Y → Z, then X → Z Armstrong's Axioms: Inferences Rules 1. IR1. (Reflexive For rule) FD :- If Y is subset-of X, then X -> Y 2.IR2. (Augmentation rule) :- The augmentation is also called as a partial dependency. In augmentation, if X determines Y, then XZ determines YZ for any Z. That is adding attributes in dependencies, does not change the basic dependencies. If X-> Y, then XZ -> YZ or X ->Y = ╞ XZ -> YZ meaning: [X-> Y is equivalent to XZ -> YZ] Example: 3. IR3. (Transitive rule) If X -> Y and Y -> Z, then X -> Z or {X -> Y, Y -> Z} ╞ X -> Z meaning: [X-> Y and Y -> Z is equivalent to X -> Z] Additional Inference Rules The following rules can be derived using Armstrong's axioms: Union : If 𝑋→𝑌 and 𝑋→𝑍, then X→YZ. Example: If {𝐴}→{𝐵} and {𝐴}→{𝐶}{A}→{C}, then {𝐴}→{𝐵,𝐶} Decomposition : If 𝑋→𝑌Z, then 𝑋→𝑌 and 𝑋→𝑍 Example: If {𝐴}→{𝐵,𝐶}, then {𝐴}→{𝐵}and {𝐴}→{𝐶} Pseudotransitivity : If 𝑋→𝑌 and WY→Z, then 𝑊𝑋→𝑍. Example: If {A}→{B} and {𝐶,𝐵}→{𝐷}, then {𝐴,𝐶}→{𝐷}. Example Problem Given 𝐹={𝐴→𝐵,𝐵→𝐶} derive 𝐴→𝐶. Using Armstrong's Axioms:From 𝐴→𝐵and 𝐵→𝐶, apply transitivity:𝐴→𝐶. Thus, 𝐴→𝐶 is derived. Trivial Functional Dependency Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial FD..Explanation: The dependency is considered trivial because 𝑌 is already included in 𝑋, so the dependency is alway ttrue. Example:Consider 𝑋={𝐴,𝐵} and 𝑌={𝐴}.Here, {𝐴,𝐵}→{𝐴}is trivial because 𝐴 is already a part of {𝐴,𝐵}. Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non- trivial FD. Example:Consider 𝑋={𝐴} and 𝑌={𝐵} Here, {𝐴}→{𝐵} is non-trivial because 𝐵 is not part of {𝐴}. Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a completely non-trivial FD. Semi-Trivial 𝑌∩𝑋≠∅ but 𝑌⊈𝑋 Example :{𝐴,𝐵}→{𝐵,𝐶} is semi-trivial because 𝐵 is part of 𝑋, but 𝐶 is not. Functional Dependencies Example :Given set of functional dependencies FD1: {SSN, PNUMBER} -> HOURS FD2: SSN -> ENAME FD3: PNUMBER -> {PNAME, PLOCATION} Closure sets with respect to F {SSN} -> {SSN, ENAME} {PNUMBER} -> {PNUMBER, PNAME, PLOCATION} Normalization on Relational Data Base Normalization of data is the process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties which are as follows: 1.Minimizing redundancy 2.Minimizing the insertion , deletion , and update anomalies An unsatisfactory relation schema that don’t meet certain conditions for a normal form test, are decomposed into smaller relation schemas that meet the tests and hence posses the desirable properties. It can be considered as a “fi ltering” or “purifi cation” process to make the design have successively better quality. Normalization- Outcomes and Normalization is a systematic approach to organizing a relational database in a way that reduces Definition redundancy and improves data integrity. It involves breaking down a database into smaller, related tables and defining relationships between them. The normalization procedure provides database designers with the following: A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes.(Keys: A key uniquely identifies a row in a table. For example, a "Student ID" might serve as a primary key in a "Students" table. Functional Dependencies: These define relationships between attributes. For instance, if a "Student ID" uniquely determines a "Student Name," then the name is functionally dependent on the ID.) By analyzing keys and functional dependencies, designers can identify problems like data duplication or anomalies. A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree. Definition of Normal Form: The normal form of a relation refers to the highest normal form condition that it meets, and hence indicates the degree to which it has been normalized. Eg: A table in 3NF has also passed the rules for 1NF and 2NF, meaning it is better normalized. Achieving higher normal forms reduces data redundancy and prevents anomalies, such as update anomalies (where changes to data must be repeated in multiple places). Normalization on Relational Data Base The process of normalization through decomposition must also confi rm the existence of two additional properties that the relational schemas (together) should possess. – The lossless join or nonadditive join property It guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition. It is extremely critical and must be achieved at any cost. – The dependency preservation property It ensures that each functional dependency is represented in some individual relation resulting after decomposition. Sometimes sacrificed for higher performance. Consider a relation R(A,B,C) with the FDs:𝐴→𝐵,B→C If 𝑅 is decomposed into 𝑅1(𝐴,𝐵) and 𝑅2(𝐵,𝐶) the decomposition is dependency-preserving because:𝐹1={𝐴→𝐵} ,𝐹2={𝐵→𝐶} 𝐹1∪𝐹2=𝐹, so no FD is lost. Decomposition in DBMS is the process of breaking a relation (table) into two or more smaller relations in such a way that the decomposition is lossless (no information is lost) and preserves dependencies. It is an essential concept in relational database design to improve database structure, eliminate redundancy, and resolve anomalies. Types of Decomposition 1.Lossless Decomposition: 1. Ensures that no information is lost when the original table is decomposed into two or more tables. 2. The natural join of the decomposed tables must result in the original table. 2.Dependency-Preserving Decomposition: 1. Ensures that all functional dependencies are preserved in the decomposed relations. 2. Useful for maintaining consistency without the need to reconstruct the original relation. Original Table 𝑅(𝐴,𝐵,𝐶): A B C 1 X P 2 Y Q 3 Z R Decomposed Tables:Table 𝑅1(𝐴,𝐵) A B 4 X 5 Y 6 Z Table 𝑅2(𝐴,𝐶) A C 7 P 8 Q 9 R To verify the lossless property, perform a natural join on R 1 and 𝑅2 using the common attribute 𝐴 𝑅1𝑅 2= 𝑅(𝐴,𝐵,𝐶) This matches the original table, confirming the decomposition is Definitions of Keys and Attributes Participating in Keys Definition: Prime Attribute: An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R. Non-Prime Attribute: An attribute is called nonprime if Key point: If anitattribute is not aisprime part of attribute—that is, itif isit aisprime any candidate key, not aattribute. member of any candidate key. Ex: Both Ssn and Pnumber are prime attributes of WORKS_ON, whereas other attributes of WORKS_ON are nonprime. Why Normalize? Eliminates Redundancy: By dividing data into related tables, repeated information is minimized. 1N F It states that the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute. In other words, 1NF disallows relations within relations or relations as attribute values within tuples. The only attribute values permitted by 1NF are single atomic (or indivisible) values. A relation schema which is not in 1 NF Dlocations is not an atomic attribute. 1N A relation schema which is not in 1 NF F The above relation is not in 1NF because, Dlocations is not an atomic attribute. There are two ways can look at the Dlocations attribute: The domain of Dlocations contains atomic values, but some tuples can have a set of these values. In this case, Dlocations is not functionally dependent on the primary key Dnumber. The domain of Dlocations contains sets of values and hence is non atomic. In this case, Dnumber → Dlocations because each set is 1NF : First There are three main techniques to achieve first normal form for such a relation: Technique First technique: – Remove the attribute DLOCATIONS that violates 1NF and place it in a – separate relation The primary key DEPT_LOCATIONS of relation along is with the primary key the combination DNUMBER of DEPARTMENT. DEPT_LOCATIONS {DNUMBER, DLOCATION}. – This method is best method. 1NF: Second Second Technique: Technique – Expand the key so that there will be a separate tuple in the original DEPARTMENT for each location of a DEPARTMENT. – Then Primary Key becomes {DNUMBER, DLOCATION} and redundancy is introduced. 1NF : Domain of an Conclusion attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute. Does not allow nested relations – Nested relation means, each tuple can have a relation within it. To change to 1NF: – Remove nested relation attributes into a new relation – Propagate the primary key into it – Unnest relation into a set of 1NF relations Functional Dependency: Full and Partial Second normal form (2NF) is based on the concept of full functional dependency. A functional dependency X → Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more; That is, for any attribute A ε X, (X – {A}) does not functionally determine Y. A functional dependency X → Y is a partial dependency if some attribute A ε X can be removed from X and the dependency still holds; That is, for some A ε X, (X – {A}) → Y. Functional Dependency: Full and Partial In Figure 15.3(b), the primary keys are {SSN,PNUMBER} Example for full functional dependency is FD1: – {Ssn, Pnumber} → Hours is a full dependency – Because, neither Ssn → Hours nor Pnumber → Hours, holds). Functional Dependency: Full and Partial In Figure 15.3(b), the primary keys are {SSN,PNUMBER} Example for partial functional dependencies are FD2 and FD3: 1.The dependency {Ssn, Pnumber} → Ename - because Ssn → Ename [FD2] holds it is partial 2.Similarly the functional dependency {Ssn,Pnumber} -> {Pname, Plocation} - because Pnumber -> Pname, Plocation [FD3] holds it is partial Means, if we have more than one primary key, then none of the functional dependencies 2NF : Definition A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R. Consider the following relation schema EMP_PROJ The EMP_PROJ relation in Figure 15.3(b) is in 1NF but is not in 2NF. The nonprime attribute Ename violates 2NF because of FD2, as do the nonprime attributes Pname and Plocation because of FD3. The functional dependencies FD2 and FD3 make Ename, Pname, and Plocation partially dependent on the primary key {Ssn, Pnumber} of EMP_PROJ, thus violating the 2NF test. If Primary Key contains single attribute, the test need not be applied. 2NF Normalization If a relation schema is not in 2NF, it can be second normalized or 2NF normalized into a number of 2NF relations in which nonprime attributes are associated only with the part of the primary key on which they are fully functionally dependent. Therefore, the functional dependencies FD1, FD2, and FD3 in Figure 15.3(b) lead to the decomposition of EMP_PROJ into the three relation schemas EP1, EP2, and EP3 shown in Figure 15.11(a), General Definition of Second Normal Form Defi nition. A relation schema R is in second normal form (2NF) if every non- prime attribute A in R is not partially dependent on any key of R. General Definition of Second Normal Form- Example Consider the relation schema LOTS shown in Fig, which describes parcels of land for sale in various counties(towns) of a state. Suppose that there are two candidate keys: Property_id# and {County_name, Lot#}; That is, lot numbers are unique only within each county, but Property_id# numbers are unique across counties for the entire state. Based on the two candidate keys Property_id# and {County_name, Lot#}, there are two functional dependencies FD1 and FD2. Suppose that the following two additional functional dependencies hold in LOTS: FD3: County_name → Tax_rate FD4: Area → Price General Definition of Second Normal Form- Example 3 NF Third normal form(3NF) is based on the concept of transitive dependency. Transitive Dependency – A functional dependency X → Y in a relation schema R is a transitive dependency if there exists a set of attributes Z in R that is neither a candidate key nor a subset of any key of R, and both X → Z and Z → Y hold. – [Z is a non-prime attribute, a non-prime attribute should not determine another non-prime attribute] 3 NF FD1: Ssn -> Ename, Bdate, Address, Dnumber FD2: Dnumber -> Dname, Dmgr_ssn The dependency SSN -> DMGRSSN is a transitive FD since – because both the dependencies Ssn → Dnumber and Dnumber → Dmgr_ssn hold and Dnumber is neither a key itself nor a subset of the key of EMP_DEPT. Intuitively, we can see that the dependency of Dmgr_ssn on Dnumber is undesirable in EMP_DEPT since Dnumber is not a key of EMP_DEPT. 3 NF: Definition Defi nition: A relation schema R is in 3NF if it satisfi es 2NF and no nonprime attribute of R is transitively dependent on any key of R. Every non-key column is non-transitively dependent on the primary key.No non-key column should depend on another non-key column. The relation schema EMP_DEPT , as above is in 2NF, since no partial dependencies on a key exist. However, EMP_DEPT is not in 3NF because of the transitive dependency of Dmgr_ssn (and also Dname) on Ssn via Dnumber. Normalizing to 3 NF Normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2 shown below. Intuitively, we see that ED1 and ED2 represent independent entity facts about employees and departments. A NATURAL JOIN operation on ED1 and ED2 will recover the original relation EMP_DEPT without generating spurious tuples. General Definition of Third Normal Form Defi nition. A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency X → A holds in R, either (a) X is a superkey of R, or (b) A is a prime attribute of R. How to check for 3 N.F is – L.H.S must be a candidate key or superkey OR – R.H.S is a prime attribute. General Definition of Third Normal Form Defi nition. A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency X → A holds in R, either (a) X is a superkey of R, or (b) A is a prime attribute of R. According to this definition, LOTS2 is in 3NF. However, FD4 in LOTS1 violates 3NF because Area is not a superkey and Price is not a prime attribute in LOTS1. To normalize LOTS1 into 3NF, we decompose it into the relation schemas LOTS1A and LOTS1B shown in Figure 15.12(c). We construct LOTS1A by removing the attribute Price that violates 3NF from LOTS1 and placing it with Area (the left- hand side of FD4 that causes the transitive dependency) into another relation LOTS1B. Both LOTS1A and LOTS1B are in 3NF. General Definition of Third Normal Form- Example General Definition of Third Normal Form Two points are worth noting about this example and the general definition of 3NF: LOTS1 violates 3NF because Price is transitively dependent on each of the candidate keys of LOTS1 via the nonprime attribute Area. This general definition can be applied directly to test whether a relation schema is in 3NF; it does not have to go through 2NF first. If we apply the above 3NF definition to LOTS with the dependencies FD1 through FD4, we find that both FD3 and FD4 violate 3NF(county_name alone not a key attribute). Therefore, decompose LOTS into LOTS1A, LOTS1B, and LOTS2 directly. Hence, the transitive and partial dependencies that violate 3NF can be removed in any order. Boyce-Codd Normal Form Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF. Every relation in BCNF is also in 3NF, but a relation in 3NF not necessarily be in BCNF. A relation schema R is in BCNF if – It is in 3NF and – For every functional dependency X -> A, the determinant X is a superkey of R. Definition. A relation schema R is in BCNF if whenever a nontrivial functional dependency X → A holds in R, then X is a superkey of R. Relation Schema in 3NF but not in BCNF Decomposition of Relations not in BCNF