Database Systems Normalization PDF
Document Details
Uploaded by GracefulMossAgate
Carlos Coronel, Steven Morris
Tags
Summary
This document provides information about database normalization. It explains different normal forms (1NF, 2NF, and 3NF), functional dependencies, and how to convert a database from one form to another. It also explores surrogate keys, data modeling checklists, and the different aspects of database management.
Full Transcript
Chapter 5 Normalization of Database Tables Learning Objectives • After completing this chapter, you will be able to: • Explain normalization and its role in the database design process • Identify and describe each of the normal forms: 1NF, 2NF, and 3NF • Explain how normal forms can be transforme...
Chapter 5 Normalization of Database Tables Learning Objectives • After completing this chapter, you will be able to: • Explain normalization and its role in the database design process • Identify and describe each of the normal forms: 1NF, 2NF, and 3NF • Explain how normal forms can be transformed from lower normal forms to higher normal forms • Apply normalization rules to evaluate and correct table structures • Identify situations that require denormalization to generate information efficiently • Use a data-modeling checklist to check that the ERD meets a set of minimum requirements © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 2 Database Tables and Normalization (1 of 2) • Normalization: evaluating and correcting table structures to minimize data redundancies • Reduces data anomalies • Assigns attributes to tables based on determination • Normal forms • First normal form (1NF) • Second normal form (2NF) • Third normal form (3NF) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 3 Database Tables and Normalization (2 of 2) • Structural point of view of normal forms • Higher normal forms are better than lower normal forms • Denormalization: produces a lower normal form • Results in increased performance and greater data redundancy © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 4 The Need for Normalization • Used while designing a new database structure • Analyzes the relationship among the attributes within each entity • Determines if the structure can be improved through normalization • Improves the existing data structure and creates an appropriate database design © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 5 The Normalization Process (1 of 5) • Objective is to ensure that each table conforms to the concept of well-formed relations • • • • • Each table represents a single subject Each row/column intersection contains only one value and not a group of values No data item will be unnecessarily stored in more than one table All nonprime attributes in a table are dependent on the primary key Each table has no insertion, update, or deletion anomalies © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 6 The Normalization Process (2 of 5) • Ensures that all tables are in at least 3NF • Higher forms are not likely to be encountered in business environment • Works one relation at a time • Identifies the dependencies of a relation (table) • Progressively breaks the relation up into a new set of relations © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 7 The Normalization Process: Functional Dependence Consider a table called "Employees" with attributes "EmployeeID," "EmployeeName," and "Department." In this table, "EmployeeName" is functionally dependent on "EmployeeID." For each unique "EmployeeID," there is a unique "EmployeeName." However, "EmployeeName" may also depend on the "Department" attribute because employees with the same department can have the same name. In this example, "EmployeeName" depends on "EmployeeID," but it is not fully functionally dependent because "EmployeeName" can also be determined by the "Department" attribute. EmployeeID EmployeeName Department 101 John Sales 102 Mary Marketing 103 David Sales © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 8 The Normalization Process: Fully Functional Dependence Now, consider a modified version of the "Employees" table where we have eliminated the possibility of the "EmployeeName" being determined by any attribute other than "EmployeeID.“ In this case, "EmployeeName" is fully functionally dependent on "EmployeeID." There are no other attributes within the table that can determine "EmployeeName." Removing any other attribute, such as "Department" or "Salary," does not change the dependency between "EmployeeID" and "EmployeeName.“ In summary, in the first example, "EmployeeName" was functionally dependent on "EmployeeID" but also partially dependent on "Department." In the second example, we ensured that "EmployeeName" is fully functionally dependent on "EmployeeID" by removing any other potential sources of dependency. EmployeeID EmployeeName 101 John 102 Mary 103 David © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 9 The Normalization Process: Functional Dependence Consider a table called "Students" with attributes "StudentID," "StudentName," "StudentMajor," and "StudentAdvisor." In this table, "StudentAdvisor" is functionally dependent on "StudentID." For each unique "StudentID," there is a unique "StudentAdvisor." However, "StudentAdvisor" may also depend on the "StudentMajor" because students with the same major might have the same advisor. In this example, "StudentAdvisor" depends on "StudentID," but it is not fully functionally dependent because "StudentAdvisor" can also be determined by the "StudentMajor" attribute. . StudentID StudentName StudentMajor StudentAdvisor 101 Alice Biology Dr. Smith 102 Bob Chemistry Dr. Johnson 103 Carol Biology Dr. Smith © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 10 The Normalization Process: Fully Functional Dependence Now, consider a modified version of the "Students" table where we have eliminated the possibility of "StudentAdvisor" being determined by any attribute other than "StudentID.“ In this case, "StudentAdvisor" is fully functionally dependent on "StudentID." There are no other attributes within the table that can determine "StudentAdvisor." Removing any other attribute, such as "StudentMajor," does not change the dependency between "StudentID" and "StudentAdvisor." In summary, in the first example, "StudentAdvisor" was functionally dependent on "StudentID" but also partially dependent on "StudentMajor." In the second example, we ensured that "StudentAdvisor" is fully functionally dependent on "StudentID" by removing any other potential sources of dependency. StudentID StudentName 101 Alice 102 Bob 103 Carol © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 11 Partial dependency • Functional dependence in which the determinant is only part of the primary key • A partial dependency occurs when an attribute depends on only part of the primary key rather than the entire primary key. StudentID CourseID Grade Professor 1 C1 A Prof. X 1 C2 B Prof. Y 2 C1 C Prof. X 2 C2 A Prof. Y ProfessorID DepartmentID Salary Department Name P1 D1 5000 Computer Science P2 D1 6000 Computer Science P3 D2 7000 Mathematic s In this table, the primary key is a composite key (StudentID, CourseID). The Grade depends on both StudentID and CourseID, but the Professor only depends on the CourseID. This is a partial dependency. In this table, the primary key is a composite key (ProfessorID, DepartmentID). The Salary depends on both ProfessorID and DepartmentID, but the DepartmentName only depends on the DepartmentID. This is another example of partial dependency. © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 12 Partial dependency OrderID ProductID Quantity ProductName O1 P1 2 Product A O1 P2 1 Product B O2 P1 3 Product A O2 P3 2 Product C CourseID Semester RoomNumb Building er C1 Fall 101 Science C2 Spring 202 Arts C1 Spring 101 Science C2 Fall 202 Arts In the table, the primary key is a composite key (OrderID, ProductID). The Quantity depends on both OrderID and ProductID, but the ProductName only depends on the ProductID. This is another example of partial dependency. In this table, the candidate key is a composite key (CourseID, Semester). The RoomNumber depends on both CourseID and Semester, but the Building only depends on the RoomNumber. This is another example of partial dependency. © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 13 Transitive dependency • Attribute is dependent on another attribute that is not part of the primary key • More difficult to identify among a set of data • Occur only when a functional dependence exists among nonprime attributes StudentID MajorID MajorName Department S1 M1 Computer Science Engineering In this table, the primary key is StudentID. The MajorName is dependent on the MajorID, and the Department is dependent on the MajorName. S2 M2 English Literature Humanities S3 M1 Computer Science Engineering CourseID InstructorID InstructorName Office C1 I1 Prof. A Room 101 C2 I2 Prof. B Room 202 This is a transitive dependency because the Department is indirectly dependent on the StudentID through the MajorName. In this table, the primary key is CourseID. The InstructorName is dependent on the InstructorID, and the Office is dependent on the InstructorName. C3 I1 Prof. A Room 101 This is a transitive dependency because the Office is indirectly dependent on the CourseID through the InstructorName. © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 14 The Normalization Process (3 of 5) Table 6.3: Functional Dependence Concepts Concept Definition Functional dependence The attribute B is fully functionally dependent on the attribute A if each value of A determines one and only one value of B. Example: PROJ_NUM S PROJ_NAME (read as PROJ_NUM functionally determines PROJ_NAME) In this case, the attribute PROJ_NUM is known as the determinant attribute, and the attribute PROJ_NAME is known as the dependent attribute. Functional dependence (generalized definition) Attribute A determines attribute B (that is, B is functionally dependent on A) if you look at all the rows in a table and see that they have the same value for attribute A, they will also have the same value for attribute B. In simpler terms, when you know the value of A, you can be sure about the value of B in that table. Fully functional dependence (composite key) If attribute B is functionally dependent on a composite key A but not on any subset of that composite key, the attribute B is fully functionally dependent on A. © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 15 The Normalization Process (4 of 5) Table 6.2: Normal Forms Normal Form Characteristic Section First normal form (1NF) Table format, no multi-valued, and PK identified , identify partial and transitive dependencies 6-3a Second normal form (2NF) 1NF and no partial dependencies 6-3b Third normal form (3NF) 2NF and no transitive dependencies 6-3c Boyce-Codd normal form (BCNF) Every determinant is a candidate key (special case of 3NF) 6-6a Fourth normal form (4NF) 3NF and no independent multivalued dependencies 6-6b © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 16 Original Dataset (Unnormalized) • Consider a dataset with information about library's book inventory: BookID Title Author Genre ISBN Publisher 1 "Book 1" "Author 1, Author 2" "Fiction" "9781234567890" "Publisher A" 2 "Book 2" "Author 3" "Mystery" 3 "Book 3" "Author 2" "Fiction" "9789876543210" "9781111111111" "Publisher B" "Publisher A" • This dataset is not in 1NF because the "Author" column contains multiple authors separated by commas. © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 17 Conversion to First Normal Form (1NF) (1 of 3) • Repeating group: group of multiple entries of same type can exist for any single key attribute occurrence • Reduces data redundancies • Three step procedure • Eliminate the multi-valued attributes • Identify the primary key • Identify all dependencies • Dependency diagram: depicts all dependencies found within given table structure • Helps to get an overview of all relationships among table’s attributes • Makes it less likely that an important dependency will be overlooked © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 18 Conversion to First Normal Form (1NF) (2 of 3) • 1NF describes tabular format in which: • All key attributes are defined • There are no multi-valued in the table • All attributes are dependent on the primary key • All relational tables satisfy 1NF requirements • Some tables contain partial dependencies • Update, insertion, or deletion BookID 1 1 2 3 Title "Book 1" "Book 1" "Book 2" "Book 3" Author "Author 1" "Author 2" "Author 3" "Author 2" Genre "Fiction" "Fiction" "Mystery" "Fiction" ISBN "978-1234567890" "978-1234567890" "978-9876543210" "978-1111111111" Publisher "Publisher A" "Publisher A" "Publisher B" "Publisher A" • Now, the dataset is in 1NF because each column contains atomic values, and the multi-valued "Author" attribute has been split into separate rows. © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 19 Conversion to Second Normal Form (2NF) (1 of 2) • Conversion to 2NF occurs only when the 1NF has a composite primary key • If the 1NF has a single-attribute primary key, then the table is automatically in 2NF • The 1NF-to-2NF conversion is simple • Make new tables to eliminate partial dependencies - New table name - New table attributes • Reassign corresponding dependent attributes • Table is in 2NF when it: • Is in 1NF • Includes no partial dependencies © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 20 Conversion to Second Normal Form (2NF) (1 of 2) • "BookID" is the primary key. However, "Author" depends on both "BookID" and "Title." To achieve 2NF, we create two tables: "Books" and "Authors.“ • Now, "Author" depends on "AuthorID," and "Title" depends on "BookID." This eliminates partial dependencies. Authors Table (with AuthorID as the Primary Key): AuthorID Author 1 "Author 1" 2 "Author 2" 3 "Author 3" Books Table (with BookID as the Primary Key): BookID Title Genre 1 "Book 1" "Fiction" 2 "Book 2" "Mystery" 3 "Book 3" "Fiction" ISBN "9781234567890" "9789876543210" "9781111111111" Publisher AuthorID "Publisher A" 1 "Publisher B" 3 "Publisher A" 2 © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 21 Conversion to Third Normal Form (3NF) (1 of 2) • The data anomalies created by the database organization shown in Figure 6.4 are easily eliminated • Make new tables to eliminate transitive dependencies • Reassign corresponding dependent attributes • Table is in 3NF when it: • Is in 2NF • Contains no transitive dependencies © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 22 Conversion to Third Normal Form (3NF) (2 of 2) • "Publisher" depends on "ISBN," and there is a transitive dependency because "ISBN" is not the primary key of the table. We want "Publisher" to depend only on the primary key. To achieve this, we'll create a separate "Publishers" table. Let's create a "Publishers" table. Publishers Table (with PublisherID as the Primary Key): PublisherID Publisher 1 "Publisher A" 2 "Publisher B" Books Table (After 3NF Conversion): • BookID Title Genre ISBN PublisherID 1 "Book 1" "Fiction" "978-1234567890" 1 2 "Book 2" "Mystery" "978-9876543210" 2 3 "Book 3" "Fiction" "978-1111111111" 1 Now, "Publisher" depends solely on "PublisherID," which is the primary key of the "Publishers" table. This eliminates the transitive dependency and brings the dataset into 3NF. © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 23 Class Practice • • • • The below data reflects the assignment of employees to projects. Apparently, an employee can be assigned to more than one project. For example, Darlene Smithson (EMP_NUM = 112) has been assigned to two projects: Amber Wave and Starflight. Given the structure of the dataset, each project includes only a single occurrence of any one employee. Therefore, knowing the PROJ_NUM and EMP_NUM values will let you find the job classification and its hourly charge. HOURS attribute represents the number of hours an employee worked on a specific project and CHG_HOUR is the amount charge per hour. © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 24 Improving the Design • Normalization is valuable because its use helps eliminate data redundancies • Evaluate PK assignments and naming conventions • Refine attribute atomicity - Atomic attribute: cannot be further subdivided - Atomicity: characteristic of an atomic attribute • Identify new attributes and new relationships • Refine primary keys as required for data granularity - Granularity: Level of detail represented by the values stored in a table’s row • Maintain historical accuracy and evaluate using derived attributes © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 30 Surrogate Key Considerations • Used by designers when the primary key is considered to be unsuitable • System-defined attribute • Created an managed via the DBMS • Have a numeric value which is automatically incremented for each new row © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 31 Normalization and Database Design (1 of 6) • Normalization should be part of the design process • Proposed entities must meet required the normal form before table structures are created • Principles and normalization procedures to be understood to redesign and modify databases • ERD is created through an iterative process • Normalization focuses on the characteristics of specific entities © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 38 Normalization and Database Design (2 of 6) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 39 Normalization and Database Design (3 of 6) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 40 Normalization and Database Design (4 of 6) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 41 Normalization and Database Design (5 of 6) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 42 Normalization and Database Design (6 of 6) © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 43 Data-Modeling Checklist (1 of 6) • Business rules • Properly document and verify all business rules with the end users • Ensure that all business rules are written precisely, clearly, and simply - The business rules must help identify entities, attributes, relationships, and constraints • Identify the source of all business rules, and ensure that each business rule is justified, dated, and signed off by an approving authority © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 46 Data-Modeling Checklist (2 of 6) • Data modeling • Naming conventions: all names should be limited in length (database-dependent size) • Entity names: • • • • Should be nouns that are familiar to business and should be short and meaningful Should document abbreviations, synonyms, and aliases for each entity Should be unique within the model For composite entities, may include a combination of abbreviated names of the entities linked through the composite entity © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 47 Data-Modeling Checklist (3 of 6) • Attribute names: • • • • • • Should be unique within the entity Should use the entity abbreviation as a prefix Should be descriptive of the characteristic Should use suffixes such as _ID, _NUM, or _CODE for the PK attribute Should not be a reserved word Should not contain spaces or special characters such as @, !, or & • Relationship names: • Should be active or passive verbs that clearly indicate the nature of the relationship © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 48 Data-Modeling Checklist (4 of 6) • Entities: • Each entity should represent a single subject • Each entity should represent a set of distinguishable entity instances • All entities should be in 3NF - Any entities below 3NF should be justified • Granularity of the entity instance should be clearly defined • PK should be clearly defined and support the selected data granularity © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 49 Data-Modeling Checklist (5 of 6) • Attributes: • • • • Should be simple and single-valued (atomic data) Should document default values, constraints, synonyms, and aliases Derived attributes should be clearly identified and include source(s) Should not be redundant unless this is required for transaction accuracy, performance, or maintaining a history • Nonkey attributes must be fully dependent on the PK attribute • Relationships: • Should clearly identify relationship participants • Should clearly define participation, connectivity, and document cardinality © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 50 Data-Modeling Checklist (6 of 6) • ER model: • • • • • Should be validated against expected processes: inserts, updates, and deletions Should evaluate where, when, and how to maintain a history Should not contain redundant relationships except as required (see attributes) Should minimize data redundancy to ensure single-place updates Should conform to the minimal data rule: All that is needed is there, and all that is there is needed © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 51 Summary (1 of 2) • Normalization is a technique used to design tables in which data redundancies are minimized • A table is in 1NF when all key attributes are defined and all remaining attributes are dependent on the primary key • A table is in 2NF when it is in 1NF and contains no partial dependencies • A table is in 3NF when it is in 2NF and contains no transitive dependencies • A table that is not in 3NF may be split into new tables until all of the tables meet the 3NF requirements • Normalization is an important part of database design © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 52 Summary (2 of 2) • The larger the number of tables, the more additional I/O operations and processing logic you need to join them • The data-modeling checklist provides a way for the designer to check that the ERD meets a set of minimum requirements © 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. 53