Database Normalization in the Logical Phase PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document explains the concept and importance of database normalization. It describes different normalization forms (1NF, 2NF, and 3NF) and how to implement them, using examples for a Hospital Patient Records System and a University Course Enrollment System.
Full Transcript
Database Normalization in the Logical Phase In the logical phase of database design, normalization is an essential process for structuring data to eliminate redundancy, prevent anomalies, and ensure consistency across the database. Normalization systematically organizes data to reduce duplication a...
Database Normalization in the Logical Phase In the logical phase of database design, normalization is an essential process for structuring data to eliminate redundancy, prevent anomalies, and ensure consistency across the database. Normalization systematically organizes data to reduce duplication and dependency, resulting in efficient, consistent, and easy-to-maintain databases. Importance of Normalization Normalization is important because it: Reduces Redundancy: Breaking down complex tables into simpler, related tables minimizes repeated information, conserving storage space and simplifying data updates. Prevents Data Anomalies: o Insert Anomalies: Difficulty adding new data due to missing associated data (e.g., adding a department without staff). o Update Anomalies: Inconsistencies due to changes in only some instances of duplicated data (e.g., updating a department name in one record but not others). o Delete Anomalies: Loss of valuable information when deleting a record (e.g., deleting an employee record that also removes all department info). Improves Query Efficiency: Smaller, organized tables allow faster queries, especially in large databases. Normalization is achieved in stages, known as normal forms, each addressing specific data organization issues. Below are the steps of normalization, with definitions for each form. While there are additional steps in database normalization—such as Fourth Normal Form (4NF) and Fifth Normal Form (5NF)—which address more complex dependencies, we will focus on learning the first three normal forms (1NF, 2NF, and 3NF) as they cover the essential principles of data structuring for most practical applications. Mastering these three forms provides a strong foundation in database design, ensuring data integrity, consistency, and efficiency. Steps of Normalization 1. First Normal Form (1NF) o Definition: Ensures that each column contains only atomic (indivisible) values and eliminates multi-valued attributes by separating repeating groups into distinct rows. Each row must represent a unique instance with no multi-valued fields. o Objective: To simplify data by making each value in the database atomic, or indivisible, and to remove any repeating groups or arrays within a single field. 2. Second Normal Form (2NF) o Definition: Builds on 1NF by removing partial dependencies, where non-key attributes depend only on part of a composite primary key. In 2NF, every non- key attribute must depend on the entire primary key, not just a portion of it. o Objective: To eliminate partial dependencies, ensuring that every non-key attribute is functionally dependent on the entire primary key, which simplifies the structure of the database. 3. Third Normal Form (3NF) o Definition: Builds on 2NF by removing transitive dependencies, where a non- key attribute depends on another non-key attribute instead of directly on the primary key. o Objective: To ensure that every non-key attribute is fully dependent on the primary key alone, thereby simplifying relationships and avoiding indirect dependencies. Each step in normalization reduces redundancy, enhances data integrity, and results in a more efficient database structure. By adhering to these normal forms, the logical phase of database design achieves a clean, reliable, and scalable database that effectively serves its intended functions. Case 1: Hospital Patient Records System A hospital needs to organize its database to manage information about patients, doctors, diagnoses, and treatments. Initially, the data is unstructured, containing redundancy and multi- valued attributes. Initial Unnormalized Data PatientID PatientName DoctorID DoctorName Diagnosis Treatment TreatmentCost P001 John Doe D101 Dr. Smith Flu Medication A 100 P001 John Doe D101 Dr. Smith Flu Check-up 50 P002 Jane Roe D102 Dr. Johnson Hypertension Medication B 150 P002 Jane Roe D102 Dr. Johnson Hypertension Blood Test 75 P003 Alice King D103 Dr. Clark Diabetes Insulin Injection 200 Step 1: First Normal Form (1NF) To achieve 1NF, we remove repeating groups by separating data into two tables, ensuring atomic values in each cell. 1. Patient Table: PatientID PatientName P001 John Doe P002 Jane Roe P003 Alice King 2. MedicalRecord Table: PatientID DoctorID DoctorName Diagnosis Treatment TreatmentCost P001 D101 Dr. Smith Flu Medication A 100 P001 D101 Dr. Smith Flu Check-up 50 P002 D102 Dr. Johnson Hypertension Medication B 150 P002 D102 Dr. Johnson Hypertension Blood Test 75 P003 D103 Dr. Clark Diabetes Insulin Injection 200 Step 2: Second Normal Form (2NF) In 2NF, we remove partial dependencies by creating a separate Doctor table. 1. Patient Table: PatientID PatientName P001 John Doe P002 Jane Roe P003 Alice King 2. Doctor Table: DoctorID DoctorName D101 Dr. Smith D102 Dr. Johnson D103 Dr. Clark 3. MedicalRecord Table: PatientID DoctorID Diagnosis Treatment TreatmentCost P001 D101 Flu Medication A 100 P001 D101 Flu Check-up 50 P002 D102 Hypertension Medication B 150 P002 D102 Hypertension Blood Test 75 P003 D103 Diabetes Insulin Injection 200 Step 3: Third Normal Form (3NF) In 3NF, we remove transitive dependencies by separating Treatment and TreatmentCost into their own table. 1. Patient Table: PatientID PatientName P001 John Doe P002 Jane Roe P003 Alice King 2. Doctor Table: DoctorID DoctorName D101 Dr. Smith D102 Dr. Johnson D103 Dr. Clark 3. Treatment Table: Treatment TreatmentCost Medication A 100 Check-up 50 Medication B 150 Blood Test 75 Insulin Injection 200 4. MedicalRecord Table: PatientID DoctorID Diagnosis Treatment P001 D101 Flu Medication A PatientID DoctorID Diagnosis Treatment P001 D101 Flu Check-up P002 D102 Hypertension Medication B P002 D102 Hypertension Blood Test P003 D103 Diabetes Insulin Injection Normalized relational Schema: First Normal Form (1NF) 1. Patient: PatientID, PatientName 2. MedicalRecord: PatientID, DoctorID, DoctorName, Diagnosis, Treatment, TreatmentCost Second Normal Form (2NF) 1. Patient: PatientID, PatientName 2. Doctor: DoctorID, DoctorName 3. MedicalRecord: PatientID, DoctorID, Diagnosis, Treatment, TreatmentCost Third Normal Form (3NF) 1. Patient: PatientID, PatientName 2. Doctor: DoctorID, DoctorName 3. Treatment: Treatment, TreatmentCost 4. MedicalRecord: PatientID, DoctorID, Diagnosis, Treatment Case Study: University Course Enrollment System A university needs to organize its database to track information about students, courses, grades, and lecturers. Initially, the data is unstructured, with redundant entries and multi-valued attributes. Initial Unnormalized Data StudentID StudentName CourseID CourseName LecturerID LecturerName Grade 2001 Emma C101 Biology L201 Dr. Green 88 2001 Emma C102 Chemistry L202 Dr. Ray 92 2001 Emma C103 Physics L203 Dr. White 85 2002 Liam C101 Biology L201 Dr. Green 90 StudentID StudentName CourseID CourseName LecturerID LecturerName Grade 2002 Liam C104 Math L204 Dr. Black 95 Normalized relational Schema: Step 1: First Normal Form (1NF) In 1NF, we eliminate repeating groups by creating two tables, ensuring atomic values in each cell. Relational Schema: 1. Student: StudentID, StudentName 2. Enrollment: StudentID, CourseID, CourseName, LecturerID, LecturerName, Grade Step 2: Second Normal Form (2NF) In 2NF, we remove partial dependencies by creating a separate Course table. Relational Schema: 1. Student: StudentID, StudentName 2. Course: CourseID, CourseName, LecturerID, LecturerName 3. Enrollment: StudentID, CourseID, Grade Step 3: Third Normal Form (3NF) In 3NF, we remove transitive dependencies by separating LecturerName into a Lecturer table, as it depends on LecturerID rather than directly on CourseID. Relational Schema: 1. Student: StudentID, StudentName 2. Course: CourseID, CourseName, LecturerID 3. Lecturer: LecturerID, LecturerName 4. Enrollment: StudentID, CourseID, Grade This final 3NF schema ensures all attributes depend solely on the primary keys without redundancy or transitive dependencies, creating a clear and efficient structure for the university’s course enrollment database. Case Study: Employee Project Management System A company needs to organize its database to manage information about employees, projects, and assignments. Initially, the data is unstructured, containing redundancy and multi-valued attributes. Initial Unnormalized Data EmployeeID EmployeeName ProjectID ProjectName Role HoursWorked E001 John Smith P101 Website Redesign Developer 40 E001 John Smith P102 Mobile App Tester 20 E002 Alice Brown P101 Website Redesign Designer 35 E002 Alice Brown P103 Data Analysis Analyst 30 E003 Bob Johnson P102 Mobile App Developer 25 Normalized relational Schema: Step 1: First Normal Form (1NF) To achieve 1NF, we remove repeating groups by creating two tables, ensuring atomic values in each cell. Relational Schema: 1. Employee: EmployeeID, EmployeeName 2. Assignment: EmployeeID, ProjectID, ProjectName, Role, HoursWorked Step 2: Second Normal Form (2NF) In 2NF, we remove partial dependencies by creating a separate Project table, ensuring all non-key attributes depend on the entire primary key. Relational Schema: 1. Employee: EmployeeID, EmployeeName 2. Project: ProjectID, ProjectName 3. Assignment: EmployeeID, ProjectID, Role, HoursWorked Step 3: Third Normal Form (3NF) In 3NF, we remove transitive dependencies by ensuring non-key attributes depend only on the primary key. In this case, we don’t have any further transitive dependencies to address, so our schema is already in 3NF after the 2NF adjustments. Final Relational Schema: 1. Employee: EmployeeID, EmployeeName 2. Project: ProjectID, ProjectName 3. Assignment: EmployeeID, ProjectID, Role, HoursWorked Summary This final schema for the Employee Project Management System in 3NF ensures that all attributes are dependent solely on the primary keys, without redundancy or transitive dependencies. This structure supports efficient data storage and query performance for the company’s project management needs.