Database Anomalies and normalization.pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
Part 1 Database Anomalies and Normalization Objectives Define database anomaly, and its frequency of occurrence. Identify different types of data anomalies. Discss how to address different types of anomalies. Discuss the fundamental types of Database Normalization. What is a database anom...
Part 1 Database Anomalies and Normalization Objectives Define database anomaly, and its frequency of occurrence. Identify different types of data anomalies. Discss how to address different types of anomalies. Discuss the fundamental types of Database Normalization. What is a database anomaly? Anomaly means inconsistency in the pattern from the normal form. In a database management system (DBMS), anomaly means an inconsistency that occurs in the relational table during the operations performed on the relational table. Anomalies in the relational model refer to inconsistencies or errors that can arise when working with relational databases, specifically in data insertion, deletion, and modification. What are the causes of database anomalies? Database anomalies are the faults in the database caused due to poor management of storing everything in the database. If a table is constructed poorly, then there is a chance of database anomaly. Due to database anomalies, the integrity of the database suffers. What are the causes of database anomalies? The other reason for the database anomalies is that all the data is stored in a single table. For example, if there is a lot of redundant data in our database, then DBMS anomalies can occur. Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. Anomalies in Relational Model 3 Types of Anomalies Insertion Anomalies Deletion Anomalies Update Anomalies Insertion Anomalies Insertion anomalies happen when vital data is inserted into the database, which is not possible because other data is not already there. For example, if a system is designed to require that a customer be on file before a sale can be made to that customer, but you cannot add a customer until they have bought something, then you have an insert anomaly. Insertion Anomaly Deletion Anomalies This anomaly happens when the deletion of unwanted information causes desired information to be deleted as well. For example, if a single database record contains information about a particular product along with information about a salesperson for the company and the salesperson quits, then information about the product is deleted along with the salesperson's information. Sample Table Deletion Anomaly- Sample Data Deletion Anomaly Update Anomalies Update Anomalies happen when the person charged with keeping all the records current and accurate is asked, for example, to change an employee’s title due to a promotion. If the data is stored redundantly in the same table, and the person misses any of them, then multiple titles will be associated with the employee. The end-user has no way of knowing which is the correct title. Update Anomaly Sample Data Update Anomaly How do we address database anomalies? Generally, database anomalies are removed by the process of NORMALIZATION which is performed by splitting/joining of tables. What is Database Normalization? Normalization is the process of organizing data in a database. It includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency. Example If Row 2 is updated, row 4 is also updated as they have the same values. This will make the data of row 4 incorrect. Remember that we are only updating row 2 and not row 4. Addressing Anomalies through Normalization Addressing Anomalies through Normalization Add New Department without adding new employees. Is insert anomaly addressed? Addressing Anomalies through Normalization Deleting Employees: If Employee smg leaves the company, will the department where is smg is assigned will also be deleted? Addressing Anomalies through Normalization Update anomaly: If department head pqr is changed to mpr, only records of dept_id 7 will be affected. The importance of normalization If a table has data redundancy and is not properly normalized, then it will be difficult to handle and update the database, without facing data loss. It will also eat up extra memory space and Insertion, Update, and Deletion Anomalies are very frequent if the database is not normalized. Levels of Normalization First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) First Normal Form What is First Normal Form? A relation is in first normal form if every attribute in that relation is single-valued attribute. ✓No Composite attribute ✓No Multi-valued Attribute First Normal Form First Normal Form Is this an optimal solution? First Normal Form Is this an optimal solution compared to the first solution? First Normal Form Note: A database design is considered as a bad design if it is not even in the First Normal Form (1NF). First Normal Form What is the significance of 1NF in database design? The implementation of 1NF is essential for the preservation of data integrity and productivity. It eliminates the need for redundant data and ensures that the data is structured in a manner that facilitates simple queries and operations. Second Normal Form What is 2nd Normal Form? The second Normal Form (2NF) is based on fully functional dependency. The second Normal Form applies to relations with composite keys, that is, relations with a primary key composed of two or more attributes. A relation with a single-attribute primary key is automatically in at least 2NF. A relation that is not in 2NF may suffer from the update anomalies. What is 2nd Normal Form? What is a Functional Dependency? A functional dependency is a relationship between two sets of attributes in a database, where one set (the determinant) determines the values of the other set (the dependent). For example, in a database of employees, the employee ID number (determinant) would determine the employee’s name, address, and other personal information (dependent). This means that, given an employee ID number, we can determine the corresponding employee’s name and other personal information, but not vice versa. What is a Functional Dependency? Functional dependencies can also be represented using mathematical notation. Employee ID → Employee Name, Address, etc. It’s important to note that functional dependencies only apply to the individual tuples in the table, and not to the table as a whole. Functional Dependency Functional Dependency Functional Dependency Functional Dependency Functional Dependency Functional Dependency 4 cases of functional dependency: Important points on 1 and 2 Normal Form st nd 1NF is centered on the removal of repeating groups, and 2NF is associated with the issue of partial dependencies. With the aid of these normal forms, the databases yield high efficiency and can withstand peculiarities viewed as anomalies, thereby helping in the proper organization of databases. Partial Dependency Partial Dependency Second Normal Form Partial Dependency Convert to 2 NF nd Business Rule: In a university database, each course can be taught by multiple professors, and a professor can teach multiple courses. The Course table stores the course code, professor ID, and semester. Why can Professor ID not be a foreign key in the course table? Why can Professor ID not be a foreign key in the course table? Making Professor_ID a foreign key in the Course table violates the Second Normal Form (2NF) due to partial dependency. Why can Professor ID not be a foreign key in the course table? How it Violates 2NF: In the Course table, if you include Professor_ID as a foreign key, you create a partial dependency: Professor_ID depends only on Course_Code, not on the full composite key (Course_Code, Semester). Therefore, attributes like Professor_Name (if added) would also depend only on Professor_ID (a part of the key), not on the full key. This violates the principle that all non-prime attributes should depend on the entire primary key. Why can Professor ID not be a foreign key in the course table? Correct Approach (Junction Table): Instead of storing Professor_ID in the Course table, you separate it into a Junction Table (Teaching Assignment) where Course_Code and Professor_ID form a composite primary key. This way: Professor_ID depends on the entire key in the Teaching Assignment Table (which is a composite of Course_Code and Professor_ID). No partial dependency exists, thus preserving 2NF. The Role of the Junction Table (Bridge) A. Represents a Many-to-Many Relationship: A single course can be taught by multiple professors. A single professor can teach multiple courses. The Teaching Assignment table acts as the intermediary that connects these two entities, allowing for this many-to-many relationship. The Role of the Junction Table (Bridge) B. Stores Unique Combinations: The table holds unique combinations of Course_Code and Professor_ID. This ensures that each professor-course assignment is recorded only once. The Role of the Junction Table (Bridge) C. Composite Primary Key: The primary key of the table is the combination of Course_Code and Professor_ID. No professor can be assigned the same course twice, preventing duplicates. The Role of the Junction Table (Bridge) D. Maintains Data Integrity: Course_Code is a foreign key that references the Course table, ensuring the course exists. Professor_ID is a foreign key that references the Professor table, ensuring the professor exists.