Summary

This document explains normalisation terminology and database concepts like entity, database schema, and repeating groups. It discusses practical situations like normalizing data stored within tables.

Full Transcript

NORMALISATION TERMINOLOGY Normalisation Normalisation is a process that organizes the data in a way that each piece of information is stored only once. This makes managing the data easier and helps keep it accurate and consistent. Database A database is...

NORMALISATION TERMINOLOGY Normalisation Normalisation is a process that organizes the data in a way that each piece of information is stored only once. This makes managing the data easier and helps keep it accurate and consistent. Database A database is defined as an organised collection of data that is stored electronically, allowing for efficient access, management, and updating of that data. Table A table is a structured collection of related data organised in a two- dimensional format, consisting of rows and columns. Record A record refers to a single entry within a table, represented by a row. Fields Fields are synonymous with columns in a table. Each field represents a specific attribute or property of the data being stored. For instance, in a student table, fields might include Student ID, Name, Age, and Course. The field names are used to identify and access the data contained within each record Relation In relational databases, a relation is a math idea that means the same thing as a table. Although people often use these words to mean the same thing, a relation is actually a group of rows where each row is different and doesn’t have a specific order. This means that in a true relation, you can’t have any duplicate records, unlike tables that might have some rows that are the same. Primary key A primary key is a special column in a database table that helps identify each record uniquely. This means no two records can have the same value in this column. For example, in a student database, the student ID could be the primary key because each student has a different ID. A table can only have one primary key, and it cannot have empty (null) values. Super key A super key is a set of one or more columns that can uniquely identify a record in a table. This means that any combination of these columns can help find a specific record. For example, if you have both the student ID and email address in a table, together they form a super key because they can identify students uniquely. Candidate key A candidate key is a specific type of super key that is minimal, meaning it has no extra columns that aren’t needed to identify records. In other words, if you remove any column from it, it will no longer be able to uniquely identify records. For example, if the student ID alone can identify students, then it is a candidate key. NORMALISATION TERMINOLOGY Composite key A composite key is a type of primary key made up of two or more columns. It is used when one column alone cannot uniquely identify records. For instance, if you have a table for class enrollments where neither student ID nor class ID alone can identify a record, you can combine both to create a composite key that does. Entity Same as a table Entity instance Same as a record Property Same as a field Database schema A database schema is like a blueprint that shows how data is organised in a database. It describes the tables, which are like boxes that hold information, and the fields, which are the specific pieces of information within those tables. The schema also explains how different tables are connected to each other and what rules apply to the data. Repeating group A repeating group in database normalisation refers to a situation where a table has multiple fields that store the same type of information. This can happen when one record includes several similar pieces of data that should actually be separate. For example, if a table for students includes multiple columns for phone numbers, like "Phone 1" and "Phone 2," this creates a repeating group. Repeating groups can cause problems because they make the data harder to manage and can lead to mistakes, like having different phone numbers for the same student in different rows. To fix this, the repeating information should be moved to a new table, which helps keep the database organised and makes it easier to find and update information. Data redundancy Data redundancy in a database happens when the same information is stored in more than one place. This can create problems because it uses up extra space and can make it confusing to keep everything up to date. For example, if a student's address is saved in multiple tables, and the address changes, you have to update it in every spot where it's stored. If you forget one, it can lead to mistakes. NORMALISATION TERMINOLOGY Update anomaly An update anomaly occurs in a database when you change some information but forget to update it everywhere it appears. This can lead to confusion and mistakes because the same piece of information might be different in different places. For example, imagine a table that lists students and their addresses. If a student moves and you only update their address in one row but not in others, there will be two different addresses for the same student. This inconsistency can cause problems when trying to find the correct information. Insert anomaly An insert anomaly happens in a database when you cannot add new information because of certain rules or requirements in the table. This usually occurs when the table is not organized properly. For example, imagine you have a table that lists teachers and their departments. If you want to add a new department, like "Math," but there isn't a teacher assigned to that department yet, you might not be able to add the "Math" department at all. This is because the table requires both a department and a teacher's information to be entered together. Insert anomalies can cause problems because they make it hard to add new data without having other related information ready. Delete anomaly A delete anomaly occurs in a database when deleting a record also removes important information that you want to keep. This can happen when related data is stored together in the same table, and removing one piece of information accidentally deletes something else that is needed. For example, imagine you have a table that lists students and their clubs. If you decide to delete the row for a club that has no members left, you might also delete all the details of the students who were part of that club. This means you lose important information about those students just because the club was removed. 1st Normal form Atomic Values: Each cell in a table must contain only one value. This means you cannot have lists or multiple values in a single cell. No Repeating Groups: A table should not have columns that repeat the same kind of information. Each piece of information should be stored in its own column. Unique Column Names: Every column must have a unique name to avoid confusion. Unique Identifier: Each row must have a unique identifier, known as a primary key, to distinguish it from other rows. NORMALISATION TERMINOLOGY 2nd Normal form Must Be in 1NF: The table must already be in First Normal Form, meaning each cell contains only one value, and there are no repeating groups. No Partial Dependencies: All non-key attributes (columns that are not part of the primary key) must depend on the entire primary key, not just part of it. This means if a primary key is made up of more than one column, every other column must rely on all parts of that key. 3rd Normal form Must Be in 2NF: The table must already be in Second Normal Form, which means it has no partial dependencies. No Transitive Dependencies: Non-key attributes (columns that are not part of the primary key) should not depend on other non-key attributes. This means that if one non-key attribute depends on another non-key attribute, it creates a transitive dependency, which should be eliminated. Partial Dependency Partial dependency is a concept in database design that occurs when a non-key attribute (a piece of information that is not part of the primary key) depends on only part of a composite primary key (a primary key made up of more than one column). This can lead to problems like data redundancy and inconsistency. Transitive Transitive dependency is a concept in database design that can cause Dependency problems when organising data. It happens when one piece of information indirectly depends on another piece of information through a third piece. This can lead to unnecessary duplication and make the database harder to manage. Derived Data Derived data is a type of information in a database that is calculated or created from other existing data rather than being stored directly. This means that instead of keeping the derived data in a table, it can be generated using formulas or calculations based on the original data. Duplicate Data Duplicate data in a database refers to the same piece of information being stored more than once. This can happen when the same values appear in multiple rows or tables. Having duplicate data can lead to several problems, making it important to organise and manage data correctly through a process called normalisation. Atomic Data Atomic data is an important concept in database normalisation. It refers to the idea that each piece of information in a database should be stored as a single, indivisible value. This means that every cell in a table should contain only one piece of data, making it easier to manage and analyse. NORMALISATION TERMINOLOGY Referential Integrity Referential integrity is a crucial concept in database design that helps ensure the accuracy and consistency of data across different tables. It involves rules that maintain the relationships between tables, making sure that data is linked correctly and that there are no orphaned records (records that reference non-existent data).

Use Quizgecko on...
Browser
Browser