Database Normalization: 1NF, 2NF, 3NF

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which statement best describes the process of normalization in database design?

  • A method of combining all data into a single table to simplify data access
  • A technique for encrypting sensitive data within a database
  • A process of increasing data redundancy to improve query performance
  • A process for evaluating and correcting table structures to minimize data redundancy and anomalies (correct)

Denormalization consistently results in a higher normal form.

False (B)

What is the primary goal of normalization in database design?

to eliminate data anomalies

A table is in 2NF if it is in 1NF and has no ______ dependencies.

<p>partial</p> Signup and view all the answers

Match the following normal forms with their descriptions:

<p>1NF = Table format, no repeating groups, and a primary key is identified 2NF = 1NF plus no partial dependencies 3NF = 2NF plus no transitive dependencies</p> Signup and view all the answers

In which normal form does every determinant have to be a candidate key?

<p>BCNF (Boyce-Codd Normal Form) (D)</p> Signup and view all the answers

Normalization should not be part of the database design process.

<p>False (B)</p> Signup and view all the answers

What is the term for an attribute that cannot be further subdivided into meaningful components?

<p>atomic attribute</p> Signup and view all the answers

The level of detail represented by the values stored in a table's row is known as ______.

<p>granularity</p> Signup and view all the answers

Match the following concepts with their descriptions:

<p>Partial Dependency = Attribute is dependent on only part of the primary key Transitive Dependency = Attribute is dependent on another attribute that is not part of the primary key Multivalued Dependency = Occurs when one key determines multiple values of two other attributes, and those attributes are independent of each other</p> Signup and view all the answers

Which of the following is a characteristic of unnormalized data?

<p>Existence of several multivalued data elements (B)</p> Signup and view all the answers

Higher levels of normalization are always more desirable for every database design.

<p>False (B)</p> Signup and view all the answers

What is the term for the process of increasing speed in a database, typically by producing a lower normal form?

<p>denormalization</p> Signup and view all the answers

According to good DB design, the optimal relational DB is in ______ form.

<p>3NF</p> Signup and view all the answers

Match the situations to when normalization would be used

<p>After design completion = Used to analyze relationships among attributes in each entity to determine if the structure can be improved Asked to modify existing data structures = Used to improve the existing data structure &amp; create an appropriate DB design</p> Signup and view all the answers

Which of the following best describes a 'repeating group' in the context of normalization?

<p>A group of multiple entries of the same or multiple types for any single key attribute occurence. (B)</p> Signup and view all the answers

Fifth Normal Form (5NF) can have further lossless decompositions

<p>False (B)</p> Signup and view all the answers

What term describes occurs when one key determines multiple values of 2 other attributes & those attributes are independent of each other?

<p>Multivalve dependency</p> Signup and view all the answers

A surrogate key is a(n) ______ PK w/ the purpose of simplifying assignment of Pks to table

<p>artificial</p> Signup and view all the answers

Match the following Normal Form Rules:

<p>4NF Rules = All attributes must be dependent on PK, independent of each other. No row may contain 2 or more multivalued facts about an entity.</p> Signup and view all the answers

Flashcards

Normalization

Process for evaluating and correcting table structures to minimize data redundancies and anomalies, assigning attributes to entities.

Data Redundancy Reduction

A state where attributes are assigned to entities so that data redundancies are reduced or eliminated.

Denormalization

A design approach that considers end-users' demand for fast performance by producing a lower normal form.

Main Goal of Normalization

Eliminating data anomalies and unnecessary redundancies by using functional dependency to identify attribute relationships.

Signup and view all the flashcards

Unnormalized Data

Raw data in its original state, characterized by the existence of several multivalued data elements.

Signup and view all the flashcards

Table Characteristics

Each relation (table) represents a single subject, each row/column intersection contains a single value, and data is updated in one place.

Signup and view all the flashcards

1NF (First Normal Form)

Table format with no repeating groups, and a primary key (PK) is identified.

Signup and view all the flashcards

2NF (Second Normal Form)

A table in 1NF with no partial dependencies.

Signup and view all the flashcards

3NF (Third Normal Form)

Table is in 2NF and contains no transitive dependencies.

Signup and view all the flashcards

Partial Dependency

Functional dependence in which the determinant is only part of the primary key (PK).

Signup and view all the flashcards

Transitive Dependency

Attribute is dependent on another attribute that is not part of the primary key.

Signup and view all the flashcards

Repeating Group

A group of multiple entries of the same or multiple types that can exist for any single key attribute occurrence.

Signup and view all the flashcards

Update Anomalies

Occurs when modifying data requires changing multiple rows, leading to potential inconsistencies.

Signup and view all the flashcards

Insertion Anomalies

Occurs when adding new data requires adding data to multiple tables, potentially leading to inconsistencies.

Signup and view all the flashcards

Deletion Anomalies

Occurs when deleting data from one table unintentionally deletes related data in another table.

Signup and view all the flashcards

Composite Primary Key

A primary key made up of more than one attribute.

Signup and view all the flashcards

Determinant

An attribute whose value determines other values within a row.

Signup and view all the flashcards

Surrogate Key

Artificial primary key used to simplify the assignment of primary keys to a table.

Signup and view all the flashcards

Atomic Attribute

An attribute that cannot be further subdivided to produce meaningful components.

Signup and view all the flashcards

Boyce-Codd Normal Form (BCNF)

A special type of 3NF where every determinant is a candidate key.

Signup and view all the flashcards

Study Notes

Normalization Overview

  • Normalization evaluates and corrects table structures to minimize data redundancies and anomalies.
  • It assigns attributes to entities, reducing or eliminating data redundancies.
  • The process is based on determination and functional dependency.
  • There are three stages of normal forms, with 3NF being the best, but that doesn't mean always most desirable

Normal Forms Stages:

  • 1NF (First Normal Form): Considered the worst.
  • 2NF (Second Normal Form).
  • 3NF (Third Normal Form): Considered the best.
  • Higher normal forms require more relational join operations for specified output and resources.
  • Design success balances end-user demands and fast performance.
  • Denormalization produces lower normal forms, usually to increase speed, but can introduce data anomalies and redundancy (e.g., 3NF to 2NF).

Need for Normalization

  • Normalization uses entity-relationship modeling.
  • Normalization is applied in two main situations:
  • After design completion, database designers use normalization to analyze relationships among attributes in each entity to determine if a structure can be improved.
  • Database designers use normalization to modify existing data structures to improve the database design and create an appropriate structure.
  • The process is the same whether designing new or existing structures.
  • It is rare to design a completely new database using just normalization.

Goals and Steps of Normalization

  • The main goal is to eliminate data anomalies by eliminating unnecessary or unwanted data redundancies.
  • Functional dependency uses to identify which attribute (or set) determines other attributes exists.
  • The steps include:
  • Identifying business roles.
  • Identifying and defining business and data constraints.
  • Defining functional dependencies.
  • Identifying entities, attributes, and relationships.
  • Eliminating multivalued attributes.

Unnormalized Data and Table Characteristics

  • Unnormalized data is raw data in its original state.
  • It exists with several multivalued data elements.
  • Table characteristics:
  • Each relation/table represents a single subject.
  • Each row/column intersection contains a single value, not a group of values.
  • No data item is unnecessarily stored in more than one table.
  • Ensures data is updated in one place.
  • Non-prime attributes depend on the primary key (PK).
  • The entire key and nothing but the PK. Ensures that the data is uniquely identifiable by PK.
  • Prevents insertion, update, or deletion anomalies.

Normal Forms Explained

  • 1NF: Table format with no repeating groups, and primary keys are identified.
  • 2NF: 1NF plus no partial dependencies.
  • 3NF: 2NF plus no transitive dependencies.
  • Norms are presented from the perspective of the candidate key, assuming each table has one candidate key that is the primary key.
  • A data modeler's objective is ensuring tables are in at least 3NF.

Dependencies: Partial and Transitive

  • Norm starts by identifying dependencies and progressively breaking up the table into a new set of relations based on dependencies.
  • Partial Dependencies: Functionality where the determinant is only part of the primary key; the attribute can identify the value of attribute C
  • Straightforward and easy to identify.
  • Transitive Dependencies: When the attribute is dependent on another attribute that's not part of the primary key where X determines the value of Z via Y.
  • More difficult to identify, and they often occur among nonprime attributes.

###First Normal Form (1NF)

  • A repeating group is a group of multiple entries of the same or multiple types in any single key attribute.
  • Repeating must be eliminated by ensuring each row defines a single entity instance.
  • Each row-column intersection has only a single value.
  • Identifying the normal forms lets you know where you are in the normalization process.
  • A three-step removal procedure:
  • Eliminate repeating groups by changing the table's focus from project to assignment and creating separate rows.
  • Identify the primary key.
  • Identify all dependencies.
  • Dependency diagrams, showing all data dependencies, a birdseye view of the relationship between entities

Anomalies

  • Update anomalies: Modifying
  • Insertion anomalies: Adding
  • Deletion Anomalies

Conversion to Second Normal Form (2NF)

  • A table is in 2NF if it is in 1NF and has no partial dependencies, meaning no attribute depends on only a portion of the primary key.
  • Happens only when INF has a composite pk. if INF has single-attribute pk its automatically in 2NF
  • This occurs only when 1NF has a composite primary key.
  • If the 1NF has a single-attribute primary key, it's automatically in 2NF.
  • Steps:
  • Make a new table and eliminate partial dependencies where each component of primary key acts as a determinant.
  • The new table contains a copy of the primary key component and this primary key component remains foreign keys in new table

Conversion to Third Normal Form (3NF)

  • dependent in partial dep. are removed from OG table t placed in new table w/ dep. determinant
  • foreign Keys created
  • Partial dependency can exist only when tables Pk has several attributes, a table w/ PK of a single attribute is automatically INF
  • A table is in 3NF when it is in 2NF and has no transitive dependencies.
  • Steps:
  • Make a new table to eliminate transitive dependencies.
  • A determinant is any attribute whose value within a row determines other values.
  • For every transitive dependency, write a copy of its determinant in a new table.
  • Reassign corresponding dependent attributes.
  • Place dependent attributes in new tables with their determinants and remove them from their original tables.
  • Draw a new dependency diagram to show all defined tables.
  • Name the table to reflect its contents and functions.

Improving Design

  • Improving the database's ability to provide information and enhancing its operational characteristics.
  • Normalization itself doesn't guarantee good designs; it helps eliminate data redundancies.
  • Steps:
  • Minimalize data entry errors
  • Evaluates naming conventions
  • Refines attribute atomicity

Errors and Keys

  • Surrogate keys are artificial primary keys used to simplify the assignment of primary keys to tables.
  • These keys are usually numeric, generated automatically by DBMS, are free of semantic content, are unique, and are hidden from the end user.
  • When the design allows for descriptive entries in determinant attributes, it's easy to make data-entry errors that lead to referential integrity violations, so add an attribute as a unique identifier.

Atomic Attributes

  • Atomic attributes cannot be further subdivided to produce meaningful components (e.g., last name).
  • Parts of a transaction treated as a single, invisible, logical unit (Opposite of atomic),
    • Cannot be divided into smaller units but needs single values.
  • Improving atomicity gains querying flexibility.
  • Designers prefer single-valued attributes based on business rules and processing requirements.
  • Identify new attributes and relationships.
  • Refine primary key as required for data granularity and Designers ensure a suitable primary (Evaluate need for surrogate Key)

Further Refinement of Normalization

  • Storing attribute tables derive use by making it easy to right app software to produce desired results
  • Evaluate how to properly implement using derived attributes
  • Granularity is the level of detail represented by the values stored in a table's row.
  • Data stored at the lowest level of granularity is atomic data (e.g., daily, weekly).
  • Designers also ensure candidate key uniqueness.
  • Designers must identify granularity levels and evaluate the need for a surrogate key.

Boyce-Codd Normal Form (BCNF)

  • A special type of 3NF where every determinant is a candidate key is a candidate Key
  • A table has to have more than Key, and 3NF and BCNF are equal
  • BCNF is violated only when a table has more than one candidate key is not equal

Additional Normal Forms

  • Fourth Normal Form (4NF): When in 3NF with no multivalued dependencies.
  • Multivalue dependency occurs when one key determines multiple values of two other attributes. Those attributes are independent of each other and if tables 1 and 2 are implemented separately, they are likely to contain hulls.
  • Solution: Create a new table for components of multi-valued dependency.
  • All attributes must depend on the primary key and be independent of each other.
  • No row may contain two or more multivalued facts about an entity.

Fifth Normal Form (5NF)

  • When in 4NF, cannot have further lossless decompositions.
  • Project Join NF (PJNF) means that a table cannot be decomposed anymore without losing data or creating incorrect information.
  • Lossless decomposition: When, even if it decomposes it will recreate the original table after joining together again.
    • It contains no missing data or new erroneous data.
    • The relationship between decomposed tables is represented by a relational diagram.
  • While higher forms can provide value, the value is limited by the additional processing to work with the data for normalization.

Normalization and DB Design

  • The normalization should be part of the design process.
  • ERD provides big picture, micro view, of orgs data requirements and operations
  • Which is also an iterative process, Identifying entities, attributes, & relationships
  • Normalization in micro view of entities in ERD
  • Iterative and incremental normalization process

Denormalization and Normalization Purity

  • Denormalization is used to increase processing speeds for Optimal relational DB is in 3NF.
  • The database design considers processing and reporting requirements, and processing speed.
  • If enough storage space is available, designers have limited choices:
    • Storing data in a permanently denormalized table (not recommended) only if performance is an issue.
    • Temporary denormalized table from permanent normalized table to make report as disappears.
    • Only implements viable processing options, if performance is not an issue.
  • Normalization purity is difficult to sustain in a modern database environment.

Warehouse Data defects

  • Data warehouses routinely use 2NF.
  • Unnormalized data defects include:
  • Data updates are less efficient because programs must deal with larger tables.
  • Indexing is more cumbersome and not practical.
  • Unnormalized tables yield no simple strategies for creating virtual tables (called views).

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser