Database Normalization

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of these anomalies are associated with database design?

  • Insertion
  • Deletion
  • Modification
  • All of the above (correct)

Insertion anomalies occur when certain information cannot be stored unless other, unrelated information is also stored.

True (A)

What type of anomaly occurs when deleting a record results in the loss of other related but necessary information?

deletion anomaly

A database suffers from a(n) __________ anomaly when the same attribute has different values across multiple tuples.

<p>modification</p> Signup and view all the answers

Match each database anomaly type with its corresponding description:

<p>Insertion Anomaly = Inability to add certain data without adding other, unrelated data. Deletion Anomaly = Loss of related information when deleting a record. Modification Anomaly = Inconsistent data across multiple tuples.</p> Signup and view all the answers

What design objective do the informal guidelines for relation schema primarily aim to achieve?

<p>All of the above (D)</p> Signup and view all the answers

Combining attributes from multiple entity types into a single relation always simplifies the interpretation of the database.

<p>False (B)</p> Signup and view all the answers

What kind of problem do NULL values in tuples introduce when performing COUNT or SUM aggregate operations?

<p>unpredictable results</p> Signup and view all the answers

A relation schema should be designed to avoid generating __________ tuples, which are extra rows appearing after a join that do not accurately represent the data.

<p>spurious</p> Signup and view all the answers

Match the informal guidelines for relation schema design with their respective aims:

<p>Clear Semantics = Ensuring attributes are easily interpretable. Reduced Redundancy = Minimizing insertion, deletion, and update anomalies. Minimized NULL Values = Reducing space waste and complexity in aggregate operations. Avoid Spurious Tuples = Preventing the generation of incorrect data after joins.</p> Signup and view all the answers

What does a functional dependency X → Y mean?

<p>The value of Y is determined by the value of X. (A)</p> Signup and view all the answers

If X→Y, it automatically implies that Y→X.

<p>False (B)</p> Signup and view all the answers

In the context of functional dependencies, what is the term for the set of inference rules?

<p>Armstrong's Axioms</p> Signup and view all the answers

According to the reflexivity rule in Armstrong's axioms, if B is a subset of A, then A→__________.

<p>b</p> Signup and view all the answers

Match the Armstrong's Axioms rules with their descriptions:

<p>Reflexivity Rule = If B is a subset of A, then A→B Augmentation Rule = If A→B holds, then AY→BY also holds. Transitivity Rule = If A→B and B→C holds, then A→C also holds.</p> Signup and view all the answers

What is the term used to describe the set of all functional dependencies that can be inferred from a given set of functional dependencies, F?

<p>Closure of F (B)</p> Signup and view all the answers

Functional dependency equivalence means that two sets of functional dependencies can infer the exact same dependencies.

<p>True (A)</p> Signup and view all the answers

What term describes a set of functional dependencies that covers another set if every dependency in the second set can be inferred from the first?

<p>cover</p> Signup and view all the answers

A minimal __________ of functional dependencies is a simplified, non-redundant set that logically implies all dependencies in the original set.

<p>cover</p> Signup and view all the answers

Match the concept with its definition:

<p>Functional Dependency = A constraint where the values of one attribute determine the values of another. Closure of F = The set of all functional dependencies inferred from a given set F. Minimal Cover = A minimal, non-redundant set of functional dependencies equivalent to the original.</p> Signup and view all the answers

What is the primary goal of normalization in database design?

<p>All of the above (D)</p> Signup and view all the answers

Normalization strictly focuses on optimizing query performance, and not on data integrity.

<p>False (B)</p> Signup and view all the answers

Attributes participating in the key are termed what?

<p>prime attributes</p> Signup and view all the answers

The domain of an attribute must only include single valued attributes in __________ normal form.

<p>first</p> Signup and view all the answers

Match the normal form name with the related attribute to it:

<p>First Normal Form = Single valued attribute Second Normal Form = Every non-prime attribute of R is fully functionally dependent on the primary key of R. Third Normal Form = Transitive dependency Boyce-Codd Normal Form = Every non-trivial FD X→Y that holds in R, then X is a super key of R</p> Signup and view all the answers

What is the characteristic of a table in First Normal Form (1NF)?

<p>Both A and B (D)</p> Signup and view all the answers

A table that is in 1NF is automatically in 2NF.

<p>False (B)</p> Signup and view all the answers

What condition related to non-prime attributes must be met for a table to be in Second Normal Form (2NF)?

<p>fully functionally dependent</p> Signup and view all the answers

A table satisfies the Third Normal Form (3NF) if it is in 2NF and has no __________ dependency.

<p>transitive</p> Signup and view all the answers

Match each normal form with its characteristic criterion:

<p>1NF = Atomic values in each attribute. 2NF = No partial dependencies. 3NF = No transitive dependencies.</p> Signup and view all the answers

What is the key requirement for a relation schema to be in Boyce-Codd Normal Form (BCNF)?

<p>Both A and B (A)</p> Signup and view all the answers

Any relation that is in BCNF is also automatically in 3NF.

<p>True (A)</p> Signup and view all the answers

BCNF is violated when a non-trivial functional dependency exists where what condition isn't met?

<p>superkey</p> Signup and view all the answers

BCNF is also referred to as __________ Normal Form.

<p>3.5</p> Signup and view all the answers

Match the dependencies with their definitions:

<p>Candidate Key = A minimal set of attributes that uniquely identifies each tuple in a relation. Primary Key = A candidate key chosen as the principal means of identifying tuples. Foreign Key = An attribute in one that refers to a primary key in another relation.</p> Signup and view all the answers

What principle does a lossless-join decomposition uphold?

<p>Both B and C (A)</p> Signup and view all the answers

A dependency-preserving decomposition ensures that each functional dependency in the original relation can be directly enforced in one of the resulting relations.

<p>False (B)</p> Signup and view all the answers

What is the goal of relational decomposition w.r.t normal forms?

<p>decompose a relation scheme in order to achieve higher normal forms</p> Signup and view all the answers

The attribute __________ condition requires that each attribute in the original relation must appear in at least one relation schema in the decomposition.

<p>preservation</p> Signup and view all the answers

Match database decomposition with the correct descriptions:

<p>Attribute Preservation = Each attribute in the original relation appears in at least one resulting relation. Dependency Preservation = Functional dependencies can be directly enforced or inferred. Lossless Join = The natural join recovers the original relation.</p> Signup and view all the answers

Flashcards

Insertion Anomalies

Anomalies that occur when inserting data into a database table.

Deletion Anomalies

Anomalies that occur when deleting data from a database table results in unintended data loss.

Modification Anomalies

Anomalies that occur when updating data in a database table leads to inconsistencies.

Informal Design Guidelines

A set of guidelines to determine the quality of a relation schema.

Signup and view all the flashcards

Clear Semantics

Ensuring the intended meaning of attributes is clear within the database schema.

Signup and view all the flashcards

Data Redundancy

Condition in which the same data is stored multiple times within a database.

Signup and view all the flashcards

Schema Design Guideline

Design a schema to avoid insertion, deletion, and update anomalies.

Signup and view all the flashcards

NULL Values

Values in a database indicating absence of data.

Signup and view all the flashcards

Spurious Tuples

Rows created when joining tables, that do not accurately represent relationships.

Signup and view all the flashcards

Functional Dependency (FD)

A constraint between two sets of attributes in a database.

Signup and view all the flashcards

Armstrong's Axioms

A set of axioms used to test logical implications of functional dependencies.

Signup and view all the flashcards

Reflexivity Rule

If A is a set of attributes and B is a subset of A, then A determines B.

Signup and view all the flashcards

Augmentation Rule

If A determines B, then adding an attribute to both sides still holds.

Signup and view all the flashcards

Transitivity Rule

If A determines B and B determines C, then A determines C.

Signup and view all the flashcards

Closure of Functional Dependencies

The set of all functional dependencies that can be inferred from a given set.

Signup and view all the flashcards

Cover of FDs

A set of dependencies that can infer all dependencies in another set.

Signup and view all the flashcards

Equivalence of FDs

Two sets of functional dependencies where each can infer all dependencies of the other.

Signup and view all the flashcards

Minimal Cover

A minimal set of functional dependencies that is equivalent to the original set.

Signup and view all the flashcards

Normalization

Decomposing relations to minimize redundancy & anomalies.

Signup and view all the flashcards

First Normal Form (1NF)

Multivalued attributes are not allowed

Signup and view all the flashcards

Second Normal Form (2NF)

Fully functionally dependent on the primary key

Signup and view all the flashcards

Third Normal Form (3NF)

Not transitively dependent on primary key

Signup and view all the flashcards

Boyce-Codd Normal Form (BCNF)

Every non-trivial FD X→Y, X is a super key of R

Signup and view all the flashcards

Superkey

A set of attributes that uniquely identifies tuples in a relation.

Signup and view all the flashcards

Candidate Key

A minimal superkey; no proper subset is a superkey.

Signup and view all the flashcards

Primary Key

The candidate key chosen to identify tuples.

Signup and view all the flashcards

Prime Attribute

An attribute that is part of any candidate key.

Signup and view all the flashcards

Nonprime Attribute

An attribute that is not a member of any candidate key.

Signup and view all the flashcards

Decomposition

The process of breaking relations into smaller relations

Signup and view all the flashcards

Attribute Preservation

Guarantees that no attributes are lost in the new relations.

Signup and view all the flashcards

Dependency Preservation

Each dependency is either present or can be inferred.

Signup and view all the flashcards

Lossless Join

No additional tuples are generated when relations are rejoined.

Signup and view all the flashcards

Study Notes

Module 4: Normalization

  • Module covers database design anomalies, normalization concepts, functional dependencies, Armstrong's Axioms, normal forms (1NF, 2NF, 3NF, BCNF), and lossless join and dependency preserving decomposition.

Database Design Anomalies

  • Anomalies in database design include insertion, deletion, and modification issues.

Insertion Anomalies

  • Insertion anomalies occur when certain information can't be stored unless other, related information is also stored.
  • In an EMP_DEPT relation with attributes Ename, Ssn, Bdate, Address, Dnumber, Dname, and Dmgr_ssn, adding a new employee requires assigning them to a department or using NULLs.
  • Adding a new department with no employees requires using NULLs for the employee Ssn, which should be the primary key.
  • In an EMP_PROJ relation with attributes Emp#, Proj#, Ename, Pname, and No_hours, a project can't be inserted without assigning an employee, and vice versa.

Deletion Anomalies

  • Deletion anomalies occur when deleting information results in the loss of other, related information.
  • In an EMP_DEPT relation, deleting the last employee record from a department results in losing information about that department.
  • Deleting Borg, James record, leads to losing data about Head Quarters dept.

Modification Anomalies

  • Modification anomalies occur when inconsistencies arise due to the need to update multiple copies of repeated data.
  • For example, in the EMP_DEPT relation, changing the manager of department 5 requires updating multiple tuples.
  • If updates aren't applied consistently across all tuples, the database becomes inconsistent, showing different values for the same department.

Informal Design Guidelines for Relation Schema

  • There are four informal guidelines to determine the quality of relation schema design.
  • Semantics: ensure the meaning of attributes is clear.
  • Redundancy Reduction: design schemas to avoid insertion, deletion, and update anomalies.
  • Null value reduction: to limit NULL values in tuples
  • Avoidance of Spurious Tuples: disallow the creation of false tuples from joins.

Clear Semantics

  • Attributes within a relation schema should have a clear, real-world meaning and proper interpretation.
  • Semantics refers to the meaning resulting from interpreting attribute values within a tuple.

Guideline 1

  • Design relation schemas to ensure ease of explanation of meaning.
  • Avoid combining attributes from multiple entity and relationship types into a single relation.
  • Relations should represent a single entity or relationship type for straightforward interpretation.
  • Combining multiple entities and relationships can result in semantic ambiguities.

Guideline 1 Violation

  • Combining Employee and Department attributes into a single table loses meaning because tuples do not clearly refer to individual relations and their attributes.

Redundancy and Update Anomalies

  • Data redundancy exists when the same data is stored in multiple places within a database.
  • Redundancy leads to wasted storage space and update anomalies (insertion, deletion, and modification).

Guideline 2

  • Design database schemas so that insertion, deletion, and update anomalies do not occur.

Null Values

  • Null values indicate attributes that are not applicable, values that are unknown, or values that exist but are unavailable.
  • Null can waste storage at the storage level.
  • Problems happen with understanding of attributes and with specifying JOIN operations at the logical level.
  • NULL values can lead to unpredictable results with COUNT or SUM aggregate functions.

Guideline 3

  • Avoid placing attributes that may frequently be NULL in a base relation.
  • Design relations to minimize NULL values.
  • Attributes with frequent NULLs can be placed in separate relations with the primary key.

Spurious Tuples

  • Consider EMP_LOCS(EName, PLocation) and EMP_PROJ1(SSN, PNumber, Hours, PName, PLocation) versus EMP_PROJ(SSN, PNumber, Hours, EName, PName, PLocation).
  • The former, when joined, does not give any information about EMP_PROJ
  • Natural joins can produce many rows that are not in the initial EMP_PROJ relation
  • In a natural join the extra rows are called spurious tuples
  • Another design guideline is that to design relation schemas so that they can be joined with equality conditions on attributes that are either primary keys or foreign keys
  • This will stop spurious tuples from being generated.

Natural Join & Spurious Tuples

  • Decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1 is undesirable because when they are joined using NATURAL JOIN, we do not get correct information from the original attributes.
  • Plocation relates EMP_LOCS and EMP_PROJ1 but is neither a primary nor foreign key in either.

Guideline 4

  • Design relation schemas to ensure joins with equality conditions on appropriate attributes (primary key, foreign key) prevent spurious tuples.
  • Avoid relations containing matching attributes that are not primary/foreign keys, as joins may produce spurious tuples.

Summary of Design Guidelines

  • Design guidelines address anomalies causing redundant work during insertion and modification causing data loss during deletion.
  • Design guidelines address waste of storage space from NULLs and difficulty of selections, aggregations, and joins caused by NULL values.
  • The guidelines address the generation of invalid and spurious data during joins on base relations with mismatched attributes.

Functional Dependencies

  • Functional Dependency (FD) is a constraint between two attribute sets in a database.
  • Given schema R with attributes A1, A2, ..., An, functional dependency is denoted by X → Y.
  • In any two tuples t1 and t2, if t1[X] = t2[X], then it must also be that t1[Y] = t2[Y].
  • X→Y means Y values depend on X values.
  • Y is functionally dependent on X.

Key Points on Functional Dependencies

  • X→Y indicates Y's value is determined by X's value, but it doesn't imply Y→X.

Functional Dependency Examples

  • In EMP_PROJ with attributes SSN, PNO, ENAME, PNAME, PLOC, HOURS,
  • SSN (Social Security Number) determines ENAME (Employee Name). -SSN→ENAME
  • PNO (Project Number) determines PNAME (Project Name) and PLOCATION.
  • {SSN, PNUMBER} determine HOURS. -{SSN, PNUMBER}→HOURS
  • {X,Y}→Z can be written as XYZ

Determining Validity

  • A→B
  • a1 = b1
  • a2 = b3 This is determined to be invalid as only unique matching can be used
  • B→A
  • b1 = a1
  • b3= a2
  • b2 = a1 Determined to be valid.

Armstrong's Axioms

  • Armstrong's axioms provide inference rules for functional dependencies, introduced by William W Armstrong.
  • Armstrong test the logical implication of functional dependencies.
  • If F is functional dependencies closure of F, denoted as F+ represents all dependencies logically implied by F
  • Armstrong's Axioms are rules which, used serially, create closure of functional dependancies

Axiom Rules

  • IR1: Reflexivity Rule: If A includes B, then A holds B (A→B).
  • IR2: Augmentation Rule: If A→B holds and Y is an attribute set, then AY→BY also holds, indicating adding attributes to dependencies does not change basic dependencies.
  • IR3: Transitivity Rule: If A→B and B→C holds, then A→C also holds.

Secondary Rules

  • IR4: Union is where A→B, A→C holds, then A→BC holds.
  • IR5: Decomposition is where A→BC holds then A→B, A→C hold.
  • IR6: composition: If A→B and X→Y holds, then AX→BY holds.
  • IR7: Pseudo Transitivity If A→B holds and BC→D holds, then AC→D holds.

Why Armstrong Axioms are Sound and Complete

  • Soundness means that, given functional dependencies F on relation schema R, any dependency inferred from F using Armstrong's rules holds in every relation state r of R that satisfies dependencies in F.
  • Completeness means using primary rules of Armstrong axioms repeatedly to infer dependencies to the end and the complete set of all possible dependencies can be inferred from F.

Closure Set

  • Represent the set of FDs specified on R.
  • Closure of F (F+) is the set of all FDs, F, as well as all dependencies that can be inferred/deduced from F.
  • If department and department phone number and department social security depends on the respective info, then department depends on department phone number
  • It is then stated this inferred number need not be stated as well

Closure Algorithm

  • Closure is algorithm to see FDs
  • A set F of FDs on a relation schema R, and a set of attributes X, which is a subset of R.
  • The sets are used to determine further attributes and so on.

Examples

  • Questions given with certain set R and other sets, students required to find the sets under the closure algorithm.

Covers & Equivalencies

  • A set of functional dependencies F is said to cover another set of functional dependencies E if every FD in E is also in F+;
  • A way used to find relation dependencies.
  • Two sets of functional dependencies E and F are equivalent if E+ = F+.

Minimal Cover

  • Minimal cover happens when a set of function dependencies and every dependency in E is in the closure of F

Key Facts regarding Minimal Cover

  • Every dependency has a single attribute RHS with proper subsets of X & equivalent sets in F and dependencies that are equivalent to F

Key Steps for Minimal Set Algorithm

  • Set it to E
  • Replace each with a new liner FD
  • Ensure each FD is liner
  • Perform the if statement to determine new FD
  • Perform it again to finish the FD

Example

  • An example has an FD set and the question shows the steps to find the correct set.
  • Step 1 is All above dependencies are in canonical form and only have one attribute
  • Step 2 is to determine what attribute if needing to be changed & has redundancy
  • Final is to find the reduced transitive to remove the redundancy

Normalization

  • A methodology to ensure that the the attributes into smaller relations are broken-up into more simplistic terms

Main Keys with Normalization

  • Normalization consists of mainly keys consisting of FDs, primary etc
  • Helps minimise modifications and insertions
  • Ensures 1NF, 2NF 3NF and Boyce Codd Normal form +
  • Also applies to 4-5NF dependancies and mutli-valued elements.

Keys

  • It is subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have equivalent closures

Normal Attributes

  • Is a set of attributes that are candidate keys but called secondary.
  • A Prime attribute must be a member of some candidate key.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser