Advanced Data Modeling and Normalization

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does EERM stand for?

Extended Entity Relationship Model

What is the purpose of the Entity Supertype?

Generic entity type related to one or more entity subtypes, and contains common characteristics.

All entity subtypes inherit their primary key attribute from their supertype

True (A)

At the implementation level, a supertype and its subtype(s) maintain a 1:1 relationship.

<p>True (A)</p> Signup and view all the answers

What does the attribute in the supertype entity determine?

<p>The attribute determines which entity subtype the supertype occurence is related to.</p> Signup and view all the answers

What is the difference between disjoint and overlapping constraints?

<p>Disjoint subtypes contain a unique subset of the supertype entity set, while overlapping subtypes contain nonunique subsets of the supertype entity set.</p> Signup and view all the answers

What is the difference between specialization and generalization?

<p>Specialization is a top-down process that identifies lower-level, more specific entity subtypes from a higher-level entity supertype, while generalization is a bottom-up process that identifies a higher-level, more generic entity supertype from lower-level entity subtypes</p> Signup and view all the answers

What is an entity cluster?

<p>A “Virtual” entity type used to represent multiple entities and relationships in ERD.</p> Signup and view all the answers

What is the purpose of primary keys?

<p>Single attribute or a combination of attributes that uniquely identifies each entity instance, guarantees entity integrity, and works with foreign keys to implement relationships.</p> Signup and view all the answers

What is a natural key?

<p>Real-world identifier used to uniquely identify real-world objects</p> Signup and view all the answers

Desirable primary key characteristics include being intelligent, changing over time, and being preferably non-numeric.

<p>False (B)</p> Signup and view all the answers

When are composite primary keys used?

<p>Identifiers of composite entities.</p> Signup and view all the answers

When are surrogate primary keys used?

<p>To simplify the identification of entity instances.</p> Signup and view all the answers

What is time-variant data?

<p>Data whose values change over time and for which a history of the data changes must be retained.</p> Signup and view all the answers

What is a design trap?

<p>Occurs when a relationship is improperly or incompletely identified</p> Signup and view all the answers

What is normalization?

<p>Evaluating and correcting table structures to minimize data redundancies</p> Signup and view all the answers

What are Normal Forms?

<p>First normal form (1NF), Second normal form (2NF), Third normal form (3NF)</p> Signup and view all the answers

What is the objective of the Normalization process?

<p>To ensure that each table conforms to the concept of well-formed relations</p> Signup and view all the answers

Match the normal form with its characteristic:

<p>First normal form (1NF) = Table format, no repeating groups, and PK identified Second normal form (2NF) = 1NF and no partial dependencies Third normal form (3NF) = 2NF and no transitive dependencies Boyce-Codd normal form (BCNF) = Every determinant is a candidate key (special case of 3NF) Fourth normal form (4NF) = 3NF and no independent multivalued dependencies</p> Signup and view all the answers

A table is in _____ when all key attributes are defined and all remaining attributes are dependent on the primary key?

<p>1NF (D)</p> Signup and view all the answers

What is the disadvantage to joining a larger number of tables?

<p>Takes additional input/output (I/O) operations and processing logic (A)</p> Signup and view all the answers

Data updates are more efficient because tables are larger (in denormalized tables).

<p>False (B)</p> Signup and view all the answers

Flashcards

Extended Entity Relationship Model (EERM)

An enhanced entity relationship model incorporating more semantic constructs.

Entity Supertype

Generic entity type related to one or more entity subtypes, containing common characteristics.

Entity Subtype

Entity with unique characteristics of each entity subtype.

Inheritance

Enables a subtype to inherit attributes and relationships from its supertype.

Signup and view all the flashcards

Subtype Discriminator

Attribute in the supertype that determines the related subtype.

Signup and view all the flashcards

Disjoint Subtypes

Constraint where subtypes contain a unique subset of the supertype entity set.

Signup and view all the flashcards

Overlapping Subtypes

Constraint where subtypes contain nonunique subsets of the supertype entity set.

Signup and view all the flashcards

Completeness Constraint

Specifies whether supertype occurrence must be a member of at least one subtype

Signup and view all the flashcards

Specialization

Top-down process identifying specific subtypes from a supertype.

Signup and view all the flashcards

Generalization

Bottom-up process identifying a generic supertype from subtypes.

Signup and view all the flashcards

Entity Clustering

"Virtual" entity type used to represent multiple entities and relationships in ERD by combines interrelated entities into a single, abstract entity object

Signup and view all the flashcards

Primary Key

Single attribute or combination that uniquely identifies each entity instance.

Signup and view all the flashcards

Natural Key

Real-world identifier used to uniquely identify real-world objects.

Signup and view all the flashcards

Optimal Primary Keys

Primary key characteristics that include being unintelligent, unchanging, single-attribute, numeric, and security compliant.

Signup and view all the flashcards

Composite Primary Key

Used for composite entities and identifiers of weak entities.

Signup and view all the flashcards

Surrogate Primary Key

A primary key used to simplify and avoid the use of natural keys.

Signup and view all the flashcards

Implementing 1:1 Relationships

When implementing 1:1 relationships between tables, you can place the primary key of the parent entity on the dependant entitiy as a foreign key.

Signup and view all the flashcards

Time-Variant Data

Used for data with values changing over time, requiring a history of changes.

Signup and view all the flashcards

Design Trap

Occurs when a relationship is improperly identified.

Signup and view all the flashcards

Fan Trap

Occurs when one entity is in two 1:M relationships to other entities.

Signup and view all the flashcards

Redundant Relationships

Occur with multiple relationship paths between related entities.

Signup and view all the flashcards

Normalization

Assessment and rectification of table structures to decrease data redundancies.

Signup and view all the flashcards

Normal forms advantage

Structural perspective of normal forms, where higher forms are preferred.

Signup and view all the flashcards

Denormalization

Process that generates a normal form of lower degree and helps increasing performance data redundancy.

Signup and view all the flashcards

The Need for Normalization

Used in designing a new database structure to analyse the relationship among the attributes within each entity.

Signup and view all the flashcards

Normalization Objective

Ensures that all tables are in at least 3NF

Signup and view all the flashcards

First Normal Form (1NF)

Table format, no repeating groups, and PK identified.

Signup and view all the flashcards

Second Normal Form (2NF)

1NF and no partial dependencies on a composite primary key.

Signup and view all the flashcards

Third Normal Form (3NF)

2NF and no transitive dependencies; non-key attributes do not depend on other non-key attributes

Signup and view all the flashcards

Functional Dependence

B is fully functionally dependent on A if each A value determines one B value.

Signup and view all the flashcards

Partial Dependency

Functional dependence in which the determinant is only part of the primary key.

Signup and view all the flashcards

Transitive Dependency

Attribute depends on another non-key attribute, not part of the primary key.

Signup and view all the flashcards

Surrogate Key Considerations

Simplifies identification of entity instances where no natural key is available

Signup and view all the flashcards

Boyce-Codd Normal Form (BCNF)

Every determinant in the table should be a candidate key.

Signup and view all the flashcards

Fourth Normal Form (4NF)

All attributes are dependent on the primary key but independent of each other

Signup and view all the flashcards

The reason for Denormalization

Denormalized cases such as redundant values or information requirements

Signup and view all the flashcards

Data-modeling check list

To properly document business rules and verifiy with the end users

Signup and view all the flashcards

Study Notes

  • The session discusses advanced data modeling and normalization of database tables.
  • The topic is Business Analytics and Advanced Business Intelligence.

Learning Objectives

  • Describe extended entity relationship (EER) model constructs and their representation in ERDs and EERDs.
  • Use entity clusters to represent multiple entities and relationships in an entity relationship diagram (ERD).
  • Describe the characteristics of good primary keys and how to select them.
  • Apply flexible solutions for special data-modeling cases.
  • Explain normalization and its role in the database design process.
  • Identify and describe normal forms: 1NF, 2NF, 3NF, BCNF, and 4NF.
  • Explain how normal forms can be transformed from lower to higher normal forms.
  • Apply normalization rules to evaluate and correct table structures.
  • Identify situations where denormalization is needed to efficiently generate information.
  • Employ a data-modeling checklist to verify if an ERD meets minimum requirements.

The Extended Entity Relationship Model (EERM)

  • An enhanced entity relationship model is created by adding more semantic constructs to the original entity relationship (ER) model.
  • Employ the EER model in EER diagrams (EERDs).

Entity Supertypes and Subtypes

  • An entity supertype is a generic entity type related to one or more entity subtypes and contains common characteristics.
  • An entity subtype contains unique characteristics of each entity subtype.
  • Usage Criteria: there must be different, identifiable kinds of entities in the user's environment with unique attributes for each kind of instance.

Inheritance

  • Allows an entity subtype to inherit attributes and relationships of the supertype.
  • All entity subtypes inherit their primary key attribute from their supertype.
  • At the implementation level, the supertype and its subtype(s) maintain a 1:1 relationship.
  • Entity subtypes inherit all relationships in which the supertype entity participates.
  • Lower-level subtypes inherit all attributes and relationships from their upper-level supertypes.

Subtype Discriminator

  • An attribute in the supertype entity that determines to which entity subtype the supertype occurrence is related.
  • A default comparison condition is the 'equality comparison'.
  • In some cases, the subtype discriminator is not based on an equality comparison.

Disjoint and Overlapping Constraints

  • Disjoint subtypes contain unique subsets of the supertype entity set and are known as nonoverlapping subtypes, implemented based on the subtype discriminator attribute value.
  • Overlapping subtypes contain nonunique subsets of the supertype entity set with a subtype discriminator attribute for each subtype.

Completeness Constraint

  • Specifies whether each supertype occurrence must also be a member of at least one subtype.
  • Partial completeness: Not every supertype occurrence is a member of a subtype.
  • Total completeness: Every supertype occurrence must be a member of at least one subtype.

Specialization and Generalization

  • Specialization is a top-down process that identifies lower-level, specific entity subtypes from a higher-level entity supertype, and it is based on grouping unique characteristics and relationships.
  • Generalization is a bottom-up process that identifies a higher-level, more generic entity supertype from lower-level entity subtypes, based on grouping common characteristics and relationships.

Entity Clustering

  • A virtual entity type is used to represent multiple entities and relationships in an ERD.
  • Created by combining multiple interrelated entities into a single, abstract entity object.
  • A general rule is to avoid the display of attributes to eliminate complications when inheritance rules change.

Entity Integrity: Selecting Primary Keys

  • Primary keys are single attributes or a combination of attributes that uniquely identify each entity instance.
  • Primary keys guarantee entity integrity.
  • Primary keys work with foreign keys to implement relationships.

Natural Keys and Primary Keys

  • Natural keys (or natural identifiers) are real-world identifiers used to uniquely identify real-world objects.
  • They are familiar to end-users, part of their day-to-day business vocabulary, and used as the primary key of the entity being modeled.

Primary Key Guidelines

  • Desirable primary key characteristics include: non-intelligent, no change over time, preferably single-attribute and numeric, and security-compliant.

When to Use Composite Primary Keys

  • Identifiers of composite entities, where each primary key combination is allowed once in an M:N relationship should be composite primary keys.
  • Identifiers of weak entities should be composite primary keys.
  • Key points include strong identifying relationship with the parent entity, real-world object existence dependent on another real-world object, and how the data model uses two separate entities in a strong identifying relationship.

When to Use Surrogate Primary Keys

  • Use surrogate primary keys to simplify the identification of entity instances when there's no natural key and if the selected candidate key has embedded semantic content or is too long.
  • Ensure the candidate key of the entity in question performs properly with "unique index" and "not null" constraints.

Implementing 1:1 Relationships

  • Foreign keys work with primary keys to implement relationships in the relational model.
  • The primary key of the parent entity is placed on the dependent entity as a foreign key.
  • In selecting and placing the foreign key look at placing in both entities or just one.
  • When one side is mandatory and the other is optional, place the PK of the entity on the mandatory side in the entity on the optional side as a FK, and make the FK mandatory.
  • When both sides are optional, select the FK that causes the fewest nulls or place the FK in the entity in which the relationship role is played.
  • When both sides are mandatory, revise the model to ensure that the two entities do not belong together in a single entity or see case two.

Maintaining History of Time-Variant Data

  • Time-variant data refers to data whose values change over time.
  • It requires creating a new entity in a 1:M relationship with the original entity.
  • The new entity contains the new value, the date of the change, and any other pertinent attribute.

Design Trap

  • Occurs when a relationship is improperly or incompletely identified.
  • The design trap may not be represented in a correct way with the real world.
  • A 'fan trap' occurs when one entity is in two 1:M relationships to other entities.
  • Fan traps produce an association among other entities not expressed in the model.

Redundant Relationships

  • Redundant relationships occur when there are multiple relationship paths between related entities.
  • Redundant relationships must remain consistent across the model and help simplify the design.

Database Tables and Normalization

  • Normalization is often evaluating and correcting table structures to minimize data redundancies.
  • Normalization reduces data anomalies.
  • Normal forms include first normal form (1NF), second normal form (2NF) and third normal form (3NF)
  • Higher normal forms are better than lower normal forms, and Properly designed 3NF structures meet the requirement of fourth normal form (4NF).
  • Denormalization produces a lower normal form but results in increased performance and greater data redundancy.

The Need for Normalization

  • Used while designing a new database structure.
  • Analyzes the relationship among the attributes within each entity.
  • Determines if the structure can be improved through normalization.
  • Improves the existing data structure and creates an appropriate database design.

The Normalization Process

  • The objective is to ensure each table conforms to the concept of well-formed relations.
  • Each table represents a single subject.
  • Each row/column intersection contains only one value, not a group of values.
  • No data item will be unnecessarily stored in more than one table.
  • All nonprime attributes in a table are dependent on the primary key.
  • Each table has no insertion, update, or deletion anomalies.
  • Ensures all tables are at least in 3NF.
  • Higher forms are not likely to be encountered in a business environment.
  • Works one relation at a time.
  • Identifies the dependencies of a relation (table).
  • Progressively breaks the relation up into a new set of relations.

Normal Form Characteristics

  • First normal form (1NF) indicates basic table format, no repeating groups, and PK identified.
  • Second normal form (2NF) encompasses 1NF and no partial dependencies.
  • Third normal form (3NF) requires 2NF and no transitive dependencies.
  • Boyce-Codd normal form (BCNF) suggests every determinant is a candidate key (special case of 3NF).
  • Fourth normal form (4NF) needs 3NF and no independent multivalued dependencies.

Normalization Concepts

  • Functional dependence occurs when the a value on attribute A determines exactly one value for some attribute B.
  • Full functional dependency occurs when you have a composite primary key consisting of Attributes A & B, attribute C is functionally dependent on C, but not attribute A, the attribute C if fully functionally dependent on Attributes A & B

Normalization Dependencies

  • Partial dependency is functional dependence in which the determinant is only part of the primary key, assuming one candidate key, is straightforward, and easy to identify.
  • Transitive dependency is when an attribute is dependent on another attribute that is not part of the primary key, is more difficult to identify among a set of data, and occurs only when a functional dependence exists among nonprime attributes.

Conversion to First Normal Form (1NF)

  • A repeating group is a group of multiple entries of the same type that can exist for any single key attribute occurrence, so, reducing data redundancies can remove this
  • Eliminating the repeating groups requires identifying the primary key and all dependencies.
  • A dependency diagram depicts all dependencies found within the given table structure.
  • A dependency diagram helps to get an overview of all relationships among table's attributes and makes it less likely that an important dependency will be overlooked.
  • 1NF describes tabular format in which:
    • All key attributes are defined
    • There are no repeating groups in the table
    • All attributes are dependent on the primary key
  • All relational tables satisfy 1NF requirements, and some tables contain partial dependencies related to update, insertion, or deletion.

Conversion to Second Normal Form (2NF)

  • Conversion to 2NF occurs only when the 1NF has a composite primary key.
  • If the 1NF has a single-attribute primary key, then the table is automatically in 2NF.
  • The 1NF-to-2NF conversion is simple with these 2 steps.
    • Make new tables to eliminate partial dependencies
    • Reassign corresponding dependent attributes requirements
  • Table is in 2NF when it:
    • Is in 1NF
    • Includes no partial dependencies

Conversion to Third Normal Form (3NF)

  • The data anomalies created by the database organization shown in Figure 6.4 are easily eliminated so the following steps can be done..
    • Make new tables to eliminate transitive dependencies
    • Reassign corresponding dependent attributes
  • Table is in 3NF when it:
    • Is in 2NF
    • Contains no transitive dependencies

Improving Design

  • Normalization is valuable because its use helps eliminate data redundancies.
  • Evaluate PK assignments and naming conventions.
  • Refine attribute atomicity. Atomic Attribute: can't be subdvidied.
  • Atomicity: Characteristic of a attribute.
  • Identify new variables and new conditions.
  • Refine primary Keys as required for data Granularity. Details are Level displayed by values stored in data rows.
  • Maintain actual/evaluate historicals depending on derived variables.

Surrogate Key Considerations

  • Designers often use surrogate keys when the established primary key is considered unsuitable.
  • The surrogate keys are system-defined attributes automatically managed by the DBMS.
  • These have a numeric value that's automatically incremented for each new row.

Boyce-Codd Normal Form (BCNF)

  • Every determinant in the table should be a candidate key.
  • A candidate key shares the same characteristics as a primary key, but was not chosen to be the primary key.
  • BCNF is equivalent to 3NF when the table contains only one candidate key.
  • BCNF is violated only when the table contains more than one candidate key.
  • BCNF is considered to be a special case of 3NF.

Fourth Normal Form (4NF)

  • All attributes must be dependent on the primary key, but they must be independent of each other.
  • No row may contain two or more multivalued facts about an entity.
  • A table is in 4NF when it: Is in 3NF and has no multivalued dependencies.

Normalization and Database Design

  • Normalization should be part of the design process, and proposed entities must meet the required normal form before table structures are created.
  • Principles and normalization procedures should be understood to redesign and modify databases.
  • ERD is created through an iterative process.
  • Normalization focuses on the characteristics of specific entities.

Denormalization

  • Design goals are Creation of normalized relations and Processing requirements and speed.
  • The Number of database tables expands.
  • Tables are decomposed to conform to normalization requirements.
  • Defects in unnormalized tables
    • Data updates are less efficient because tables are larger.
    • Indexing is more cumbersome.
    • No simple strategies for creating view tables
  • Example Cases
    • Redundant Data is Storing ZIP and CITY attributes in the AGENT table when ZIP determines CITY to avoid additional joins to program
    • Programmed to validate city based on an entered zip code.
    • Derived data is Storing Student classification data to Lookup student data, avoid joins.
    • Preaggregated data is storing student grade point averages, and it is updated every term to avoid computing constantly.

Data-Modeling Checklist

  • Business rules: all business rules are properly documented and verified with end users for clarity and accurate identification of entities, attributes, relationships, and constraints; the source of each business rule is identified, justified, dated, and officially approved.
  • Data Modeling:
    • Naming conventions: all names must conform to standards established for length and context, which is size dependent. -Entity names: are nouns that are familiar to business, short, and meaningful uniquely within the model and aliases are noted for any synonyms -Attribute names: are unique within the entity, follow the entity abbreviation as a prefix, describe characteristic of the variable, and suffix variable with primary key.
  • Relationship names: use clearly defined active/passive verbs. Entities in 3NF or higher represent single entities with a clearly defined and justified primary key.
  • A data model has been normalized to the fullest possible to minimize null values or data Redundancy.
  • Attributes
    • All should be the single valued and single valued, Default values - All should be documented constraints, synonyms, include source documentation, exclude unneeded values. -Relationships - Relationship constraints. - Evaluate how to minimize data relationships, and maximize data integrity.

Check ER Models

  • Check it against expected process updates, deletions etc
  • Check should evaluate where, when and how to manage a set of history processes, updates , deletions etc.

Summary Points

  • The extended entity relationship (EER) model adds semantics to the ER model via entity supertypes, subtypes, and clusters.
  • A specialization hierarchy depicts the arrangement and relationships between entity supertypes and entity subtypes.
  • An entity cluster is a “virtual" entity type used to represent multiple entities and relationships in the ERD.
  • Natural keys are identifiers that exist in the real world.
  • Composite keys are useful to represent M:N relationships and weak (strong identifying) entities.
  • Surrogate primary keys are useful when there is no natural key, when the primary key is a composite primary key with multiple data types, or when the primary key is too long to be usable.
  • Time-variant data refers to data whose values change over time, necessitating a history of data changes.
  • A fan trap occurs when one entity has two 1:M relationships to other entities, causing an unexpressed association among the other entities.
  • Normalization is a technique used to design tables in which data redundancies are minimized.
  • A table is in 1NF when all key attributes are defined and all remaining attributes are dependent on the primary key.
  • A table is in 2NF when it is in 1NF and contains no partial dependencies.
  • A table is in 3NF when it is in 2NF and contains no transitive dependencies.
  • A table that is not in 3NF may be split into new tables until all of the tables meet the 3NF requirements.
  • Normalization is an important part—but only a part—of the design process.
  • A table in 3NF might contain multivalued dependencies that produce either numerous null values or redundant data.
  • The larger the number of tables, the more additional I/O operations and processing logic you need to join them.
  • The data-modeling checklist provides a way for the designer to check that the ERD meets a set of minimum requirements.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser