Podcast
Questions and Answers
What does EERM stand for?
What does EERM stand for?
Extended Entity Relationship Model
What is the purpose of the Entity Supertype?
What is the purpose of the Entity Supertype?
Generic entity type related to one or more entity subtypes, and contains common characteristics.
All entity subtypes inherit their primary key attribute from their supertype
All entity subtypes inherit their primary key attribute from their supertype
True (A)
At the implementation level, a supertype and its subtype(s) maintain a 1:1 relationship.
At the implementation level, a supertype and its subtype(s) maintain a 1:1 relationship.
What does the attribute in the supertype entity determine?
What does the attribute in the supertype entity determine?
What is the difference between disjoint and overlapping constraints?
What is the difference between disjoint and overlapping constraints?
What is the difference between specialization and generalization?
What is the difference between specialization and generalization?
What is an entity cluster?
What is an entity cluster?
What is the purpose of primary keys?
What is the purpose of primary keys?
What is a natural key?
What is a natural key?
Desirable primary key characteristics include being intelligent, changing over time, and being preferably non-numeric.
Desirable primary key characteristics include being intelligent, changing over time, and being preferably non-numeric.
When are composite primary keys used?
When are composite primary keys used?
When are surrogate primary keys used?
When are surrogate primary keys used?
What is time-variant data?
What is time-variant data?
What is a design trap?
What is a design trap?
What is normalization?
What is normalization?
What are Normal Forms?
What are Normal Forms?
What is the objective of the Normalization process?
What is the objective of the Normalization process?
Match the normal form with its characteristic:
Match the normal form with its characteristic:
A table is in _____ when all key attributes are defined and all remaining attributes are dependent on the primary key?
A table is in _____ when all key attributes are defined and all remaining attributes are dependent on the primary key?
What is the disadvantage to joining a larger number of tables?
What is the disadvantage to joining a larger number of tables?
Data updates are more efficient because tables are larger (in denormalized tables).
Data updates are more efficient because tables are larger (in denormalized tables).
Flashcards
Extended Entity Relationship Model (EERM)
Extended Entity Relationship Model (EERM)
An enhanced entity relationship model incorporating more semantic constructs.
Entity Supertype
Entity Supertype
Generic entity type related to one or more entity subtypes, containing common characteristics.
Entity Subtype
Entity Subtype
Entity with unique characteristics of each entity subtype.
Inheritance
Inheritance
Signup and view all the flashcards
Subtype Discriminator
Subtype Discriminator
Signup and view all the flashcards
Disjoint Subtypes
Disjoint Subtypes
Signup and view all the flashcards
Overlapping Subtypes
Overlapping Subtypes
Signup and view all the flashcards
Completeness Constraint
Completeness Constraint
Signup and view all the flashcards
Specialization
Specialization
Signup and view all the flashcards
Generalization
Generalization
Signup and view all the flashcards
Entity Clustering
Entity Clustering
Signup and view all the flashcards
Primary Key
Primary Key
Signup and view all the flashcards
Natural Key
Natural Key
Signup and view all the flashcards
Optimal Primary Keys
Optimal Primary Keys
Signup and view all the flashcards
Composite Primary Key
Composite Primary Key
Signup and view all the flashcards
Surrogate Primary Key
Surrogate Primary Key
Signup and view all the flashcards
Implementing 1:1 Relationships
Implementing 1:1 Relationships
Signup and view all the flashcards
Time-Variant Data
Time-Variant Data
Signup and view all the flashcards
Design Trap
Design Trap
Signup and view all the flashcards
Fan Trap
Fan Trap
Signup and view all the flashcards
Redundant Relationships
Redundant Relationships
Signup and view all the flashcards
Normalization
Normalization
Signup and view all the flashcards
Normal forms advantage
Normal forms advantage
Signup and view all the flashcards
Denormalization
Denormalization
Signup and view all the flashcards
The Need for Normalization
The Need for Normalization
Signup and view all the flashcards
Normalization Objective
Normalization Objective
Signup and view all the flashcards
First Normal Form (1NF)
First Normal Form (1NF)
Signup and view all the flashcards
Second Normal Form (2NF)
Second Normal Form (2NF)
Signup and view all the flashcards
Third Normal Form (3NF)
Third Normal Form (3NF)
Signup and view all the flashcards
Functional Dependence
Functional Dependence
Signup and view all the flashcards
Partial Dependency
Partial Dependency
Signup and view all the flashcards
Transitive Dependency
Transitive Dependency
Signup and view all the flashcards
Surrogate Key Considerations
Surrogate Key Considerations
Signup and view all the flashcards
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF)
Signup and view all the flashcards
Fourth Normal Form (4NF)
Fourth Normal Form (4NF)
Signup and view all the flashcards
The reason for Denormalization
The reason for Denormalization
Signup and view all the flashcards
Data-modeling check list
Data-modeling check list
Signup and view all the flashcards
Study Notes
- The session discusses advanced data modeling and normalization of database tables.
- The topic is Business Analytics and Advanced Business Intelligence.
Learning Objectives
- Describe extended entity relationship (EER) model constructs and their representation in ERDs and EERDs.
- Use entity clusters to represent multiple entities and relationships in an entity relationship diagram (ERD).
- Describe the characteristics of good primary keys and how to select them.
- Apply flexible solutions for special data-modeling cases.
- Explain normalization and its role in the database design process.
- Identify and describe normal forms: 1NF, 2NF, 3NF, BCNF, and 4NF.
- Explain how normal forms can be transformed from lower to higher normal forms.
- Apply normalization rules to evaluate and correct table structures.
- Identify situations where denormalization is needed to efficiently generate information.
- Employ a data-modeling checklist to verify if an ERD meets minimum requirements.
The Extended Entity Relationship Model (EERM)
- An enhanced entity relationship model is created by adding more semantic constructs to the original entity relationship (ER) model.
- Employ the EER model in EER diagrams (EERDs).
Entity Supertypes and Subtypes
- An entity supertype is a generic entity type related to one or more entity subtypes and contains common characteristics.
- An entity subtype contains unique characteristics of each entity subtype.
- Usage Criteria: there must be different, identifiable kinds of entities in the user's environment with unique attributes for each kind of instance.
Inheritance
- Allows an entity subtype to inherit attributes and relationships of the supertype.
- All entity subtypes inherit their primary key attribute from their supertype.
- At the implementation level, the supertype and its subtype(s) maintain a 1:1 relationship.
- Entity subtypes inherit all relationships in which the supertype entity participates.
- Lower-level subtypes inherit all attributes and relationships from their upper-level supertypes.
Subtype Discriminator
- An attribute in the supertype entity that determines to which entity subtype the supertype occurrence is related.
- A default comparison condition is the 'equality comparison'.
- In some cases, the subtype discriminator is not based on an equality comparison.
Disjoint and Overlapping Constraints
- Disjoint subtypes contain unique subsets of the supertype entity set and are known as nonoverlapping subtypes, implemented based on the subtype discriminator attribute value.
- Overlapping subtypes contain nonunique subsets of the supertype entity set with a subtype discriminator attribute for each subtype.
Completeness Constraint
- Specifies whether each supertype occurrence must also be a member of at least one subtype.
- Partial completeness: Not every supertype occurrence is a member of a subtype.
- Total completeness: Every supertype occurrence must be a member of at least one subtype.
Specialization and Generalization
- Specialization is a top-down process that identifies lower-level, specific entity subtypes from a higher-level entity supertype, and it is based on grouping unique characteristics and relationships.
- Generalization is a bottom-up process that identifies a higher-level, more generic entity supertype from lower-level entity subtypes, based on grouping common characteristics and relationships.
Entity Clustering
- A virtual entity type is used to represent multiple entities and relationships in an ERD.
- Created by combining multiple interrelated entities into a single, abstract entity object.
- A general rule is to avoid the display of attributes to eliminate complications when inheritance rules change.
Entity Integrity: Selecting Primary Keys
- Primary keys are single attributes or a combination of attributes that uniquely identify each entity instance.
- Primary keys guarantee entity integrity.
- Primary keys work with foreign keys to implement relationships.
Natural Keys and Primary Keys
- Natural keys (or natural identifiers) are real-world identifiers used to uniquely identify real-world objects.
- They are familiar to end-users, part of their day-to-day business vocabulary, and used as the primary key of the entity being modeled.
Primary Key Guidelines
- Desirable primary key characteristics include: non-intelligent, no change over time, preferably single-attribute and numeric, and security-compliant.
When to Use Composite Primary Keys
- Identifiers of composite entities, where each primary key combination is allowed once in an M:N relationship should be composite primary keys.
- Identifiers of weak entities should be composite primary keys.
- Key points include strong identifying relationship with the parent entity, real-world object existence dependent on another real-world object, and how the data model uses two separate entities in a strong identifying relationship.
When to Use Surrogate Primary Keys
- Use surrogate primary keys to simplify the identification of entity instances when there's no natural key and if the selected candidate key has embedded semantic content or is too long.
- Ensure the candidate key of the entity in question performs properly with "unique index" and "not null" constraints.
Implementing 1:1 Relationships
- Foreign keys work with primary keys to implement relationships in the relational model.
- The primary key of the parent entity is placed on the dependent entity as a foreign key.
- In selecting and placing the foreign key look at placing in both entities or just one.
- When one side is mandatory and the other is optional, place the PK of the entity on the mandatory side in the entity on the optional side as a FK, and make the FK mandatory.
- When both sides are optional, select the FK that causes the fewest nulls or place the FK in the entity in which the relationship role is played.
- When both sides are mandatory, revise the model to ensure that the two entities do not belong together in a single entity or see case two.
Maintaining History of Time-Variant Data
- Time-variant data refers to data whose values change over time.
- It requires creating a new entity in a 1:M relationship with the original entity.
- The new entity contains the new value, the date of the change, and any other pertinent attribute.
Design Trap
- Occurs when a relationship is improperly or incompletely identified.
- The design trap may not be represented in a correct way with the real world.
- A 'fan trap' occurs when one entity is in two 1:M relationships to other entities.
- Fan traps produce an association among other entities not expressed in the model.
Redundant Relationships
- Redundant relationships occur when there are multiple relationship paths between related entities.
- Redundant relationships must remain consistent across the model and help simplify the design.
Database Tables and Normalization
- Normalization is often evaluating and correcting table structures to minimize data redundancies.
- Normalization reduces data anomalies.
- Normal forms include first normal form (1NF), second normal form (2NF) and third normal form (3NF)
- Higher normal forms are better than lower normal forms, and Properly designed 3NF structures meet the requirement of fourth normal form (4NF).
- Denormalization produces a lower normal form but results in increased performance and greater data redundancy.
The Need for Normalization
- Used while designing a new database structure.
- Analyzes the relationship among the attributes within each entity.
- Determines if the structure can be improved through normalization.
- Improves the existing data structure and creates an appropriate database design.
The Normalization Process
- The objective is to ensure each table conforms to the concept of well-formed relations.
- Each table represents a single subject.
- Each row/column intersection contains only one value, not a group of values.
- No data item will be unnecessarily stored in more than one table.
- All nonprime attributes in a table are dependent on the primary key.
- Each table has no insertion, update, or deletion anomalies.
- Ensures all tables are at least in 3NF.
- Higher forms are not likely to be encountered in a business environment.
- Works one relation at a time.
- Identifies the dependencies of a relation (table).
- Progressively breaks the relation up into a new set of relations.
Normal Form Characteristics
- First normal form (1NF) indicates basic table format, no repeating groups, and PK identified.
- Second normal form (2NF) encompasses 1NF and no partial dependencies.
- Third normal form (3NF) requires 2NF and no transitive dependencies.
- Boyce-Codd normal form (BCNF) suggests every determinant is a candidate key (special case of 3NF).
- Fourth normal form (4NF) needs 3NF and no independent multivalued dependencies.
Normalization Concepts
- Functional dependence occurs when the a value on attribute A determines exactly one value for some attribute B.
- Full functional dependency occurs when you have a composite primary key consisting of Attributes A & B, attribute C is functionally dependent on C, but not attribute A, the attribute C if fully functionally dependent on Attributes A & B
Normalization Dependencies
- Partial dependency is functional dependence in which the determinant is only part of the primary key, assuming one candidate key, is straightforward, and easy to identify.
- Transitive dependency is when an attribute is dependent on another attribute that is not part of the primary key, is more difficult to identify among a set of data, and occurs only when a functional dependence exists among nonprime attributes.
Conversion to First Normal Form (1NF)
- A repeating group is a group of multiple entries of the same type that can exist for any single key attribute occurrence, so, reducing data redundancies can remove this
- Eliminating the repeating groups requires identifying the primary key and all dependencies.
- A dependency diagram depicts all dependencies found within the given table structure.
- A dependency diagram helps to get an overview of all relationships among table's attributes and makes it less likely that an important dependency will be overlooked.
- 1NF describes tabular format in which:
- All key attributes are defined
- There are no repeating groups in the table
- All attributes are dependent on the primary key
- All relational tables satisfy 1NF requirements, and some tables contain partial dependencies related to update, insertion, or deletion.
Conversion to Second Normal Form (2NF)
- Conversion to 2NF occurs only when the 1NF has a composite primary key.
- If the 1NF has a single-attribute primary key, then the table is automatically in 2NF.
- The 1NF-to-2NF conversion is simple with these 2 steps.
- Make new tables to eliminate partial dependencies
- Reassign corresponding dependent attributes requirements
- Table is in 2NF when it:
- Is in 1NF
- Includes no partial dependencies
Conversion to Third Normal Form (3NF)
- The data anomalies created by the database organization shown in Figure 6.4 are easily eliminated so the following steps can be done..
- Make new tables to eliminate transitive dependencies
- Reassign corresponding dependent attributes
- Table is in 3NF when it:
- Is in 2NF
- Contains no transitive dependencies
Improving Design
- Normalization is valuable because its use helps eliminate data redundancies.
- Evaluate PK assignments and naming conventions.
- Refine attribute atomicity. Atomic Attribute: can't be subdvidied.
- Atomicity: Characteristic of a attribute.
- Identify new variables and new conditions.
- Refine primary Keys as required for data Granularity. Details are Level displayed by values stored in data rows.
- Maintain actual/evaluate historicals depending on derived variables.
Surrogate Key Considerations
- Designers often use surrogate keys when the established primary key is considered unsuitable.
- The surrogate keys are system-defined attributes automatically managed by the DBMS.
- These have a numeric value that's automatically incremented for each new row.
Boyce-Codd Normal Form (BCNF)
- Every determinant in the table should be a candidate key.
- A candidate key shares the same characteristics as a primary key, but was not chosen to be the primary key.
- BCNF is equivalent to 3NF when the table contains only one candidate key.
- BCNF is violated only when the table contains more than one candidate key.
- BCNF is considered to be a special case of 3NF.
Fourth Normal Form (4NF)
- All attributes must be dependent on the primary key, but they must be independent of each other.
- No row may contain two or more multivalued facts about an entity.
- A table is in 4NF when it: Is in 3NF and has no multivalued dependencies.
Normalization and Database Design
- Normalization should be part of the design process, and proposed entities must meet the required normal form before table structures are created.
- Principles and normalization procedures should be understood to redesign and modify databases.
- ERD is created through an iterative process.
- Normalization focuses on the characteristics of specific entities.
Denormalization
- Design goals are Creation of normalized relations and Processing requirements and speed.
- The Number of database tables expands.
- Tables are decomposed to conform to normalization requirements.
- Defects in unnormalized tables
- Data updates are less efficient because tables are larger.
- Indexing is more cumbersome.
- No simple strategies for creating view tables
- Example Cases
- Redundant Data is Storing ZIP and CITY attributes in the AGENT table when ZIP determines CITY to avoid additional joins to program
- Programmed to validate city based on an entered zip code.
- Derived data is Storing Student classification data to Lookup student data, avoid joins.
- Preaggregated data is storing student grade point averages, and it is updated every term to avoid computing constantly.
Data-Modeling Checklist
- Business rules: all business rules are properly documented and verified with end users for clarity and accurate identification of entities, attributes, relationships, and constraints; the source of each business rule is identified, justified, dated, and officially approved.
- Data Modeling:
- Naming conventions: all names must conform to standards established for length and context, which is size dependent. -Entity names: are nouns that are familiar to business, short, and meaningful uniquely within the model and aliases are noted for any synonyms -Attribute names: are unique within the entity, follow the entity abbreviation as a prefix, describe characteristic of the variable, and suffix variable with primary key.
- Relationship names: use clearly defined active/passive verbs. Entities in 3NF or higher represent single entities with a clearly defined and justified primary key.
- A data model has been normalized to the fullest possible to minimize null values or data Redundancy.
- Attributes
- All should be the single valued and single valued, Default values - All should be documented constraints, synonyms, include source documentation, exclude unneeded values. -Relationships - Relationship constraints. - Evaluate how to minimize data relationships, and maximize data integrity.
Check ER Models
- Check it against expected process updates, deletions etc
- Check should evaluate where, when and how to manage a set of history processes, updates , deletions etc.
Summary Points
- The extended entity relationship (EER) model adds semantics to the ER model via entity supertypes, subtypes, and clusters.
- A specialization hierarchy depicts the arrangement and relationships between entity supertypes and entity subtypes.
- An entity cluster is a “virtual" entity type used to represent multiple entities and relationships in the ERD.
- Natural keys are identifiers that exist in the real world.
- Composite keys are useful to represent M:N relationships and weak (strong identifying) entities.
- Surrogate primary keys are useful when there is no natural key, when the primary key is a composite primary key with multiple data types, or when the primary key is too long to be usable.
- Time-variant data refers to data whose values change over time, necessitating a history of data changes.
- A fan trap occurs when one entity has two 1:M relationships to other entities, causing an unexpressed association among the other entities.
- Normalization is a technique used to design tables in which data redundancies are minimized.
- A table is in 1NF when all key attributes are defined and all remaining attributes are dependent on the primary key.
- A table is in 2NF when it is in 1NF and contains no partial dependencies.
- A table is in 3NF when it is in 2NF and contains no transitive dependencies.
- A table that is not in 3NF may be split into new tables until all of the tables meet the 3NF requirements.
- Normalization is an important part—but only a part—of the design process.
- A table in 3NF might contain multivalued dependencies that produce either numerous null values or redundant data.
- The larger the number of tables, the more additional I/O operations and processing logic you need to join them.
- The data-modeling checklist provides a way for the designer to check that the ERD meets a set of minimum requirements.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.