CPSC 332 Lecture4 PDF
Document Details
Uploaded by Deleted User
California State University, Fullerton
Bhavya Meghana Chippada
Tags
Related
- Test Bank for Database Systems Design, Implementation & Management (PDF)
- Database Systems: A Practical Approach to Design, Implementation, and Management (2014) PDF
- CPSC 332 Lecture4 PDF
- AD3391 Database Design and Management Lecture Notes 1 PDF
- Database Systems: A Practical Approach to Design, Implementation, and Management (6th Edition) PDF
- Pearson Database Systems PDF
Summary
This document is a lecture on enhanced entity-relationship modeling (EER) for database systems. It outlines concepts like subclasses, superclasses, and specialization. The lecture focuses on how to design databases in a way that is accurate and organized.
Full Transcript
CPSC - 332 File Structure and Database Systems - Enhanced Entity-Relationship (EER) Modeling Instructor: Bhavya Meghana Chippada Contents: EER EER Model Concepts Includes all modeling concepts of basic ER Additional concepts: Subclasses/Superclasses Specializa...
CPSC - 332 File Structure and Database Systems - Enhanced Entity-Relationship (EER) Modeling Instructor: Bhavya Meghana Chippada Contents: EER EER Model Concepts Includes all modeling concepts of basic ER Additional concepts: Subclasses/Superclasses Specialization/Generalization Categories (UNION types) Attribute and Relationship Inheritance Constraints on Specialization/Generalization Knowledge Representation and Ontology Concepts Reference Prof. Shawn Wang Enhanced Entity Relationship Model: The Enhanced Entity-Relationship (EER) Model is an extension of the traditional ER (Entity-Relationship) model, adding more features to model complex database structures more accurately. It introduces concepts such as subclasses, superclasses, specialization, generalization, aggregation, and categorization to represent more detailed relationships between entities. Superclass: A Superclass is the more general entity that contains attributes and relationships common to all its subclasses. It represents a higher level of abstraction. Subclass: A Subclass is a specialized version of a superclass. It inherits the attributes and relationships of its superclass but can also introduce additional attributes or relationships that are unique to the subclass. Subclasses are defined based on specific attributes or properties of the superclass's entities. A subclass represents a subset of entities from a superclass. Each subclass has its own attributes but also inherits the attributes of its superclass. This mechanism allows you to organize the data better when certain entities in a general group (superclass) need to be specialized into more specific groups (subclasses). Eg: EMPLOYEE entity may be further subdivided into specific job roles like: SECRETARY ENGINEER TECHNICIAN Subclasses and Super Classes: The EMPLOYEE entity type is the superclass, representing the general concept of an employee. It shows three subclasses of EMPLOYEE: SECRETARY, TECHNICIAN, and ENGINEER. These subclasses represent more specialized roles within the general category of employees. The relationship between the superclass and its subclasses is known as specialization. It indicates that an instance of a subclass is also an instance of the superclass. Each entity type (superclass and subclasses) has its own attributes, which are represented by ovals. For example, EMPLOYEE has attributes like Fname, Minit, Lname, Name, Ssn, Birth_date, and Address. Subclasses may have additional attributes specific to their roles, such as Typing_speed for SECRETARY or Eng_type for ENGINEER. Subclasses and Super Classes: Each subclass is a specialized entity that represents a more specific role or type of an employee, such as SECRETARY, TECHNICIAN, or MANAGER. These subclasses may have additional attributes specific to their roles. For example, TECHNICIAN may have an attribute like Certification, while MANAGER may have a Department A member of the superclass (e.g., EMPLOYEE) can optionally be a member of one or more subclasses (e.g., MANAGER or TECHNICIAN). An entity that belongs to a subclass represents the same real-world entity as the one in the superclass. The IS-A relationship is a key feature of subclass/superclass hierarchies in EER models. It expresses that the subclass is a specific version of the superclass. SECRETARY IS-A EMPLOYEE TECHNICIAN IS-A EMPLOYEE MANAGER IS-A EMPLOYEE This means that every entity in a subclass (e.g., SECRETARY) is also an entity of the superclass (e.g., EMPLOYEE). In disjoint subclasses, an entity can belong to only one subclass at a time (e.g., an employee can either be a SECRETARY or TECHNICIAN, but not both). In overlapping subclasses, an entity can belong to multiple subclasses (e.g., an employee can be both a MANAGER and a TECHNICIAN). Representing Specialization in EER Diagrams: The EMPLOYEE entity type is the superclass, representing the general concept of an employee. The diagram shows three subclasses of EMPLOYEE: SECRETARY, TECHNICIAN, and ENGINEER. These subclasses are defined based on the values of the Job_type attribute. This type of specialization uses an attribute (in this case, Job_type) to determine the subclass membership. The values of the attribute (e.g., 'Secretary', 'Engineer', 'Technician') correspond to the different subclasses. Each entity type (superclass and subclasses) has its own attributes. For example, EMPLOYEE has attributes like Fname, Minit, Lname, Name, Ssn, Birth_date, and Address. Subclasses may have additional attributes specific to their roles, such as Typing_speed for SECRETARY or Eng_type for ENGINEER. The diagram also shows relationships between entities. In this case, there are no explicit relationships shown, but it's implied that SECRETARY, TECHNICIAN, and ENGINEER would have their own relationships based on their specific roles. Attribute Inheritance in Superclass / Subclass Relationships: In an Enhanced Entity-Relationship (EER) Model, when an entity is a member of a subclass, it inherits all the attributes and relationships of its superclass Attribute Inheritance: When an entity becomes a member of a subclass, it automatically inherits all attributes of the superclass. This means that every instance of the subclass will have values for these inherited attributes. Relationship Inheritance: Similarly, the entity also inherits all relationships defined for the superclass. This ensures that the subclass can participate in the same associations as the superclass. Subclass entities inherit all the attributes of the superclass. This means that any SECRETARY, TECHNICIAN, or ENGINEER subclass will automatically have the attributes defined for the EMPLOYEE superclass Subclasses also inherit all the relationships defined for the superclass. For example, if the EMPLOYEE superclass has a relationship with the DEPARTMENT entity (i.e., each employee works in a department), then every SECRETARY, TECHNICIAN, and ENGINEER will also have that relationship. Inheritance ensures that subclass entities participate in the relationships that the superclass has established. Ensures that every subclass entity has the required information (attributes and relationships) defined in the superclass, maintaining data consistency. Reduces redundancy by allowing shared attributes and relationships to be defined once in the superclass rather than repeatedly for each subclass. Specialization: Specialization refers to the process of creating more specific entities (called subclasses) from a general entity (called a superclass). It is used to model entities that share common characteristics but have additional distinguishing features or roles. A superclass can have multiple specializations based on different criteria. Subclasses can have their own unique attributes, which are not present in the superclass. called specific or local attributes. For example: SECRETARY may have TypingSpeed. Subclasses can participate in specific relationships that are relevant to their specialized roles. In EER diagrams, specialization is typically represented with a triangle or tree structure connecting the superclass to its subclasses, showing the hierarchical relationship. Subclasses can participate in their own specific relationships. For example: HOURLY_EMPLOYEE might have a relationship BELONGS_TO, representing the union or department they are part of. For EMPLOYEE, a subclass might be: MANAGER For EMPLOYEE, subclasses might be: SALARIED_EMPLOYEE HOURLY_EMPLOYEE Specialization: The EMPLOYEE entity type is the superclass, representing the general concept of an employee. The three subclasses of EMPLOYEE: SECRETARY, TECHNICIAN, and ENGINEER. These subclasses represent more specialized roles within the general category of employees..EMPLOYEE has attributes like Fname, Minit, Lname, Name, Ssn, Birth_date, and Address. Subclasses may have additional attributes specific to their roles, such as Typing_speed for SECRETARY or Eng_type for ENGINEER. The relationship between entities like MANAGES relationship indicates that a MANAGER can manage multiple PROJECTS, and the BELONGS_TO relationship indicates that an EMPLOYEE can belong to multiple TRADE_UNIONs. Generalization: Generalization is the process of identifying commonalities among several classes and creating a more general superclass to represent them. It's the reverse of specialization. Classes with shared attributes and relationships are grouped into a superclass. To abstract and unify common features of several entities into a single superclass, reducing redundancy and simplifying the data model. Create a superclass to represent these common features. The original entities (now subclasses) are considered as more specific versions of the new superclass. By defining a common superclass, you can reuse attributes and relationships across multiple subclasses. This Ensures data consistency by enforcing common attributes and relationships. Provides flexibility in modeling complex relationships and hierarchies. Simplifies the data model by grouping related classes into a single superclass. Example: Superclass: VEHICLE Subclasses: CAR, TRUCK In this example, CAR and TRUCK share common features (e.g., make, model, year, color) and can be generalized into a VEHICLE superclass. Generalization: a) Shows two separate entity types, CAR and TRUCK, each with its own attributes: CAR: Vehicle_id, Price, License_plate_no, No_of_passengers, Max_speed TRUCK: Vehicle_id, Price, License_plate_no, No_of_axles, Tonnage b) Represents the generalization of CAR and TRUCK into a single superclass, VEHICLE. The VEHICLE entity inherits common attributes from both CAR and TRUCK: Vehicle_id, Price, and License_plate_no. The subclasses, CAR and TRUCK, have their own specific attributes: CAR: No_of_passengers, Max_speed TRUCK: No_of_axles, Tonnage In this case, CAR and TRUCK share common attributes like Vehicle_id, Price, and License_plate_no. These common attributes are grouped into the VEHICLE superclass. The subclasses, CAR and TRUCK, inherit these common attributes from the superclass and also have their own specific attributes that differentiate them. Notations in Generalization and Specialization: Specialization and generalization can be represented visually without using arrows to avoid subjectivity. In some cases, arrows are used to visually represent generalization and specialization: An arrow pointing toward the superclass represents generalization. Arrows pointing toward the subclasses represent specialization. However, these notations can be subjective because it is not always clear which process—generalization or specialization—is more appropriate for a given situation. Data Modelling : In an EER diagram : A superclass or subclass represents a collection or set of entities, which can also be viewed as a type of entity. Both superclasses and subclasses are depicted using rectangles in an EER diagram, just like regular entity types. All these entity types (whether they are superclasses or subclasses) can be collectively referred to as classes. Constraints on Specialization and Generalization: In Enhanced Entity-Relationship (EER) modeling, the membership of entities in subclasses can be determined by certain conditions or operations. These conditions are classified as predicate-defined, attribute-defined, or user-defined based on how the subclass membership is specified. A predicate-defined subclass is one where we can explicitly determine which entities from the superclass will become members of each subclass based on a specific condition or predicate. This condition acts as a constraint that determines the membership of entities in the subclass. In EER diagrams, the predicate condition is displayed by writing it next to the line connecting the subclass to its superclass. In an attribute-defined specialization, all subclasses in the specialization are determined based on the value of the same attribute in the superclass. This attribute is called the defining attribute of the specialization. The value of this attribute dictates the subclass to which an entity belongs. In EER diagrams, the defining attribute is usually shown next to the line connecting the superclass and its subclasses. In a user-defined subclass, there is no predefined condition or attribute value that determines membership in a subclass. Instead, membership in the subclass is determined manually by the database users, who apply an operation to add specific entities from the superclass to a subclass. This allows more flexibility, as entities are assigned to subclasses on an individual basis, based on the users’ decision, rather than an automatic condition. Predicate-defined class: Attribute-defined class: Displaying an attribute-defined specialization in EER diagrams: The provided EER diagram illustrates an attribute-defined specialization based on the Job_type attribute The EMPLOYEE entity type is specialized into three subclasses based on the values of the Job_type attribute: Employees with a Job_type of 'Secretary' have additional attributes like Typing_speed. Employees with a Job_type of 'Technician' have additional attributes like Tgrade. Employees with a Job_type of 'Engineer' have additional attributes like Eng_type. The diamond shape connected to the EMPLOYEE entity type represents the specialization. The lines connecting the diamond to the subclasses indicate the specialization relationship. The Job_type attribute is placed next to the diamond to show that it's the defining attribute for the specialization. Constraints on Specialization and Generalization: Constraints are used to control how entities are categorized into subclasses (specialization) or how subclasses combine into a superclass (generalization). These constraints define the structure and behavior of the superclass-subclass relationships in the model. The main types of constraints are disjointness, completeness 1. Disjointness Constraint: The disjointness constraint defines whether an entity in the superclass can belong to more than one subclass. Disjoint: An entity in the superclass can belong to only one subclass. Example: If an Employee is specialized into Manager and Engineer using disjoint specialization, a particular employee can be either a Manager or an Engineer, but not both. Overlapping : An entity in the superclass can belong to more than one subclass. Example: If an Employee is specialized into Manager and Project_Lead, an employee can be both a Manager and a Project_Lead simultaneously. 2. Completeness Constraint: The completeness constraint defines whether all entities in the superclass must be a member of at least one subclass. Total: Every entity in the superclass must be a member of at least one subclass. Example: In a specialization of Vehicle into Car and Truck, if this is a total specialization, every vehicle must be either a car or a truck. In EER diagrams, total specialization is represented by a double line connecting the superclass to the specialization. Partial Specialization : An entity in the superclass may or may not be a member of any subclass. Example: In a specialization of Person into Student and Employee, if this is partial specialization, there can be persons who are neither students nor employees. In EER diagrams, partial specialization is represented by a single line connecting the superclass to the specialization. Types of Specialization/Generalization : 1. Disjoint, Total: Disjoint: No entity can belong to more than one subclass. Total: All entities in the superclass must belong to one of the subclasses. 2. Disjoint, Partial: Disjoint: No entity can belong to more than one subclass. Partial: Not all entities in the superclass need to belong to a subclass. 3. Overlapping, Total: Overlapping: An entity can belong to multiple subclasses. Total: All entities in the superclass must belong to at least one subclass. 4. Overlapping, Partial: Overlapping: An entity can belong to multiple subclasses. Partial: Not all entities in the superclass need to belong to a subclass. Total generalization: All instances of the subclasses must also be instances of the superclass. Eg : A Person can be specialized into Student and Employee. All Students and Employees are also Persons Partial generalization: Not all instances of the subclasses need to be instances of the superclass. Eg ;A Vehicle can be specialized into Car and Motorcycle. However, not all Vehicles are necessarily Cars or Motorcycles. Example of disjoint partial Specialization : Example of overlapping total Specialization: The PART entity type is specialized into two subclasses based on the source of the part: MANUFACTURED_PART: Parts that are manufactured internally. PURCHASED_PART: Parts that are purchased from suppliers. A PART can belong to both subclasses, meaning a part can be both manufactured and purchased. This is indicated by the overlapping lines connecting the PART entity type to the subclasses. Specialization/Generalization Hierarchies and Lattices: A subclass may itself have further subclasses. This forms a hierarchy or a lattice. Hierarchy: In a hierarchy, each subclass has only one superclass, which is known as single inheritance. This structure resembles a tree. Each subclass inherits the attributes of its immediate superclass and all of its predecessor superclasses up the hierarchy. Lattice: A lattice allows a subclass to have more than one superclass, which is known as multiple inheritance. In this structure, a subclass can inherit attributes from multiple superclasses, creating a more complex relationship structure. A subclass with more than one superclass is called a shared subclass. In a lattice, lines can connect subclasses to multiple superclasses, forming a more interconnected web rather than a strict tree structure. Both specialization and generalization can result in either hierarchies or lattices, depending on how the entities and relationships are modeled. We often use specialization as a general term to represent the result of either process, focusing on refining entity types into subclasses. Hierarchy Lattice Shared Subclass “Engineering_Manager”: The EMPLOYEE entity type is the superclass, representing a general class of all employees. It has been specialized into several subclasses based on job roles and pay types. The d inside the circle at the top of the diagram signifies that this specialization is disjoint. This means that each employee can belong to only one of the specialized subclasses (e.g., SECRETARY, TECHNICIAN, ENGINEER, or MANAGER). Employees cannot be classified into multiple job roles at the same time (no overlap among these subclasses). This structure forms a lattice rather than a hierarchy because ENGINEERING_MANAGER has multiple superclasses (ENGINEER and MANAGER), indicating multiple inheritance. In a hierarchy, each subclass would have only one direct superclass, but in a lattice, a subclass can inherit from more than one superclass. Specialization/Generalization Shared Subclass: Shared Subclasses: A shared subclass is a subclass that has multiple superclasses and hence participates in multiple inheritance. Specialization follows a top-down approach, starting with a general entity type and breaking it down into more specific subclasses based on distinguishing characteristics. This is called a conceptual refinement process. Example: Starting with the Person entity and defining subclasses like Employee and Student is a specialization process. Generalization follows a bottom-up approach, where we start with many specific entity types and combine them into a superclass that shares common attributes. This is called a conceptual synthesis process. Example: Starting with Engineer, Technician, and Manager, and generalizing them into a Employee superclass is a generalization process. In practice, both top-down specialization and bottom-up generalization are often employed together to model complex systems. Categories (UNION TYPES): Categories, also known as union types, are a powerful construct in data modeling that allow for flexible and expressive representations of entities that can belong to multiple categories or types. A category (or union type) is a subclass that can belong to one or more distinct superclasses, but it does not inherit from all superclasses simultaneously. Instead, an entity in the category belongs to exactly one of the superclasses at any given time. The superclasses in a category may represent different entity types, not necessarily specializations of a common superclass. Example :In a vehicle registration database, the entity type OWNER can represent a PERSON, a BANK (holding a lien on the vehicle), or a COMPANY. The OWNER category is a union of these three distinct superclasses (PERSON, BANK, and COMPANY), and a member of OWNER can be either a PERSON, a BANK, or a COMPANY—but not more than one at the same time. This is different from a shared subclass because the OWNER category represents a union of its superclasses, not their intersection Categories are typically represented using a diamond shape connected to the entity type. Lines connect the diamond to the categories, indicating the possible memberships. A shared subclass is a subset of the intersection of multiple superclasses, requiring an entity to belong to all of them. A category is a subset of the union of its superclasses, meaning an entity can belong to one of the superclasses, but not all at the same time. Two categories (UNION types): OWNER, REGISTERED_VEHICLE: Formal Definitions of EER Model: Class :A class represents a type of entity. It can be an entity type, a subclass, a superclass, or a category. In ER/EER models, using “class” instead of “entity type” can better capture relationships among various classes. Subclass : A subclass is a subset of a superclass. It inherits all attributes and relationships from its superclass but is more specific. A subclass S of a class C inherits all attributes and relationships of C. The set of entities in S is always a subset of the set of entities in C (i.e., S \subseteq C ). C is the superclass of S , and the relationship between S and C is referred to as a superclass/subclass relationship. Superclass C: The superclass C is the more general class that encompasses the subclass S. A superclass/subclass relationship exists between S and C, indicating that S is a specialized version of C. Specialization : Specialization is when a superclass is divided into subclasses. Z is a set of subclasses {S1, S2,..., Sn} with the same superclass G. This means that G/Si is a superclass/subclass relationship for i = 1,..., n. The superclass G is called a generalization of the subclasses {S1, S2,..., Sn}. A specialization Z is total if the union of all the subclasses' sets of entities equals the set of entities of the superclass (S1 ∪ S2 ∪... ∪ Sn = G). A specialization Z is partial if the union of all the subclasses' sets of entities is not equal to the set of entities of the superclass. A specialization Z is disjoint if the sets of entities of any two distinct subclasses have no overlap (Si ∩ S2 = empty-set for i ≠ j). A specialization Z is overlapping if the sets of entities of at least two distinct subclasses have some overlap. Subclasses inherit properties and behaviors from their superclass, promoting code reuse and modularity. Objects of different subclasses can be treated as objects of their common superclass, allowing for flexible and dynamic programming. Superclasses capture the commonalities among subclasses, providing a higher-level view of the domain. Superclass/subclass relationships are essential for modeling hierarchical structures and inheritance in database design. Formal Definitions of EER Model: Predicate-Defined Subclass : A subclass defined by a condition or predicate applied to attributes of the superclass. A subclass S of a class C is defined by a predicate p if membership in S is determined by whether entities in C satisfy p. Formally, S = C[p] , where C[p] includes all entities in C that meet the condition p. User-Defined Subclass : A subclass not defined by a specific predicate but by design or specific requirements of the system. Attribute-Defined Specialization : Uses a specific attribute of the superclass to define subclasses. This is a specific form of specialization where a subclass S_i is defined by a predicate involving an attribute A of the general class G. The predicate is of the form A = c_i , where c_i is a constant from the domain of A. Category (UNION Type) : A category is a class that represents the union of multiple superclasses. It includes entities that belong to any of the defining superclasses. A category or UNION type T is a class that is a subset of the union of n defining superclasses D1, D2,..., Dn, where n > 1. This can be expressed as: T ⊆ (D1 ∪ D2 ∪... ∪ Dn) A predicate pi can be specified on the attributes of each Di to further specify which entities of Di are members of T. If a predicate is specified on every Di, then T can be defined as: T = (D1[p1] ∪ D2[p2] ∪... ∪ Dn[pn]) Alternative Diagrammatic Notations : ER Diagrams: The most common way to represent database schemas, particularly in relational database design. Many other notations exist in the literature and are employed in various database design and modeling tools. Alternative Notations : Information Flow Diagrams (IFDs): Focus on the flow of data within a system. Data Flow Diagrams (DFDs): Emphasize the transformation of data as it moves through a system. Semantic Data Models: Provide a more formal and precise representation of data, often used in knowledge representation and reasoning. Relational Algebra: A formal language used to describe database operations. UML Example for Displaying Specialization / Generalization : Other Alternative Diagrammatic Notations : a) Entity Type/Class, Attribute, and Relationship Symbols (c) Displaying Cardinality Ratios (d) Various (min, max) Notations b) Displaying Attributes (e) Specialization/Generalization Notations Knowledge Representation : Knowledge Representation (KR) : KR is concerned with modeling and representing a domain of knowledge. It aims to encode knowledge in a way that enables computers to understand, reason, and make decisions based on that knowledge. Ontology: An ontology in KR is a formal representation of a set of concepts within a domain and the relationships between those concepts. It serves as a schema or framework for organizing and interpreting knowledge in a knowledge base. Common Features with Data Models: Both KR and data models use similar abstractions such as classification (defining types), aggregation (grouping), generalization (hierarchies), and identification (unique identifiers). Both provide concepts, relationships, constraints, operations, and languages to represent knowledge and model data. Differences from Data Models: KR has a broader scope, dealing with incomplete or missing knowledge, common-sense reasoning, and defaults. KR schemes typically include rules and mechanisms for inferencing (deducing new knowledge from existing knowledge). KR often involves both data and metadata but treats them differently. In contrast, data modeling typically distinguishes between data and metadata. KR is often used in artificial intelligence systems for decision support and more complex reasoning tasks. Data Modeling : Data modeling focuses on structuring and organizing data within a system to ensure it is efficiently stored, retrieved, and managed. Entities and Attributes: Define what data will be stored (e.g., customers, orders) and the characteristics of that data (e.g., customer name, order date). Relationships: Describe how different entities are related (e.g., customers place orders). Constraints: Ensure data integrity and validity (e.g., a customer cannot place an order if they do not exist). Additional Slides: General Basis for Conceptual Modeling : Conceptual Modeling Conceptual modeling involves creating a high-level representation of a domain or system. It focuses on the essential concepts, relationships, and rules that define the domain. This representation is typically expressed using diagrams, such as Entity-Relationship (ER) diagrams or Unified Modeling Language (UML) diagrams. Types of Data Abstractions: Classification & Instantiation: Classification: Organizing entities into classes based on common characteristics (e.g., “Employee”). Instantiation: Creating specific instances of classes (e.g., “John Doe”). Aggregation & Association: Aggregation: Whole-part relationship (e.g., “Library” contains “Books”). Association: General relationships between entities (e.g., “Student” enrolls in “Course”). Generalization & Specialization: Generalization: Creating a generalized class from multiple classes (e.g., “Vehicle” from “Car” and “Truck”). Specialization: Defining more specific subclasses (e.g., “SportsCar” from “Car”). Identification: Defining unique identifiers for instances (e.g., unique ID for “Employee”). Constraints: Rules ensuring data integrity (e.g., unique keys, domain constraints). Cardinality: Min Cardinality: Minimum number of associations (e.g., “Professor” must teach at least one “Course”). Max Cardinality: Maximum number of associations (e.g., “Student” can enroll in up to 5 “Courses”). Coverage: Total vs. Partial: Whether all superclass instances are represented in subclasses. Disjoint vs. Overlapping: Whether subclasses are mutually exclusive or can overlap (e.g., “Employee” and “Student”). THANKYOU ! QUESTIONS?