GUC Information & Communication Architecture 1 Lecture 4 PDF
Document Details
German University in Cairo
Ayman Al-Serafi
Tags
Summary
Lecture notes on database modelling. These lecture notes cover fundamental concepts and design, including logical designs and relational models. The lecture also discusses the importance of using entity-relationship diagrams (ERDs) as effective communication tools for database design.
Full Transcript
Information & Communication Architecture 1 [BINF 303] Information Management [INSY 711] GUC - Winter 2024 – Lecture 4 Database Modelling: Fundamental Concepts and Design II Dr. Ayman Al-Serafi TAs: Ali Sul...
Information & Communication Architecture 1 [BINF 303] Information Management [INSY 711] GUC - Winter 2024 – Lecture 4 Database Modelling: Fundamental Concepts and Design II Dr. Ayman Al-Serafi TAs: Ali Suleiman* (lead), Farida Kamel, Mariam Elemary, Hams Dorgham, Hagar Hany, Mohab Gehad Outline 1. Database Design (recap) 2. Conceptual Database Design: Entity – Relationship Model 3. Logical Database Design: The Relational Model 4. The seven rules to map an ER schema into a relational one (database schema) 5. Physical Database Design 6. A Query Language for Relational DBMSs (SQL) Q&A breaks 7. Conclusion between sections Urgent Qs only in between! 4-2 ICA I - GUC - Dr. Ayman Alserafi The Three-Level Database Model Level 1: Conceptual model The highest-level and most abstract model (blueprint) Think of: main entities, their attributes and their relationships Level 2: Logical model A more detailed “Technical” view of all corporate data in terms of the relational model Think of: tables, fields, primary keys, & foreign keys Level 3: Physical or storage level model Specifies the most detailed way data is physically stored in a specific DBMS Think of: data types of fields, database constraints, indexes, etc. 4-3 ICA I - GUC - Dr. Ayman Alserafi Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) Database Design Designing a database system is a complex undertaking typically divided into four phases: 1. Requirements specification 2. Conceptual design 3. Logical design 4. Physical design 28/09/24 4 Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) Requirements specification Collects information about the users’ needs with respect to the database system and what information they need? Goal to elicit necessary and desirable system properties from prospective users, to homogenize requirement During this phase, the active participation of users will increase their satisfaction with the delivered system and avoid errors, which can be very expensive to correct if the subsequent phases have already been carried out, and to assign priorities to them. This is commonly called Business analysis / System analysis. 28/09/24 5 Step 1: Business Requirements We need to create a database for an application that stores movies, their actors, their cinema locations (and schedule of shows), and the online viewer ratings A movie can have many actors A movie is shown in many cinemas at different times A movie can only be rated once by a viewer, but can be rated by many viewers 4-6 ICA I - GUC - Dr. Ayman Alserafi Outline 1. Database Design (recap) 2. Conceptual Database Design: Entity – Relationship Model 3. Logical Database Design: The Relational Model 4. The seven rules to map an ER schema into a relational one (database schema) 5. Physical Database Design 6. Conclusion Q&A 4-7 ICA I - GUC - Dr. Ayman Alserafi The Importance of A Conceptual Data Model Blue print: official documentation E.g., like the blue print sketch of a house Business users w/o DB knowledge can understand a data model diagram vs. a list of tables Used as an effective Communication Tool Improve interaction among the managers, the designers, and the end users Independence from a particular logical data model or DBMS Relational DB, Object-oriented DB, etc. 4-8 ICA I - GUC - Dr. Ayman Alserafi STEP 2: Conceptual Database Design - Entity-relationship (ER) The entity-relationship (ER) model is one of the most often used conceptual models for designing database applications. A design (conceptual model) created using this approach is called Entity-Relationship Diagram (ERD) Show the main entities (object types modelled) as boxes, the attributes (fields) describing each entity as ovals, and the relationships between entities as diamonds / rhombus. A visual representation of the database design REMEMBER: An ERD conceptual design must match the business requirements (that is logical and makes sense), and different ERD designs can match different business needs and the same data can be modelled using different ERD designs depending on the business rules and applications! 4-9 ICA I - GUC - Dr. Ayman Alserafi Conceptual Data Modeling A database can be modeled as: a collection of entities, relationship among entities. An entity is an object that exists and is distinguishable from other objects. Example: specific person, company, event, plant All the instances / objects of this entity type share the same properties (attributes) Entities have attributes Attributes describe properties of the entity / object Example: people have names and addresses 4-10 ICA I - GUC - Dr. Ayman Alserafi Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) Entity types Entity types are used to represent a set of real-world objects of interest to an application. Regular Employees, Orders, and Customers are examples of entity types. A specific object belonging to an entity type is called an Weak entity or an instance (e.g., employee John Smith) o Tangible: exist physically like: Person, Employee, Car, Product, Associative etc. o Intangible: exist only logically with no physical existence like: Bank Account, Sales Transaction, etc. 28/09/24 11 Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) Entity types Regular: normal entity that exists independent of any other entity Regular Weak: an entity that can only exist if another regular entity Weak exists Associative: an entity that is only conceptually linking 2 other Associative entities in a relationship, therefore exists only if 2 other regular entities take place in a many-to-many relationship. 28/09/24 12 Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) Example of ERD diagram Northwind conceptual Model Entity (objects) as boxes Attribute (fields) as circles Relationship between entities as diamonds 28/09/24 13 Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) Crow’s Foot notation and cardinalities (NOTATION USED IN COURSE!) 28/09/24 14 Information & communication Architecture I German University Faculty of Management We canTechnology underline(MNGT) BI the attribute (field) that is the primary key. An entity can have many attributes TOGETHER forming the in Cairo - GUC Department of Business Informatics (BI) primary key, underlining many attributes (e.g. EmployeeID can be replaced with FirstName and LastName BOTH as Crow’s foot notations primary key but both together must be unique and not repeating between employee instances in the database! Northwind conceptual Model Dir: 2 Dir: 1 E.g., Direction 1: A Supplier supplies 0 or many products and Direction 2: a product can have one and only one supplier (i.e., one and only 1 supplier) 28/09/24 16 Attributes on Relationships Sometimes it is useful to attach an attribute to a relationship. Think of this attribute as a property of tuples / instances in the relationship. 4-17 ICA I - GUC - Dr. Ayman Alserafi Example: Attribute on Relationship Cafe Sells Product price Price is a function of both the cafe and the product, not of one alone. 4-18 ICA I - GUC - Dr. Ayman Alserafi Alternative Diagrams Without Attributes on Relationships Create an entity representing values of the attribute. Make that entity participate in the relationship. Not the best way to do it if the attribute describes the relationship rather than a separate entity! 4-19 ICA I - GUC - Dr. Ayman Alserafi BAD Example: Removing an Attribute from a Relationship Cafe Sells Product Prices price 4-20 ICA I - GUC - Dr. Ayman Alserafi Conceptual E-R Design Decisions The use of an attribute or entity to represent an object. Whether a real-world concept is best expressed by an entity (associative?) or a relationship. The use of a strong or weak entity. Etc … The same business problem can be modelled in different ways (using different ERD diagrams). All of them can be correct if they satisfy the business requirements! Yet, one might be preferred over the other for performance reason (e.g. reduce redundancy), understandability, or future flexibility of the model! 4-23 ICA I - GUC - Dr. Ayman Alserafi Design Decision Hints 1. Avoid redundancy. Don’t repeat the same information with more than one entity / attribute 2. Limit the use of weak entity sets. Not very important in this course and in many practical situations! 3. Don’t use an entity when an attribute will do. 4-24 ICA I - GUC - Dr. Ayman Alserafi Avoiding Redundancy Redundancy = saying the same thing in two (or more) different ways. Information should appear only once in the ERD diagram and not be repeated in many places! Wastes space and (more importantly) encourages inconsistency. Two representations of the same fact become inconsistent if we change one and forget to change the other. 4-25 ICA I - GUC - Dr. Ayman Alserafi Example: Good name name addr Product ManfBy Manfs This design gives the address of each manufacturer exactly once. 4-26 ICA I - GUC - Dr. Ayman Alserafi Example: Bad name name addr Product ManfBy Manfs manf This design states the manufacturer of a product twice: as an attribute and as a related entity. 4-27 ICA I - GUC - Dr. Ayman Alserafi Example: Bad name manf manfAddr Product This design is bad because: 1. Repeats the manufacturer’s address once for each product (redundant) 2. Loses the address if there are temporarily no product for a manufacturer. 3. There can also be inconsistency of the same manufacturer address by storing different addresses in different product instances! Therefore, manufacturer should clearly be made as an independent / separate entity ICA I - GUC - Dr. Ayman Alserafi 4-28 Entity Versus Attributes An entity should be created if it satisfies at least one of the following conditions: It is more than the name of something; it has at least one non-key attribute. or It is the “many” in a many-one or many-many relationship. 4-29 ICA I - GUC - Dr. Ayman Alserafi Example: Good name name addr Product ManfBy Manfs Manfs deserves to be an entity because of the nonkey attribute addr. Product deserves to be an entity because it is the “many” of the many-one relationship ManfBy. 4-30 ICA I - GUC - Dr. Ayman Alserafi Example: Good name manf Product There is no need to make the manufacturer an entity, because we record nothing about manufacturers besides their name. 4-31 ICA I - GUC - Dr. Ayman Alserafi Example: Bad name name Product ManfBy Manfs Since the manufacturer is nothing but a name, and is not at the “many” end of any relationship, it should not be a separate entity. 4-32 ICA I - GUC - Dr. Ayman Alserafi Step 2: Conceptual Design List the main entities Movie Rating Actor Movie Viewer Location Cinema 4-34 ICA I - GUC - Dr. Ayman Alserafi Step 2: Conceptual Design List the main entities and relationships between them Movie Rating Makes HasA ActsIn Actor Movie Viewer ShownIn Location IsAt Cinema 4-35 ICA I - GUC - Dr. Ayman Alserafi Step 2: Conceptual Design Add the cardinalities Movie Rating Makes HasA ActsIn Actor Movie Viewer ShownIn Location IsAt Cinema 4-36 ICA I - GUC - Dr. Ayman Alserafi Step 2: Conceptual Design Add the Attributes Expiry Rating Creation Date Date Country of Birth First Name Movie Rating Salary Surname Makes Name Birthdate HasA ActsIn Production Year Actor Movie Duration Viewer Start End Genre Date Date Latitude Name Username Password ShownIn Date Longitude Location IsAt Cinema Time Zip Code Name 4-37 ICA I - GUC - Dr. Ayman Alserafi Outline 1. Databases: an introduction 2. Database Design 3. Conceptual Database Design: Entity – Relationship Model 4. Logical Database Design: The Relational Model 5. The seven rules to map an ER schema into a relational one (database schema) 6. Physical Database Model 7. A Query Language for Relational DBMSs (SQL) 8. Conclusion Q&A 4-38 ICA I - GUC - Dr. Ayman Alserafi Step 3: Logical design Aims at translating the conceptual representation of the database obtained in the previous phase into the relational model common to several DBMSs. Tables + Columns + Primary Keys + Foreign Keys + Relationships Currently, the most common logical model is the relational model. To ensure an adequate logical representation, we specify a set of suitable mapping rules that transform the constructs in the conceptual model to appropriate structures of the logical model. 4-39 ICA I - GUC - Dr. Ayman Alserafi Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) The relational Model Stores entities as tables Stores attributes as column fields Links entities having a relationship 28/09/24 40 The seven rules to map an ER schema into a relational one Rule 1: A regular strong entity type E is mapped to a table T containing the simple monovalued attributes and the simple components of the monovalued complex attributes of E. The identifier of E defines the primary key of T. For example, the strong entity type Employees is mapped to the table Employess as shown in the figure with key EmployeeID. T also defines non-null constraints for the mandatory attributes. Note that additional attributes will be added to this table by subsequent rules. ICA I - GUC - Dr. Ayman Alserafi 4-41 Step 3: Logical Design Rule 1 4-42 ICA I - GUC - Dr. Ayman Alserafi NOT VERY IMPORTANT FOR THIS COURSE! Rule 2: Let us consider a weak entity type W, with owner (strong) entity type O. Assume Wid is the partial identifier of W, and Oid is the identifier of O. W is mapped in the same way as a strong entity type, that is, to a table T. In this case, T must also include Oid as an attribute, with a referential integrity constraint to attribute O.Oid. Moreover, the identifier of T is the union of Wid and Oid. 4-43 ICA I - GUC - Dr. Ayman Alserafi Step 3: Logical Design Rule 2 4-44 ICA I - GUC - Dr. Ayman Alserafi A foreign key is the primary key attribute(s) of an entity (table) referenced inside another entity (table) that is linked to it in a relationship, e.g., StudentID in “Contact Info table” Rule 3: A regular binary one-to-one relationship type R between two entity types E1 and E2, which are mapped, respectively, to tables T1 and T2 is mapped embedding the identifier of T1 in T2 as a foreign key. In addition, the simple monovalued attributes and the simple components of the monovalued complex attributes of R are also included in T2. This table also defines non-null constraints for the mandatory attributes. 4-45 ICA I - GUC - Dr. Ayman Alserafi Step 3: Logical Design Rule 3 4-46 ICA I - GUC - Dr. Ayman Alserafi Rule 4: Consider a regular binary one-to-many relationship type R relating entity types E1 and E2, where T1 and T2 are the tables resulting from the mapping of these entities. R is mapped embedding the key of T2 in table T1 as a foreign key. In addition, the simple monovalued attributes and the simple components of the monovalued complex attributes of R are included in T1, defining the corresponding non-null constraints for the mandatory attributes. SupplierID 4-47 ICA I - GUC - Dr. Ayman Alserafi Step 3: Logical Design Rule 4 4-48 ICA I - GUC - Dr. Ayman Alserafi Rule 5: Consider a regular binary many-to-many relationship type R between entity types E1 and E2, such that T1 and T2 are the tables resulting from the mapping of the former entities. R is mapped to a table T containing the keys of T1 and T2, as foreign keys. The key of T is the union of these keys. Alternatively, the relationship identifier, if any, may define the key of the table. T also contains the simple monovalued attributes and the simple components of the monovalued complex attributes of R and also defines non-null constraints for the mandatory attributes. 4-49 ICA I - GUC - Dr. Ayman Alserafi Example 4-50 ICA I - GUC - Dr. Ayman Alserafi Step 3: Logical Design Rule 5 4-51 ICA I - GUC - Dr. Ayman Alserafi A multivalued attribute (with 2 round circles) is a special attribute type. It means that the attribute can exist many times for the same entity object, e.g., an Employee can have many skills like 1. making telephone calls, 2. presentation skill, 3. using an Rule 6: A multivalued ERP system function, … etc. attribute of an entity or relationship type E is mapped to a table T , which also includes the identifier of the entity or relationship type. A referential integrity constraint relates this identifier to the table associated with E. The primary key of T is composed of all of its attributes. 4-52 ICA I - GUC - Dr. Ayman Alserafi Step 3: Logical Design Rule 6 4-53 ICA I - GUC - Dr. Ayman Alserafi Outline 1. Databases: an introduction 2. Database Design 3. Conceptual Database Design: Entity – Relationship Model 4. Logical Database Design: The Relational Model 5. The seven rules to map an ER schema into a relational one (database schema) 6. Physical Database Design 7. Conclusion Q&A 4-54 ICA I - GUC - Dr. Ayman Alserafi Physical design Aims at customizing the logical representation of the database obtained in the previous phase to a physical model targeted to a particular DBMS platform. Specific data types of each column defined according to DBMS supported datatypes The objective of physical database design is to specify how database records are stored, accessed, and related in order to ensure adequate performance of a database application. Mainly focus on consuming less storage space while meeting business requirements! 4-55 ICA I - GUC - Dr. Ayman Alserafi Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) Physical Database Design 28/09/24 56 Step 4: Physical Design 4-57 ICA I - GUC - Dr. Ayman Alserafi Outline 1. Databases: an introduction 2. Database Design 3. Conceptual Database Design: Entity – Relationship Model 4. Logical Database Design: The Relational Model 5. The seven rules to map an ER schema into a relational one (database schema) 6. Physical Database Design 7. A Query Language for Relational DBMSs (SQL) 8. Conclusion Q&A 4-58 ICA I - GUC - Dr. Ayman Alserafi Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) A Query Language for Relational DBMSs SQL (structured query language) is the most common language for creating, manipulating, and retrieving data from relational DBMSs. SQL is composed of several sublanguages. o The data definition language (DDL) is used to define the schema of a database (CREATE / ALTER / DROP tables). o The data manipulation language (DML) is used to modify the content of a database (i.e., to insert, update, and delete data). o The data query language (DQL) is used to query the database to retrieve desired records (SELECT data records). Can learn more at: https://www.w3schools.com/sql/ 28/09/24 59 Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) DDL: CREATE TABLE, DROP TABLE & ALTER TABLE The basic DDL statement is CREATE TABLE, which creates a relation and defines” o data types of the attributes, o the primary and foreign keys, o and the constraints. SQL provides a DROP TABLE statement for deleting a table and an ALTER TABLE statement for modifying the structure of a table. 28/09/24 60 Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) SQL DDL: CREATE TABLE - Example CREATE TABLE Movie_Rating ( Movie_Name CHARACTER VARYING (100) NOT NULL PRIMARY KEY, Movie_Production_Year DATE NOT NULL PRIMARY KEY, Viewer_Username CHARACTER VARYING (50) NO NULL PRIMARY KEY, Rating INTEGER NOT NULL, Creation_Date DATE NOT NULL, Expiry_Date DATE, FOREIGN KEY (Movie_Name) REFERENCES Movie(Name), FOREIGN KEY (Movie_Production_Year) REFERENCES Movie(Production_Year), FOREIGN KEY (Viewer_Username) REFERENCES Viewer(Username), CHECK (Creation_Date < Expiry_Date) ); 28/09/24 61 Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) SQL: The DML The DML part of SQL is used to insert, update, and delete tuples from the database tables. For example, the following INSERT statement INSERT INTO Movie (Name, Production_Year, Duration) VALUES (‘Men In Black', 1997, 98) ; Example for update: This tuple is modified by the following UPDATE statement: UPDATE Movie_Genre SET Genre=‘Sci-Fi' WHERE Genre=‘Science Fiction' AND Movie_Name = ‘Men In Black’ AND Movie_Production_Year = 1997 ; Finally, a record is removed in the following DELETE statement DELETE FROM Viewer WHERE Username=‘TheRealJunky997' 28/09/24 62 Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) SELECT: The DQL SQL also provides statements for retrieving data from the database. The basic structure of an SQL expression is: SELECT ⟨ list of attributes ⟩ FROM ⟨ list of tables ⟩ WHERE ⟨ condition ⟩ o ⟨ list of attributes ⟩ indicates the attribute names whose values are to be retrieved by the query, o ⟨ list of tables ⟩ is a list of the relation names that will be included in the query, o and ⟨ condition ⟩ is a Boolean expression that must be satisfied by the tuples in the result. The semantics of an SQL expression E.g., SELECT R.A, S.B FROM R, S WHERE R.B = S.A 28/09/24 63 Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) Examples SELECT AVG(Rating) FROM Movie_Rating WHERE Movie_Name LIKE 'Men in Black' AND CreationDate >= '2021-10-01' AND ExpiryDate IS NULL ; 28/09/24 64 Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) JOIN: From multiple tables (alternative to multiple tables in From clause) SELECT Name, Genre FROM Movie M, Movie_Genre G WHERE M.name = G.Movie_name AND M.Production_Year = G.Movie_Production_Year ; An alternative formulation of this query is as follows: SELECT Name, Genre FROM Movie M JOIN Movie_Genre G ON M.name = G.Movie_name AND M.Production_Year = G.Movie_Production_Year ; 28/09/24 65 Outline 1. Databases: an introduction 2. Database Design 3. Conceptual Database Design: Entity – Relationship Model 4. Logical Database Design: The Relational Model 5. The seven rules to map an ER schema into a relational one (database schema) 6. Physical Database Design 7. Conclusion Q&A 4-66 ICA I - GUC - Dr. Ayman Alserafi Conclusion Database Design Fundamentals Conceptual design: ERD Logical design: Relational Model Physical design: Physical DB schema definition A Query Language for Relational DBMSs (SQL) Information & communication Architecture I German University Faculty of Management Technology (MNGT) BI in Cairo - GUC Department of Business Informatics (BI) Source: Data-Centric Systems and Applications - Data Warehouse Systems - Design and Implementation Chapter 2 Database Concepts Download: https://link.springer.com/book/10. 1007%2F978-3-642-54655-6 Alejandro Vaisman Esteban Zimanyi 28/09/24 68 THANK YOU FOR YOUR ATTENTION NEXT WEEK: Data warehouse Modelling: Design TUTORIAL: Multidimensional Modelling