FDM - Week 1 - Day 3 - Data Modeling and Design.pdf
Document Details
Full Transcript
Applied A.I. Solutions Foundations of Data Management Professor Daniel Vitaver, B.Sc., EMBA [email protected] 1 DATA MODELING AND DESIGN 2 The DAMA Wheel...
Applied A.I. Solutions Foundations of Data Management Professor Daniel Vitaver, B.Sc., EMBA [email protected] 1 DATA MODELING AND DESIGN 2 The DAMA Wheel Data Data Data Modeling& Architecture Modeling &Design Design Data Quality Data Storage & Operations Data Metadata Governance Data Data Governance Security Data Integration DW & BI & Inter- operability Ref & Doc & Master Content Data Mgmt. 1 Main source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 3 INTRODUCTION Data Models are a critical component of DM and enable organizations to understand its data assets Common schema are relational, dimensional, object-Oriented, NoSQL Models exist at a conceptual, logical, and physical level. Each model contains a set of components: entities, relationships, facts, keys, and attributes 1 Main source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 4 DM&D Framework Definition Data Modeling is the process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the Data Model. 1 Main source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 5 Principles Formalization (data model documents ) Scope Definition (boundaries for data context ) Knowledge retention / documentation 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 6 Goals To confirm and document an understanding of different perspectives, which leads to applications that align with business requirements Business Driver Inputs Activities Deliverables Data architecture Existing data models 1. Plan for Data Modeling Data standards 2. Build the Data Models 1. Data Models Data sets (conceptual, logical, – conceptual Initial data physical) – logical requirements 3. Review the Data Models – physical Original data 4. Manage the Data Model requirements 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 7 Inputs Activities Deliverables Suppliers Participants Consumers Data Architects Business Analysts DBAs, and Developers DBAs, Developers Data Modelers Business Analysts Business Analysts Data Modelers SME Software Developers Data Stewards Data Stewards Metadata Admin Data Quality Analysts Technical Drivers Techniques Tools Metrics Naming conventions Data modeling tools Data model validation Database design Lineage tools measurement Database type selection Metadata repository Data model patterns Industry data models 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 8 Data Models Provide a common vocabulary around data Capture and document explicit knowledge about an organization’s data and systems Serve as primary communication tool during projects Data Models contains 4 main building blocks: o Entities o Relationships o Attributes o Domains 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 9 Entities Entity is a thing that exists “per se”, separate from other things It is a thing about which the organization collects information They are referred to as nouns The entity represents the answer to: o Who o What o When, Where, Why o How 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 10 Relationship It captures the high-level interactions or association between conceptual entities (Chen, 1976) The detailed interactions between logical entities The constraints between physical entities 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 11 Relationship Cardinality It captures how many entity-instances participate in a relationship with how many of the other entity The most common form of notation is Information Engineering (IE) syntax (crow’s foot) Crow’s foot notation 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 12 Attributes An attribute is a property that identifies, describes, or measures an entity. The physical correspondent of an attribute in an entity is a column, field or tag, or node in a table, view, document, graph or file Keys Are a set of one or more attributes that uniquely defined an instance of an entity 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 13 Primary Key It uniquely identifies a record in the relational database table A primary key is used to ensure data in the specific column is unique Only one primary key is allowed in a table It does not allow NULL Values cannot be deleted from the parent table. Examples of primary keys include Unique last name, Social security number, online username 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 14 Secondary Key A secondary key provides a secondary reference point for objects whose primary keys do not adequately distinguish them for reference purposes It is used for identification of rows but not usually unique We can have multiple secondary key per table Allows NULL values. Examples of secondary keys include street address number, phone number, middle name 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 15 Foreign Key A foreign key is used in physical and sometimes logical relational data modeling schemes to represent relationships among entities A foreign key is created implicitly when a relationship is defined between entities, depending on the database technology or the data modeling tools, and whether the two entities have mutual dependencies 16 Domain A complete set of possible values that an attribute can be assigned. Provides a means of standardizing the characteristics of the attributes All values inside a domain are valid values, outside are invalid Attributes should not contain values outside its domain Domain constraints are specific rules that restrict a domain Domains can be defined by data type, format, list, range, rule-based 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 17 Logical Level Entity PK = FK1 + FK2 attribute Student Registration Course Student Number (PK) Student Number (FK1) register Course Code (PK) Student First Name Course Code (FK2) Course Name Student Last Name Registration Date Student Birth Date attended by Business Rules Each student may register to none or many courses Independent Entity Each course may be attended by none or many students Dependent Entity 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 18 Relational Scheme Relational Theory provides a systemic way to organize data so that they reflected their meaning (Codd, 1970) The design objective is to have an exact expression of business data and to have one fact in one place (with no redundancies) It is ideal for the design of operational systems (transactional database) 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 19 Dimensional Scheme Data is structured to optimize the query and analysis Dimensional models focus on a particular business process Geography Country Region Zone Conceptual Level Admissions Calendar Year Semester Name Level School Yes/No Financial Aid 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 20 Normalization Group-study and Self-study 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 21 Normalization Rules to organize business complexity into stable data structures Keeps attributes in only one place to eliminate redundancy, inconsistencies Rules sort attributes according to the PK and FKs 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 22 Pros and Cons Increases data consistency as it avoids the duplicity Helps in grouping related data under the same schema Improves searching faster as indexes can be created faster (OLTP) Delays the retrieving of data as more table joins are needed (performance) Normalization is not a good option in OLAP transactions 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 23 1NF: each entity has a valid PK, every attribute depends on the PK 2NF: each entity has the minimal PK, every attribute depends on the complete PK 3NF: each entity has no hidden PKs; each attribute depends on no attributes outside the whole key 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 24 Boyce / Codd normal form (BCNF): resolves overlapping composite candidate keys 4NF: resolves many-to-many-to-many relationships (and beyond) in pairs until entities cannot be broken down into smaller pieces 5NF: resolves inter-entity dependencies into basic pairs, and all join dependencies use parts of the PK 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 25 Normalization – cont’d 1NF EmployeeNumber LastName FirstName AreaName AreaCity AreaCountry 1111 Andrews Jack Accounting Toronto Canada 1115 Smith Mike Technology Toronto Canada 1220 Jones Harry HR New York USA 1250 Harvey John Admin London UK 1250 Harvey John HR London UK 1 Source: Copyright © 2023 DAMA International – DMBOK2 - Technics Publications, Basking Ridge, New Jersey, USA 26 2NF EmployeeNumber LastName FirstName EmpAreaID EmployeeNumber AreaNumber 1111 Andrews Jack 1 1111 10 1115 Smith Mike 2 1115 20 1220 Jones Harry 3 1220 30 1250 Harvey John 4 1250 40 1250 Harvey John 5 1250 30 Table A Table C AreaNumber AreaName AreaCity AreaCountry 10 Accounting Toronto Canada 20 Technology Toronto Canada Table B 30 HR New York USA 40 Admin London UK 27 3NF Customer Table Transitive dependency CustomerID CustomerZIP 1111 10110 CustomerID CustomerZIP CustomerCity 1115 15000 1111 10110 New York 1115 15000 San Diego CustZIP Table CustomerZIP CustomerCity dependent dependent 10110 New York Customer Table 15000 San Diego 1 Source: Software Testing Help https://www.softwaretestinghelp.com/ 28 29