Data Management Course - MSc Data Analytics 2024-2025 Bordeaux PDF
Document Details
Uploaded by Deleted User
KEDGE Business School
Milad Poursoltan
Tags
Summary
This document contains lecture notes from a data management course in the MSc Data analytics for business program at KEDGE Business School. It covers topics such as introduction to data, data and information, data modeling, data governance and several exercises and questions regarding these topics.
Full Transcript
Data Management MSc Data analytics for business 2024-2025 Bordeaux Dr. Milad Poursoltan Dr. Milad POURSOLTAN Scientific Officer Digital Twin Researcher Digital Twin Ph.D. Cyber-Phys...
Data Management MSc Data analytics for business 2024-2025 Bordeaux Dr. Milad Poursoltan Dr. Milad POURSOLTAN Scientific Officer Digital Twin Researcher Digital Twin Ph.D. Cyber-Physical and Human Systems M.Sc. Computer Science M.Sc. Industrial Engineering Bachelor Industrial Engineering Data Management 8 Sessions Witten exam - 50% Individual Practical exam - 40% Group of two Presentation - 10% Group of three Data Management And you? Collect data about your classmates : What is her/his first name? In what field did he/she study? What is his/her favourite job in data science? To what extent does he/she know about database and data management? You only have 30 minutes to do it You should protect your data and you are not allowed to check your data with others Data Management How many students are in the class? How many different jobs are students interested in? What field of study have most students done? What percentage of students have knowledge about data management? What is the most common first name? You only have 10 minutes to do it Data Management Form groups of four Discuss and propose solutions to overcome these challenges E.g., data collection protocols, , defining data quality standards etc. You only 20 minutes to do it Data Management Introduction to data and data management Data Modeling & Relational Database Content Structured Query Language (SQL) MariaDB and MySQL software Emerging technics and methods Data Management Part 1 Data and Data Management Data Management 1.1. Data, information, knowledge, and wisdom Data Management Data represent facts about the world. But ‘facts’ are not always simple. Data has been called the “raw material of information” and information has been called “data in context”. Data Management 1.1. Information has also been defined by information scientists and professionals in other ways as well, such as: - Synonym for data; - Near synonym for fact; - Something one did not know before, news; - Something that changes someone's beliefs or knowledge; - Something that changes someone’s expectations: - Something that changes someone's uncertainty about a particular situation Data Management 1.1. When we apply information to achieve our goals, we turn it into knowledge. Wisdom is knowledge applied in action Data Management 1.1. Example: 50 is numerical data. 50 km/h is the speed limit on the streets. For our safety, we should not exceed the speed limit of 50 km/h on the streets. We drive at a speed of less than 50 km/h on the streets. Data Management 1.1. 1.2 Organization and Data Management Within an organization, it may be helpful to draw a line between information and data for purposes of clear communication about the requirements and expectations of different uses by different stakeholders. An asset is an economic resource, that can be owned or controlled to produce value. Assets can be converted to money. Data is widely recognized as an enterprise asset. Data Management 1.1. 1.2 Organization and Data Management: Data Asset Physical assets can be pointed to, touched, and moved around. They can be in only one place at a time. Data is different. Data is not tangible. The value of data often changes as it ages. Data is easy to copy and transport. Data is not easy to reproduce if it is lost or destroyed. Because it is not consumed when used, it can even be stolen without being gone. Data is dynamic and can be used for multiple purposes. The same data can even be used by multiple people at the same time... Data Management 1.2. 1.3 The data life cycle Collecting Processing Storing and securing Using Sharing Archiving, Reusing, Destroying Data Management 1.2. 1.4 Data management Data management is the practice of collecting, organizing, and accessing data to support productivity, efficiency, and decision-making. Data Management 1.2. 1.5 Data Management Challenges Data Differs from Other Assets Data Valuation Data Quality Data Handling Ethics.... Data Management 1.2. 1.6 Data Management Frameworks DMBOK ( Data Management Body of Knowledge) Strategic alignment model DCAM (Data Management Capability Assessment Model)... Data Management Association (DAMA), The first edition (DAMA-DMBOK): 2015 The second edition: 2017 DAMA also provides a professional data management certification for individuals known as a Certified Data Management Professional (CDMP) Data Management 1.2. 1.7 Data Management Framework Dama international – DMBOK2 2017 Data Management 1.2. 1.4 Data management and data governance Data Governance (DG) is defined as the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets. The purpose of DG is to ensure that data is managed properly according to policies and best practices (Ladley, 2012). Controlling and Executing activities monitoring activities Data Management 1.2. Data Management 1.2. Data management and data governance Data Management 1.2. 1.5.1 Data Architecture Data Management 1.2. 1.5.1 Data Architecture ISO/IEC 42010 Architecture is defined as “the fundamental organization of a system, embodied in its components, their relationships to each other and the environment, and the principles governing its design and evolution.” Enterprise Architecture: Providing a visual blueprint of the organization, and, showing key interrelationships between data, process, applications, technologies and more. Data Architecture: Identifying the data needs of the enterprise (regardless of structure), and designing and maintaining the master blueprints to meet those needs. Data Management 1.2. 1.5.1 Data Architecture... Data Architecture............ Enterprise Architecture Data Management 1.2. 1.5.1 Data Architecture: Data lineage Example Data Management 1.2. 1.5.1 Data Architecture: Data lineage Data Management 1.2. 1.5.1 Data Architecture: Data lineage Data Management 1.2. 1.5.2 Data Modeling and Design Data Management 1.2. 1.4.2 Data modeling Data modeling is the process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model. CDM LDM PDM Data Management 1.2. 1.5.2 Data modeling Data Modeling scheme The six most common schemes used to represent data Data Management 1.2. 1.5.2 Data modeling Relational Dimensional Data Management 1.2. 1.5.2 Data modeling Object-Oriented Object Role Modeling Data Management 1.2. 1.5.2 Data modeling Time-Based - Anchor Modeling NoSQL Data Management 1.2. 1.5.3 Data Storage and Operations Data Management 1.2. 1.5.3 Data Storage and Operations Data Storage and Operations includes the design, implementation, and support of stored data, to maximize its value throughout its lifecycle, Database: Any collection of stored data, regardless of structure or content. Some large databases refer to instances and schema. Data Management 1.2. 1.5.3 Data Storage and Operations Database Architecture Types Data Management 1.2. 1.5.3 Data Storage and Operations 3 basic requirements for distributed architecture (CAP theory) Data Management 1.2. 1.5.3 Data Storage and Operations Data Management 1.2. 1.5.3 Data Storage and Operations 3 basic requirements for distributed architecture (CAP theory) Data Management 1.2. 1.5.4 Data Security Data Management 1.2. 1.5.4 Data Security Data Security includes the planning, development, and execution of security policies and procedures to provide proper authentification, authorization, access, and auditing of data and information assets STAKEHOLDER GOVERNMENT Access Authentification CONCERNS REGULATION NECESSARY LEGITIMATE BUSINESS BUSINESS Audit Autorisation ACCESS NEEDS CONCERNS Data Management 1.2. 1.5.4 Data Security Data Management 1.2. 1.5.4 Data Security Example of requirements: General Data Protection Regulation Data Management 1.2. 1.5.5 Data Integration and Interoperability Data Management 1.2. 1.5.5 Data Integration and Interoperability Data Integration and Interoperability describes processes related to the movement and consolidation of data within and between data stores, applications and organizations Data Integration consolidates data into consistent forms. Data Interoperability is the ability for multiple systems to communicate Data Management 1.2. 1.5.5 Data Integration and Interoperability Data Management 1.2. 1.5.6 Document and Content Management Data Management 1.2. 1.5.6 Document and Content Management Document and Content Management entails controlling the capture, storage, access, and use of data and information stored outside relational databases (Unstructured data) Content refers to the data and information inside the file, document, or website. Document management: storage, inventory, and control of electronic and paper documents. "documents and records“ Content management: the processes, techniques, and technologies for organizing, categorizing, and structuring access to information content, resulting in effective retrieval and reuse. Standard e.g., ISO 9001:2015, GARP... Data Management 1.2. 1.5.6 Document and Content Management Data Management 1.2. 1.5.7 Reference and Master Data Data Management 1.2. 1.5.7 Reference and Master Data Reference and Master Data : Managing shared data to meet organizational goals, reduce risks associated with data redundancy, ensure higher quality, and reduce the costs of data integration. Data Management 1.2. 1.5.7. Reference and Master Data Reference Data is any data used to characterize or classify other data, or to relate data to an organization Reference data Data Management 1.2. 1.5.7 Reference and Master Data Master Data is data about the business entities (e.g., employees, customers, products, financial structures, assets, and locations) that provide context for business transactions and analysis Master Data Data Management 1.2. 1.5.7 Reference and Master Data Data Management 1.2. 1.5.8 Data Warehousing and Business Intelligence Data Management 1.2. 1.5.8 Data Warehousing and Business Intelligence Data Warehousing and Business Intelligence: Planning, implementation, and control processes to provide decision support data and support decision makers in reporting, query, and analysis. A data warehouse is designed to support business intelligence (BI) activities by providing a platform for querying, reporting, and data analysis. Data Management 1.2. 1.5.8. Data Warehousing and Business Intelligence Data Management 1.2. 1.5.8 Data Warehousing and Business Intelligence The term Business Intelligence (BI) has two meanings. First, it refers to a type of data analysis aimed at understanding organizational activities and opportunities. Results of such analysis are used to improve organizational success. Secondly, Business Intelligence refers to a set of technologies that support this kind of data analysis. An evolution of decisions support tools, BI tools enable querying, data mining, statistical analysis, reporting, scenario modeling, data visualization, and dashboarding. Data Management 1.2. 1.5.8 Data Warehousing and Business Intelligence A data warehouse is a subject-oriented, integrated, time-variant, and non volatile collection of data in support of management’s decision-making process. Time variant refers to the fact that the data warehouse essentially stores atime series of periodic snapshots. Subject-oriented implies that the data are organized around subjects suchas customers, products, sales, etc. Non-volatile implies that the data are primarily read-only, and will thus notbe frequently updated or deleted over time. The data warehouse is integrated in the sense that it integrates data from avariety of operational sources and a variety of formats Data Management 1.2. 1.5.8 Data Warehousing and Business Intelligence Data Management 1.2. 1.5.8 Data Warehousing and Business Intelligence Database Transactional processing, storing and retrieving data Data Lake Stores raw data in its native format, regardless of structure or type. Data Warehouse Designed for analytical processing, providing a centralized repository for integrating data from various sources for reporting and analysis.. Data Mart A subset of a data warehouse, focusing on a specific business department or function. Data Management 1.2. 1.4.8 Metadata Management Data Management 1.2. 1.4.8 1.4.8 Metadata Management Metadata is “data about data,” Business Intelligence tools produce various types of Metadata Meta data is a kind of data, and it should be managed as such. Data Management 1.2. 1.4.8 1.4.8 Metadata Management Metadata is typically categorized into three types: descriptive (describes the content of the data), structural or technical (information about the technical details and systems that store the data), and administrative or operational (details about the processing and accessing of data) Example: A digital image of a painting might have the following metadata: Descriptive: Title: "Mona Lisa"; Author: Leonardo da Vinci; Subject: Portrait, Renaissance art. Structural: File Format: JPEG; File Size: 2.5 MB; Dimensions: 780x570 pixels; Resolution: 96 PPI. Administrative: Creator: Leonardo da Vinci; Owner: Louvre Museum; Access Rights: Public; Copyright: Public domain. Data Management 1.2. 1.4.8 1.4.8 Metadata Management Data Management 1.2. 1.4.9 Data quality Data Management 1.2. Data quality The term data quality refers both to the characteristics associated with high quality data and to the processes used to measure or improve the quality of data The quality of data should be managed across the data lifecycle Data quality managemnt is using quality management techniques to data, in order to assure it is fit for consumption and meets the needs of data consumers Formal data quality management is similar to continuous quality management for other products. A Data Quality program should focus on the data most critical to the enterprise and its customers The focus of a Data Quality program should be on preventing data errors Data Management 1.2. Data quality In 2013, DAMA UK produced a white paper describing six core dimensions of data quality: Completeness: The proportion of data stored against the potential for 100%. Uniqueness: Data should be current and up-to-date. Timeliness: The degree to which data represent reality from the required point in time. Validity: Data is valid if it conforms to the syntax (format, type, range) of its definition. Accuracy: The degree to which data correctly describes the ‘real world’ object or event being described. Consistency: The absence of difference, when comparing two or more representations of a thing against a definition. Data Management 1.2. Data quality Data Management 1.2. Data quality Data Management 1.2. Data Management 1.2. Group exercise N.2 Objective: To gain a practical understanding of data governance concepts and the ability to analyze related problems in various scenarios. Data Management Since there is interaction between the knowledge areas, you should now consider the information of the other areas of knowledge and update your knowledge area. For example, some major data entities may be identified in data architecture but not identified in data models and vice versa., or some security or privacy rules may be identified or established in data governance that are not even identified in data security and vice versa. To do this, everyone should find someone from other groups to share information about their knowledge areas. Group exercise N.2 Data Management 1. Data Governance provides direction and oversight for data management Abstract of by establishing a system of decision rights over data that accounts for the needs of the enterprise. Knowledge Areas 2. Data Architecture defines the blueprint for managing data assets by aligning with organizational strategy to establish strategic data requirements and designs to meet these requirements. 3. Data Modeling and Design is the process of discovering, analyzing, representing, and communicating data requirements in a precise form called the data model. 4. Data Storage and Operations includes the design, implementation, and support of stored data to maximize its value. Operations provide support throughout the data lifecycle from planning for to disposal of data. 5. Data Security ensures that data privacy and confidentiality are maintained, that data is not breached, and that data is accessed appropriately. Data Integration and Interoperability includes processes related to the movement and consolidation of data within and between data stores, applications, and organizations. Data Management 7. Document and Content Management includes planning, implementation, and control Abstract of activities used to manage the lifecycle of data and information found in a range of Knowledge unstructured media, especially documents needed to support legal and regulatory compliance requirements. Areas 8. Reference and Master Data includes ongoing reconciliation and maintenance of core critical shared data to enable consistent use across systems of the most accurate, timely, and relevant version of truth about essential business entities. 9. Data Warehousing and Business Intelligence includes the planning, implementation, and control processes to manage decision support data and to enable knowledge workers to get value from data via analysis and reporting. 10. Metadata includes planning, implementation, and control activities to enable access to high quality, integrated Metadata, including definitions, models, data flows, and other information critical to understanding data and the systems through which it is created, maintained, and accessed. 11. Data Quality includes the planning and implementation of quality management techniques to measure, assess, and improve the fitness of data for use within an organization Data Management 1.2. Context diagram of knowledge areas Data Management 1.2. Context diagram Data Management 1.2. Context diagram Data Management 1.2. Context diagram Data Management 1.2. Context diagram Data Management 1.2. Context diagram Data Management 1.2. Context diagram Data Management 1.2. Context diagram Data Management 1.2. Context diagram Data Management 1.2. Context diagram Data Management 1.2. Context diagram Data Management 1.2. Context diagram Data Management 1.2. Job Titles of Data Scientists Data Scientist Chief Actuary of GeoSpatial Analytics and Modeling Director, Business Planning & Analytics Data Analyst Chief Strategy & Analytics Officer Assistant Professor Research & Analytics Director Database Manager Machine Learning Engineer Business Analyst Customer Analytics & Pricing Python Developer Project Coordinator Data Visualization Analyst Analytics Officer Director - Advanced Analytics Assistant Vice President Executive Director Chief Credit & Analytics Officer Research Analyst Director, Big Data Analytics and Segmentation Director, Business Intelligence and Analytics Director of Technology Data Engineer Chief Analytic Officer Chief Analytics & Algorithms Officer Database Administrator Data Learning Engineer Data Architect Strategic Data Analytics Analyst Chief Analytic Officer Statistician Data and Analytics Manager Director of Risk Analytics and Policy AI Product Manager Director, Data Warehousing & Analytics GIS Analyst Information Security Analyst AI Architect Data Visualizers Research Analyst Data Science Director Chief Technology Officer Statistical Modeling and Analytics Data Ecologists Health Analytics Principal Big Data Architect Forensic Data Analytics Director Marketing Analytics Customer Analytics Data Manager Big Data Developer Web Analytics Director, Database Marketing & Analytics Data Developer Risk and Business Analytics Director of Analytics Clinical Analytics Geospatial Data Scientist Reporting/Analytics Big Data Architect R&D Engineer Data Scientist Predictive Analytics Python Data Developer Data Management