Introduction To Analytics 2023-2024 PDF
Document Details
Uploaded by UnboundGradient1686
2023
Tags
Summary
This document is a presentation on Introduction to Analytics, providing information about the data life cycle. It covers concepts like data quality, XML, and business intelligence. The document was created for 2023-2024 academic year.
Full Transcript
INTRODUCTION TO ANALYTICS 2023 – 2024 LESSON 2. DATA LIFE CYCLE Learning Objectives Name and understand the phases of the data lifecycle Identify the processes and activities of each phase Recognize DAMA Framework knowledge areas Interpret a simple context diagram Describe how ana...
INTRODUCTION TO ANALYTICS 2023 – 2024 LESSON 2. DATA LIFE CYCLE Learning Objectives Name and understand the phases of the data lifecycle Identify the processes and activities of each phase Recognize DAMA Framework knowledge areas Interpret a simple context diagram Describe how analytics fits into DAMA framework Discuss good and bad data Interpret XML data format Agenda 1. Data lifecycle phases and activities 2. Context diagram example 3. DAMA DMBOK knowledge areas 4. Qualities of good data; five C’s 5. XML data format Does the data have a life cycle? Discuss the article given out as home assignment. What happens to the data? Where does it come from? Where does it go? DATA LIFE CYCLE Module 2 Data Life Cycle Sourcing Storage & Preparation Protection & Usage Sharing Archiving Destruction Data Life Cycle Sourcing Collecting and capturing data values from various sources. A.k.a Data capture/Data acquisition Storage & Storing, maintaining and preparing data for usage. preparation A.k.a. Storage & maintenance Protection & Application of data to the tasks needed to operate the enterprise while usage protecting the data. A.k.a Permitted use of data Sharing Sending data to users or entities that require the data for certain purposes, both inside and outside the enterprise. A.k.a. “publication” Archiving Archiving data that is no longer actively used for a defined retention period. Destruction Removal of every copy of data item from enterprise. A.k.a. Purging / Permanently destroying Data Life Cycle – processes Sourcing Storage & Preparation Protection & Usage Obtain data externally Sharing Create or enter data Receive and capture data signals Archiving Destruction Data Life Cycle – processes Sourcing Storage & Preparation Protection & Usage Sharing Move and store data Cleanse and enrich data Archiving Transform and synthesise data Integrate data from multiple sources Destruction Data Life Cycle – processes Sourcing Storage & Preparation Protection & Usage Sharing Archiving Apply data to enterprise tasks Protect, monitor and audit usage Destruction Search, classify and explore data Model and analyse data Data Life Cycle – processes Sourcing Storage & Preparation Protection & Usage Sharing Archiving Data publication Destruction Visualization Data sharing, moving and copying Delivering data products to customers Data Life Cycle – processes Sourcing Storage & Preparation Protection & Usage Sharing Archiving Destruction Copying data into archive Removing archived data from active environments Data Life Cycle – processes Sourcing Storage & Preparation Protection & Usage Sharing Archiving Destruction Permanently destroying data Data life cycle: group discussions Why do enterprises purge (destroy) data? Later in the course Module 7: Analytics Phases in analytics projects – how do they relate to data life project basics cycle Module 8: Legislative & Permitted uses of data security issues Data protection Module 9: Ethical Ethical sharing of data issues in analytics Data Life Cycle – processes What knowledge and skills are Sourcing needed to manage data Storage & through its lifecycle? Preparation Protection & Usage Sharing Archiving Destruction DMBOK KNOWLEDGE AREAS Module 2 DAMA and DMBOK DAMA International is a not-for-profit, vendor-independent, global association of technical and business professionals dedicated to advancing the concepts and practices of information and data management. DAMA DMBOK ®: Data Management Association (DAMA) Data Management Body of Knowledge https://dama.org/content/body-knowledge DMBOK Data Management Knowledge Areas Data Management is an overarching term that describes the processes used to plan, specify, enable, create, acquire, maintain, use, archive, retrieve, control, and purge data. These processes overlap and interact within each data management knowledge area. DAMA DMBOK Framework Data Governance DMBOK Planning, oversight, and control over management of data and the use Definition of data and data-related resources. Processes & Enforce: activities Consistent definitions Rules Business metrics Policies and procedures on how to use data Reference data Data ownership Data Architecture DMBOK The overall structure of data and data-related resources as an integral Definition part of the enterprise architecture Processes & Define: activities Data needed to meet business needs Data, facts and dimensions Logical data models Enterprise data flows Examine: Completeness and correctness of the source systems needed to obtain data Context Diagram — example Service Customer Customer Customer Self- requests Information Transaction Relationship Service App History Management Request status Customer ¬ifications transactions Customer Order address status Order Management Data Modeling & Design DMBOK Analysis, design, building, testing, and maintenance of data structures Definition Processes & Design and build: activities Conceptual, logical and physical data modeling Master data modeling Modeling and design for different architectures (data warehouse, data lake, cloud data storage etc.) Data Storage & Operations DMBOK Deployment and management of structured physical data assets storage Definition Processes & Manage: activities Building and operating data storage solutions Performance management, back-up and recovery of data assets Monitoring, archiving and purging of data assets Data Security DMBOK Ensuring privacy, confidentiality and appropriate access to data Definition Processes & Define: activities Privacy and security Access management Security governance (monitoring, audit, breach responses) Data protection (encryption) Data Integration & Interoperability DMBOK Acquisition, extraction, transformation, movement, delivery, replication, Definition federation, virtualization and operational support of data assets Processes & Manage: activities Data acquisition and movement Transformation Interoperability and integration Data migration and conversion Documents & Content DMBOK Storing, protecting, indexing, and enabling access to data found in Definition unstructured sources (electronic files and physical records), and making this data available for integration and interoperability with structured (database) data Processes & Govern: activities Content management (classification, tagging, indexing) Managing physical documents Managing electronic records (documents, images, scans, multimedia) Reference & Master Data DMBOK Managing shared data to reduce redundancy and ensure better data Definition quality through standardized definition and use of data values Processes & Govern: activities Establishing and managing systems of record Acquiring or creating systems of reference (business, spatial, market data) Data business rules Data Warehousing & Business Intelligence DMBOK Managing analytical data processing and enabling access to decision Definition support data for reporting and analysis Processes & Govern: activities Data profiling and warehousing Data discovery, searching and querying Operational and analytical reporting Analytics Metadata DMBOK Collecting, categorizing, maintaining, integrating, controlling, managing, Definition and delivering metadata Processes & Manage: activities Business glossary / data dictionary Data classification Describing data: metadata Image credit: John O’Gorman Metadata: information about data Metadata: description of the data as it is created, stored, transformed, accessed and consumed by the enterprise. Business metadata: description of the data from business perspective Business definition Meaning Source of the data Technical metadata: description of the data as it is processed by software tools Format Size Mapping Sources: Textbook Chapter 4 Metadata: information about data Metadata: description of the data as it is created, stored, transformed, accessed and consumed by the enterprise. Business metadata: description of the data from business perspective Business definition Meaning Source of the data Technical metadata: description of the data as it is processed by software tools Format Size Mapping Sources: Textbook Chapter 4 Metadata - example Data Quality DMBOK Defining, monitoring, maintaining data integrity, and improving data Definition quality Processes & Govern: activities Planning data quality Implementing data quality measures Monitoring data quality Business Insights & Analytics: how does it fit in? Sourcing Storage & Preparation Protection & Usage Sharing Archiving Destruction Business Insights & Analytics: how does it fit in? Sourcing Storage & Preparation Protection & Usage Sharing Archiving Destruction GOOD AND BAD DATA Module 2 The five C’s of data Clean data must be accurate, have no missing data points, conform Clean to the format and contain no invalid entries Consistent data must follow the same standard, definitions and use Consistent the same codes and ranges of values to reflect the same meaning Conformed data must be shareable across the same dimensions with Conformed the same business meaning Current data must be as recent as required for business purposes Current Comprehensive data must be sufficient and complete for the purpose Comprehensive that this data is to be used for Sources: Textbook Chapter 1 Can data be bad? Where can bad data come from? Provide an example of bad data from your personal or professional life. https://www.dataquest.io/blog/advanced-data-cleaning-r-course/ XML DATA FORMAT Module 2 Structured/Semi- Structured/Unstructured Examples Semi- Structured Unstructured Structured Text Numbers Social media XML files Categories Satellite images Email Codes Presentations JSON messages Dates PDFs Digital photo files Character strings Audio recordings Accessible PDFs Binary (True/False) Video Website content Rectangular datasets (spreadsheets, database tables) XML Basics XML (eXtensible Markup Language): Text-based format used to share data Markup language – uses tags to describe pieces of data Metalanguage - allows users to define their own markup languages A specification for storing information A specification for describing the structure of that information Has a well-defined structure – must follow a set of rules Example: https://learning-oreilly-com.ezproxy.humber.ca/library/view/xml-visual- quickstart/9780321602589/ch02.html XML example XML Basics by S. Banzal XML structure A root element is required Every XML document must contain one, and only one, root element. This root element contains all the other elements in the document. All data (values) must be enclosed within tags Every piece of data must have a defined place in an XML file within a starting and a closing tag. Closing tag has the same name as starting tag, with ‘/’ in front Tags can have any names, but must describe the content A user can pick any name for a tag however it should describe the element’s purpose and contents. Closing tags are required Every element must have a closing tag. XML structure Tags can have attributes (zero to many) Information contained in an attribute is considered metadata - information about the data in the element, as opposed to the data itself. An element can have as many attributes as desired, as long as each has a unique name. How to train your dragon How to Speak Dragonese 125.00 Indentation It is a good practice to indent child elements relative to parents to make XML documents easier to read and interpret by a human (see examples in the source) Nesting Elements must be properly nested If you start element A, then start element B, you must first close element B before closing element A Root element Child element Toopy Grandchild element Grandchild element Grandchild element Child element Toopy Child element XML syntax XML declaration Should be included at the beginning of each XML file: Case matters XML is case sensitive. Starting and closing tags must use the same capitalization. Tag names Names must begin with a letter, underscore, or colon, and may contain letters, digits, and underscores. Spaces are not allowed. Although valid, it is recommended to avoid including colons, dashes, and periods within your names. Names that begin with the letters xml, in any combination of upper- and lowercase, are not allowed. Tag contents does not require any additional format XML: Visual QuickStart Guide, Second Edition Everything within starting and closing tag is considered the tag content by Kevin Howard Goldberg Published by Peachpit Press, 2008 XML syntax Attribute values must be enclosed in quotation marks An attribute’s value must always be enclosed in either matching single or double quotation marks. No spaces between attribute name and value. White Space You can add extra white space, including line breaks, around the elements in your XML code to make it easier to edit and view. While extra white space is visible in the file and when passed to other applications, it is ignored by the XML processor, Language support Tag and element names do not need to be in English – it can be any language supported by the software used. Comments Comments can be inserted anywhere, enclosed in (double hyphen) Special characters in XML Special character XML replacement Dun & Bradstreet < < Dun & Bradstreet > > & & “ " ' ' XML example – dates Using a date attribute: Using an expanded element: Tove Jani 2008 Hello there 01 10 Using a element: Tove Jani 2008-01-10 Tove Jani https://www.w3schools.com/xml/xml_attributes.asp XML example Yulia Lucy 7 /7 /2005 female Matt 7/12/2002 Preetika 7/7/2007 XML vs JSON example { "student": [ { 01 "id":"01", Tom "name": "Tom", Price "lastname": "Price" }, 02 { Nick "id":"02", Thameson "name": "Nick", "lastname": "Thameson" } ] } JSON vs XML: What’s the Difference?