Introduction to Analytics 2023-2024 PDF

Summary

This document provides an introduction to analytics for the 2023-2024 academic year. It covers core concepts, sources, and types of data, along with the importance of data and analytics. The document outlines the learning objectives, agenda, and learning principles related to the course. It also touches upon the various aspects to consider when studying data analysis.

Full Transcript

INTRODUCTION TO ANALYTICS 2023 – 2024 LESSON 1. CORE ANALYTICS CONCEPTS. SOURCES & TYPES OF DATA Learning Objectives Understand core concepts in data domain Distinguish data, information and knowledge Discuss sources of data and how it is used Identify structured and unstructu...

INTRODUCTION TO ANALYTICS 2023 – 2024 LESSON 1. CORE ANALYTICS CONCEPTS. SOURCES & TYPES OF DATA Learning Objectives Understand core concepts in data domain Distinguish data, information and knowledge Discuss sources of data and how it is used Identify structured and unstructured data Explain characteristics of data Define big data Agenda 1. Learning principles 2. Icebreaker 3. Why study data? Why is data important? 4. What is data, information and knowledge 5. Key concepts 6. Types and sources of data 7. Big data Learning Principles 1. There is more than one way to define a concept – use of multiple sources 2. Different modes of learning – use words, pictures, discussions and independent research 3. Examples reinforce theory – your own examples are most valuable 4. Co-operation and group discussions encourage sharing 5. This is an emerging discipline – the field is changing at high speed 6. Explore the links used in class materials for extra reading material (optional) Let’s break some ice WHY IS DATA AND ANALYTICS IMPORTANT? Module 1 Data is…. A set of values of qualitative or quantitative variables Facts and statistics collected about one or more persons together for reference or or objects analysis Things known or assumed as facts, making the basis of reasoning or calculation Information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be information in digital form that can be meaningful transmitted or processed Source: Wikipedia, Merriam-Webster Data is…. Scott Mackey, https://www.adlibsoftware.com/blog/authors/scott-mackey.aspx https://www.ceotodaymagazine.com/2018/04/is-data-the-new-gold/ The value of data and analytics Study and better understand needs and motivations of: customers, competitors, partners, employees, stockholders Create knowledge Reduce uncertainty Improve business processes Increase automation Find weak spots, resource drains and problem root causes Exploit competitive advantage Make just-in-time decisions Satisfy regulatory and audit requirements DATA, INFORMATION, AND KNOWLEDGE Module 1 Aren’t these interchangeable? Knowledge Information Data How do we use data? The temperature is 20 degrees. How should you dress? Putting data into context 20 Celsius 20 Fahrenheit= -6 Celsius 20 Kelvin = -253 Celsius Raw data is… Unsuitable Inconsistent Unformatted Outdated Has errors Incomplete Has too much volume From data to knowledge I see a lion Electrical Recognizing Light rays impulses a lion source: www.theeyewearboutique.co.za From data to knowledge I see a lion Electrical Recognizing Light rays impulses a lion DATA INFORMATION KNOWLEDGE source: www.theeyewearboutique.co.za https://www.youtube.com/watch?v=i3_n3Ibfn1c Raw data requires… - integration - design - architecting - modelling …to become useful information Analyzed information; insights drawn from the analysis of the information; understanding the information; ability to make predictions Knowledge Generalized | Understanding | Insights Data captured, processed and categorised so that is can be stored, consumed and shared Information Organized | Structured | Processed Objective facts, symbols and Data observations of the real world Raw | Random | Unorganized Data vs Information vs Knowledge Data Objective facts of the world around us; Facts and observations without Facts that exist whether we observe interpretation; can be perceived with them or not our senses Information Information is created by humans that Makes data relevant to a particular observe and collect data with a context; purpose; Data processed to become useful; Stored and shared via human created Answers questions What, When, Where, media (clay, paper, silicon chips) Which Knowledge Insights gained from analysis of Application of information to achieve information to: results and make decisions; Gain experience; Understand How and Why Make connections; Compare information; Make predictions Later in the course Module 2: Data life How is data collected, stored and managed cycle Module 3: Types of How different types of analytics reflect stages of analytics knowledge: From What happened? To Why did it happen? To What will happen? Module 4: Analytics How does data becomes information through data life cycle architecting, modelling, integration and design How do we extract knowledge from information by generating analytical insights KEY CONCEPTS Module 1 Data Management: Encompasses the policies, procedures, and standards used to design and manage the information of the enterprise, to meet the data consumption requirements of all applications and business processes. https://managementmania.com/en/three-tier-architecture Sources: Textbook Chapter 5; Gartner IT Glossary Data Science: a broad field that refers to the collective processes, theories, concepts, tools and technologies that enable the review, analysis and extraction of valuable knowledge and information from raw data. https://www.edureka.co/blog/what-is-data-science/ https://www.techopedia.com/ Data Mining: the exploration and analysis of large datasets to discover meaningful patterns and rules using data science principles Reporting: The process of organizing data into informational summaries in order to monitor how different areas of a business are performing https://www.annualreports.com/HostedData/AnnualReportArchive/a/NASDAQ_AMZN_2018.pdf Analytics: The examination of information to uncover insights that give a businessperson the knowledge to make informed decisions. Source: Textbook Chapter 1 Insights: Knowledge, experience, and predictions gained from analysis of information to: Make connections; Compare information; Make decisions and change actions https://www.adp.com/spark/articles/2018/11/from-data-to-insight-overview.aspx SOURCES AND TYPES OF DATA Module 1 Sources of data External Internal Open- Third-party source data Types of data What data about yourself can be found on social media? Structured vs Unstructured Data Structured Unstructured a.k.a “traditional”, “quantitative”, a.k.a “non-traditional”, “qualitative” “transactional” Free form, unorganized, variable numbers, symbols and categories nature of data well-defined format lack of structure or very complex easy to search and organize structure easily stored in table format stored in data lakes or NoSQL (spreadsheets, relational solutions, cannot be stored in table databases) format Usually managed using Structured Processed by machine learning and Query Language (SQL ) artificial intelligence (AI) tools https://lawtomated.com/structured-data-vs-unstructured-data-what-are-they-and-why-care/ What is Semi-Structured? Like Structured Like Unstructured Contains some structured data - Properties and tags are combined with properties and tags that allow for unstructured data partial categorization Example: digital camera image Date: Examples: XML, RSS feeds, JSON Time: (JavaScript Object Notation) Longitude: Latitude: Aperture: Resolution: Structured/Semi- Structured/Unstructured Examples Semi- Structured Unstructured Structured Text Numbers Social media Email Categories Satellite images XML files Codes Presentations JSON messages Dates PDFs Digital photo files Character strings Audio recordings Accessible PDFs Binary (True/False) Video Website content Structured/Semi- Structured/Unstructured Examples Semi- Structured Unstructured Structured Text Numbers Social media XML files Categories Satellite images Email Codes Presentations JSON messages Dates PDFs Digital photo files Character strings Audio recordings Accessible PDFs Binary (True/False) Video Website content Rectangular datasets (spreadsheets, database Next lesson tables) Rectangular Dataset Table ~ dataset ~ data array ~ tabular set ~ rectangular dataset Row ~ record ~ entity ~ case~ instance ~ observation Column ~ field ~ attribute ~ variable ~ feature ~ data element Types of Data Quantitative Data Qualitative Data (Numerical) (Categorical) Observations that can be Observations that cannot measured and expressed as be measured but can be numbers described or categorised Continuous (Analog): Discrete: Nominal: Ordinal: Measurements on a scale; Can be counted; possible Limited set of values with no Limited set of values with a infinite values in any interval; values can be listed; values meaningful order or ranking: meaningful order: any value can be cannot be meaningfully White, Blue, Yellow Low, Medium, High subdivided into finer divided increments 0, 1,2,3, 4… 0.0001, 124.569, 2*105 Boolean (Binary): Nominal with two mutually exclusive categories True/False; Yes/No Quantitative Data Type Description Examples Integer Whole numbers. 1, 10, 20000, -345, 0 The range of numbers is defined by the length of the data field Decimal Decimal numbers. 0.1234; -908.00001; 2000.00; The range is defined by a combination of: 0.00 precision (total number of digits) scale (number of digits to the right of the decimal point) Floating Decimal numbers with a floating point. 3.56E6 = 3.56 * 10^6 (Engineering Defined as a mantissa (significand) plus an -5.701E-12 = -5.701 * 10^-12 notation) exponent (order of magnitude, power of 10) Date Short and long dates Mar/05/2020; 2019-01-31 Time Time; may be combined with the date 20:03:00; 11:54; 23:59:59 Qualitative Data Type Description Examples Boolean Categorical with only two possible 0/1, T/F, Y/N (a.k.a. Binary) values Character One character A, x, N, b String Sequence of characters of limited “This is a string” length Code / Categories or codes where each value 01/02/03; A/B/C; CAD/USD/EUR Category has a discrete meaning Data Types Examples What Example Data type? Phone number 4166548899 Hexadecimal number 1F5D Pain level From 1 (No pain) to 10 (The worst pain) Clothes sizes S/M/L/XL Shoe sizes https://canadianfootwear.com/size-fit-guide Later in the course Module 5: Data How can structured data be modelled modelling Module 6: Business More on data sources of structured and unstructured data; Intelligence Approaches to storing, processing and transforming Architecture structured and unstructured data BIG DATA Module 1 What is big data Big data: an accumulation of data that is too large and complex for processing by traditional database management tools. Characteristics of big data: Enormously large volumes of data are being generated, in particular by Volume internet-connected devices. Increased demand for current and real-time data. More data is used for Velocity reporting and analytics as soon as it is generated. Continued expansion of the sources of data in a large variety of formats, in Variety particular unstructured data that requires new methods of processing. Veracity Variability in the accuracy, quality and trustworthiness of data generated from a wide range of sources Value The potential value from using big data to support business goals and objectives Sources: Merriam-Webster; Textbook Chapter 1 Sources of big data https://www.smartdatacollective.com/big-data-20-free-big-data-sources-everyone-should-know/ Big vs. Small Data Small Data Big Data Objectives Specific, pre-defined Broad or undefined Structure Structured Semi-structured or unstructured Volumes Small to medium (Gigabytes to Huge (petabytes +) terabytes) Storage One computer or server Multiple servers, cloud Sources Traditional (enterprise systems) Include non-traditional sources (social media, Internet of Things, media) Velocity Near real-time or batch Often real time Usage Business intelligence and reporting Advanced and predictive analytics Analysis Easy to analyze and visualize, analysis Difficult to get the information and can be done manually or with analyze, requires sophisticated and traditional tools e.g. SQL specialized tools Data Storage Units http://www.itutility.net/data-measurement-abbreviations-refresher/ How big is a Terabyte? 1 byte A single character (e.g. ASCII code for ‘B’ is 01000010) 1 Kilobyte A very short story 5 Megabytes The complete works of Shakespeare, a high-resolution photo or 30 seconds of high-quality video 1 Gigabyte A symphony in high-fidelity sound or 10 meters of shelved books 1 Terabyte 50,000 trees made into paper and printed 200 Petabytes All printed materials ever 5 Exabytes All words ever spoken by human beings https://www.zmescience.com/science/how-big-data-can-get/ Some Big Data Stats 1. Approximately 80-90% of the data produced in the present time is in an unstructured format. 2. The combined volume of global data is projected to increase from 33 zettabytes in 2023 to 175 zettabytes by 2025. 3. The ratio between original and duplicated data will be 1:10 by 2024. 4. 95% of businesses sees unstructured data as a problem for their business. 5. According to a study by the BCG, the internet is responsible for roughly 2% of world carbon emissions. https://techreport.com/statistics/big-data-statistics/ https://www.sciencefocus.com/science/what-is-the-carbon-footprint-of-the-internet It’s not the size of the data – it’s how we use it Textbook Chapter 1 Figure 1.1

Use Quizgecko on...
Browser
Browser