ITCT101 Computer Technologies Module 2: Data Science - Lecture Slides PDF
Document Details
![LightHeartedIslamicArt5999](https://quizgecko.com/images/avatars/avatar-10.webp)
Uploaded by LightHeartedIslamicArt5999
Mahidol University
Asst.Prof.Dr. Preecha Tangworakitthaworn
Tags
Summary
These lecture slides cover the introduction to data science, data and information, and data analytics. The document discusses data vs information, data repositories, and the progression of data. This forms part of the ITCT101 Computer Technologies course, module 2.
Full Transcript
Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology ITCT101 Computer Technologies Module 2: AI, ML, and Data Science Introduction to Data Science Asst.Prof.Dr. Preecha Tangworakitthaworn Email: preecha...
Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology ITCT101 Computer Technologies Module 2: AI, ML, and Data Science Introduction to Data Science Asst.Prof.Dr. Preecha Tangworakitthaworn Email: [email protected] Faculty of ICT, Mahidol University Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Content Data and Database Concept Principle of Data Science Data Analytics and Data-Driven Decision making Data Science Life Cycle ITCT101 Computer Technologies Module 2: Principle of Data Science 2 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Introduction to Data Science Module 2: Principle of Data Science 3 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Data vs. Information Data Information Raw facts Produced by processing data Raw data - Not yet been processed to reveal the meaning Require context to reveal the meaning of data Building blocks of information Enables knowledge creation Data management Should be accurate, relevant, and Generation, storage, and retrieval of data timely to enable good decision making Module 2: Principle of Data Science 4 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Data in repositories Data are kept in repositories for: Machine Processable and Understandable format, Searching and retrieving using database language (SQL), Human Understandable format. Practically, we store data in either File System or Database Management System (DBMS), (or both!) Module 2: Principle of Data Science 6 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Progression of DATA WISDOM Applied Knowledge KNOWLEDGE Actionable Information INFORMATION Processed Data DATA Raw Fact Module 2: Principle of Data Science 7 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology DATA Source: https://gmsisuccess.com/progression-of-data-from-data-to-information-to-knowledge-to-insight-to-action/ Module 2: Principle of Data Science 8 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Information Source: https://gmsisuccess.com/progression-of-data-from-data-to-information-to-knowledge-to-insight-to-action/ Module 2: Principle of Data Science 9 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Knowledge Source: https://gmsisuccess.com/progression-of-data-from-data-to-information-to-knowledge-to-insight-to-action/ Module 2: Principle of Data Science 10 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology WISDOM Source: https://gmsisuccess.com/progression-of-data-from-data-to-information-to-knowledge-to-insight-to-action/ Module 2: Principle of Data Science 11 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Data vs. Information Module 2: Principle of Data Science 12 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Nature of Data Object (Database Design Perspective) Data Object Data Data Instance Schema Raw fact or Raw data Skeleton structure of data Structural or Unstructural data Properties or Characteristics of Defined as Data Record data For example, Mr.Somboon Defined as Attribute of data Sae-tae, 6288000, Male. For example, Student_Name, Student_ID, Gender Module 2: Principle of Data Science 13 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Nature of Data Object (Database Implementation Perspective) Data Schema Person Name Born Twitter Alice 4 August 1989 @Alice Data Bob 1 June, 1988 @Bob Instance Module 2: Principle of Data Science 14 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Database Architecture Module 2: Principle of Data Science 15 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Module 2: Principle of Data Science 16 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology From Data to Data Science Data and the capability to extract useful knowledge from data, are determined to be the key factors for data science. WISDOM KNOWLEDGE Knowledge Discovery INFORMATION from data DATA Raw Fact Module 2: Principle of Data Science 17 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Introduction to Data Science Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains (Provost and Fawcett, 2013). Reference: F. Provost, and T. Fawcett. Data Science for Business: What You Need to Know about Data Mining and Data. O Reilly Media, 2013. Module 2: Principle of Data Science 18 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Introduction to Data Science Data Science is a field of study concerned with the collection, cleaning, and anonymizing (if required) large quantities of data of diverse variety relevant for solving real-life problems and analyzing them to initiate meaningful actions (Rajaraman, 2016). Reference: V. Rajaraman (2016), Big Data Analytics, General Article, Resonance, August 2016, pp.695-716. Module 2: Principle of Data Science 19 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Introduction to Data Science Data science involves principles, processes, and techniques for understanding phenomena via the automated analysis of data (Provost and Fawcett, 2013). Fig1. Data science in the context of various data-related processes in organization (Provost and Fawcett, 2013) Reference: F. Provost, and T. Fawcett. Data Science for Business: What You Need to Know about Data Mining and Data. O Reilly Media, 2013. Module 2: Principle of Data Science 20 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Data-Driven Decision Making Data-Driven Decision Making (DDD) refers to the practice of basing decisions on the analysis of data, rather than purely on intuition. For example, a marketer could select advertisements based on long experience in the field and her eye for what will work. Or, she could base her selection on the analysis of data regarding how consumers react to different ads. Module 2: Principle of Data Science 21 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Data Processing and Big Data Data engineering and processing are critical to support data science. Data science needs access to data and it often benefits from sophisticated data engineering that data processing technologies may facilitate data science. Module 2: Principle of Data Science 22 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Big Data Bigdata means datasets that are too large for traditional data processing system, and therefore require new processing technologies (Provost and Fawcett, 2013). Big Data consists of different types of key technologies like Hadoop, HDFS, NoSQL, MapReduce, MongoDB, Cassandra, PIG, HIVE, and HBASE that work together to achieve the end goal like extracting value from data that would be previously considered (Zakir, Seymour, and Berg, 2015). Reference: F. Provost, and T. Fawcett (2013), Data Science for Business: What You Need to Know about Data Mining and Data. O Reilly Media. J.Zakir, T.Seymour, and K.Berg (2015), Big Data Analytics, Issues in Information Systems, Vol.16, Issue11, pp.81-90. Module 2: Principle of Data Science 23 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Big Data http://www.ibmbigdatahub.com/blog/changing-face-business-intelligence Module 2: Principle of Data Science 24 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology What is Data Analytics? Analytics is the discovery and communication of meaningful patterns in data Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance Wikipedia Module 2: Principle of Data Science 25 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Data Analytics Knowledge Module 2: Principle of Data Science 26 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Analyzing Big Data Data analytics is concerned with extraction of actionable knowledge and insights from big data. This is done by hypothesis formulation that is often based on conjectures gathered from experience and discovering correlations among variables. Reference: V. Rajaraman (2016), Big Data Analytics, General Article, Resonance, August 2016, pp.695-716. Module 2: Principle of Data Science 27 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Level of Analytics Prescriptive Analytics “Best” course of action? Predictive Analytics What will happen? Diagnostic Analytics Why did that happen? Descriptive Analytics What happened? Provost and Fawcett, “Data Science for Business” Module 2: Principle of Data Science 28 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Examples of Business Questions Simple (descriptive) Stats “Who are the most profitable customers?” Hypothesis Testing “Is there a difference in value to the company of these customers?” Segmentation/Classification What are the common characteristics of these customers? Prediction Will this new customer become a profitable customer? If so, how profitable? Provost and Fawcett, “Data Science for Business” Module 2: Principle of Data Science 29 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Applying techniques Most business questions are causal: what would happen if? (e.g. I show this ad) But its easier to ask correlational questions, (what happened in this past when I showed this ad). Supervised Learning: Classification and Regression Unsupervised Learning: Clustering and Dimension reduction Note: Unsupervised Learning is often used inside a larger Supervised learning problem. E.g. auto-encoders for image recognition neural nets. Module 2: Principle of Data Science Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Applying techniques Supervised Learning: kNN (k Nearest Neighbors) Naïve Bayes Logistic Regression Support Vector Machines Random Forests Unsupervised Learning: Clustering Factor analysis Latent Dirichlet Allocation Module 2: Principle of Data Science Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Data Analytics Challenges https://bcourse.berkeley.edu/ Module 2: Principle of Data Science 32 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Data Science Life Cycle Ezer, Daphne & Whitaker, Kirstie. (2019). Data science forModule 2: Principle the scientific life cycle.ofeLife Data Science 33 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Data Science Life Cycle K.Vassakis, E.Petrakis, and L.Kopanakis (2017), Big Data Analytics: Applications, Module 2:Prospects andof Principle Challenges, Mobile Big Data, Vol.10, pp.3-20 Data Science 34 Bachelor of Arts and Science ( B.A.Sc.) Major in Creative Technology Challenges in Data Life Cycle K.Vassakis, E.Petrakis, and L.Kopanakis (2017), Big Data Analytics: Applications, Module 2:Prospects andof Principle Challenges, Mobile Big Data, Vol.10, pp.3-20 Data Science 35