FYE110: Reasoning with Data Fall 2024 PDF
Document Details
Uploaded by Deleted User
2024
Tags
Summary
This document is an outline for a data analysis course, FYE110: Reasoning with Data, Fall 2024, likely from a university or college. It covers topics such as the role of data, data types (qualitative vs. quantitative), information sources, converting information to data, and data formats.
Full Transcript
Outline 1 Role of Data Statistics and Data Process of Data Science 2 Data Type Qualitative vs Quantitative Level of Measurements 3 Data formation/Development Information Sources Converting information to data Data Fo...
Outline 1 Role of Data Statistics and Data Process of Data Science 2 Data Type Qualitative vs Quantitative Level of Measurements 3 Data formation/Development Information Sources Converting information to data Data Formation Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 2 / 38 Outline Objectives In this chapter, you learn: What are the fundamental concepts of data and its role in decision- making across various fields and industries. What is PPDAC Cycle. What are the data types and its level of measurements. How to use information sources. How to convert information to data for analysis. What are the di!erent data formats. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 3 / 38 Role of Data Table of Contents 1 Role of Data Statistics and Data Process of Data Science 2 Data Type Qualitative vs Quantitative Level of Measurements 3 Data formation/Development Information Sources Converting information to data Data Formation Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 4 / 38 Role of Data Statistics and Data Statistics and Data Statistics is everywhere: on social media, in newspapers, business re- ports, weather forecasts, medical studies, sports updates, surveys, and food labels. Researchers, journalists, students, and others in all disciplines rely on statistics to convey their work to their readers and audience. Statistics and data are used by professionals in many disciplines: Provide Few Examples on Data in the Following Areas A Health Sport Meteorology Education Environment Sustainability Economics Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 5 / 38 Role of Data Statistics and Data Data in The News S&P published a report showing the growth of non-oil businesses in UAE, compared to other regional businesses (The National, October 04, 2023). Figure: Purchasing Managers’ Index since January 2019 Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 6 / 38 Role of Data Statistics and Data What is Statistics All About? Statistics Statistics is the art and science of designing studies and analyzing the data that those studies produce. Its ultimate goal is translating data into knowledge and understanding of the world around us. In short, statistics is the art and science of learning from data.1 Using this definition, the three main components of statistics for answering a statistical question are: 1 Design 2 Describe 3 Inference 1 Agresti, A., Franklin, C., & Bernhard Klingenberg. (2018). Statistics : the art and science of learning from data. Pearson Education Limited. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 7 / 38 Role of Data Statistics and Data Example: Online Habits among UAE students A research study published in the Intercontinental Journal of Social Sciences investigated the internet habits of university students in UAE based on a sample of 600 participants. The study found students in the UAE are spending almost the equivalent of a working day online (The National, June 01, 2024). Figure: Online habits of UAE university students Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 8 / 38 Role of Data Statistics and Data Example: Online Habits among UAE students, con’t In this example, 1 A survey was designed and data was collected from 600 partici- pants. 2 Data was analyzed and showed that: ↭ 84% out of 600 students were online for seven or more hours daily. ↭ The prolonged use of internet applications has a negative e!ect on the academic performance of university students. 3 The study recommends: ↭ Workshops should be arranged within universities to demonstrate to students the harm of using the internet for long periods on their academic achievement. ↭ Students should be introduced to educational and academic web- sites that may be helpful for their studies. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 9 / 38 Role of Data Statistics and Data Activity: Can Technology Solve Global Warming? A survey by duke+mir communications firm ahead of COP28 found that 75% of people in the UAE think that humans will find technology to solve global warming (Sustainability Middle East News, Aug 15, 2023). Read the article and answer the following: 1 What is the objective of this study? 2 How was the data collected? 3 What are the major findings of the study? Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 10 / 38 Role of Data Statistics and Data Types of Statistics Using the distinction between samples and populations, we can now elaborate more about the use of description and inference in statis- tical analyses. There are two very general applications of statistics: descriptive statistics and inferential statistics. The focus of this course is descriptive statistics. Descriptive Statistics and Inferential Statistics Descriptive statistics involve summarizing and organizing the given information, graphically and/or numerically. Inferential statistics refers to methods of making decisions or predictions about a popula- tion, based on data obtained from a sample of that population. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 11 / 38 Role of Data Statistics and Data Definitions: Population and Sample A population is the entire group of individuals about which we want information. A sample is the part of the population we actually observe. A parameter is a numerical description of a population charac- teristic. A statistic is a numerical description of a sample characteristic. Population Sample Statistics Parameters Figure: Population and Sample Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 12 / 38 Role of Data Statistics and Data Individuals and Variables in Data An individual is a person or object that you are interested in finding out information about. A Variable is the measurement or observation of the individual. Notice that who you want to measure is individual, and what you want to measure is the variable. When data in table form, usually variables arranged in columns and individuals are in rows. Figure: Variable and Individual in a dataframe Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 13 / 38 Role of Data Statistics and Data Example: Online Habit Among UAE Students Back to the previous study; we can identify the following: Population – UAE university students. Sample – 600 female and male students who are questioned. Parameter – proportion of UAE university students who are online for more than 7 hours calculated from population. Statistic– 84% which is the proportion of those who are online for more than 7 hours daily calculated from sample. Individual – Any student participated in the survey. Variables – The collected responses to the survey questions, for example age, major, or the responses to the question “How much time do you spend online daily?”. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 14 / 38 Role of Data Statistics and Data Activity: Can Technology Solve Global Warming For the previous “ Do People Think Humans will Find Technology to Solve Global Warming” survey, identify the following: 1 Individual/Subject: 2 Variable: 3 Population: 4 Sample: 5 Parameter: 6 Statistic: Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 15 / 38 Role of Data Statistics and Data Activity 2: Understanding Reading Index Report On August 17, 2024, the Ministry of Culture unveiled the results of 2023 UAE National Reading Index. Of a national sample of around 3900 citizens and residents from across the UAE, 90.4% are using social media sites. Explain what is wrong with each of the following statements: 1 The population is the 3900 citizens and residents contacted by the Ministry of Culture. 2 The sample is the 90.4% who use social media sites. 3 The variable is the 90.4% who use social media sites. 4 The subjects in this survey are the social media sites. 5 The parameter consists of all citizens and residents in UAE. 6 The statistic is the average number of citizens and residents who use social media sites. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 16 / 38 Role of Data Process of Data Science PPDAC Cycle This model shows the process of abstracting and solving a statistical problem to help solve a larger real problem. A knowledge-based so- lution to the real problem requires better understanding of how some things work. The framework that we will be featured frequently in this course, “Problem, Plan, Data, Analysis and Conclusion.” Figure: Cycle of Data Processing Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 17 / 38 Role of Data Process of Data Science PPDAC Cycle The PPDAC cycle comprises five stages 1 Problem: Define the question or issue to be investigated, identi- fying the objectives and scope of the study. 2 Plan: Design the approach to gather the necessary data, consid- ering methods, tools, and procedures. 3 Data: Collect the data according to the plan, ensuring accuracy and relevance. 4 Analysis: Process and analyze the data to uncover patterns, trends, or insights related to the problem. 5 Conclusion: Interpret the analysis results, draw conclusions, and make recommendations or decisions based on the findings. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 18 / 38 Role of Data Process of Data Science The PPDAC cycle comprises five stages Figure: Description of PPDAC Cycle Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 19 / 38 Role of Data Process of Data Science PPDAC Cycle: Height and Arm Span 1 Problem: I wonder if there is a relationship between the height and arm span of students in my class 2 Plan: We are going to record the height of students in my class, with a tape measure to record their height in centimetres. We are going to record the arm span of the students. They will be standing facing a white-board and raising both arms until they are stretched out as far as we can, one of them touching the left of the board and getting a partner to read o! how far I am reaching on the measuring tape. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 20 / 38 Role of Data Process of Data Science Height and Arm Span: Collecting Data 3 Data: Here is the collected data: Height (cm) Arm span (cm) 159 160 163 159 173 177 165 167...... 163 163 180 176 Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 21 / 38 Role of Data Process of Data Science Height and Arm Span: Analysis 4 Analysis: I will graph the data with Excel. Figure: Height vs Arm-span Graph 5 Conclusion: This means that if a student in our class is tall, I would expect them to have a large arm span, and if they are short, I would expect them to have a small arm span. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 22 / 38 Data Type Table of Contents 1 Role of Data Statistics and Data Process of Data Science 2 Data Type Qualitative vs Quantitative Level of Measurements 3 Data formation/Development Information Sources Converting information to data Data Formation Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 23 / 38 Data Type Qualitative vs Quantitative Data Types: Qualitative vs Quantitative Qualitative (Categorial) variables are those variables that take on categories or label values. The categories or labels are mutually exclusive, meaning that an observation cannot be placed in two di!erent categories or given two di!erent labels at the same time. ↭ Examples: Blood types of A, B, AB, and O, Gender (Male or female), education level (High School, Bachelor, Masters, etc.). Quantitative or Numerical variables are those variables that take on quantities and we are able to meaningfully perform arith- metic operations like adding and taking average. ↭ Examples: Number of hours students spend on-line, Height, Num- ber of media accounts, Body temperature (in → C), The ages (in years) of subjects enrolled in a clinical trial. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 24 / 38 Data Type Qualitative vs Quantitative Exercise Classify each of the following variables as either; qualitative or quanti- tative. a Time spent on-line daily b Recycling habit (Never, Rarely, Sometimes, Always) c Favorite social media platform d Height measurement (in cm) e Arm span (in cm) f Quality of medical care at a hospital (low, medium, high) g Satisfaction level about support services (1=Lowest, 5=Highest) h Operating system on a tablet i The temperature in cities throughout UAE j The birth weights of babies Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 25 / 38 Data Type Level of Measurements Classifications of Data Types Among categorical variables, there are generally two sub-types. An ordinal variable is a categorical variable where there is some natural ordering and numbers can be used to represent the order- ing. A nominal variable is a categorial variable where there is no fundamental ordering. Among quantitative variables, there are also generally two sub- types. A discrete quantitative variable is one where there are gaps in the set of possible numbers taken on by the variable. And a continuous quantitative variable is one that can take on all possible numerical values in a given range or interval. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 26 / 38 Data Type Level of Measurements Figure: Data Types and It’s Level of Measurements Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 27 / 38 Data Type Level of Measurements Exercises In UAE National Health Survey Report 2017-2018 the table below is inspired by the results in the report. For each item, identify the type of data (Qualitative or Quantitative), then specify the classification whether it is (Nominal, Ordinal, Continuous, or Discrete). Survey Results Type Classification Age Gender Education Level Body Mass Index (BMI) Number of servings of vegetables per day Intensity of physical exercises(No, Moderate, and intense) Glucose level Health insurance coverage (Governmental, Private, No) Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 28 / 38 Data Type Level of Measurements Independent vs Dependent Variables Independent variables (IV): are those that may be subjected to adjustments, either deliberately or spontaneously, in a study. Dependent variable (DV): are those that are hypothesised to change depending on how the independent variable is adjusted in the study. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 29 / 38 Data formation/Development Table of Contents 1 Role of Data Statistics and Data Process of Data Science 2 Data Type Qualitative vs Quantitative Level of Measurements 3 Data formation/Development Information Sources Converting information to data Data Formation Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 30 / 38 Data formation/Development Information Sources Information Sources Most methods of data collection can be used in both qualitative and quantitative research. The distinction is mainly due to the restrictions imposed on flex- ibility, structure, sequential order, depth and freedom that a re- searcher has in their use during the research process. Major Sources of Information: 1 Primary data:is basically information obtained firsthand by the researcher from original sources for the purpose of the study. 2 Secondary data:is information that has been previously collected by someone else for a di!erent purpose and is utilized by re- searchers for their study. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 31 / 38 Data formation/Development Information Sources Examples of Information Sources Examples of Collecting Primary Data Interviews: Direct, in-depth questioning to gather detailed infor- mation. Observation: Systematic watching and recording of behaviors or events. Questionnaires: Structured sets of questions distributed to gather responses from participants. Examples of Collecting Secondary Data Existing Records and Documents: Such as books, articles, and reports. O!cial Statistics: Such as government and organizational records. Databases: Including data repositories and online databases. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 32 / 38 Data formation/Development Converting information to data Converting information to data Converting information to data means transforming qualitative or un- structured information into a structured format that can be easily an- alyzed, processed, and interpreted by computers. This involves several steps Identification: Determine what information needs to be converted into data. Extraction: Extract relevant details from the information. Structuring: Organize the extracted details into a structured for- mat such as tables, spreadsheets, or databases. Quantification: Where possible, convert the information into nu- merical data. (e.g., rating satisfaction on a scale of 1 to 10). Encoding: Convert the structured and quantified information into a format suitable for computer processing. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 33 / 38 Data formation/Development Converting information to data Examples of converting information to data Text to Data: Social Media Text Analytics involves the extrac- tion of meaningful information from text data across various social media platforms. It goes beyond mere word analysis, examine the text to get insights to make decisions. Image to Data: Recently, farmers are implementing various im- age processing tools to collect real-time data on soil conditions, crop health, and weather patterns. This data, analyzed through image recognition software, converts identifications into labeled data points, enhancing decision-making and resource management in agriculture. Audio to Data: Transcribing spoken words into text and then analyzing the text for specific keywords or patterns. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 34 / 38 Data formation/Development Data Formation Di”erent Types of Data Formats A data format is a specific structure for organizing and encoding infor- mation so that it can be stored, processed, and interpreted by comput- ers. Di!erent types of data formats are used depending on the nature of the data and the requirements of the system handling the data. Figure: Samples of File Formats Used to Store Data Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 35 / 38 Data formation/Development Data Formation Examples of data formats: Text Formats This includes simple text without formatting, tab- ular data as CSV files. JSON which is easy format for humans and machine to parse and generate. Binary Formats Images, Audio, and Videos are all types of binary formats can be process with statistical tools. Document Formats Documents, PDF, and HTML are example of data formats. Database Formats For large data, SQL Databse, or NoSQL database are common ways to store data. Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 36 / 38 Data formation/Development Data Formation Exercise 1 The Household Expenditure and Income survey 2014-2015 indi- cated that the annual average of household income of Emirati families is 866,890 Dhs. Interestingly, 59% of this income is from salaries and benefits. The results are based on a survey of 1,800 families selected from a list of 141,000 families in Dubai. a Describe the population of interest. b Describe the sample that was collected. c Is 866,890 a parameter or a statistic? Explain. d Is 59% a parameter or a statistic? e What type of statistics has been used inferential or descriptive? Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 37 / 38 Data formation/Development Data Formation Exercise 2 Use the PPDAC framework to evaluate whether students’ aca- demic performance can be improved through the use of a mobile application for studying. Consider the following: a Problem: What specific aspect of academic performance are you focusing on (e.g., grades, study habits, retention of information)? b Plan: To investigate the impact of the mobile application on students’ academic performance? What methods will you use (e.g., surveys, performance metrics, experiment)? c Data: What types of data will you collect to measure academic performance and the usage of the mobile application? d Analysis: How will you analyze the data to determine if there is an improvement in academic performance? e Conclusion: What do you expect to find regarding the e!ective- ness of the mobile application in enhancing students’ academic performance? Department of Mathematics and Statistics FYE110: Reasoning with Data Fall 2024 38 / 38