Data Management: Statistics (2024)

Summary

This document provides a concise explanation of fundamental concepts in Data Management (Statistics). It defines and illustrates different types of data, approaches for organizing and presenting data, and statistical descriptors.

Full Transcript

DATA MANAGEMENT: (Statistics) Definition of Statistics Statistics is used in everyday life, which people do not realize. The science of classification and manipulation of data in order to draw inferences. Statistics is derived from the Latin word "status" meaning state. ◦ Two basic meanings of th...

DATA MANAGEMENT: (Statistics) Definition of Statistics Statistics is used in everyday life, which people do not realize. The science of classification and manipulation of data in order to draw inferences. Statistics is derived from the Latin word "status" meaning state. ◦ Two basic meanings of the word Statistics: 1. It refers to actual numbers derived from the data. 2. It refers as method of analysis. Definition of Statistics Statistics is a collection of quantitative data, such as statistics of crimes, statistics of enrolment, statistics of unemployment. Statistics is also the study of how to collect, organize, analyze, and interpret numerical information from data.  It simplifies mass of data (condensation);  Helps to get concrete information about any problem;  Helps for reliable and objective decision making; Importance of  It presents facts in a precise & definite form; Statistics  It facilitates comparison (Measures of central tendency and measures of dispersion);  It facilitates Predictions (Time series and regression analysis are the most commonly used methods towards prediction.);  It helps in formulation of suitable policies; In Engineering – The engineer samples a product quality characteristics along with various controlled process variable to assist in locating important variable related to product quality. Importance of In Manufacturing – Newly manufactures fuses are sampled before shipping to decide whether Statistics to ship or hold individual lots. Quality Control: Determining techniques for evaluation of quality through adequate sampling, in process control, consumer survey and experimental design in product development etc. Two Kinds of Statistics Inferential Statistics – Deals with making generalizations about a body Descriptive Statistics – Deals with of data where only part of it is the methods of organizing, examined. This comprises those summarizing and presenting a mass methods concerned with the of data so as to yield meaningful analysis of a subset of data leading information. to predictions or inferences about the entire set of data. POPULATION AND SAMPLE VARIABLE oRefers to any characteristics of interest measureable on each and every individual in the population. oOthers also call this to as data Data that is numerical, counted, or compared on a scale Quantitative ◦ Demographic data Variable ◦ Answers to closed-ended survey items ◦ Scores on standardized instruments Classification of Quantitative Variable Discrete quantitative Continuous variable quantitative variable results from either a results from infinitely finite number of many possible values possible values or a that can be associated countable number of with points on a possible values continuous scale Narratives, logs, experience ◦Focus groups ◦Interviews Qualitative ◦Open-ended survey items Variable ◦Diaries and journals ◦Notes from observations ◦Photographs or Video Recordings Levels of Measurement Level 1. Nominal is characterized by data that consists of names, labels, or categories only. Levels of Ex. name civil status Measurement sex religion address degree program Level 2. Ordinal involves data that may arranged in some order, but differences between data values either cannot be determined or are Levels of meaningless. Measurement military rank job position year level Level 3. Interval is like the ordinal level, with the additional property that meaningful amounts of differences Levels of between data can be determined. However, there is no inherent (natural) Measurement zero starting point. IQ score temperature (in 0C) Level 4. Ratio – the interval level modified to include the inherent zero starting point. For values at this level, Levels of differences and ratios are meaningful. Measurement height area weekly allowance Levels of Measurement Textual Method METHODS OF DATA Tabular Method PRESENTATION Graphical Method Textual Method Textual method uses a narrative description of the data gathered. The survey returns showed that 54 percent of the respondents indicated they believe their weight is ideal for their height, and they also think that they are in overall good health. "However, when the respondents gave their actual height and weight, it appeared that only 45% fit the normal category (applying the Asian body mass index range, or BMI). Tabular Method Tabular Method is a systematic arrangement of information into columns and rows. Frequency Distribution Table (FDT) 1. Qualitative FDT 2. Quantitative FDT Tabular Method FDT for Qualitative or Categorical Data - the data are grouped according to some qualitative characteristics / non-numerical categories. TABLE TABLE 2: Frequency Distribution of the Brands of Gas Range Used HEADING CAPTION STUBS / BODY CLASSES Graphical Method Qualities of a Good Graph 1. It is accurate 2. It is clear 3. It is simple 4. It has a good appearance Qualitative Data Quantitative Data 1. Pie Chart 1. Scatter Graph 2. Column or Bar Graph 2. Line Chart 3. Frequency Histogram 4. Relative Frequency Histogram 5. Frequency Polygon 6. Ogives Common Types of Graph Common Types of Graph Common Types of Graph Common Types of Graph Graphical Presentation of the Frequency Distribution Table Graphical Presentation of the Frequency Distribution Table Graphical Presentation of the Frequency Distribution Table MEAN Measure of Central Tendency - any single value that is used to identify the MEDIAN “center” of the data or typical value. MODE MEAN MEDIAN MODE Sum of all observed Defined as the The observed value that occurs values divided by the positional middle most frequently. number of value when The data is said to be unimodal if observations observations are there is only one mode, bimodal if ordered from smallest there are two modes, trimodal if to largest (or vice there are three modes. versa) Quantitative Data Quantitative Data Quantitative & Qualitative Data Most popular Extreme values do May not exist measure of central not affect the median May also not be unique location as strongly as they do Extreme values do not affect the mode. Affected by extreme the mean Not necessarily unique - may have more values Useful when than one value. It is unique - there is comparing sets of When no values repeat in the data set, the only one answer. data mode is every value and is useless. Useful when It is unique - there is When there is more than one mode, it is comparing sets of only one answer. difficult to interpret and/or compare data. Mean Median and Mode of ungrouped data The Arithmetic Mean n x i Formula for getting the mean of ungrouped data: X = i =1 , where n n is the number of observations EXAMPLE#1: MEAN Data: 4 6 5 7 3 4 5 4 EXAMPLE#2: MEAN Data: Scores of 14 students in Math122a Midterm exam 72 83 84 82 72 80 79 80 76 80 85 79 90 91  What is the mean?  What if a 15th student took the Midterm exam just by guessing and got a score of 10?  What happens to the mean? The Median How to get the median of ungrouped data: Arrange the scores in ascending or descending order. If n is odd, the median is the middle score, if n is even the median is the average of the two middlemost score.(n is the number of observations) For values of Xi, for i = 1,2,3, …, n M d = X n +1 For n that is odd 2 For n that is even EXAMPLE#1: MEDIAN Data: 4 6 5 7 3 4 5 4 EXAMPLE#2: MEDIAN Data: 4 6 5 3 4 5 4 The Mode How to find the mode of ungrouped data: ✓ Simply find the score or the value that occurs the most EXAMPLE#1: MODE Data: 4 6 5 7 3 4 5 4 EXAMPLE#2: MODE Data: 72 83 64 82 71 60 79 EXAMPLE#3: MODE Data: Blood Type of 20 patients in UMC A, A, AB, O, O, B, A, O, O, O, A, A, A, B, B, O, B, B, B, AB Examples: 1. A statistics class of 60 students took a quiz. In this class, 18 students scored 4 points, 15 students scored 3 points, 9 students scored 2 points, 12 students scored 1 point, and 6 students scored 0. a. Find the Mean b. Find the Mode c. Find the Median Examples: 2. If the mean of five values is 8.2 and four of the values are 6, 10, 7, and 12. Find the fifth value. 3. The life car of batteries (in years) were obtained by manufacturing company and the following information were taken: 3.2, 1.8, 2.1, 3.5, 4.0, 2.4, 2.7, 3.1, 3.6, and 4.2. What is the median? Examples: 4. The number of buses observed in ten major roads in Metro Manila are 667, 705, 645, 705, 800, 759, 724, 759, 769 and 750. a. Find the mean. b. Find the median. 5. What happens to the median if two observations are included with counts of 150 and 1,500 buses? Will it increase, decrease or will be the same. The Weighted Mean The weighted mean of the n numbers x1 , x2 , x3,...xn With the respective assigned weights is w1 , w2 , w3...wn Weighted mean =  ( x.w ) w Where  ( x.w) is the sum of the products of the number and its assigned weight, and  w is the sum of all the weights. Examples: The table below shows Vincent’s first semester course grades. Use the weighted mean formula to find Vincent’s GPA for the semester. Course Course Course Grade Point grade grade units A 4 MMW A 3 B 3 Calculus B 4 C 2 Chemistry C 3 D 1 P.E. D 2 F 0 Examples Ages of Science Fair Contestants Age Frequency 7 3 Find the mean, the median 8 4 and all modes for the data 9 6 in the given table. 10 15 11 11 12 7 13 1 Measure of PERCENTILE Location - values below which a specified fraction or percentage of the DECILE observations in a given set must fall QUARTILE Absolute Dispersion Measure of - range, variance, standard Dispersion deviation - indicate the extent to which individual items in a series are scattered about an Relative Dispersion average. - Coefficient of variation, standard score 1 Measures of Dispersion Absolute Dispersion Measure of - range, variance, standard 125 Dispersion deviation 100 75 Which of the - indicate the 50 25 distributions of extent to which 0 1 2 3 4 5 6 7 8 9 10 individual items in a scores has theRelative 125Dispersion series are scattered larger dispersion? about an average. 100 - Coefficient of variation, 75 50 standard score 25 0 1 2 3 4 5 6 7 8 9 10 Measures of dispersion Measures of dispersion indicate the extent to which individual items in a series are scattered about an average. ◦ The more similar the scores are to each other, the lower the measure of dispersion will be ◦ The less similar the scores are to each other, the higher the measure of dispersion will be ◦ In general, the more spread out a distribution is, the larger the measure of dispersion will be 54 Measures of Absolute Dispersion  Measures of absolute dispersion are expressed in the units of the original observations.  There are three main measures of absolute dispersion: The range The semi-interquartile range (SIR) Variance / standard deviation 55 The Range The range is defined as the difference between the largest score in the set of data and the smallest score in the set of data, XL – XS The range is used when ◦ you have ordinal data or ◦ you are presenting your results to people with little or no knowledge of statistics What is the range of the following data: 4 8 1 6 6 2 9 3 6 9 Two very different sets of data can have the same range: 1 1 1 1 9 vs 1 3 5 7 9 56 The Standard Deviation and the Variance  Variance is the mean of the squared deviation scores  The larger the variance is, the more the scores deviate, on average, away from the mean  The smaller the variance is, the less the scores deviate, on average, from the mean 57 The Standard Deviation and the Variance When the deviate scores are squared in variance, their unit of measure is squared as well ◦ E.g. If people’s weights are measured in pounds, then the variance of the weights would be expressed in pounds2 (or squared pounds) Since squared units of measure are often awkward to deal with, the square root of variance is often used instead ◦ The standard deviation is the square root of variance 58 The Standard Deviation and the Variance Sample- Population- s: Standard Deviation σ: Standard Deviation s2: Variance σ2: Variance N is the population. n is the sample. 59 Computational Formula Example xi xi - (xi -) 2 9 8 6 5 8 6  = 42  = 0  = 12 60 Computational Formula Example xi xi - (xi -) 2 9 8 6 5 8 6  = 42  = 0  = 12 61 Measures of Relative Dispersion  Measures of relative dispersion are unit-less and are used when one wishes to compare the scatter of one distribution with another distribution.  Some measures of absolute dispersion:  Coefficient of Variation  Standard Score 62 Coefficient of Variation The Coefficient of Variance, CV, is the ratio of the standard deviation (SD) to the mean and is usually expressed in percentage. It is computed as 𝑆𝐷 𝜎 𝐶𝑉 = ∗ 100% = ∗ 100% 𝑚𝑒𝑎𝑛 𝜇 It answers the question; how big is the SD of the distribution to the mean of the distribution? 63 Coefficient of Variation Example: A laboratory technician studied recent instruments made with two different instruments. The 1st measured the diameter of a ball bearing and obtained a mean of 4.96 mm with SD of 0.022 mm. the second ball measured the diameter of a metal rod and obtained a mean of 6.48 mm with SD of 0.032 mm. which of the two was relatively more precise? 64 Coefficient of Variation Example: A laboratory technician studied recent instruments made with two different instruments. The 1st measured the diameter of a ball bearing and obtained a mean of 4.96 mm with SD of 0.022 mm. the second ball measured the diameter of a metal rod and obtained a mean of 6.48 mm with SD of 0.032 mm. which of the two was relatively more precise? Solution: 0.022 mm Instrument #1: 𝐶𝑉1 = × 100% = 0.44% 4.96 mm 0.032 mm Instrument #2: 𝐶𝑉2 = × 100% = 0.49% 6.48 mm ∴ Instrument #1 is relatively more precise. 65 Standard Score It measures how many standard deviation is above or below the mean. It is computed as 𝑥−𝜇 z= 𝜎 and the sample counterpart is 𝑥 − 𝑥ҧ z= 𝑠 Not really a measure of relative dispersion but related somehow. Useful for comparing 2 values from different series especially when these 2 series differ with respect to the mean or SD or both are expressed in different units. 66 Standard Score Example: Mario got a grade of 75% in English and a grade of 90% in History. The mean grade in English is 65% and SD is 10%, whereas in History, the mean grade is 80% and SD is 20%, in which subject did Mario perform well? 67 Standard Score Example: Mario got a grade of 75% in English and a grade of 90% in History. The mean grade in English is 65% and SD is 10%, whereas in History, the mean grade is 80% and SD is 20%, in which subject did Mario perform well? 𝑥 − 𝑥ҧ 𝑥 − 𝑥ҧ 𝑧𝐻𝑖𝑠𝑡𝑜𝑟𝑦 = 𝑧𝐸𝑛𝑔ℎ𝑙𝑖𝑠ℎ = 𝑠 𝑠 90−80 75−65 = = 20 10 = 0.5 = 1.0 ∴ Mario perform well in English. 68

Use Quizgecko on...
Browser
Browser