Data Analysis PDF
Document Details
Uploaded by EfficientSimile1321
Western Mindanao State University
Dr. Rica Rose May A. Rubio
Tags
Summary
This document provides an overview of data analysis techniques, focusing on descriptive and inferential statistics. It explains measures of central tendency (mean, median, mode) and dispersion as well as example calculations. It's a helpful resource for understanding basic statistical concepts, suitable for undergraduate-level study.
Full Transcript
# DATA ANALYSIS ## Western Mindanao State University ## Data Analysis * Data analysis is the process of organizing collected data in order to draw helpful conclusions from it * The main purpose of data analysis is to find meaning in data so that the derived knowledge can be used to make informed...
# DATA ANALYSIS ## Western Mindanao State University ## Data Analysis * Data analysis is the process of organizing collected data in order to draw helpful conclusions from it * The main purpose of data analysis is to find meaning in data so that the derived knowledge can be used to make informed decisions. ## Descriptive & Inferential Data Analysis * **Descriptive Data Analysis** - used to describe, show or summarize data in a meaningful way, leading to a simple interpretation of data. The commonly used statistics are frequency, percentage, measures of central tendency (ex. Mean, median, mode) and measures of dispersion (ex. SD) * **Inferential Data Statistics** - tests hypothesis about a set of data to reach conclusions or make generalizations beyond merely describing the data It includes tests of significance of difference such as the t-test, analysis of variance (ANOVA), Chi-square test. ## Descriptive Data Analysis (Measures of Central Tendency) * **A measure of central tendency** is a single value that attempts to describe a set of data by identifying the central position within that set of data. * **The 3 main measures are:** - **Mean** - The average score - **Median** - The value that lies in the middle after ranking all the scores - **Mode** - The most frequently occurring score **The measure you choose should give you a good indication of the typical score in the sample or population.** ## Measures of Central Tendency **Mean**...the most frequently used but is sensitive to extreme scores. The mean is the sum of the observed values in the distribution divided by the number of observations. The formula is: **Mean** = **Sum of all data values** / **Number of data values** **Symbolically,** $\bar{x} = \frac{\sum x}{n}$ where $\bar{x}$ (read as 'x bar') is the mean of the set of *x* values, $\sum x$ is the sum of all the *x* values, and *n* is the number of *x* values. ## Sample Mean for Ungrouped Data **Find the mean:** 57, 86, 42, 38, 90, 66 $\bar{x} = \frac{\sum x}{n} = \frac{X_1 + X_2 + X_3 + .... + X_n}{n}$ $= \frac{57 + 86 + 42 + 38 + 90 + 66}{6}$ $= \frac{379}{6}$ = 63.167 ## Sample exercise: * **Find the mean of the following scores:** 24, 55, 77, 36, 68, 89, 52, 45, 28, 72 ## Exercise 2 * **Find the mean of the following scores:** 25, 54, 77, 34, 66, 89, 50, 45, 29, 71 ## Sample exercise: * **Find the mean of the following scores:** 24, 55, 77, 36, 68, 89, 52, 45, 28, 72 ## Measures of Central Tendency **Median** ...is the midpoint of distribution ...is not sensitive to extreme scores ...use it when you are unable to use the mean because of extreme scores ...ordering the set from lowest to highest and finding the exact middle. ## Example 1: For odd n (number of observations) Imagine that a top running athlete in a typical 200-metre training session runs in the following times: 26.1, 25.6, 25.7, 25.2 and 25.0 seconds. How would you calculate his median time? First, the values are put in asscending order: 25.0, 25.2. 25.6, 25.7, 26.1. Then, using the following formula, figure out which value is the middle value. Remember that *n* represents the number of values in the data set. **Median = {(n + 1) ÷ 2}th value** = (5 + 1) ÷ 2 = $\frac{6}{2}$ = 3 The third value in the data set will be the median. Since 25.6 is the third value, 25.6 seconds would be the median time. ## Sample exercise for odd n: * A top running athlete in a typical 200-metre training session runs in the following times: * **25.4, 25.8, 25.7, 25.2 and 26.0 seconds.** * **How would you calculate his median time?** ## Measures of Central Tendency **Mode** ...does not involve any calculation or ordering of data. ...most frequently occurring value in a set of observation. **Ex. 16, 18, 18, 25, 25, 25, 30, 34, 36, 38.** **The mode is 25** ## Exercise 4: Find the mode | Scores in the National Achievement Test | | :---: |:---:|:---:|:---:|:---:|:---:| | 90 | 95 | 96 | 87 | 110| | 102 | 95 | 98 | 87| 117| | 115 | 96 | 91 | 95 | 95| | 93 | 105 | 86 | 103 | 106| **Mode is _** # DATA ANALYSIS (Measures of Dispersion) ## By ## Dr. Rica Rose May A. Rubio * **A measure of dispersion** indicates the scattering of data It explains the disparity of data from one another, delivering a precise view of their distribution. The measure of dispersion displays and gives us an idea about the variation and the central value of an individual item. * **In other words,** dispersion is the extent to which values in a distribution differ from the average of the distribution. It gives us an idea about the extent to which individual items vary from one another, and from the central value. ## Measures of Dispersion **Measures that indicate the spread of scores:** - Range - Average (Mean) Deviation - Standard Deviation **While measures of central tendency are used to estimate "normal" values of a data set, measures of dispersion are important for describing the spread of the data, or its variation around a central value.** ## Range * **It is the difference between the largest and the smallest values in a set of data** * **Formula:** Max score - Min score = Range * **It is a crude indication of the spread of the scores** because it does not tell us much about the shape of the distribution and how much the scores vary from the mean. ## Example: * **Consider the ff. scores obtained by 10 students participating in a mathematics contest:** 6, 10, 12, 15, 18, 18, 20, 23, 25, 28 * **Formula: Max. score - Min. score = Range** * **Range = 28 - 6 = 22** ## Sample exercise: **Find the range of the set of observations:** 22, 60, 75, 85, 98. ## Average (Mean) Deviation for Ungrouped Data * **This measure of spread is defined as the absolute difference or deviation between the values in a set of data and the mean,** divided by the total number of values in the set of data. * **Formula:** **Ave. Deviation (AD)= ∑ |x-x/n** **Where:** * *x* is a value in the set * $\bar{x}$ is the mean * *n* is the total number of values in the data set ## Example: * **Consider a set of values which consists of 20, 25, 35, 40, 45. Find the average deviation.** Formula: Ave. Deviation (AD)= ∑ |x-x/n * **Get first the $\bar{x}$. Add 20+25+35+40+45 = 165 + 5 = 33** * **AD=||20-33|+|25-33|+|35-33|+|40-33|+|45-33|/5** = |-13|+|-8|+|2|+|7|+|12|/5 = 13+8+2+7+12/5 = 42/5 = 8.4 **Thus, on the average, each value is 8.4 units from the mean.** ## Exercise 1: a. The ff. scores obtained by the students participating in a mathematics contest: 7, 10, 12, 15, 18, 18, 20, 23, 25, 26. **Find the range and average deviation.** b. A set of observations consist of 32, 46, 60, 75, 85, 90, 98. **Find the range and average deviation.** ## Standard Deviation (SD) * **SD is a measure of the spread or variation of data about the mean.** * **It tells us what is happening between the minimum and maximum scores** * **It tells us how far each data value deviates or is different from the mean.** * **It is useful when we need to compare groups using the same scale** ## Example: * **Consider the ff. scores obtained by 10 students participating in a mathematics contest:** 6, 10, 12, 15, 18, 18, 20, 23, 25, 28 **Steps:** 1. **Compute the mean ($x$).** 2. **Subtract the mean ($x$) from each score (*x*), or (*x* - $\bar{x}$).** 3. **Square the difference from step 2, or (x - $\bar{x}$)²** 4. **Sun all the squares from step 3.** 5. **Divide the number in step 4 by *n* - 1** 6. **Compute the SD using the formula.** ## Example * **Consider the ff. scores obtained by 10 students participating in a mathematics contest:** 6, 10, 12, 15, 18, 18, 20, 23, 25, 28 **Steps:** 1. **Compute the mean ($x$).** 2. **Subtract the mean ($x$) from each score (*x*), or (*x* - $\bar{x}$).** 3. **Square the difference from step 2, or (x - $\bar{x}$)²** 4. **Sum all the squares from step 3.** 5. **Divide the number in step 4 by *n* - 1.** 6. **Compute the SD using the formula.** | Score (x) | (x - $\bar{x}$) | (x - $\bar{x}$)² | | :---: | :---: | :---: | | 6 | 11.5 | 132.25 | | 10 | 7.5 | 56.25 | | 12 | 5.5 | 30.25 | | 15 | -2.5 | 6.25 | | 18 | 5 | 25 | | 18 | 5 | 25 | | 20 | 2.5 | 6.25 | | 23 | 5.5 | 30.25 | | 25 | 7.5 | 56.25 | | 28 | 10.5 | 110.25 | | **n=10** | **Σ(x- $\bar{x}$)=** | **Σ(x- $\bar{x}$)²=** | | | | **428.5** | **1. Compute for the mean ($\bar{x}$). 6, 10, 12, 15, 18, 18, 20, 23, 25, 28** **175+10 = 17.5** **$\bar{x}$ = 17.5** **Formula:** **SD=√Σ(x - $\bar{x}$)²/(n - 1)** = √428.5/(10-1) = √428.5/9 = √47.61 **SD = 6.9** **The students have a mean score of 17.5 in the mathematics contest and a standard deviation of 6.9.** ## Exercise 2: a. **A set of scores consist of 15, 20, 32, 46, 60, 75, 85, 90, 93, 98. Find the mean and standard deviation.** b. **A set of data consist of 10, 20, 30, 40, 50. Compute for the mean and standard deviation.** ## Thank You!