Descriptive Statistics PDF

Descriptive Statistics: The first step towards statistical analysis Descriptive Statistics  It is a branch of statistics that focuses on summarizing and presenting data in a meaningful way.  It provides a set of tools and techniques for organizing, analyzing, and interpreting data to help people understand and make sense of the information at hand. Types of Descriptive Statistics 1. Measures of Central Tendency (mean, median, mode) 2. Measures of Position (percentiles, deciles, quartiles, Z- scores) 3. Measures of Variability (range, average deviation, variance, standard deviation) Measures of Central Tendency  Central tendency is defined as “the statistical measure that identifies a single value as representative of an entire distribution.”  It aims to provide an accurate description of the entire data. It is the single value that is most typical/representative of the collected data. Measures of Central Tendency Mean: The sum of all values divided by the number of values. Median: The middle value when data is ordered. Mode: The value that appears most frequently. Example of Measures of Central Tendency Data Set: 35, 15, 22, 40, 25, 18, 28, 35  Find the Mean, Median, and Mode of this dataset. Mean: (35 + 15 + 22 + 40 + 25 + 18 + 28 + 35) ÷ 8 = 27.25 Median: 15, 18, 22, 25, 28, 35, 35, 40 (25+28) ÷ 2 = 26.5 Mode: 35 Measures of Position  Measures of position are statistical tools that describe the relative standing of data points within a data set.  They tell us how a specific data value compares to other values in the set and help to understand the distribution of the data. Percentiles  Percentiles: Divide the data into 100 equal parts.  Each percentile indicates the value below which a certain percentage of the data points fall. Percentiles are widely used to rank and compare data points within a dataset.  75th percentile indicates the value below which 75% of the data falls. ; in a class of 60 75% of the scores of 60 students (45) are below 82. Quartiles Quartiles are a type of measure of position that divide a data set into four Q1 (First equalThe Quartile): parts, 25theach containing 25% percentile. of the data. Q2 (Second Quartile or Median): The 50th percentile. Q3 (Third Quartile): The 75th percentile.  The difference between the third and first quartile is called the interquartile range (IQR), which measures the spread of the middle 50% of the data.  Quartiles are useful for identifying outliers and understanding the central tendency and variability in a data set. Interquartile Range  The IQR measures the spread of the middle 50% of the data, calculated as the difference between the third quartile (Q3) and the first quartile (Q1).  The IQR is especially useful for identifying outliers because it gives a more robust measure of spread than the range, which can be distorted by extreme values. Why use IQR?  The IQR is resistant to outliers because it only looks at the middle 50% of the data, ignoring the lowest 25% and the highest 25%.  It's commonly used in conjunction with box plots to visually represent the spread and to detect outliers (any data points that fall below Q1 - 1.5 × IQR or above Q3 + 1.5 × IQR are considered outliers). Z-score A Z-score (also called a standard score) measures how many standard deviations a data point is from the mean of a data set. It helps us determine the position of a value within a distribution. A Z-score tells us whether the data point is below or above the mean and by how much. z-score/standard score observed value mean of the sample standard deviation of the sample Z-score Example: The table below shows the test-scores of Liza on three subjects, the mean, and the standard deviation Lisa’s score in science is of the scores of the section where she belongs. In which 1.29 standard deviations subject did she perform best? above the average (or mean score), her score in mathematics is 1.25 standard deviations below the average while her score in English is 0.96 standard deviation above the mean score of the class she belongs to. Thus, she performed best in Science. Measures of Variability  Who among the students is the most consistent or a student with most compressed scores? Measures of Variability  Measures of variability (also called measures of dispersion) describe how spread out or scattered the data points in a data set are. These measures help us understand the degree to which the data points differ from each other and from the central tendency (mean, median). Range  The range is the difference between the highest and lowest values in the data set. 𝑅𝑎𝑛𝑔𝑒=𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒− 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 Example: If the highest score in a class is 95 and the lowest is 55, the range is 40. Variance  Variance measures the average squared deviation of each data point from the mean. It gives a sense of how far the numbers are from the mean, but in squared units. Standard Deviation  Standard deviation is a measure of the amount of variation or dispersion in a set of data points.  It tells us how much the individual data points tend to differ from the mean (average) of the data set. In simple terms, it indicates how spread out the data is. Standard Deviation Interpretation:  A low standard deviation means the data points are clustered close to the mean, indicating less variability.  A high standard deviation means the data points are spread out, indicating more variability. Rule of Thumb Typically, if the standard deviation is less than 10% of the mean, it’s often considered small in relative terms. The table below presents the test results for Grade 5 in Mathematics, where the teacher taught his three classes using th Uses of Standard Deviation Interpreting Teacher’s Performance Example: The table below presents the test results of Grade 5 in Mathematics where the teacher taught his three classes using the same teaching strategy. To which section did the teacher made his teaching strategy most effective, given that the three classes are homogenously grouped? Uses of Standard Deviation Interpreting Teacher’s Performance Section A performed the BEST among the three sections, having obtained the highest mean (). Section A obtained the highest standard deviation (s = 8.50).This suggests a significant (heterogeneous) differentiation of abilities. In this case, the teacher failed to reduce differentiations among the individual Uses of Standard Deviation Interpreting Teacher’s Performance The teaching strategy was most effective in reducing the gap Section C performed the POOREST among the three sections, having obtained the between good and lowest mean (=33.55). Section C had the poor performers in lowest standard deviation (s = 3.34). This Section C. implies that Section C is a homogeneous (almost the same ability) class. The teacher in this case successfully closed the gap between Excel Commands Measures of Position Percentile: =PERCENTILE.EXC(A1:A41, 0.7) 0.7 means 70th percentile Percentile Rank: =PERCENTRANK(A1:A50, 75) First Quartile (Q1) : =QUARTILE(A1:A50, 1) Third Quartile (Q3) : =QUARTILE(A1:A50, 2) Excel Commands Measures of Variability Range : =MAX(A1:A50) - MIN(A1:A50) Variance: =VAR.P(A1:A50) (or VAR.S(A1:A50) for sample variance) Standard Variation: =STDEV.P(A1:A50) (or STDEV.S(A1:A50) for sample) Z-score: = (X - AVERAGE(A1:A50)) / STDEV.P(A1:A50) THANK YOU

Descriptive Statistics PDF

Document Details

Tags

Related

Summary

Full Transcript