Descriptive Statistics PDF

Summary

This document provides an overview of descriptive statistics, including different measurement scales (nominal, ordinal, interval, ratio), measures of central tendency (mean, median, mode), and measures of dispersion (standard deviation, interquartile range). It also explains how to calculate these measures and their advantages and disadvantages.

Full Transcript

Descriptive Statistics Wednesday, October 02, 2024 10:40 PM Why do we collect data? To describe To infer To make predictions Measurements scales There are 4 types of measurement scales: nominal, ordinal, interval and ratio. They determine what can be said about the data and how the data can b...

Descriptive Statistics Wednesday, October 02, 2024 10:40 PM Why do we collect data? To describe To infer To make predictions Measurements scales There are 4 types of measurement scales: nominal, ordinal, interval and ratio. They determine what can be said about the data and how the data can be described. Nominal Ordinal Labelled data. Data ordered along a continuum. Not meaningfully scaled. Differences between two data points are E.g. political views, gender interpretable. E.g. different navy ranks: commander, ca Interval Ratio Differences between data points are meaningful Differences between data points are mea There is no true 0 value There is a true 0 value E.g. differences in temp; 0 degrees is not a true 0 E.g. income, employment status value Data is usually taken from a sample as it is impractical to measure an entire population. Histograms and boxplots are used before computing data. Histograms show the distribution of a continuous variable. The arithmetic mean and standard deviation can be calculated using a histogram. Boxplots show how data is distributed (IQR, median), as well as any outliers in the data. Descriptive statistics are described using measures of central tendency and measures of disp Central tendency: mean, median, mode What is it? Formula Advantages and Disadvantages Mean Sum of data points divided by Takes all data in the number of data points. Adv: accounts fo e not aptain aningful persion. d nto account. or every What is it? Formula Advantages and Disadvantages Mean Sum of data points divided by Takes all data in the number of data points. Adv: accounts fo value Dis: sensitive to asymmetry Median The middle value of a list of Data must be sorted by Takes 1-2 data p ordered data. magnitude first. account. If N is odd… Adv: not affecte value=number of values/2 and non-normal If N is even… Can be used on types of descrip value=sum of both middle statistics. values/2 Mode Most frequently occurring Takes a few dat data point. into account Dis: might not b identifiable if all unique. Dispersion What is it? Formula Standard Shows the distance of each data point 1. Subtract the mean from each data deviation from the mean. 2. Square the differences The spread of data: 68 percent of this 3. Sum the squared differences data is in this interval. 4. Divide the sum by N-1 (number of 5. Take the square root of the fractio Interquartile Shows the spread of the middle half Sort the data by magnitude, range of data/ where most values lie. take the ¼ and the ¾ data point if not an integer (whole number), rou Subtract the ¼ data point from the point. d nto account. or every o outliers and points into ed by outliers l data. various ptive ta points be l values are a point. f samples-1) on und ¾ data Interquartile Shows the spread of the middle half Sort the data by magnitude, range of data/ where most values lie. take the ¼ and the ¾ data point if not an integer (whole number), rou Subtract the ¼ data point from the point. Variance Shows the spread between numbers in a set. Sum of Shows the variability around the squares mean. und ¾ data

Use Quizgecko on...
Browser
Browser