Unit II Descriptive Statistics PDF
Document Details
Uploaded by HandsDownAntigorite4242
Tags
Summary
These notes cover descriptive statistics, including measures of position (mean, median, mode), measures of spread (range, standard deviation), and measures of shape (skewness, kurtosis). The document also touches on outliers, graphical displays, and different distributions.
Full Transcript
UNIT II DESCRIPTIVE STATISTICS Why do we Need Statistics? To find why a process behaves the way it does. To find why it produces defective goods or services. To center our processes on ‘Target’ or ‘Nominal’. To check the accuracy and precision of the process. To prevent problems caused by...
UNIT II DESCRIPTIVE STATISTICS Why do we Need Statistics? To find why a process behaves the way it does. To find why it produces defective goods or services. To center our processes on ‘Target’ or ‘Nominal’. To check the accuracy and precision of the process. To prevent problems caused by assignable causes of variation. To reduce variability and improve process capability. To know the truth about the real world. Descriptive Statistics Methods of describing the characteristics of a data set. Includes calculating things such as the average of the data, its spread and the shape it produces. Descriptive statistics involves describing, summarizing and organizing the data so it can be easily understood. Graphical displays are often used along with the quantitative measures to enable clarity of communication. When analyzing a graphical display, you can draw conclusions based on several characteristics of the graph. Outlier: A data point that is significantly greater or smaller than other data points in a data set. The easiest way to detect them is by graphing the data or using graphical methods such as: Histograms. Boxplots. Normal probability plots. The following measures are used to describe a data set: Measures of position (also referred to as central tendency or location measures). Measures of spread (also referred to as variability or dispersion measures). Measures of shape. Measures of Position: Position Statistics measure the data central tendency. Despite the common use of average, there are different statistics by which we can describe the average of a data set: Mean Median Mode Mean: The total of all the values divided by the size of the data set. It is the most used statistic of position. It is easy to understand and calculate. It works well when the distribution is symmetric and there are no outliers. The mean of a sample is denoted by ‘x-bar’. The mean of a population is denoted by ‘μ’. Median: The middle value where exactly half of the data values are above it and half are below it. Median Calculation: Why can the mean and median be different? Mode The value that occurs the most often in a data set. It is rarely used as a central tendency measure Measures of Spread The Spread refers to how the data deviates from the position measure. There are different statistics by which we can describe the spread of a data set: Range and Standard deviation. Range: The difference between the highest and the lowest values. The simplest measure of variability. Often denoted by ‘R’. Standard Deviation The average distance of the data points from their own mean. The standard deviation of a sample is denoted by ‘s’. The standard deviation of a population is denoted by “σ”. Measures of Shape: Data can be plotted into a histogram to have a general idea of its shape, or distribution. The shape can reveal a lot of information about the data. Data will always follow some know distribution. Examples of symmetrical distributions include: Two common statistics Skewness that measure the shape of the data Kurtosis Skewness Describes whether the data is distributed symmetrically around the mean. A skewness value of zero indicates perfect symmetry. A negative value implies left-skewed data. A positive value implies right-skewed data. Kurtosis Measures the degree of flatness (or peakness) of the shape. When the data values are clustered around the middle, then the distribution is more peaked. A greater kurtosis value. When the data values are spread around more evenly, then the distribution is more flatted. Variance It is a measure of the variation around the mean. It measures how far a set of data points are spread out from their mean.