Statistics Chapter on Central Tendency and Spread

Study Notes

Mode: The value with the highest frequency in a dataset; possible to have multiple modes, applicable to nominal and ordinal variables.
Median: The middle value that separates data into two equal parts; can only be calculated for ordered variables (ordinal, interval, ratio).
Mean: The average value, calculated by dividing the total value by the number of items; can only be determined for interval and ratio variables.
Comparison: Mean is sensitive to outliers; median remains unaffected, providing a more robust center measure in skewed distributions.

Skewness: Measures asymmetry in the data distribution; positive skewness indicates a long right tail, negative skewness indicates a long left tail, and zero skewness indicates symmetry.
Kurtosis: Measures the "tailedness" of the distribution; high kurtosis suggests a sharper peak (more outliers), while low kurtosis indicates a flatter distribution.

Quantiles: Cut-off points that divide datasets into sections with equal data counts; quartiles (4 sections) and percentiles (100 sections) are specific types of quantiles.
Variance: A measure of data spread calculated as the average of the squared distances from the mean; a higher variance indicates wider spread.
Standard Deviation (SD): The square root of variance; provides a sense of average distance from the mean.
Z-Score: Indicates how many standard deviations a data point is from the mean; a Z-score over 3 or below -3 suggests an outlier.

Can arise from inaccuracies in data collection, methodological issues, or truly extreme values; identified through Z-score and Interquartile Range (IQR) methods.
An outlier is defined as a point above the third quartile (Q3) or that distances from the mean exceed 3 standard deviations.

Visual tools that represent the distribution of data based on quartiles and highlight potential outliers; show central tendency and spread.

Represents the frequency distribution of data categorized into bins; the height of each bin indicates how often a value occurs; can group data using various bin rules.

1st Moment: Mean represents the balance point of data values.
2nd Moment: Variance quantifies the extent of data spread around the mean.
3rd Moment: Skewness reveals the degree of asymmetry in the data.
4th Moment: Kurtosis assesses the relative sharpness of the peak or the thickness of the tails.

Used for hypothesis testing with two categories based on sample data; utilizes sample size (N), significance level (Q), and observed outcomes (K).
Hypothesis testing involves calculating a p-value; if p-value is less than 0.05, the null hypothesis can be rejected; if p-value is greater than 0.05, the hypothesis cannot be rejected.