Trimmed Mean and Outliers

EyeCatchingMountRushmore avatar
EyeCatchingMountRushmore
·
·
Download

Start Quiz

Study Flashcards

18 Questions

______ is a value around which data is centered. 'Centre' can mean many things. Average value.

Mean

Most common value is known as ______.

Mode

In statistics, ______ is the 'middle data point'. 50% of the data points are smaller or equal to it.

Median

For successful data preprocessing, it is essential to have an overall picture of your data by looking at central tendency and variation/spread. This includes measuring ______.

dispersion

Basic statistical descriptions can be used to identify properties of the data and highlight which data values should be treated as ______.

noise

Multivariate Summary Statistics provide an ______ of the relationships among multiple variables in a dataset.

overview

The trimmed mean is the mean obtained after chopping off values at the ______ and low extremes.

high

We should avoid trimming too large a portion (such as 20%) at both ends, as this can result in the loss of valuable ______.

information

For skewed (asymmetric) data, a better measure of the centre of data is the ______.

median

The median is the middle value in a set of ordered values. It separates the higher half of a data set from the lower half. The median generally applies to ______ data.

numeric

Let’s find the median of the data in the previous example. The data is already in ascending order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. The median lies here, therefore take the average of the middle two values. The median salary is R54 000. ∴ median = 52+56 / 2 = ______.

54

Given an odd number of values, the median is the middlemost value. This is the sixth value in this list, which has a value of R52 000. The ______ is another measure of central tendency.

mode

A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the ______ half of the data.

middle

Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the ______.

median

A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the ______ covered by the middle half of the data.

range

Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the ______ splits the data into equidistant halves.

median

A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the middle half of the data. Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into ______ halves.

equidistant

Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the median. However, 𝑄𝑄1 , the median, and 𝑄𝑄3 together contain no information about the endpoints (e.g., tails) of the ______.

data

Learn about the concept of trimmed mean as a way to reduce the impact of extreme values on the mean calculation. Discover how trimming values at the high and low extremes can provide a more accurate representation of the data.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free
Use Quizgecko on...
Browser
Browser