Trimmed Mean and Outliers

______ is a value around which data is centered. 'Centre' can mean many things. Average value.

Mean

Most common value is known as ______.

Mode

In statistics, ______ is the 'middle data point'. 50% of the data points are smaller or equal to it.

Median

For successful data preprocessing, it is essential to have an overall picture of your data by looking at central tendency and variation/spread. This includes measuring ______.

dispersion Signup and view all the answers

Basic statistical descriptions can be used to identify properties of the data and highlight which data values should be treated as ______.

noise Signup and view all the answers

Multivariate Summary Statistics provide an ______ of the relationships among multiple variables in a dataset.

overview Signup and view all the answers

The trimmed mean is the mean obtained after chopping off values at the ______ and low extremes.

high Signup and view all the answers

We should avoid trimming too large a portion (such as 20%) at both ends, as this can result in the loss of valuable ______.

information Signup and view all the answers

For skewed (asymmetric) data, a better measure of the centre of data is the ______.

median Signup and view all the answers

The median is the middle value in a set of ordered values. It separates the higher half of a data set from the lower half. The median generally applies to ______ data.

numeric Signup and view all the answers

Let’s find the median of the data in the previous example. The data is already in ascending order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. The median lies here, therefore take the average of the middle two values. The median salary is R54 000. ∴ median = 52+56 / 2 = ______.

54 Signup and view all the answers

Given an odd number of values, the median is the middlemost value. This is the sixth value in this list, which has a value of R52 000. The ______ is another measure of central tendency.

mode Signup and view all the answers

A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the ______ half of the data.

middle Signup and view all the answers

Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the ______.

median Signup and view all the answers

A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the ______ covered by the middle half of the data.

range Signup and view all the answers

Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the ______ splits the data into equidistant halves.

median Signup and view all the answers

A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the middle half of the data. Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into ______ halves.

equidistant Signup and view all the answers

Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the median. However, 𝑄𝑄1 , the median, and 𝑄𝑄3 together contain no information about the endpoints (e.g., tails) of the ______.

data Signup and view all the answers

Trimmed Mean and Outliers

Choose a study mode

Podcast

Questions and Answers

______ is a value around which data is centered. 'Centre' can mean many things. Average value.

Most common value is known as ______.

In statistics, ______ is the 'middle data point'. 50% of the data points are smaller or equal to it.

For successful data preprocessing, it is essential to have an overall picture of your data by looking at central tendency and variation/spread. This includes measuring ______.

Basic statistical descriptions can be used to identify properties of the data and highlight which data values should be treated as ______.

Multivariate Summary Statistics provide an ______ of the relationships among multiple variables in a dataset.

The trimmed mean is the mean obtained after chopping off values at the ______ and low extremes.

We should avoid trimming too large a portion (such as 20%) at both ends, as this can result in the loss of valuable ______.

For skewed (asymmetric) data, a better measure of the centre of data is the ______.

The median is the middle value in a set of ordered values. It separates the higher half of a data set from the lower half. The median generally applies to ______ data.

Let’s find the median of the data in the previous example. The data is already in ascending order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. The median lies here, therefore take the average of the middle two values. The median salary is R54 000. ∴ median = 52+56 / 2 = ______.

Given an odd number of values, the median is the middlemost value. This is the sixth value in this list, which has a value of R52 000. The ______ is another measure of central tendency.

More Like This

Aerodinamica e Pendenze della Curva

Quick Share