Trimmed Mean and Outliers
18 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

______ is a value around which data is centered. 'Centre' can mean many things. Average value.

Mean

Most common value is known as ______.

Mode

In statistics, ______ is the 'middle data point'. 50% of the data points are smaller or equal to it.

Median

For successful data preprocessing, it is essential to have an overall picture of your data by looking at central tendency and variation/spread. This includes measuring ______.

<p>dispersion</p> Signup and view all the answers

Basic statistical descriptions can be used to identify properties of the data and highlight which data values should be treated as ______.

<p>noise</p> Signup and view all the answers

Multivariate Summary Statistics provide an ______ of the relationships among multiple variables in a dataset.

<p>overview</p> Signup and view all the answers

The trimmed mean is the mean obtained after chopping off values at the ______ and low extremes.

<p>high</p> Signup and view all the answers

We should avoid trimming too large a portion (such as 20%) at both ends, as this can result in the loss of valuable ______.

<p>information</p> Signup and view all the answers

For skewed (asymmetric) data, a better measure of the centre of data is the ______.

<p>median</p> Signup and view all the answers

The median is the middle value in a set of ordered values. It separates the higher half of a data set from the lower half. The median generally applies to ______ data.

<p>numeric</p> Signup and view all the answers

Let’s find the median of the data in the previous example. The data is already in ascending order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. The median lies here, therefore take the average of the middle two values. The median salary is R54 000. ∴ median = 52+56 / 2 = ______.

<p>54</p> Signup and view all the answers

Given an odd number of values, the median is the middlemost value. This is the sixth value in this list, which has a value of R52 000. The ______ is another measure of central tendency.

<p>mode</p> Signup and view all the answers

A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the ______ half of the data.

<p>middle</p> Signup and view all the answers

Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the ______.

<p>median</p> Signup and view all the answers

A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the ______ covered by the middle half of the data.

<p>range</p> Signup and view all the answers

Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the ______ splits the data into equidistant halves.

<p>median</p> Signup and view all the answers

A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the middle half of the data. Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into ______ halves.

<p>equidistant</p> Signup and view all the answers

Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the median. However, 𝑄𝑄1 , the median, and 𝑄𝑄3 together contain no information about the endpoints (e.g., tails) of the ______.

<p>data</p> Signup and view all the answers

More Like This

Use Quizgecko on...
Browser
Browser