Podcast
Questions and Answers
______ is a value around which data is centered. 'Centre' can mean many things. Average value.
______ is a value around which data is centered. 'Centre' can mean many things. Average value.
Mean
Most common value is known as ______.
Most common value is known as ______.
Mode
In statistics, ______ is the 'middle data point'. 50% of the data points are smaller or equal to it.
In statistics, ______ is the 'middle data point'. 50% of the data points are smaller or equal to it.
Median
For successful data preprocessing, it is essential to have an overall picture of your data by looking at central tendency and variation/spread. This includes measuring ______.
For successful data preprocessing, it is essential to have an overall picture of your data by looking at central tendency and variation/spread. This includes measuring ______.
Basic statistical descriptions can be used to identify properties of the data and highlight which data values should be treated as ______.
Basic statistical descriptions can be used to identify properties of the data and highlight which data values should be treated as ______.
Multivariate Summary Statistics provide an ______ of the relationships among multiple variables in a dataset.
Multivariate Summary Statistics provide an ______ of the relationships among multiple variables in a dataset.
The trimmed mean is the mean obtained after chopping off values at the ______ and low extremes.
The trimmed mean is the mean obtained after chopping off values at the ______ and low extremes.
We should avoid trimming too large a portion (such as 20%) at both ends, as this can result in the loss of valuable ______.
We should avoid trimming too large a portion (such as 20%) at both ends, as this can result in the loss of valuable ______.
For skewed (asymmetric) data, a better measure of the centre of data is the ______.
For skewed (asymmetric) data, a better measure of the centre of data is the ______.
The median is the middle value in a set of ordered values. It separates the higher half of a data set from the lower half. The median generally applies to ______ data.
The median is the middle value in a set of ordered values. It separates the higher half of a data set from the lower half. The median generally applies to ______ data.
Let’s find the median of the data in the previous example. The data is already in ascending order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. The median lies here, therefore take the average of the middle two values. The median salary is R54 000. ∴ median = 52+56 / 2 = ______.
Let’s find the median of the data in the previous example. The data is already in ascending order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. The median lies here, therefore take the average of the middle two values. The median salary is R54 000. ∴ median = 52+56 / 2 = ______.
Given an odd number of values, the median is the middlemost value. This is the sixth value in this list, which has a value of R52 000. The ______ is another measure of central tendency.
Given an odd number of values, the median is the middlemost value. This is the sixth value in this list, which has a value of R52 000. The ______ is another measure of central tendency.
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the ______ half of the data.
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the ______ half of the data.
Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the ______.
Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the ______.
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the ______ covered by the middle half of the data.
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the ______ covered by the middle half of the data.
Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the ______ splits the data into equidistant halves.
Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the ______ splits the data into equidistant halves.
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the middle half of the data. Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into ______ halves.
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the middle half of the data. Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into ______ halves.
Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the median. However, 𝑄𝑄1 , the median, and 𝑄𝑄3 together contain no information about the endpoints (e.g., tails) of the ______.
Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the median. However, 𝑄𝑄1 , the median, and 𝑄𝑄3 together contain no information about the endpoints (e.g., tails) of the ______.