Podcast
Questions and Answers
______ is a value around which data is centered. 'Centre' can mean many things. Average value.
______ is a value around which data is centered. 'Centre' can mean many things. Average value.
Mean
Most common value is known as ______.
Most common value is known as ______.
Mode
In statistics, ______ is the 'middle data point'. 50% of the data points are smaller or equal to it.
In statistics, ______ is the 'middle data point'. 50% of the data points are smaller or equal to it.
Median
For successful data preprocessing, it is essential to have an overall picture of your data by looking at central tendency and variation/spread. This includes measuring ______.
For successful data preprocessing, it is essential to have an overall picture of your data by looking at central tendency and variation/spread. This includes measuring ______.
Signup and view all the answers
Basic statistical descriptions can be used to identify properties of the data and highlight which data values should be treated as ______.
Basic statistical descriptions can be used to identify properties of the data and highlight which data values should be treated as ______.
Signup and view all the answers
Multivariate Summary Statistics provide an ______ of the relationships among multiple variables in a dataset.
Multivariate Summary Statistics provide an ______ of the relationships among multiple variables in a dataset.
Signup and view all the answers
The trimmed mean is the mean obtained after chopping off values at the ______ and low extremes.
The trimmed mean is the mean obtained after chopping off values at the ______ and low extremes.
Signup and view all the answers
We should avoid trimming too large a portion (such as 20%) at both ends, as this can result in the loss of valuable ______.
We should avoid trimming too large a portion (such as 20%) at both ends, as this can result in the loss of valuable ______.
Signup and view all the answers
For skewed (asymmetric) data, a better measure of the centre of data is the ______.
For skewed (asymmetric) data, a better measure of the centre of data is the ______.
Signup and view all the answers
The median is the middle value in a set of ordered values. It separates the higher half of a data set from the lower half. The median generally applies to ______ data.
The median is the middle value in a set of ordered values. It separates the higher half of a data set from the lower half. The median generally applies to ______ data.
Signup and view all the answers
Let’s find the median of the data in the previous example. The data is already in ascending order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. The median lies here, therefore take the average of the middle two values. The median salary is R54 000. ∴ median = 52+56 / 2 = ______.
Let’s find the median of the data in the previous example. The data is already in ascending order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. The median lies here, therefore take the average of the middle two values. The median salary is R54 000. ∴ median = 52+56 / 2 = ______.
Signup and view all the answers
Given an odd number of values, the median is the middlemost value. This is the sixth value in this list, which has a value of R52 000. The ______ is another measure of central tendency.
Given an odd number of values, the median is the middlemost value. This is the sixth value in this list, which has a value of R52 000. The ______ is another measure of central tendency.
Signup and view all the answers
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the ______ half of the data.
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the ______ half of the data.
Signup and view all the answers
Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the ______.
Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the ______.
Signup and view all the answers
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the ______ covered by the middle half of the data.
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the ______ covered by the middle half of the data.
Signup and view all the answers
Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the ______ splits the data into equidistant halves.
Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the ______ splits the data into equidistant halves.
Signup and view all the answers
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the middle half of the data. Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into ______ halves.
A percentile is a measure indicating the value below which a given percentage of observations falls. A quartile is a type of quantile which divides the observations into four more or less equal parts, or quarters. The interquartile range is a distance between the first and third quartiles that gives the range covered by the middle half of the data. Variance and Standard Deviation indicate how spread out the distribution of the data is. A low standard deviation means that the data observations tend to be very close to the mean, while a high standard deviation indicates that the data are spread out over a large range of values. This is for a sample, which is what most data is based on. Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into ______ halves.
Signup and view all the answers
Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the median. However, 𝑄𝑄1 , the median, and 𝑄𝑄3 together contain no information about the endpoints (e.g., tails) of the ______.
Five-Number Summary: No single numeric measure of spread (e.g., IQR) is very useful for describing the shape of skewed distributions. Recall for a symmetric distribution, the median splits the data into equidistant halves. This does not occur for skewed distributions. Therefore, it is more informative to also look at the two quartiles, 𝑄𝑄1 and 𝑄𝑄3 , along with the median. However, 𝑄𝑄1 , the median, and 𝑄𝑄3 together contain no information about the endpoints (e.g., tails) of the ______.
Signup and view all the answers