Document Details
Uploaded by momogamain
Full Transcript
Question,A,B,C,D,CorrectAnswer,Explanation "If a dataset's whiskers in a box plot are of equal length, what does this suggest about the distribution of the data?",The data is normally distributed,The data is skewed,The data is bimodal,The data is heavily skewed to the left,A,"Equal whisker lengths s...
Question,A,B,C,D,CorrectAnswer,Explanation "If a dataset's whiskers in a box plot are of equal length, what does this suggest about the distribution of the data?",The data is normally distributed,The data is skewed,The data is bimodal,The data is heavily skewed to the left,A,"Equal whisker lengths suggest that the data is approximately symmetrical, often indicating a normal distribution." "When analyzing a right-skewed distribution, which of the following measures will be most affected by the skew?",Median,Mode,Mean,IQR,C,The mean is more affected by extreme values on the right and will be greater than the median in a right-skewed distribution. You are analyzing two datasets with the same mean but different standard deviations. What does this tell you about the datasets?,The datasets have the same variability,One dataset has more variability than the other,The datasets have the same shape,Both datasets are skewed in the same way,B,"Different standard deviations mean the datasets have different amounts of variability, even if the means are the same." "A data scientist is analyzing the heights of trees in a forest. Most trees are between 5 and 10 meters, but there are a few that are over 20 meters tall. Which measure of central tendency should they report?",Mean,Median,Mode,Range,B,The median is a better measure because it is not affected by the few extremely tall trees. "If a dataset has an IQR of 30 and Q1 is 40, what is the upper boundary for identifying outliers?",85,95,70,55,B,"The upper boundary is Q3 + 1.5 × IQR. First, calculate Q3: Q3 = Q1 + IQR = 40 + 30 = 70. Then, 70 + 1.5 × 30 = 95." A distribution of exam scores is left-skewed. Which statement about the mean and median is most likely true?,The mean is greater than the median,The mean is less than the median,The mean is equal to the median,The mean and mode are equal,B,"In a left-skewed distribution, the mean is pulled to the left (lower values) and is less than the median." What does a high standard deviation in a dataset imply about the spread of values?,The values are closely packed around the mean,The values are widely spread out around the mean,There are no outliers in the data,The data is perfectly symmetrical,B,A high standard deviation indicates that data points are spread out widely from the mean. "When analyzing a dataset of monthly sales, you find several extreme values. What should you do first before making any decisions about these outliers?",Immediately remove the outliers from the dataset,Investigate the outliers to understand why they occurred,Assume the outliers are errors,Replace the outliers with the mean value,B,You should investigate outliers to determine if they are data errors or meaningful events. A dataset of house prices is highly variable. Which of the following measures is most appropriate for understanding the overall spread of house prices?,Mean,Range,Median,Standard Deviation,D,"Standard deviation is useful for understanding the overall spread, especially when the data has high variability." "If you are using the IQR method to detect outliers and you have Q1 = 30 and Q3 = 80, what is the lower boundary for identifying outliers?",10,5,15,0,C,"The lower boundary is Q1 - 1.5 × IQR. First, calculate IQR: 80 - 30 = 50. Then, 30 - 1.5 × 50 = 15." What does a box plot reveal about a dataset?,The average value of the dataset,"The distribution and spread, including the presence of outliers",The exact frequency of each data point,The correlation between two variables,B,"A box plot shows the distribution, spread, and identifies outliers in the data." A company's annual revenue data has an IQR of \$10 million and several outliers on the high end. Which measure of central tendency would be most appropriate to report?,Mean,Median,Mode,Range,B,The median is more appropriate in the presence of high outliers because it is not affected by extreme values. What does a positive skew in a dataset indicate about the data distribution?,The data is uniformly distributed,The tail is on the left side of the distribution,The data has a long tail on the right side,The mean and median are equal,C,"A positive skew means the data has a long tail on the right side, with the mean being greater than the median." You calculate the range of a dataset to be 45. What does this tell you about the data?,The middle 50% of the data is spread across 45 units,The difference between the maximum and minimum values is 45,The standard deviation is 45,The mean is 45,B,The range is the difference between the maximum and minimum values. A data analyst uses a scatter plot and notices a strong positive correlation between advertising budget and sales. What should be the next step in their analysis?,Assume a causal relationship between budget and sales,Test for causation using additional statistical methods,Reduce the advertising budget to test the relationship,Ignore the correlation,B,Correlation does not imply causation. Further analysis is needed to establish a causal relationship. "In a dataset with a symmetric distribution, which measures of central tendency and spread are most appropriate to use?",Median and IQR,Mean and Standard Deviation,Mode and Range,Median and Range,B,"For a symmetric distribution, the mean and standard deviation are appropriate measures of central tendency and spread." When would you use the median over the mean to describe a dataset?,When the data is normally distributed,When the data has extreme outliers,When there are no outliers,When the data has a small range,B,"The median is used when there are extreme outliers because it is not affected by them, unlike the mean." "If you want to determine the consistency of test scores, which measure should you use?",Range,Mode,Standard Deviation,Median,C,Standard deviation measures the consistency or variability of data around the mean. Which of the following scenarios would be best suited for using the IQR as a measure of spread?,A dataset with no outliers and a normal distribution,A dataset with several extreme values,A dataset where all values are similar,A categorical dataset,B,"The IQR is best used when the dataset has extreme values, as it focuses on the middle 50% of the data." What can be inferred if a scatter plot shows no clear pattern between two variables?,There is a strong correlation between the variables,There is no relationship between the variables,The variables are dependent on each other,The mean of both variables is equal,B,No clear pattern in a scatter plot indicates there is no relationship or correlation between the two variables.