Statistics and Data Classification Quiz

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What does the null hypothesis represent?

No difference or no association (correct)
A specific probability distribution
There is a difference or association
An alternative hypothesis

A smaller p-value indicates insufficient evidence to reject the null hypothesis.

False (B)

The p-value is the probability of observing a difference as extreme as what was observed, assuming the _______ is true.

null hypothesis

Match the following terms with their definitions:

Null hypothesis = Indicates no difference or association Alternative hypothesis = Suggests there is a difference or association Z statistic = Measures distance from the null value P-value = Probability of observing a difference assuming null is true

Signup and view all the answers

Which of the following is NOT a type of categorical variable?

Continuous (A)

Signup and view all the answers

A box plot is best used for displaying normally distributed data.

False (B)

Signup and view all the answers

What is the purpose of using the interquartile range (IQR) in data analysis?

To measure dispersion and identify the middle 50% of values.

Signup and view all the answers

The _____ is the most common value in a data set.

mode

Signup and view all the answers

Which of the following describes a positive skew in a distribution?

Rises later and tails off to the right (A)

Signup and view all the answers

When data is normally distributed, the mean and median will be different.

False (B)

Signup and view all the answers

What are the two measures commonly used to describe central tendency?

Mean and median.

Signup and view all the answers

In a graph representing normal distribution, approximately _____% of observations lie within +/- 2 standard deviations of the mean.

95

Signup and view all the answers

Which of the following best describes a bimodal distribution?

Two peaks (C)

Signup and view all the answers

What is a point estimate?

A best guess of the population parameter derived from a sample (D)

Signup and view all the answers

Sampling distribution captures the fixed nature of population parameters.

False (B)

Signup and view all the answers

What does standard error indicate?

The typical deviation of a sample statistic from the actual population parameter.

Signup and view all the answers

A confidence interval provides a range of values which we are ____% confident contains the true population value.

95

Signup and view all the answers

Match the terms with their definitions:

Bias = Systematic difference from the true population value Random error = Variability due to random sampling Standard error = Measure of accuracy of a point estimate Confidence interval = Range of values estimating population parameter

Signup and view all the answers

What does a p-value represent in hypothesis testing?

The probability of observing extreme data if the null hypothesis is true (A)

Signup and view all the answers

A wider confidence interval indicates higher precision in estimating the population parameter.

False (B)

Signup and view all the answers

What is the primary cause of bias in estimates?

Systematic components such as selection biases.

Signup and view all the answers

Which of the following statements regarding the Central Limit Theorem (CLT) is true?

The distribution of sample means will be normal if the sample size is large enough. (B)

Signup and view all the answers

A sufficiently large sample size can result in a normal distribution from a skewed parent population.

True (A)

Signup and view all the answers

What formula is used to calculate the standard error (SE)?

SE = s / √n

Signup and view all the answers

The z-value used for a hypothesis test at a 95% confidence level is ____.

1.96

Signup and view all the answers

What does a p-value inform us about in hypothesis testing?

The strength of evidence against the null hypothesis (C)

Signup and view all the answers

If the 95% confidence interval does not contain the null value, the p-value is greater than 0.05.

False (B)

Signup and view all the answers

What is the formula for the standard deviation when estimating a proportion?

√(π(1-π)/n)

Signup and view all the answers

The t-distribution is useful when the __________ standard deviation is unknown.

population

Signup and view all the answers

Match the following statistical terms with their descriptions:

p-value = Strength of evidence against the null hypothesis Confidence Interval = Range of plausible values for a parameter Standard Error = Estimate of variability in sample means t-distribution = Distribution used when population standard deviation is unknown

Signup and view all the answers

What does a correlation coefficient (r) of 0 indicate?

No linear relationship (A)

Signup and view all the answers

Z-scores measure the distance of each observation from the median in units of standard deviation.

False (B)

Signup and view all the answers

What is the primary purpose of a scatter plot?

To see how two variables covary.

Signup and view all the answers

The prevalence of a disease is defined as the proportion of people in a population that has the disease at a particular point in time, calculated as number of people with the disease divided by _____.

total number at risk in the population

Signup and view all the answers

Which of the following is true about z-scores?

Z-scores can compare scores from normal distributions with different units. (D)

Signup and view all the answers

A positive skew indicates that the mean is less than the median.

False (B)

Signup and view all the answers

What does the area under the curve in a standard normal distribution represent?

Probability of observing z-scores of particular values.

Signup and view all the answers

The ____ rule states that about 95% of observations fall within two standard deviations of the mean.

68-95-99.7

Signup and view all the answers

Which statement about log transformation is correct?

It reduces positive skew and simplifies analysis. (D)

Signup and view all the answers

Conditional distribution can be represented as either row or column percentages in a contingency table.

True (A)

Signup and view all the answers

What is the geometric mean used for?

To measure central tendency for positively skewed data.

Signup and view all the answers

The cumulative incidence is calculated as the number of new cases of a disease divided by the number of people initially _____.

disease-free

Signup and view all the answers

What is indicated by an r value of -1?

Perfect negative linear relationship (D)

Signup and view all the answers

Case control studies focus on groups based on whether they have the outcome of interest.

True (A)

Signup and view all the answers

Flashcards

Data Variable Types

Data variables can be numerical (discrete or continuous) or categorical (ordinal or nominal). Binary/dichotomous variables have two categories.

Derived Variables

Calculated from other variables using thresholds or cutoffs.