Podcast
Questions and Answers
What conclusion can be drawn if the p-value is greater than 0.05 in a correlation test?
What conclusion can be drawn if the p-value is greater than 0.05 in a correlation test?
- The correlation is significant, which means that there is a strong relationship between the two variables.
- The correlation indicates causation, where one variable directly influences the other.
- The correlation is insignificant, which means that there is no statistically significant relationship between the two variables. (correct)
- The correlation suggests an inverse relationship, where one variable decreases as the other increases.
Establishing a correlation between two variables automatically implies that one variable causes the other.
Establishing a correlation between two variables automatically implies that one variable causes the other.
False (B)
If variables A and B are significantly correlated, describe two possible relationships between them, besides A causing B.
If variables A and B are significantly correlated, describe two possible relationships between them, besides A causing B.
B causes A, or a third variable C influences both A and B.
The functions cor()
and cor.test()
are used to calculate the r coefficient and the ______ from it.
The functions cor()
and cor.test()
are used to calculate the r coefficient and the ______ from it.
Match each scenario with the most appropriate conclusion regarding correlation and causation:
Match each scenario with the most appropriate conclusion regarding correlation and causation:
In a study examining the effect of a new drug on blood pressure, which variable would be considered the outcome variable?
In a study examining the effect of a new drug on blood pressure, which variable would be considered the outcome variable?
Statistical analysis can definitively prove that the predictor variable causes the outcome variable.
Statistical analysis can definitively prove that the predictor variable causes the outcome variable.
Explain the primary difference between a continuous variable and a categorical variable, providing an example of each.
Explain the primary difference between a continuous variable and a categorical variable, providing an example of each.
A study examining the relationship between smoking status and lung cancer would classify 'smoking status' as the ______ variable.
A study examining the relationship between smoking status and lung cancer would classify 'smoking status' as the ______ variable.
Match the following research scenarios with the appropriate variable types for the predictor and outcome variables:
Match the following research scenarios with the appropriate variable types for the predictor and outcome variables:
Which of the following scenarios involves a categorical predictor variable and a continuous outcome variable?
Which of the following scenarios involves a categorical predictor variable and a continuous outcome variable?
In a study examining the effects of various fertilizer types on plant height, which variable would be plotted on the Y axis?
In a study examining the effects of various fertilizer types on plant height, which variable would be plotted on the Y axis?
Define what a hypothesis is in the context of inferential statistics.
Define what a hypothesis is in the context of inferential statistics.
Why is it often impractical to poll an entire population to measure a parameter in statistics?
Why is it often impractical to poll an entire population to measure a parameter in statistics?
In statistics, a 'parameter' refers to a property of a sample, while a 'statistic' refers to a property of a population.
In statistics, a 'parameter' refers to a property of a sample, while a 'statistic' refers to a property of a population.
What are the two measures of dispersion discussed for non-normal distributions?
What are the two measures of dispersion discussed for non-normal distributions?
The measure of centrality for a normal distribution is the _______.
The measure of centrality for a normal distribution is the _______.
Match the term with its description:
Match the term with its description:
Which of the following is a measure of dispersion used for normal distributions?
Which of the following is a measure of dispersion used for normal distributions?
Which is a reason why descriptive statistics are important?
Which is a reason why descriptive statistics are important?
What measure of distribution do you use when you have a normal distribution?
What measure of distribution do you use when you have a normal distribution?
Why is it often impractical to measure a parameter across an entire population?
Why is it often impractical to measure a parameter across an entire population?
A statistic is a parameter measured within a sample, used to estimate the same parameter for the entire population.
A statistic is a parameter measured within a sample, used to estimate the same parameter for the entire population.
What characteristic must a sample possess to accurately represent the population?
What characteristic must a sample possess to accurately represent the population?
In sampling, the choice of sample must have nothing to do with what you want to ______.
In sampling, the choice of sample must have nothing to do with what you want to ______.
A researcher surveys people about phone ownership by asking individuals as they leave an electronics store. Why might this approach result in bad sampling?
A researcher surveys people about phone ownership by asking individuals as they leave an electronics store. Why might this approach result in bad sampling?
Under what conditions is the Standard Error of the Mean (SEM) appropriately used?
Under what conditions is the Standard Error of the Mean (SEM) appropriately used?
Inferential statistics is used to determine causation between datasets.
Inferential statistics is used to determine causation between datasets.
Match the term with the correct definition:
Match the term with the correct definition:
What is the primary reason for accepting a causation relationship between two variables?
What is the primary reason for accepting a causation relationship between two variables?
If variable A correlates with variable B, it automatically implies that variable A causes variable B.
If variable A correlates with variable B, it automatically implies that variable A causes variable B.
In the example provided, what two variables are shown to have a high correlation?
In the example provided, what two variables are shown to have a high correlation?
The document emphasizes that a notable error in drawing conclusions is assuming causation based solely on _________.
The document emphasizes that a notable error in drawing conclusions is assuming causation based solely on _________.
What pitfall is highlighted when the axes of the blood pressure and age graph are switched?
What pitfall is highlighted when the axes of the blood pressure and age graph are switched?
A p-value of 0.00003 always proves a causal relationship between two variables.
A p-value of 0.00003 always proves a causal relationship between two variables.
Provide an alternate explanation for the correlation between age and blood pressure besides age directly causing blood pressure to rise.
Provide an alternate explanation for the correlation between age and blood pressure besides age directly causing blood pressure to rise.
Which of the following statements best describes the relationship between correlation and causation?
Which of the following statements best describes the relationship between correlation and causation?
What does a Pearson's r value of 0.17, with a p-value of 0.11, indicate about the correlation between time and world mean temperatures from 1850 to 1940?
What does a Pearson's r value of 0.17, with a p-value of 0.11, indicate about the correlation between time and world mean temperatures from 1850 to 1940?
A Spearman Rank Correlation is only suitable for linear correlations.
A Spearman Rank Correlation is only suitable for linear correlations.
What conclusion was drawn regarding world temperatures between 1940 and the present, based on the provided data?
What conclusion was drawn regarding world temperatures between 1940 and the present, based on the provided data?
The correlation coefficient, denoted as r, ranges between -1 and ______.
The correlation coefficient, denoted as r, ranges between -1 and ______.
Match the correlation coefficient (r) values with their corresponding interpretations:
Match the correlation coefficient (r) values with their corresponding interpretations:
When is it more appropriate to use the Spearman Rank Correlation coefficient instead of Pearson's r?
When is it more appropriate to use the Spearman Rank Correlation coefficient instead of Pearson's r?
If the p-value is less than 0.00001 what can be inferred?
If the p-value is less than 0.00001 what can be inferred?
What does 'H1 rejected' mean?
What does 'H1 rejected' mean?
Flashcards
Population in statistics
Population in statistics
The entire group of people or objects of interest in a study.
Parameter
Parameter
A characteristic or property of a population, such as mean income or proportion.
Sampling
Sampling
The process of selecting a subset of individuals from a population to estimate characteristics of the whole.
Descriptive statistics
Descriptive statistics
Signup and view all the flashcards
Inferential statistics
Inferential statistics
Signup and view all the flashcards
Central tendency
Central tendency
Signup and view all the flashcards
Standard deviation (SD)
Standard deviation (SD)
Signup and view all the flashcards
Interquartile range (IQR)
Interquartile range (IQR)
Signup and view all the flashcards
Population
Population
Signup and view all the flashcards
Statistic
Statistic
Signup and view all the flashcards
Representative Sample
Representative Sample
Signup and view all the flashcards
Standard Error of the Mean (SEM)
Standard Error of the Mean (SEM)
Signup and view all the flashcards
Standard Deviation (STDEV)
Standard Deviation (STDEV)
Signup and view all the flashcards
Bias in Sampling
Bias in Sampling
Signup and view all the flashcards
Positive Correlation
Positive Correlation
Signup and view all the flashcards
Null Hypothesis (H0)
Null Hypothesis (H0)
Signup and view all the flashcards
Correlation
Correlation
Signup and view all the flashcards
R Value
R Value
Signup and view all the flashcards
Causation
Causation
Signup and view all the flashcards
p-value
p-value
Signup and view all the flashcards
Delta T
Delta T
Signup and view all the flashcards
Blood Pressure and Age Relationship
Blood Pressure and Age Relationship
Signup and view all the flashcards
Logical Explanation
Logical Explanation
Signup and view all the flashcards
Spearman Rank Correlation Coefficient
Spearman Rank Correlation Coefficient
Signup and view all the flashcards
Pearson’s r
Pearson’s r
Signup and view all the flashcards
H1 Hypothesis
H1 Hypothesis
Signup and view all the flashcards
Significant Warming
Significant Warming
Signup and view all the flashcards
Common Misconception
Common Misconception
Signup and view all the flashcards
Predictor Variable
Predictor Variable
Signup and view all the flashcards
Outcome Variable
Outcome Variable
Signup and view all the flashcards
Continuous Variable
Continuous Variable
Signup and view all the flashcards
Categorical Variable
Categorical Variable
Signup and view all the flashcards
Axes in Graphs
Axes in Graphs
Signup and view all the flashcards
Hypotheses in Statistics
Hypotheses in Statistics
Signup and view all the flashcards
Dependent Variable
Dependent Variable
Signup and view all the flashcards
Independent Variable
Independent Variable
Signup and view all the flashcards
Correlation Coefficient
Correlation Coefficient
Signup and view all the flashcards
Correlation vs. Causation
Correlation vs. Causation
Signup and view all the flashcards
Statistical Hypothesis
Statistical Hypothesis
Signup and view all the flashcards
Study Notes
Obesity Levels in England
- Obesity levels have risen among Year 6 children in England.
- Data shows a gradual upward trend in obesity rates among Year 6 children in England between 2006/07 and 2021/22.
Descriptive Statistics
- In the previous lecture, the foundations of descriptive statistics were covered.
Statistical Noise
- Real-world measurements are affected by noise or error.
- A graph of mean UK temperature from Jan 1 to Dec 31, 2023 displays daily fluctuations.
Drug Efficacy
- A bar graph displays the effectiveness of a treatment.
- The untreated group has a recovery rate of about 30%.
- The treated group has an efficacy rate around 70%.
Normal Distribution
- Statistical noise often follows a normal distribution.
- A normal distribution graph shows a bell curve shape.
- Data points are clustered around the mean, with fewer data points the further from the mean value.
Measures of Centrality & Dispersion
- The normal distribution shows a measure of centrality (i.e., the mean) and dispersion around it (i.e., the standard deviation)
- A mean (average), median (middle), and standard deviation (shows how much dispersed the data is) were discussed.
Z parameter
- The Z parameter is used to determine probabilities in a normal distribution.
- The Empirical Rule in a normal distribution was given in a chart showing percentages of data within +/- 1, 2, and 3 standard deviations.
Non-normal Distributions
- Non-normal distributions exist in real-world scenarios.
- A histogram displays the distribution of equivalised household disposable income.
Median and Interquartile Range
- The median is the measure of centrality.
- The interquartile range is the measure of dispersion for non-normal data.
Population vs Sample
- Descriptive statistics include definitions of population and sample.
- A population is all the people (or objects) you are interested in.
- A sample is a smaller collection of a population that measures population properties.
Sampling
- Populations are often too big or complex to measure.
- To avoid measuring the whole population, sampling is used.
- Samples are chosen to represent a whole population.
Standard Error of the Mean
- A standard error of the mean (SEM) is a reduced standard deviation.
- SEM is calculated if you want to measure standard deviations of multiple samples of the same population.
Inferential Statistics
- Inferential statistics uses data from samples to make conclusions about larger populations.
- In inferential statistics, decisions about signal vs noise are determined.
Variables
- All statistical analyses involve predictor and outcome variables.
- A predictor variable is believed to cause an effect.
- An outcome variable is believed to show an effect.
Variables: Categorical & Continuous
- Variables can be categorical or continuous.
- Categorical variables use labels without order (e.g., sex, country).
- Continuous variables use ordered numbers (e.g., blood pressure, age).
Hypotheses
- Hypotheses are important aspects of all inferential statistics.
- The null hypothesis (H₀) posits no significant effect (just noise).
- The alternative hypothesis (H₁) indicates a significant effect (i.e., a signal).
P-value
- The p-value is a probability used in statistical tests.
- It shows the probability the effect is merely random chance.
- A low p-value suggests a statistically significant effect.
P-value Significance
- A p-value of less than 0.05 (5%) usually means a significant result.
Inferential Statistics: Cases
- Statistical tests are categorized into cases based on variables.
Correlation
- This is the case when predictor and outcome variables are both continuous.
- Correlation shows a relationship between variables.
- Correlation can be positive, negative, or nonexistent.
Pearson's Correlation Coefficient
- The Pearson's correlation coefficient (r) quantifies the strength and direction of a linear correlation.
- The value of r is from -1 to +1.
- Values closer to -1 or +1 are for strong correlations.
Correlation is not Causation
- Correlation tells if variables are linked but not if one causes the other.
- A third variable may affect both variables.
Spearman Rank Correlation Coefficient
- The Spearman Rank Correlation Coefficient is used when the relationship isn't linear.
Correlation Testing
- This session's activities include testing an example using data to discover if there was a warming trend (i.e., a correlation between time and temperature).
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.