Statistics 2

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

What conclusion can be drawn if the p-value is greater than 0.05 in a correlation test?

  • The correlation is significant, which means that there is a strong relationship between the two variables.
  • The correlation indicates causation, where one variable directly influences the other.
  • The correlation is insignificant, which means that there is no statistically significant relationship between the two variables. (correct)
  • The correlation suggests an inverse relationship, where one variable decreases as the other increases.

Establishing a correlation between two variables automatically implies that one variable causes the other.

False (B)

If variables A and B are significantly correlated, describe two possible relationships between them, besides A causing B.

B causes A, or a third variable C influences both A and B.

The functions cor() and cor.test() are used to calculate the r coefficient and the ______ from it.

<p>p-value</p> Signup and view all the answers

Match each scenario with the most appropriate conclusion regarding correlation and causation:

<p>Height correlates with weight (r=0.85, p-value=0.00001) = Height relates to weight. Ice cream sales correlate with crime rates = A third variable, like hot weather, may influence both. Shoe size correlates with reading ability in children = Age is likely a confounding variable.</p> Signup and view all the answers

In a study examining the effect of a new drug on blood pressure, which variable would be considered the outcome variable?

<p>The blood pressure readings of the participants. (D)</p> Signup and view all the answers

Statistical analysis can definitively prove that the predictor variable causes the outcome variable.

<p>False (B)</p> Signup and view all the answers

Explain the primary difference between a continuous variable and a categorical variable, providing an example of each.

<p>A continuous variable can take on many ordered values (e.g., temperature), while a categorical variable takes on a limited number of unordered values (e.g., eye color).</p> Signup and view all the answers

A study examining the relationship between smoking status and lung cancer would classify 'smoking status' as the ______ variable.

<p>predictor</p> Signup and view all the answers

Match the following research scenarios with the appropriate variable types for the predictor and outcome variables:

<p>Effect of exercise frequency on weight loss. = Predictor: Categorical (exercise days per week), Outcome: Continuous (weight in kilograms) Relationship between education level and income bracket. = Predictor: Categorical (highest degree obtained), Outcome: Categorical (income bracket) Impact of study time on exam scores. = Predictor: Continuous (hours spent studying), Outcome: Continuous (exam score) Influence of treatment type on disease remission. = Predictor: Categorical (type of treatment), Outcome: Categorical (disease remission status)</p> Signup and view all the answers

Which of the following scenarios involves a categorical predictor variable and a continuous outcome variable?

<p>Comparing the average test scores of students who attended different tutoring programs. (C)</p> Signup and view all the answers

In a study examining the effects of various fertilizer types on plant height, which variable would be plotted on the Y axis?

<p>Plant Height (C)</p> Signup and view all the answers

Define what a hypothesis is in the context of inferential statistics.

<p>In inferential statistics, a hypothesis is a proposed explanation for a phenomenon, serving as a starting point for further investigation.</p> Signup and view all the answers

Why is it often impractical to poll an entire population to measure a parameter in statistics?

<p>Because it's impractical, too expensive, or destructive. (A)</p> Signup and view all the answers

In statistics, a 'parameter' refers to a property of a sample, while a 'statistic' refers to a property of a population.

<p>False (B)</p> Signup and view all the answers

What are the two measures of dispersion discussed for non-normal distributions?

<p>interquartile range (IQR)</p> Signup and view all the answers

The measure of centrality for a normal distribution is the _______.

<p>mean</p> Signup and view all the answers

Match the term with its description:

<p>Population = A group of people or objects whose properties are of interest Parameter = A property of a population Normal Distribution = A distribution where noise or error often takes form Interquartile Range (IQR) = Measure of dispersion for non-normal distributions</p> Signup and view all the answers

Which of the following is a measure of dispersion used for normal distributions?

<p>Standard Deviation (D)</p> Signup and view all the answers

Which is a reason why descriptive statistics are important?

<p>They provide the bases of descriptive statistics (D)</p> Signup and view all the answers

What measure of distribution do you use when you have a normal distribution?

<p>standard deviation</p> Signup and view all the answers

Why is it often impractical to measure a parameter across an entire population?

<p>Measuring the parameter might destroy the elements of the population. (A)</p> Signup and view all the answers

A statistic is a parameter measured within a sample, used to estimate the same parameter for the entire population.

<p>True (A)</p> Signup and view all the answers

What characteristic must a sample possess to accurately represent the population?

<p>unbiased</p> Signup and view all the answers

In sampling, the choice of sample must have nothing to do with what you want to ______.

<p>measure</p> Signup and view all the answers

A researcher surveys people about phone ownership by asking individuals as they leave an electronics store. Why might this approach result in bad sampling?

<p>People exiting electronics stores are more likely to own phones. (C)</p> Signup and view all the answers

Under what conditions is the Standard Error of the Mean (SEM) appropriately used?

<p>When combining multiple samples of the population each with its own standard deviation. (D)</p> Signup and view all the answers

Inferential statistics is used to determine causation between datasets.

<p>False (B)</p> Signup and view all the answers

Match the term with the correct definition:

<p>Population = Entire group being studied Sample = Subset of the population Parameter = Characteristic of the population Statistic = Characteristic of the sample</p> Signup and view all the answers

What is the primary reason for accepting a causation relationship between two variables?

<p>The presence of a logical explanation for the relationship. (D)</p> Signup and view all the answers

If variable A correlates with variable B, it automatically implies that variable A causes variable B.

<p>False (B)</p> Signup and view all the answers

In the example provided, what two variables are shown to have a high correlation?

<p>blood pressure and age</p> Signup and view all the answers

The document emphasizes that a notable error in drawing conclusions is assuming causation based solely on _________.

<p>correlation</p> Signup and view all the answers

What pitfall is highlighted when the axes of the blood pressure and age graph are switched?

<p>It demonstrates that simply placing a parameter on the X-axis does not make it the cause of the Y-axis variable. (D)</p> Signup and view all the answers

A p-value of 0.00003 always proves a causal relationship between two variables.

<p>False (B)</p> Signup and view all the answers

Provide an alternate explanation for the correlation between age and blood pressure besides age directly causing blood pressure to rise.

<p>Other factors related to aging, such as lifestyle changes or decreased organ function, could contribute to increased blood pressure.</p> Signup and view all the answers

Which of the following statements best describes the relationship between correlation and causation?

<p>Causation always implies correlation. (B)</p> Signup and view all the answers

What does a Pearson's r value of 0.17, with a p-value of 0.11, indicate about the correlation between time and world mean temperatures from 1850 to 1940?

<p>No significant correlation, insufficient evidence to support warming or cooling. (B)</p> Signup and view all the answers

A Spearman Rank Correlation is only suitable for linear correlations.

<p>False (B)</p> Signup and view all the answers

What conclusion was drawn regarding world temperatures between 1940 and the present, based on the provided data?

<p>very significant warming</p> Signup and view all the answers

The correlation coefficient, denoted as r, ranges between -1 and ______.

<p>1</p> Signup and view all the answers

Match the correlation coefficient (r) values with their corresponding interpretations:

<p>r = 0.9 = Strong positive correlation r = -0.8 = Strong negative correlation r = 0.1 = Weak positive correlation r = -0.2 = Weak negative correlation</p> Signup and view all the answers

When is it more appropriate to use the Spearman Rank Correlation coefficient instead of Pearson's r?

<p>When you suspect there might be a non-linear relationship between the variables. (C)</p> Signup and view all the answers

If the p-value is less than 0.00001 what can be inferred?

<p>There is a low probability that the null hypothesis is true, and there is a statistically significant correlation. (D)</p> Signup and view all the answers

What does 'H1 rejected' mean?

<p>Alternative hypothesis rejected – there is no observed effect. (D)</p> Signup and view all the answers

Flashcards

Population in statistics

The entire group of people or objects of interest in a study.

Parameter

A characteristic or property of a population, such as mean income or proportion.

Sampling

The process of selecting a subset of individuals from a population to estimate characteristics of the whole.

Descriptive statistics

Methods for summarizing and organizing data, including measures of centrality and dispersion.

Signup and view all the flashcards

Inferential statistics

Techniques that allow us to infer or generalize properties of a population based on a sample.

Signup and view all the flashcards

Central tendency

A measure that represents the center or typical value of a dataset (mean, median).

Signup and view all the flashcards

Standard deviation (SD)

A measure of the amount of variation or dispersion in a set of values.

Signup and view all the flashcards

Interquartile range (IQR)

The range between the first (Q1) and third quartiles (Q3) that represents the middle 50% of the data.

Signup and view all the flashcards

Population

The entire group you want to study.

Signup and view all the flashcards

Statistic

A measurement derived from a sample of the population.

Signup and view all the flashcards

Representative Sample

A sample that accurately reflects the population's characteristics.

Signup and view all the flashcards

Standard Error of the Mean (SEM)

A measure of how much sample means vary from the population mean, calculated by σ/√n.

Signup and view all the flashcards

Standard Deviation (STDEV)

A measure of the amount of variation in a set of values.

Signup and view all the flashcards

Bias in Sampling

When the sample does not accurately represent the population due to poor selection methods.

Signup and view all the flashcards

Positive Correlation

A relationship where as one variable increases, the other also increases.

Signup and view all the flashcards

Null Hypothesis (H0)

A hypothesis stating there is no relationship between variables.

Signup and view all the flashcards

Correlation

A relationship between two variables where they tend to change together.

Signup and view all the flashcards

R Value

A statistical measure that indicates the strength of a correlation.

Signup and view all the flashcards

Causation

When one variable directly affects another, causing a change.

Signup and view all the flashcards

p-value

A measure that indicates the probability that the observed results occurred by chance.

Signup and view all the flashcards

Delta T

The change in temperature compared to a base year or average.

Signup and view all the flashcards

Blood Pressure and Age Relationship

Blood pressure increases with age, indicating a strong correlation.

Signup and view all the flashcards

Logical Explanation

A sound reasoning that connects a cause to its effect.

Signup and view all the flashcards

Spearman Rank Correlation Coefficient

A non-parametric measure of correlation that assesses how well the relationship between two variables can be described by a monotonic function.

Signup and view all the flashcards

Pearson’s r

A coefficient that measures linear correlation between two continuous variables.

Signup and view all the flashcards

H1 Hypothesis

A statement that suggests a possible relationship between two variables.

Signup and view all the flashcards

Significant Warming

A statistically significant increase in average temperatures over a designated time period.

Signup and view all the flashcards

Common Misconception

Assuming that correlation implies causation, which is incorrect.

Signup and view all the flashcards

Predictor Variable

The independent variable that may cause an effect in a study.

Signup and view all the flashcards

Outcome Variable

The dependent variable that shows the effect in a study.

Signup and view all the flashcards

Continuous Variable

A variable that can take many values and can be ordered, like age or temperature.

Signup and view all the flashcards

Categorical Variable

A variable that represents categories with limited values and no specific order, like gender or country.

Signup and view all the flashcards

Axes in Graphs

Predictor variables are typically plotted on the X-axis, and outcome variables on the Y-axis.

Signup and view all the flashcards

Hypotheses in Statistics

Answers to the main question in inferential statistics, indicating whether there is a signal or not.

Signup and view all the flashcards

Dependent Variable

Another name for the outcome variable, which measures the effect of the predictor.

Signup and view all the flashcards

Independent Variable

Another name for the predictor variable, which is manipulated to observe its effect.

Signup and view all the flashcards

Correlation Coefficient

A number (r) that indicates the strength and direction of a relationship between two variables.

Signup and view all the flashcards

Correlation vs. Causation

Correlation does not imply that one variable causes changes in another; other factors may be involved.

Signup and view all the flashcards

Statistical Hypothesis

A statement or assumption about a population parameter that can be tested with data.

Signup and view all the flashcards

Study Notes

Obesity Levels in England

  • Obesity levels have risen among Year 6 children in England.
  • Data shows a gradual upward trend in obesity rates among Year 6 children in England between 2006/07 and 2021/22.

Descriptive Statistics

  • In the previous lecture, the foundations of descriptive statistics were covered.

Statistical Noise

  • Real-world measurements are affected by noise or error.
  • A graph of mean UK temperature from Jan 1 to Dec 31, 2023 displays daily fluctuations.

Drug Efficacy

  • A bar graph displays the effectiveness of a treatment.
  • The untreated group has a recovery rate of about 30%.
  • The treated group has an efficacy rate around 70%.

Normal Distribution

  • Statistical noise often follows a normal distribution.
  • A normal distribution graph shows a bell curve shape.
  • Data points are clustered around the mean, with fewer data points the further from the mean value.

Measures of Centrality & Dispersion

  • The normal distribution shows a measure of centrality (i.e., the mean) and dispersion around it (i.e., the standard deviation)
  • A mean (average), median (middle), and standard deviation (shows how much dispersed the data is) were discussed.

Z parameter

  • The Z parameter is used to determine probabilities in a normal distribution.
  • The Empirical Rule in a normal distribution was given in a chart showing percentages of data within +/- 1, 2, and 3 standard deviations.

Non-normal Distributions

  • Non-normal distributions exist in real-world scenarios.
  • A histogram displays the distribution of equivalised household disposable income.

Median and Interquartile Range

  • The median is the measure of centrality.
  • The interquartile range is the measure of dispersion for non-normal data.

Population vs Sample

  • Descriptive statistics include definitions of population and sample.
  • A population is all the people (or objects) you are interested in.
  • A sample is a smaller collection of a population that measures population properties.

Sampling

  • Populations are often too big or complex to measure.
  • To avoid measuring the whole population, sampling is used.
  • Samples are chosen to represent a whole population.

Standard Error of the Mean

  • A standard error of the mean (SEM) is a reduced standard deviation.
  • SEM is calculated if you want to measure standard deviations of multiple samples of the same population.

Inferential Statistics

  • Inferential statistics uses data from samples to make conclusions about larger populations.
  • In inferential statistics, decisions about signal vs noise are determined.

Variables

  • All statistical analyses involve predictor and outcome variables.
  • A predictor variable is believed to cause an effect.
  • An outcome variable is believed to show an effect.

Variables: Categorical & Continuous

  • Variables can be categorical or continuous.
  • Categorical variables use labels without order (e.g., sex, country).
  • Continuous variables use ordered numbers (e.g., blood pressure, age).

Hypotheses

  • Hypotheses are important aspects of all inferential statistics.
  • The null hypothesis (H₀) posits no significant effect (just noise).
  • The alternative hypothesis (H₁) indicates a significant effect (i.e., a signal).

P-value

  • The p-value is a probability used in statistical tests.
  • It shows the probability the effect is merely random chance.
  • A low p-value suggests a statistically significant effect.

P-value Significance

  • A p-value of less than 0.05 (5%) usually means a significant result.

Inferential Statistics: Cases

  • Statistical tests are categorized into cases based on variables.

Correlation

  • This is the case when predictor and outcome variables are both continuous.
  • Correlation shows a relationship between variables.
  • Correlation can be positive, negative, or nonexistent.

Pearson's Correlation Coefficient

  • The Pearson's correlation coefficient (r) quantifies the strength and direction of a linear correlation.
  • The value of r is from -1 to +1.
  • Values closer to -1 or +1 are for strong correlations.

Correlation is not Causation

  • Correlation tells if variables are linked but not if one causes the other.
  • A third variable may affect both variables.

Spearman Rank Correlation Coefficient

  • The Spearman Rank Correlation Coefficient is used when the relationship isn't linear.

Correlation Testing

  • This session's activities include testing an example using data to discover if there was a warming trend (i.e., a correlation between time and temperature).

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Statistics Lecture Notes PDF

More Like This

Gaussian Curve and Normal Distribution
25 questions
Biostatistics: Normal Distribution Quiz
8 questions
Use Quizgecko on...
Browser
Browser