Podcast
Questions and Answers
Which correlation method is appropriate when the assumptions of Pearson’s Correlation are not satisfied?
Which correlation method is appropriate when the assumptions of Pearson’s Correlation are not satisfied?
- Simple linear regression
- Spearman's rho (correct)
- Multiple correlation
- Polynomial regression
What is the minimum recommended sample size for conducting Pearson’s correlation effectively?
What is the minimum recommended sample size for conducting Pearson’s correlation effectively?
- 100
- 30 (correct)
- 15
- 50
Which non-parametric correlation method is suggested for small sample sizes?
Which non-parametric correlation method is suggested for small sample sizes?
- Pearson's correlation
- Spearman's rho
- Kendall's tau (correct)
- Point-biserial correlation
Which of the following is NOT an assumption of Pearson’s Correlation?
Which of the following is NOT an assumption of Pearson’s Correlation?
What technique can be applied to convert non-normally distributed data for Pearson’s correlation?
What technique can be applied to convert non-normally distributed data for Pearson’s correlation?
What does the Bayesian view define probability as?
What does the Bayesian view define probability as?
Which of the following is NOT a requirement for Bayesian analysis?
Which of the following is NOT a requirement for Bayesian analysis?
Which statement best captures a disadvantage of the Bayesian view?
Which statement best captures a disadvantage of the Bayesian view?
How can subjective probability be operationalized according to the Bayesian view?
How can subjective probability be operationalized according to the Bayesian view?
In the example provided regarding rain probability, what would indicate a favorable bet?
In the example provided regarding rain probability, what would indicate a favorable bet?
What is a characteristic of elementary events in probability?
What is a characteristic of elementary events in probability?
Which option best describes a common misconception about the Bayesian view of probability?
Which option best describes a common misconception about the Bayesian view of probability?
What is a potential advantage of the Bayesian approach to probability?
What is a potential advantage of the Bayesian approach to probability?
What defines an elementary event when throwing a die?
What defines an elementary event when throwing a die?
Which event is classified as non-elementary when throwing a die?
Which event is classified as non-elementary when throwing a die?
In the context of binomial distribution, what does 'N' represent?
In the context of binomial distribution, what does 'N' represent?
In a binomial distribution setup, what does the variable 'X' represent?
In a binomial distribution setup, what does the variable 'X' represent?
Given θ = 0.167 and N = 20, what is the type of probability being calculated?
Given θ = 0.167 and N = 20, what is the type of probability being calculated?
In the equation 'Data = Model + Error', what does the term 'Model' represent?
In the equation 'Data = Model + Error', what does the term 'Model' represent?
What key difference exists between comparison and prediction in data analysis?
What key difference exists between comparison and prediction in data analysis?
What does the term 'random variable' signify in the context of a binomial experiment?
What does the term 'random variable' signify in the context of a binomial experiment?
What is the primary focus of frequentists in probability?
What is the primary focus of frequentists in probability?
Which of the following is NOT a requirement for frequentist methods?
Which of the following is NOT a requirement for frequentist methods?
What disadvantage is associated with the frequentist view of probability?
What disadvantage is associated with the frequentist view of probability?
How do frequentists and Bayesians primarily differ in their approach to probability?
How do frequentists and Bayesians primarily differ in their approach to probability?
What could be deemed an advantage of the frequentist approach to probability?
What could be deemed an advantage of the frequentist approach to probability?
Which of the following statements describes a major limitation of frequentist probability?
Which of the following statements describes a major limitation of frequentist probability?
Which analogy illustrates the difference between probability and statistics?
Which analogy illustrates the difference between probability and statistics?
What does the frequentist perspective on probability NOT account for?
What does the frequentist perspective on probability NOT account for?
What does the 'd' form in probability distributions signify?
What does the 'd' form in probability distributions signify?
In the context of the binomial distribution, what does 'size' represent?
In the context of the binomial distribution, what does 'size' represent?
Which statement is true regarding the characteristics of the normal distribution?
Which statement is true regarding the characteristics of the normal distribution?
What does the standard deviation control in a normal distribution?
What does the standard deviation control in a normal distribution?
How is the cumulative probability calculated in probability distributions?
How is the cumulative probability calculated in probability distributions?
In the notation for a normally distributed variable, which symbol represents the mean?
In the notation for a normally distributed variable, which symbol represents the mean?
What is indicated by the cumulative probability being equal to 0.5 in a normal distribution?
What is indicated by the cumulative probability being equal to 0.5 in a normal distribution?
What is the purpose of the 'r' form in probability distributions?
What is the purpose of the 'r' form in probability distributions?
What does a smaller standard deviation indicate about a data set?
What does a smaller standard deviation indicate about a data set?
How is a binomial distribution characterized?
How is a binomial distribution characterized?
What does the output of the cor.test() function provide?
What does the output of the cor.test() function provide?
What is indicated by a p-value greater than 0.05 in a correlation test?
What is indicated by a p-value greater than 0.05 in a correlation test?
What happens to the normal distribution when the standard deviation increases?
What happens to the normal distribution when the standard deviation increases?
What is implied when the confidence interval for a correlation coefficient includes zero?
What is implied when the confidence interval for a correlation coefficient includes zero?
If the t-statistic is far from the mean, what does that suggest?
If the t-statistic is far from the mean, what does that suggest?
When analyzing a dataset with a normal distribution, what effect does an increase in standard deviation have on data interpretation?
When analyzing a dataset with a normal distribution, what effect does an increase in standard deviation have on data interpretation?
Flashcards
Frequentist definition of probability
Frequentist definition of probability
Probability is defined as the long-run frequency of an event. For example, if we have a fair coin with 50% chance of landing on heads, we would expect half of the coin flips to land on heads in the long run.
Inferential Statistics
Inferential Statistics
A branch of statistics concerned with making inferences about a population based on a sample of data. It uses probabilities to quantify the uncertainty of these inferences.
Statistical inference
Statistical inference
Statistical inference aims to understand how representative our data is of the population it came from. It uses probabilities to quantify the uncertainty in our conclusions.
Bayesian statistics
Bayesian statistics
Signup and view all the flashcards
Statistics
Statistics
Signup and view all the flashcards
Probability
Probability
Signup and view all the flashcards
Frequentist view of probability
Frequentist view of probability
Signup and view all the flashcards
Bayesian view of probability
Bayesian view of probability
Signup and view all the flashcards
Prior Information
Prior Information
Signup and view all the flashcards
Data
Data
Signup and view all the flashcards
Model(s)
Model(s)
Signup and view all the flashcards
Design
Design
Signup and view all the flashcards
Elementary Events
Elementary Events
Signup and view all the flashcards
Flexibility of Bayesian Probability
Flexibility of Bayesian Probability
Signup and view all the flashcards
Subjectivity in Bayesian Probability
Subjectivity in Bayesian Probability
Signup and view all the flashcards
Non-Elementary Event
Non-Elementary Event
Signup and view all the flashcards
Sample Space
Sample Space
Signup and view all the flashcards
Binomial Distribution
Binomial Distribution
Signup and view all the flashcards
θ (Theta)
θ (Theta)
Signup and view all the flashcards
N
N
Signup and view all the flashcards
X (Random Variable)
X (Random Variable)
Signup and view all the flashcards
Data = Model + Error
Data = Model + Error
Signup and view all the flashcards
dbinom() function
dbinom() function
Signup and view all the flashcards
pbinom() function
pbinom() function
Signup and view all the flashcards
qbinom() function
qbinom() function
Signup and view all the flashcards
rbinom() function
rbinom() function
Signup and view all the flashcards
dnorm() function
dnorm() function
Signup and view all the flashcards
pnorm() function
pnorm() function
Signup and view all the flashcards
qnorm() function
qnorm() function
Signup and view all the flashcards
rnorm() function
rnorm() function
Signup and view all the flashcards
Normal Distribution
Normal Distribution
Signup and view all the flashcards
Pearson's Correlation
Pearson's Correlation
Signup and view all the flashcards
Standard Deviation
Standard Deviation
Signup and view all the flashcards
Spearman's Rho
Spearman's Rho
Signup and view all the flashcards
Normal Distribution with Small Standard Deviation
Normal Distribution with Small Standard Deviation
Signup and view all the flashcards
Kendall's Tau
Kendall's Tau
Signup and view all the flashcards
Normal Distribution with Large Standard Deviation
Normal Distribution with Large Standard Deviation
Signup and view all the flashcards
Sample Statistics
Sample Statistics
Signup and view all the flashcards
Population Parameters
Population Parameters
Signup and view all the flashcards
cor.test()
cor.test()
Signup and view all the flashcards
Correlation Coefficient
Correlation Coefficient
Signup and view all the flashcards
Correlation Test
Correlation Test
Signup and view all the flashcards
Study Notes
Statistics II - Exam Study Guide
- Basics of Probability: Probabilities form the basis for statistical inference. Inferential statistics are used to determine how representative data are of a population. Probability involves predicting outcomes, while statistics involves interpreting data to make inferences about the population.
Probability & Statistics
- Frequentists vs Bayesians: Frequentists define probability as long-run frequency. For example, a fair coin (50% probability of heads) is expected to land heads in half of the trials. Frequentists require data, models, and design for their analysis. Bayesians, on the other hand, view probabilities as degrees of belief held by a rational agent.
Advantages & Disadvantages
-
Frequentist view: Objective, unambiguous, and grounded in the physical world. However, infinite sequences don't exist, and it has a limited scope regarding the analysis of events.
-
Bayesian View: Assigns probabilities to events based on beliefs and assumptions of an intelligent agent. Can handle events that aren't easily quantified in the physical world. However, it's subjective, requiring careful specification of belief. The Bayesian view is often considered too broad.
Probability Distributions
- Elementary events: For a given observation, the outcome will be one and only one of these events.
- Example: In tossing a coin, "heads" and "tails" are elementary events.
Statistics
- Binomial Distribution: This distribution applies when an event happens or doesn't happen (e.g., 0 or 1). Success probability (e.g., the probability of a 'heads' outcome in a coin toss) and the number of observations (trials) are important parameters defining this distribution.
- Example: Calculating probability of getting a specific number of successes (like getting heads 4 times in 20 coin tosses).
Relationship Between Models & Data
- Data = Model + Error
- Statistical Inference compares models to data
Using Different Distributions in R
- Binomial Distribution (dbinom, pbinom, rbinom, qbinom): Used to calculate outcomes and probabilities in experiments of finite sizes (e.g., number of heads in a series of coin flips)
- Example: calculating the probability of getting 4 heads in 10 coin flips.
- Normal Distribution (dnorm, pnorm, rnorm, qnorm): Used to calculate outcomes and probabilities when dealing with continuous data or distributions approximated by it.
Characteristics of Normal Distribution
- The area under the normal distribution curve is equal to 1.
- The mean, mode, and median are all equal in a normal distribution.
- The curve is symmetric around the mean (μ).
- Standard deviation (σ) controls the spread, which tells us if the data is closely clustered around the mean, or spread out.
Binomial vs Normal
- Binomial: Discrete (countable) plot appearance.
- Normal: Continuous (uncountable) smooth curve distribution
Functions in R for Correlation
cor()
,cor.test()
, andrcorr()
for calculating Pearson and Spearman correlations.
Sample Statistics and Population Parameters
- Population parameter: Describes the characteristic of the whole population.
- Sample statistic: Describes the characteristic of a smaller group (subset) taken from the population.
Running & Interpreting R Output
- Linear Regression: A method to find the relationship between variables. Results (output) often include estimates, standard errors, p-values, and R-squared values.
Hypotheses & Research Questions
- Hypothesis: A statement about the relationship between variables. Example: There is a relationship between the amount of exercise and overall health. Hypotheses testing attempts to rule out chance as a plausible explanation for results.
Effect Sizes
- Cohen's d: Effect size measure focusing on mean differences in terms of the standard deviation, mostly tested with student's t-tests or z-tests.
Sampling Theory
- Population: A comprehensive set of units to which findings are generalized.
- Sample: A subset of the population from which inferences about the population are drawn.
- Sampling distribution: A probability distribution of a statistic calculated to determine the distribution of outcomes for a given statistic in a population.
Correlation & Covariance
- Correlation: Measures the extent to which two variables are related.
- Covariance: Indicates how much two variables change together. Positive covariance signifies that they generally change in the same direction, while negative covariance means they change in opposite directions.
Partial and Semi-Partial Correlation
- Partial correlation: Measures the relationship between two variables while controlling the effect of other variables.
- Semi-partial correlation: Controls for a variable's effect on only one outcome variable (either X or Y).
Regression/Test Statistics
- Regression line: Straight line depicting the mathematical relationship between variables.
Ordinary Least Square
- Method of calculating regression line minimizing the difference between observed and predicted data.
Testing the Model (ANOVA)
- ANOVA (Analysis of variance): Test showing whether the variation of the model explains the variability in data better than the variation from the mean.
Mean Squared Error (MSE)
- MSE: Represents the variability in a distribution, often estimated using a model.
- MSE can be divided into Sum of squares of the model (SSM) and Sum of squares of the residuals (SSR).
- The proportion of variance explained by a model is often quantified as R-squared (R²).
Standard Error
- Standard error: A measure of the variability of a statistic.
Null and Alternative hypotheses
- Null hypothesis(Hâ‚€): A claim of "no difference" in a population.
- Alternative hypothesis(Ha): Contends that the null hypothesis is false.
Types of Sampling Methods
- Random Sampling: Every member has an equal chance of being selected.
- Stratified Sampling: The population is divided into subgroups (strata), and random samples are taken from each subgroup.
- Volunteer Sampling: Participants self-select to participate in a study.
- Opportunity Sampling: Choosing participants who are readily available.
- Convenience Sampling: Choosing participants that are convenient for the researcher.
- Snowball Sampling: Participants recruit other potential participants (useful for hard-to-reach populations).
Confidence Intervals
- Confidence intervals: Provide a range of plausible values for a population parameter (e.g., mean, proportion), based on a random sample. Example: There's a 95% chance that the average IQ is between 89 and 111.
Central Limit Theorem
- Central Limit Theorem: Shows that as sample sizes grow, a distribution of sample means gets closer and closer to a normal distribution.
Type I and Type II Errors
- Type I error: Rejecting a true null hypothesis (false positive)
- Type II error: Failing to reject a false null hypothesis (false negative)
Interaction Terms in Multiple Regression
- Interaction: When the relationship between two predictors differs depending on the level of a third variable. Example: The effect of one ingredient in a cake differs depending on the amount of another ingredient added to it.
Categorical Variable Coding
- Dummy coding: Used to represent categorical variables in an analysis.
- Unweighted effect coding: Assign values based in the set of groups and means.
- Weighted effect coding: An approach to coding categorical variables where the values are assigned based on a weight.
- Contrast coding: Useful in situations where the researcher has pre-existing hypotheses about the interactions of the variables.
Growth Models with Polynomials
- Example: Determining functional relationship between weight and time. This is done using interaction terms when examining more than one predictor.
Running & Interpreting R output
- Polynomial Regression: A technique for modelling curvilinear relationships.
Multiple Linear Regression
- Method to model the relationship between a dependent variable and two or more independent variables.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.