Statistical Inference and Probability Distributions
70 Questions
3 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Statistical inference is the process of using sample data to make conclusions about the population.

True

A probability distribution describes the likelihood of different values of a random variable, taking into account the underlying probability distribution.

True

The standard normal distribution has a mean of 0 and a standard deviation of 1.

True

What does the 68-95-99 empirical rule state?

<p>The 68-95-99 empirical rule states that, in a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, 95% falls within two standard deviations, and 99.7% falls within three standard deviations.</p> Signup and view all the answers

A sample statistic is always unknown, while a population parameter is always known.

<p>False</p> Signup and view all the answers

What is sampling error?

<p>Sampling error is the difference between a sample statistic and the true population parameter. It arises due to the fact that we are only observing a portion of the population.</p> Signup and view all the answers

A confidence interval provides a range of values within which we are confident that the true population parameter lies.

<p>True</p> Signup and view all the answers

The standard error measures the variability of observations from the sample mean.

<p>False</p> Signup and view all the answers

The confidence level represents the probability of rejecting the null hypothesis when it is actually true.

<p>False</p> Signup and view all the answers

The risk level (alpha) is the opposite of the confidence level.

<p>True</p> Signup and view all the answers

What are the typical risk levels in statistical inference?

<p>5%, 1%, 0.1%</p> Signup and view all the answers

A hypothesis is a testable prediction about the state of the world.

<p>True</p> Signup and view all the answers

Which of the following statements represents a hypothesis?

<p>Dreaming duration for males is longer than that for females.</p> Signup and view all the answers

A directional hypothesis specifies the direction of the expected relationship between variables.

<p>True</p> Signup and view all the answers

A non-directional hypothesis, also known as an exploratory hypothesis, specifies the direction of the expected relationship between variables.

<p>False</p> Signup and view all the answers

The null hypothesis (H0) typically states that there is no effect or difference between the variables.

<p>True</p> Signup and view all the answers

What does it mean to “reject the null hypothesis” in statistical testing?

<p>Provide evidence against the null hypothesis.</p> Signup and view all the answers

A p-value represents the probability of obtaining the observed results or more extreme results, assuming the null hypothesis is true.

<p>True</p> Signup and view all the answers

If the p-value is less than the significance level, we reject the null hypothesis.

<p>True</p> Signup and view all the answers

What are the typical significance levels used in statistical testing?

<p>0.05, 0.01, 0.001</p> Signup and view all the answers

What is meant by statistical inference?

<p>Statistical inference is the process of using sample data to make conclusions about a larger population.</p> Signup and view all the answers

What are the two key steps in statistical inference?

<p>Derive estimates and test hypothesis</p> Signup and view all the answers

A probability distribution describes how likely different values of a random variable are.

<p>True</p> Signup and view all the answers

What is a test statistic?

<p>A test statistic is a summary of sample data used to test a hypothesis about a population parameter.</p> Signup and view all the answers

What is the difference between a sample statistic and a population parameter?

<p>A sample statistic summarizes data from a sample, while a population parameter describes a characteristic of the entire population.</p> Signup and view all the answers

How can we minimize sampling error?

<p>Sampling error can be minimized by increasing the sample size or by using more sophisticated sampling methods.</p> Signup and view all the answers

What is a confidence level?

<p>A confidence level represents the probability that a confidence interval will capture the true population parameter.</p> Signup and view all the answers

What is a hypothesis?

<p>A hypothesis is a testable prediction about the relationship between variables.</p> Signup and view all the answers

What are the two types of hypotheses?

<p>Null and alternative</p> Signup and view all the answers

What does NHST stand for?

<p>NHST stands for Null Hypothesis Significance Testing.</p> Signup and view all the answers

What is a p-value?

<p>A p-value is the probability of observing the data or more extreme data if the null hypothesis is true.</p> Signup and view all the answers

What is the relationship between the p-value and the significance level?

<p>The significance level is the threshold for rejecting the null hypothesis. If the p-value is less than the significance level, the null hypothesis is rejected.</p> Signup and view all the answers

What is statistical power?

<p>Statistical power is the probability of correctly rejecting the null hypothesis when it is false.</p> Signup and view all the answers

What are the four main factors that affect statistical power?

<p>Sample size, effect size, significance level, and methods</p> Signup and view all the answers

What is the difference between statistical significance and substantive significance?

<p>Statistical significance indicates whether a result is unlikely to occur by chance. Substantive significance refers to the practical importance or meaningfulness of a result.</p> Signup and view all the answers

What is effect size?

<p>Effect size is a standardized measure of the magnitude of an effect. It can be used to compare the strength of an effect across different studies.</p> Signup and view all the answers

What is a critical value?

<p>A critical value is a threshold value used in hypothesis testing to determine whether the null hypothesis should be rejected.</p> Signup and view all the answers

What is a Type I error?

<p>A Type I error occurs when we reject the null hypothesis when it is actually true.</p> Signup and view all the answers

What are the units of analysis in a dataset?

<p>Units of analysis represent the individuals or objects on which data is collected.</p> Signup and view all the answers

What are variables in a dataset?

<p>Variables are characteristics or attributes that are measured or observed in a dataset.</p> Signup and view all the answers

What are values in dataset?

<p>Values are the specific data points or observations that represent the different levels of a variable.</p> Signup and view all the answers

Explain the difference between variables and Constructs.

<p>Variables are directly measurable characteristics, while constructs are complex theoretical concepts that are not directly observable but can be measured indirectly through multiple indicators.</p> Signup and view all the answers

What is measurement error?

<p>Measurement error is the difference between an observed score and the true score on a given variable.</p> Signup and view all the answers

What is meant by validity in measurement?

<p>Validity refers to the extent to which a measurement tool actually measures what it is intended to measure.</p> Signup and view all the answers

What is meant by reliability in measurement?

<p>Reliability refers to the consistency or stability of a measurement over time or across different administrations.</p> Signup and view all the answers

What are the four main levels of measurement?

<p>The four main levels of measurement are nominal, ordinal, interval, and ratio.</p> Signup and view all the answers

What is a nominal scale?

<p>A nominal scale is the weakest level of measurement. It uses numbers as labels to categorize data, but these numbers do not have any inherent order or mathematical value.</p> Signup and view all the answers

What is an interval scale?

<p>An interval scale provides equal intervals between its measures, but it doesn’t have a true zero point.</p> Signup and view all the answers

What is a ratio scale?

<p>A ratio scale is the strongest level of measurement. It has equal intervals between its measures and a true zero point.</p> Signup and view all the answers

Explain the difference between a bar chart and a histogram.

<p>Both bar charts and histograms display frequencies of categories. A bar chart uses bars to represent each category, while a histogram uses bars to represent the frequency of values in each range.</p> Signup and view all the answers

What is a scatterplot?

<p>A scatterplot is a graph that displays the relationship between two continuous variables as points on the graph.</p> Signup and view all the answers

What is a contingency table?

<p>A contingency table is a table that displays the frequency distribution of two or more categorical variables simultaneously.</p> Signup and view all the answers

What is the purpose of assumptions in statistical analysis?

<p>Assumptions are set of conditions that must be met for the chosen analytical method to produce valid results.</p> Signup and view all the answers

What are the main assumptions of parametric statistical tests?

<p>Parametric tests have stringent assumptions. The most common ones include normality of data, homogeneity of variance, independence of observations, and linearity.</p> Signup and view all the answers

What are non-parametric tests?

<p>Non-parametric tests don’t make stringent assumptions about data distributions and are more flexible. They’re typically used when data violate assumptions required for parametric tests.</p> Signup and view all the answers

Explain the difference between ANOVA and ANCOVA.

<p>Both ANOVA and ANCOVA involve comparing means across groups. ANOVA focuses on the effect of a single categorical variable, while ANCOVA examines the effects of both a categorical and a continuous variable on an outcome.</p> Signup and view all the answers

What is a factorial ANOVA?

<p>A factorial ANOVA examines the effects of two or more categorical variables (factors) simultaneously.</p> Signup and view all the answers

What is regression analysis?

<p>Regression analysis is a statistical technique used to predict the value of a dependent variable based on one or more predictor variables.</p> Signup and view all the answers

Explain the difference between simple linear regression and multiple regression.

<p>Simple linear regression uses only one predictor variable to explain the dependent variable, while multiple regression uses two or more predictor variables.</p> Signup and view all the answers

What is meant by multicollinearity in regression analysis?

<p>Multicollinearity arises when predictor variables in a regression model are highly correlated with each other.</p> Signup and view all the answers

What is the purpose of a Durbin-Watson test in regression analysis?

<p>The Durbin-Watson test evaluates whether the residuals of a regression model exhibit autocorrelation, which is a violation of the assumption that errors are independent.</p> Signup and view all the answers

What is the purpose of a scatterplot of standardized residuals against standardized predicted values?

<p>Scatterplots of residuals help us assess the assumptions of homoscedasticity and linearity.</p> Signup and view all the answers

What is meant by heteroscedasticity in regression analysis?

<p>Heteroscedasticity occurs when the variance of the residuals in a regression model is not constant across the range of predicted values.</p> Signup and view all the answers

What is meant by normality of residuals?

<p>Normality of residuals refers to the assumption that the errors (residuals) in a regression model are normally distributed.</p> Signup and view all the answers

Explain the concept of influential cases in regression analysis.

<p>Influential cases are data points that have a significant influence on the regression coefficients and the overall model fit.</p> Signup and view all the answers

Explain the difference between spotlight analysis and floodlight analysis in regression.

<p>Spotlight analysis examines the effect of one predictor at a specific value of another predictor, while floodlight analysis considers the effect of one predictor across the entire range of values of another predictor.</p> Signup and view all the answers

What is logistic regression?

<p>Logistic regression is a statistical technique used to predict the probability of a categorical outcome variable based on one or more predictor variables.</p> Signup and view all the answers

Explain the concept of a logit in logistic regression.

<p>The logit is the natural logarithm of the odds. It transforms the nonlinear relationship between a predictor and the probability of a categorical outcome into a linear relationship for analysis.</p> Signup and view all the answers

How can we assess the goodness of fit of a logistic regression model?

<p>The goodness of fit of a logistic regression model can be assessed using various statistical measures, including the likelihood ratio test, the Hosmer-Lemeshow test, and the classification accuracy measures like hit ratio.</p> Signup and view all the answers

In what situations might logistic regression be used?

<p>Logistic regression is used in various situations, such as predicting whether a customer will purchase a product, determining the likelihood of a person having a disease, or identifying variables that influence the probability of a certain outcome.</p> Signup and view all the answers

Study Notes

Statistical Inference

  • Statistical inference uses sample data to draw conclusions about a population.
  • Specific population characteristics (parameters) are determined through contrasts, comparisons and associations.
  • A statistical model is created to accommodate uncertainty in the hypothesis being tested.
  • A random/representative sample is obtained to model the hypothesis.
  • Sample data is summarized using a test statistic.
  • Test statistics' probability distributions help in drawing inferences about the population.

Probability (Frequency) Distribution

  • A probability distribution is a function that defines the likelihood of different values of a random variable.
  • It is based on the underlying probability distribution of the variable.
  • Discrete and continuous probability distributions exist, each following similar reasoning.
  • Examples of probability distributions can be created by plotting the data with a graph.

Normal and Standard Normal Distribution

  • Normal distributions are symmetric and bell-shaped.
  • The 68-95-99.7 rule for standard deviations within a normal distribution is a key property.
  • Normal distributions are standardized for simplicity(a mean of 0 and a standard deviation of 1).
  • The equation of the normal distribution reflects its characteristic shape and other statistical properties.
  • The most frequent value = symmetric = mean = median = mode

Sampling Error (Margin of Error)

  • Sampling error exists because the analysis is conducted on a sample and not the entire population.
  • The population parameter is fixed; however, the value of the sample statistic varies.
  • Sampling error can be over/under estimate the true population parameter.
  • The variability in the sample statistic can be estimated using the Standard Error (SE).
  • The critical value from the probability distribution can be used to estimate the error range due to variation.

Parameter Estimation - Sampling Distribution & Standard Error

  • The standard deviation (S) reflects the variability in observations from the sample mean.
  • Standard error (σ) represents the variability of means across samples drawn from the same population.
  • The normal distribution is likely to exist when dealing with multiple samples from the same population.
  • For larger sample sizes the data displays a normal distribution based on the central limit theorem.
  • The sample size (n) with the standard deviation (s) can be used to approximate the standard error

Parameter Estimation - Confidence Level

  • The likelihood of the estimation being wrong is termed the confidence level.
  • The risk of making an incorrect estimation is called the significance level.
  • Typical significance level ranges are 1%, 5%, and 10%.
  • Confidence levels often correspond to these significance levels.
  • A confidence level often sets the threshold in making a Type I error (incorrectly rejecting a true null hypothesis).

Critical values and Conf./Sig. level

  • Confidence and Significance level values reflect the acceptable level of error.
  • High confidence levels correlate with a low chance of incorrect estimation.
  • Values are set for 2-tailed and 1-tailed tests with a focus on the level of risk (a=alpha).

Parameter Estimation - Confidence Interval

  • Confidence intervals are ranges containing the likely value of the population parameter.
  • Variability of the sample statistic (e.g., mean) can be calculated regarding its standard error. -The critical value from the probability distribution can be used to calculate a confidence interval.

Hypothesis

  • A hypothesis is a testable prediction about the state of the world.
  • Hypotheses are empirically measured using a valid and reliable manner.
  • They can identify a relationship between different parts of the experiment.
  • Independent variables are the predictors of the experiment; dependent variables are the outcomes.

Types of Hypotheses

  • Directional hypotheses predict the relationship's direction. Example (positive or negative).
  • Non-directional hypotheses do not have specific expectations regarding the effect's direction. Example (different).

Types of Hypotheses (Pairs)

  • Each alternative hypothesis has a corresponding null hypothesis. It negates or counters the effect of the alternative hypothesis.
  • The null hypothesis assumes no effect exists in the relationship.
  • Both null hypothesis (H0) and alternative hypothesis (H₁), together, capture all probable outcomes regarding a relationship.

Test statistic (p-value) vs. critical value (α-level)

  • The model's fit to the data is determined by the test statistic(s).
  • A p-value shows the probability of getting the test statistic if the null hypothesis is true.
  • Compare the p-value to the Significance level (=alpha) to determine if the effect is statistically significant.
  • Critical value from the probability distribution is compared to the test statistic.

Regions of rejection

  • Critical regions are areas where if the test statistic falls; the null hypothesis is rejected.
  • The region of rejection is specific to whether it is a one or two-tailed test.
  • Acceptance of the null hypothesis is determined based on the test statistic location within the rejection region.

Statistical Power

  • Power is the test's ability to correctly identify an effect if it exists in the population.
  • The statistical power of a test is 1- β(beta).
  • Beta is the probability that the study fails to detect an effect when one exists.
  • Increase sample size, effect size and the significance level to increase test power.

Sample Size and Statistical Significance

  • The sample size affects the test's validity.
  • Large sample improves the statistical validity compared to smaller samples.
  • Varying sample sizes even with similar means and standard deviations across samples can give different confidence intervals due to sampling error.

Type I and II Error

  • Type I error occurs when the null hypothesis is rejected when it is actually true, i.e. false positive.
  • Type II error occurs when the null hypothesis is not rejected when it is actually false, i.e. false negative.
  • Significance level ( α ) controls the risk of making a Type I error.
  • Power ( 1 - β) controls the risk of making a Type II error.

Significance Level (α, alpha)

  • The maximum risk taken to reject a true null hypothesis is called a Significance level.
  • It is the likelihood of observing a test statistic (or more extreme) under the null hypothesis.
  • To determine if a result is statistically significant, compare the result's p-value to the significance level, using a level of 0.01 or 0.05.

Test Statistic, Critical Value, and P-value

  • Test statistics summarize data(e.g., t-value, Z-value) and how closely it matched the expected outcome (effect size)
  • The p-value gives the likelihood of observing a test statistic at least as extreme as the observed statistic, if the null hypothesis were true.
  • Compare the test statistic to the critical value from the probability distribution to determine statistical significance.

Hypothesis Testing

  • It requires both null and alternative hypotheses.
  • The null hypothesis presumes no effect exists.
  • The probability (p-value) of observing the data, under the null hypothesis, is calculated.
  • If the p-value is less than the significance level, the null hypothesis is rejected in favour of the alternative hypothesis.

Levels of Measurement

  • Nominal variables represent categories (e.g., gender, religion).
  • Ordinal variables have a natural order (e.g., education level, customer satisfaction).
  • Interval variables have equal intervals but no absolute zero point (e.g., temperature in Celsius).
  • Ratio variables have equal intervals and an absolute zero point (e.g., weight, height).

Variables and constructs

  • Variables measure attributes of the real world.
  • Constructs combine multiple items to capture complex phenomena (e.g., consumer innovativeness).
  • Operationalization translates a construct into measurable items.

Form new variables - constructs

  • Create composite variables to join multiple indicators of a construct into a single measure (e.g., perceived brand localness).
  • Use consistent scales when measuring factors to build composite variables.
  • Statistically assess the reliability of the new composite variable.

Measurement error

  • Observed score = true score + error
  • Error can be systematic or random.
  • Validity assesses capturing what is meant to be captured.
  • Reliability assesses the absence of random error.

The dartboard analogy

  • Reliable measures hit the center of the target consistently.
  • Valid measures hit the target in the intended areas.
  • Ideally, measures should be both reliable and valid.

The Mode

  • The mode is the most frequently occurring value in a dataset.
  • It's useful for nominal and ordinal variables.
  • Mode is not sensitive to extreme values or large samples.
  • It does not provide information about variability (dispersion)

The Median

  • The median is the middle score in an ordered dataset, splitting it into two equal halves.
  • It is useful for both ordinal and scale variables.
  • It's resistant to extreme values (outliers).

The Mean

  • The mean is the average value in a dataset, determined by the sum of scores divided by the number of scores.
  • It's suitable for continuous (numerical) variables.
  • Sensitive to extreme values (outliers).

Range

  • The range is the difference between the maximum and minimum values in a dataset.
  • Sensitive to extreme values.

Deviance

  • Deviance is the distance from the mean.
  • Total deviance across all data points (on average).

Variance

  • Variance measures the spread of values around the mean.
  • It's calculated by squaring the difference of each value from the mean and summing those deviations.
  • Dividing the sum of squared errors by degrees of freedom creates greater comparability.

Standard Deviation

  • The standard deviation is the square root of the variance.
  • It's the average distance from the mean.
  • Variability, error, and homogeneity/heterogeneity in ratings are reflected by measures of variance and standard deviation.

Data frequencies

  • Frequency tables (tables) present the frequency of each value or category in a dataset.
  • Useful for summarizing variables' distributions for nominal, ordinal or metric data.
  • Frequency distributions are often displayed for visualization.

Frequency Tables

  • Frequency tables summarize how frequently each value or category appears in the data.
  • Used to show distributions for nominal, ordinal, and continuous data.
  • Useful in showing associations between variables.

Bar Charts (Categorical Data) and Histograms

  • Used to represent categorical data with their frequency distribution.
  • Bar charts show the numerical frequencies of different groups.
  • Histograms visually display the frequency distribution of continuous data.

Grouped Frequency Distribution

  • The grouped frequency distribution provides a summary of values of a particular variable categorized by their frequency.
  • Visualizations commonly show this frequency in bar charts or pie charts.

Histograms

  • Histograms visually present the frequency distribution of a continuous variable.
  • They show the number of observations within specified intervals.

Scatterplots

  • Scatterplots display bivariate relationships graphically.
  • Each point represents pairs of values for two variables.
  • Direction & dispersion of the variables (correlation).

Assumptions

  • Parametric tests assume certain characteristics about the population of the relationship being analyzed, including data distribution, independence and equality of variance.
  • Non-parametric tests do not have such assumptions.
  • Data transformations or tests of normality can be used if needed.

Linearity and Additivity

  • Relationships among independent variables in the analysis should be linear (linear function).
  • The aggregate effect of several predictors can be considered as the sum of their individual effects.

Independence

  • Observations/data must be independent and come from the same population.
  • Residual errors (differences between the predicated and observed values) must be independent of each other. -Correlation between residuals should be minimal.

Normality

  • In certain tests, observations are assumed to be normally distributed in the population of data.
  • Deviation of data from normal distribution is important.
  • Measures such as skewness and kurtosis can show how data deviates from normality.

Skewness

  • Skewness describes the asymmetry of a distribution.
  • Positive skewness indicates a longer tail on the right side.
  • Negative skewness indicates a longer tail on the left side.

Kurtosis

  • Kurtosis measures the "peakedness."
  • Leptokurtic distributions (higher than typical) are peaked and have heavy tails.
  • Platykurtic distributions (lower than typical) are flat and have light tails.

P-P Plot/Q-Q Plot

  • Used to assess if data is normally distributed.
  • Points should fall along the diagonal line for a normal distribution.

Different Q-Q Plots

  • Plots showing data falling away from a normal diagonal line show deviation from a normal distribution.

Skewness & Kurtosis

  • The skewness and kurtosis measures can be used to quantitatively describe the distribution deviation from a normal distribution.
  • These values are often displayed relative to their standard error to see if there is significant deviation from a normal distribution.

Test for normality

  • Kolmogorov-Smirnov and Shapiro-Wilk tests determine if a sample came from a normally distributed population.
  • A non-significant result suggests a normal distribution.
  • Significant results indicate deviations from normality.

Central Limit Theorem

  • When sample size (n) is large (greater than 30), the sample means tend towards normal distribution whatever the population distribution itself is.

Homogeneity of Variance or Homoscedasticity

  • Levene's test is used to assess if the variance of data between groups is similar.
  • A non-significant result indicates that variances for the different groups are similar.
  • Plotting Residuals (residuals against predicted values) is helpful to visualize homogeneity.

Comparing independent samples/groups

  • Independent t-test compares means from two sample groups, assessing if their parameters are significantly different.
  • One-way ANOVA compares means across three or more sample groups assessing if their parameters are significantly different.

Comparing independent samples/groups (Chi-Square test)

  • A chi-square test evaluates if a significant association exists between two categorical or nominal variables.
  • Frequencies/counts for each variable/group combination are scrutinized regarding the hypothesis of no association (no effect).
  • Cramer's V calculates the strength of the relationship for larger tables.

Contingency Table (Observed - Expected)

  • A contingency table displays observed and expected frequencies for relationships between categorical variables.
  • Chi-square tests use these differences to gauge a significant association between variables.

Strength of Association & Odds

  • Phi measures the strength of an association between two binary variables.
  • Cramer's V assesses the strength of association between nominal variables (larger than 2 x 2 tables).
  • Odds ratios provide the likelihood between groups/variables with a larger than 2 x 2 table.

Output of Chi-square Test

  • The chi-square output will display a significant result if there is evidence between variables.
  • Results often display p-values, Chi-square values etc.

Incomplete Gamma Function

  • The incomplete gamma function describes probabilities with different degrees of freedom.
  • Usually displayed for visualizing the probability of values in specific areas of the probability distribution.

Comparing Independent Samples, Independent-Sample t-Test

  • Independent t-tests determine whether two different/independent groups/samples differ in their means for specific variable (interval/ratio scale).
  • The test involves different/independent samples, assessed across different/independent variables

Comparing Independent Samples, Independent-Sample t-Tests (Example Research Questions)

  • Research questions regarding significant difference in group means (different/independent) for several variables including (but not limited to) Spending, attitudes , preferences.

Independent Sample t-Test : Assumptions

  • Independent observation: Observations are different/independent if they do not come from the same sample.
  • Normal distribution of differences: the scores distributions must be "normally" distributed before analysis.
  • Homogeneity of variances: the variance must be similar between groups/samples before proceeding with analysis.

Independent Sample t-Test : Rationale

  • The null hypothesis suggests that population means for both samples do not differ.
  • The alternative hypothesis states that population means differ between samples.

Independent Sample t-Test : Formula

  • The t-test formula calculates the difference between sample means divided by the standard error of the difference.
  • Pooled variance is used when group sizes are similar.

Output of an Independent Sample t-Test

  • The output for independent t-tests provides the characteristics of the tests, p-values, mean differences and other related data.
  • A p-value that is less than, or equal to (.05), means that there is a significant difference between the two groups/samples regarding the variable being assessed.

How to calculate the effect size

  • Effect sizes provide a measure of difference between groups.
  • Effect sizes, like r for correlation or d for t-tests, reflect the magnitude and strength of the effect of significance.

One-Sample t-Test

  • One-sample t-tests compare a sample mean against a hypothesized population mean.
  • Assesses if a sample mean is significantly different from an expected value.

Example research questions for one-sample t-tests

  • Research questions regarding how a specific group/sample compares (significantly) to an expected data value.

Comparing the Same Sample

  • Paired samples t-tests evaluate if the same sample differs significantly across various measures/variables. Assesses if an effect exists.

Paired-Samples t-test : Assumptions

  • Related observations: The data are from the same/related individuals/subjects
  • Normal distribution of the difference: The distribution of differences between observations should approximate a normal distribution.

Paired-Samples t-test : Rationale

  • Null hypothesis: Population means (for the paired/related observations do not differ).
  • Alternative hypothesis: Population means (for the paired/related observations) differ.

Paired-Samples t-test : Formula

  • The formula for the paired samples t-test combines the observed/measured difference between paired data points to gauge a significant (or not) effect.

Output of a Paired-Samples t-Test

  • The output for a paired samples t-test provides the characteristics of the test, relevant p-values, mean difference and other related data.
  • A p-value(s), less than, or equal to (.05), shows a statistically significant difference within the pairs/related groups when comparing the two paired variables.

How to calculate the effect size

  • Effect sizes quantify the amount of impact of a single (or multiple) predictors in relation to the outcome.

Example research questions for one-sample t-tests

  • Questions that involve a specific value against one group/sample are suitable for one-sample t tests.

One-Sample t-test

  • A one-sample t test assesses whether a sample mean differs significantly from an established/hypothesized value.

What is compared?

  • Understanding whether researchers compare different (multiple) samples or the same sample across categories/variables is crucial in identifying the statistical test.

Comparing Multiple Independent Groups

  • Multiple independent groups provide different groups for analysis across diverse variables or categories.
  • A one-way ANOVA analyzes if means differ significantly among these multiple independent groups for one particular variable.

Example research questions for one-way ANOVA

  • Appropriate for evaluating potential differences between three or more groups with a particular characteristic.

One-way ANOVA (Analysis of Variance) - Assumptions

  • Independent measures (groups): Scores for each participant/group do not affect/influence each other.
  • Independent and normally distributed errors(residuals): The deviation/errors for the outcome are independent of each other, and normally distributed (bell-shaped curve).
  • Homogeneity of variances (across groups/samples): The variance across groups/samples should be similar/equal.

One-way ANOVA (Analysis of Variance) - Process

  • Step 1: To find overall differences between groups.
  • Step 2: To find specific pair(s) that show differences between groups.
  • F-test is used to find the overall difference among group means.
  • Post-hoc tests determine which particular group means differ significantly from each other.

Theory of ANOVA

  • Decomposes overall variability in data into different portions, attributed to identified variables.
  • ANOVA is based on a partitioning of total data variability among the different parts of the model/analysis.

Theory of ANOVA (example output)

  • The output shows results from a one way ANOVA.
  • The output includes an ANOVA table with F statistics and p values, including information on various sources of variability.

One-way ANOVA applied

  • Example of a study involving three groups: A promotional strategy test involving three groups: Control, 1+1, and 50% off.

Distribution of data

  • A visual representation of the distribution of data for a particular variable or groups.
  • Mean scores for various groups and for the whole sample are shown graphically.

Total Sum of Squares (SST)

  • The total variability across all data points.

Model Sum of Squares (SSM)

  • Variability attributed/explained by the predictor (factor) in the data.

Residual Sum of Squares (SSR)

  • Variability that the predictor (factor) was unable to explain in the data.

F-Ratio

  • The F-ratio divides the explained variance (SSM) by the unexplained variance (SSR). If the ratio is very high, it shows evidence that there is influence/effect of the variable being tested.

Post-hoc comparisons & planned contrasts

  • Pairwise comparisons between all group means to determine which groups differ significantly from each other.

Example output of one-way ANOVA

  • This includes results from a one-way ANOVA.
  • The output shows F-statistics, P-values, adjusted estimations for groups, etc.

ANCOVA: Covariates in the ANOVA model

  • ANCOVA adds covariates to the ANOVA model to control/account for confounding variables.
  • It adjusts or accounts for pre-existing differences between groups by including identified pre-existing differences as covariates.
  • It removes some of the unexplained variance of the data, thus, contributing to a better regression model.

ANCOVA: adjusted and unadjusted means

  • Unadjusted means reflect different groups with potentially different values.
  • Adjusted means reflect the average value for a particular variable for each group if all covariates values were equal across groups.
  • Adjusted means provide a more accurate/representative picture in the presence of confounding variables.

Factorial ANOVA

  • A test that considers multiple predictors/factors (independent/categorical variables) and their interactions (combined effect) in relation to an outcome variable.
  • Assesses the independent and combined influence of variable(s) and their influence on each other.

Two-Way ANOVA: Main & Interaction Effects

  • Assess effects of two predictors (or factors) in relation to the outcome variable, and whether these factors have an effect on each other.

Two-Way ANOVA Applied

  • Example involves participants exposed to different promotional strategies (e.g., classical, pop, and rock music) with 2 different levels (male and female) for the gender of each participant.

Regression Analysis

  • Regression analysis helps in predicting an outcome variable from one or more predictor/independent variables.
  • Linear equation modeling is a key element of regression methods.
  • Various regression methods (OLS, etc.) can be used.

Simple Linear Regression : Applied Example

  • An example in which album sales are predicted from advertisement expenditure.

Example Output of Simple Linear Regression

  • A description of an output for a simple linear regression model.

Assessing Predictors

  • Coefficients representing the strength and direction of the predictor/independent variable influence on the output/dependent variable.
  • Provides a statistical measure of the influence of the predictor on the outcome variable.

A Cautionary Note: Outliers, Residuals, and Influential Cases

  • Outliers are observations that deviate substantially from the general trend.
  • Residuals denote the differences between observed and predicted values.
  • Influential cases are extreme observations that unduly impact model parameters.

A Cautionary Note: Validity and Generalizability

  • Validity refers to the extent to which a model accurately captures the relationship in the observed data.
  • Generalizability assesses the extent to which a model applies beyond the specific sample data.
  • Assumptions regarding data and the regression model are important to establish validity and generalizability.

Multiple Regression : Assumptions

  • Regression Assumptions (PoC): Linearity, continuous outcome, non-zero predictors, independent observations.
  • Regression Assumptions (PiA): Absence of multicollinearity; independent/errors, homogeneity of variances, normality.

Multicollinearity: Collinearity Diagnostics

  • Multicollinearity happens when two or more predictors are highly correlated.
  • VIF and tolerance measures are for assessing multicollinearity by quantifying how much the variance of an estimated coefficient is inflated by multicollinearity, measuring the variance of an individual predictor not explained by other predictors, respectively).
  • Check for correlation coefficients (r > .70).

Independent Errors (Autocorrelation)

  • Autocorrelation: Autocorrelation occurs when the residuals are related to each other.
  • Durbin–Watson test: used to check for serial correlations between errors.

Homoscedasticity and Normality of Residuals (Errors)

  • The variance of errors/residuals should be similar across levels of predictors or there should be no discernible patterns across plotted residuals.
  • Graphs (scatterplots, etc.) show whether the variances are similar across predictors and whether errors are normally distributed.

Predicted (Fitted) Values, Observed Values, and Residuals

  • Plots can reveal deviations from assumptions like homoscedasticity and normality.
  • Predicted values can be compared to observed values to see how well the regression model fits the data.

Serial Correlation (Autocorrelation)

  • Serial correlation means the errors or residuals(difference between predicated and actual values) are correlated over time.
  • Patterns are visible in plots of residuals (as opposed to a random scatter around zero) to determine if errors are random (uncorrelated).

Homoscedasticity: ZRESID Against ZPRED

  • Scatterplots of standardized residuals versus standardized predicted values (ZRESID vs. ZPRED): similar dispersion.
  • Homoscedasticity exists when the scatterplot shows similar dispersion across data values. A non-dispersed plot suggests possible deviations from this assumption, which usually indicates deviation from a normal distribution.

Normality of Residuals: Histogram

  • Histograms visually inspect the data distribution.
  • Data should conform to a bell-shaped curve for normal data.

Normality of Residuals: P-P Plot

  • P-P plots compare the cumulative proportions of the data to the cumulative proportions of a normal distribution.
  • The closer the points are to a straight line, the more the data approximates a normal distribution.

Assessing the Model: Fit & Bias diagnostics

  • Standardized residuals should, ideally, be between -/+2 or -/+2.5 standard deviations.
  • Outliers with absolute standardized residuals greater than 2.5 or 3 (or greater) are potential issues.
  • Cases' influence should be evaluated (e.g. Cook's distance, Mahalanobis distance etc.) regarding possible deviation in the model's stability from the total sample.

Multiple Regression in SPSS: Model Interpretation

  • Provides R, R², adjusted R² from ANOVA, to help interpret the model.
  • Assess for the significance of predictors in the model via the relevant p-values.
  • Betas can be used to interpret the predictors' impact on the outcome, adjusting their respective effects.

Multiple Regression: A Roadmap

  • A systematic/generalized guide for multiple regression analysis.

Multiple Regression: Applied Example

  • An example showing how a model can be used to predict employees' performance and include characteristics.

Interactions Between Predictors in Regression

  • Methods for analyzing interaction effects involve various methods such as regression analysis, etc.
  • The presence of interaction effects may influence each other and thus affect the predictive model.
  • Statistical models (e.g. Regression coefficients, ANOVA) should assess the combined effect of the interactions.

Interactions in Regression: A note...

  • Mean-centering is often used in regressions with interaction effects to reduce multicollinearity issues, adjusting for the effects of different predictors/factors/covariates.
  • It aids in understanding how the predictor's effects differ based on levels of other variables.
  • Mean centering does so by removing the variables mean from each individual observation.

Interactions in Regression : Output

  • This provides the statistical results from a regression model with interaction effects. Includes factors (variables) and their respective p-values and effect sizes.

Regression Interpretation: One Step Further...

  • Methods to explore interaction effects and model improvement.

Spotlight Analysis

  • Analyses interactions by exploring the effect of one predictor at various levels of another.
  • Helps in understanding how the effect changes as another variable changes.

Floodlight Analysis

  • Method evaluating the effect of one predictor across the full range of another variable.

Predictors in interaction analysis with regression

  • Regression analysis allows examining the effect of multiple predictors and their interactions on the outcome.
  • Different types of variables(continuous/discrete) with their corresponding levels contribute to the interaction between variables.

Covariates: Controlling for "Other" Variables

  • Covariates are typically used to control for other factors that could affect the relationship between the primary variables of interest.

Why is regression so cool?!

  • Researchers can use Regression to explore multiple variables and their interactions.

Logistic Regression: Objective

  • Predict the probability of categorical/binary outcome variables(e.g. yes/no).

Logistic Regression: Example Research Questions

  • Appropriate questions that deal with predicting different categorical outcomes such as customer churn, decision or similar outcomes.

Binary Logistic Regression: Functional form

  • The functional form of binary logistic regression models non-linear relationships between predictors and outcome variables using the logit function.

Binary Logistic Regression: Example Output

  • Example of a logistic regression output providing relevant model summaries and related statistics.

Binary Logistic Regression: Assumptions and “Quality Standards”

  • Logistic regression assumes binary outcomes, independent (disjoint) outcomes and large number of observations per predictor.

Logistic Regression: An Example

  • An example uses logistic regression to check for predictors of switching to a competing brand/product involving brand loyalty, consumer satisfaction etc.

Evaluating Logistic Regression Models

  • Different statistics are included in the model for evaluation including, but not limited to, Cox & Snell R², Nagelkerke R², Hosmer & Lemeshow etc.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Statistical Inference PDF

Description

This quiz covers the fundamentals of statistical inference and probability distributions. You'll explore how sample data helps infer population characteristics and understand different types of probability distributions, including normal and standard normal distribution. Test your knowledge on how these concepts are applied in statistics.

More Like This

Use Quizgecko on...
Browser
Browser