Performing T-tests: Hypothesis Testing in R - DataCamp PDF

Performing t-tests HYPOTHESIS TESTING IN R Richie Cotton Data Evangelist at DataCamp Two-sample problems Another problem is to compare sample statistics across groups of a variable. converted_comp is a numerical variable. age_first_code_cut is a categorical variable with levels ( "child" and "adult" ). Do users who first programmed as a child tend to be compensated higher than those that started as adults? HYPOTHESIS TESTING IN R Hypotheses H0 : The mean compensation (in USD) is the same for those that coded first as a child and those that coded first as an adult. H0 : μchild = μadult H0 : μchild − μadult = 0 HA : The mean compensation (in USD) is greater for those that coded first as a child compared to those that coded first as an adult. HA : μchild > μadult HA : μchild − μadult > 0 HYPOTHESIS TESTING IN R Calculating groupwise summary statistics stack_overflow %>% group_by(age_first_code_cut) %>% summarize(mean_compensation = mean(converted_comp)) # A tibble: 2 x 2 age_first_code_cut mean_compensation 1 adult 111544. 2 child 138275. HYPOTHESIS TESTING IN R Test statistics Sample mean estimates the population mean. x̄ denotes a sample mean. x̄child is the original sample mean compensation for coding first as a child. x̄adult is the original sample mean compensation for coding first as an adult. x̄child − x̄adult is a test statistic. z-scores are one type of (standardized) test statistic. HYPOTHESIS TESTING IN R Standardizing the test statistic sample stat − population parameter z= standard error difference in sample stats − difference in population parameters t= standard error (x̄child − x̄adult ) − (μchild − μadult ) t= SE(x̄child − x̄adult ) HYPOTHESIS TESTING IN R Standard error SE(x̄child − x̄adult ) ≈ √ s2child s2adult + nchild nadult s is the standard deviation of the variable. n is the sample size (number of observations/rows in sample). HYPOTHESIS TESTING IN R Assuming the null hypothesis is true (x̄child − x̄adult ) − (μchild − μadult ) stack_overflow %>% t= SE(x̄child − x̄adult ) group_by(age_first_code_cut) %>% summarize( H0 : μchild − μadult = 0 xbar = mean(converted_comp), s = sd(converted_comp), (x̄child − x̄adult ) n = n() t= SE(x̄child − x̄adult ) ) (x̄child − x̄adult ) t= # A tibble: 2 x 4 √ s2child s2adult age_first_code_cut xbar s n + nchild nadult 1 adult 111544. 270381. 1579 2 child 138275. 278130. 1001 HYPOTHESIS TESTING IN R Calculating the test statistic # A tibble: 2 x 4 numerator

Performing T-tests: Hypothesis Testing in R - DataCamp PDF

Document Details

Tags

Related

Summary

Full Transcript