Hypothesis Testing Notes PDF
Document Details
Tags
Summary
These notes cover hypothesis testing concepts. They detail null and alternative hypotheses, Type I and Type II errors, and discuss sampling errors. The notes also touch on significance levels for hypothesis tests.
Full Transcript
#Hypothesis Testing hypothesis testing Ho null : hypothesis ↳ status quo will hold ↳ typically...
#Hypothesis Testing hypothesis testing Ho null : hypothesis ↳ status quo will hold ↳ typically equates quantity of interest to 0 alternative Hi : hypothesis > - status quo wouldn't hold 4 covers outcome outside of Ho sampling error : FALSE TRUE Type reject o ~ error accept Type I Ho - error > - decrease in significance level = stricter test rejection regions : - · One-tailed lower tail > = constant Ho parameter line - : Hi : parameter < constant upper - > Ho tail parameter & constant - : Hi : parameter > constant "III11 ,1 critical value · two-tailed - > - Ho : parameter = constant Hi : parameter # constant /i / critical Critical valne valuc > - - 2 2 # reject Ho ↳ p-value is < 0 05 at 5 % level is sufficient. of significance , there evidence to reject hull Ho hypothesis. # fail to reject Ho 05 5% of 4) p-value > 0. at level significance , insufficient evidence to reject Ho Linear Models linear modelling E Bo E + B, X + y : dependent variable X : covariate Bo : intercept Bi : Slope E error term ↓ anything that can afleet y other than X. if E -N/0 01) E(E) 0 > If error distributed , expected value is o normally , = - error , Y = Bo + B , X + E 1) E(E) = 0 E(Y(X = x] = Bo + B , X - value y for X expected of a given Bo : mean value of y unen X = 0 · statistical inference theory extrods > - and practice of forming judgements about parameters , of of statistical relationships of population and reliability , typically on basis random sampling · estimator ~ n paretar need e a ↳ rule method criterion for at an estimate of the value of a arriving , or parameter - functions of randomly sampled data = also random variables ↑ have probability distribution · central limit theore ↳ distribution of sample means approximates a normal distribution as sample size gets larger , regardless of population distribution 4 of and deviations average sampic ecans standard will approximate population mean and standard deviation - sample size ↑, distribution of sample means become fighter ↳ for CUT sample size ? 30 Sufficient to hold · population parameter : 4) often unknown give rough idea/ confidence interval ↑ sampling can => estimate range of plausible values confidence interval has own each random sample - ↳ also a random variable X · confidence interval ↳ - confidence interval for population mean is a range of values that contains population mean in- % of repeated samples * ↳ sample statistic = (E-score St der of. sample statistic) ↳ known pople statistic : z(F) [z - + est] Itn-1 () It-test] ↳ unknown poply statistic : 4) using CCT , for a given random sample ofY 2Y 4 mean : sample n ~ st. deviation of sample means : ↳ z-score increases with probability ↳ size a sample : ↳ I in St deviation of sample mean. & fighter distribution of sample means ↑ righter confidence interval windows & estimates more accurate EcEx (M-x) · minimising squared error - > Sue of sanared error : (M +2) " 2 xEx minimises error mean of squared error- > - Id sample : mean square N 4) population mean : M 1 sample mean : estimator of M = M finding error : ↳ standard of sample mean & collect many samples with n observations ② compute M for all samples ③ SE of mean = f of population sta deviation (*) known ⑨ Se of mean - F if population sed denation unknown 15 = S , sample standard deviation) > 2-d to minimise squared error sample : find Ordinary least squares regression line - 4) Bo and B, population intercept and slope : 4 Ous intercept and slope estimated using sample : Bo and B, ↳ 1 B : x) (Yi-1) S [i(Xi Y)2 - 1 ↳ B : Y - B , X 4 If CLT , Bo and B are normally distributed around Bo and B, X)n ; ] var((Xi & - 48 , = - [var(Xi)]2 as sample size it , ↓ variance ~ 6 : Hi : 1- Exis X : · Regression output in R 2) OLS result not reliable if extrapolated outside bounds of input data Bo ↳ standard error of and B? can be used for hypothesis testing 4 Ho typically poply parameters = 0 4 poply standard deviation unknown 4) do t-test ↑ compute p-value ~ expectation of E 4 Y = Bo + B, X + E - if sue of squared errors minimised , E(E) = 0 · Internal validity internal validity ↳ statistical internal validity if statistical inferences about causal analysis has effects are valid for population being studied TLDR : extent which observed results being studied 4) truth in = pople ↳ if NOT internally valid > - cannot use results to make casual claims ↑ must be considered during model specification -) threats internal to validity ① omitted confounding variables ② simultaneous causality 4X causes ; and y causes X > - ③ error in variables 4) X measured with an error ⑦ problems with sample selection 4 observations selected not through random sampling ⑤ functional form misspecification 4) nonlinear terms variable of primary independent omitted. > all 5 threats are mathematically equivalent - 4 all lead to ECE(X = (2) #O 4 pick and have valid argument ① confounding variable Y = Bo + B , X + E 4) variable E another is confounding if show for argument ↑ ↳ it has impact on y both must be satisfied > - ↓ 4 correlated to X to show of not causal 4 omitting z from regression model threaters internal validity ↓ must be conditions very sure that both are satisfied. else it depends ②. simultaneous causality & occurs when X cause Y and Y cause X ↳ eg : size of police force It crime rate biased - cannot make ↑ when y regressed on X , the estimated B will be & causal claims > - also known as "reverse causality" ③ errors in variables ↑ occurs when X measured with error 4 true value = S observed value = C' , 4x = x + u(e9 1). 4 y = Bo + B, X + E > - y = Bo + Bix' + E = Bo + B ! (( + ) + G = BotBix + E c = Bin + E 4 B. + Bi ④ problems in sample selection ↳ observations for analysis not selected through random sampling 4 biased coefficient estimates - selection bids ↳ if data => selection bias missing at random no if data missing systematically due to sampling procedure > Selection - bins multivariate regression - => to fix omitted variable Dias , include the variable 4 y = Bo + B, X, + B2xz +... + E · Multiple R-squared vs adjusted R-sauved > - multiple R2 is measure ofnow much variance in y explained by model > adjusted R2 measure of how y explained by model - is much variance in After controlling number of regressors > regressors -> higher multiple R2 - more F higher adjusted R2 > large adjR" indicates model overfitted - may be > - neither R2 has impact on causal inference R2 only ) quality ↳ tells of model fit Y models R2 and statistically significant covariates can have very low ↳ regressors associated with only small changes in y , but high confidence relationship non-zero. control variable ↳ variables included in a regression to achieve internal validity 4 not focus of exercise hypothesis testing ↑ helps improve confidence of causality least sq assumption for causal inference least sq - assumption for causal inference O E(E(X () = = 0 ↳ no threats to internal validity ② (Xi Yi) , , i = 1... n are id 4) observations are identical and independently distributed ① OCE(KE) logarithmic - (n(x + xx) - (n(x2) = (n(1 + 2) = 4 is small applies ↳ when i logarithmic function cases log - ln cases linear-log · 4 Y = Bo + B In(X) , + E, B, (n(X OX) Ei Y + Oy = Bo + + + -Y = B. - = (* x100) ↳ 1 % increase in X associated to 10. 01] B , change in Y. log-linear · 4) In (y) = Bo + B , X + Ei + xy) Bo B, (X + 0x) + Ei (n(y = + * = B OX , # x 100 = 100 x B, X *X umb ,100 to ↓ ↳ I unit change in X associated with (100) B , % change in Y log-log · 4 In (y) = Bo + B, (n(X) + Ei (n(y + 0Y) = Bo + B , (n(X + OX) + Ei & = B. 4) 1 % change in X associated with a B % , change in Y If relation between Y and X is nonlinear , & effect on y on change in X depends on value of X ↳ marginal effect of X is not constant to X 4 solution is estimate regression function nonlinear in ② (Xi Yi) , , i = 1... n are id 4) observations are identical and independently distributed drawn from distribution ↳ identical : all observations same underlying ) ↳ independent : value of observation doesn't observations any depend on other ① OCE(KE) variable (2) that influences treatment(X) is if "randomly assigned (IV) when it quasi-experiment internally valid checking : · (x) variable > variable that is correlated with independent and is also a - is there any - (Y) determinant of variable - dependent > can affect X - y > is there error - an in measuring X > is there in - any difference treatment and control systematic #atistical significance of coefficient estimates ↳ estimates computed using standard error of coefficient 4 for valid , need residuals OLS to be internally assumptions about variance of arrance of residuals - · homoscedastic errors : 4 var(ElX = (c) = constant ~ heteroscedastic errors : ↳ no constraints on var(ElX = x) > - no effect on coefficient estimates > errors standard - affect of estimates > - if errors are heteroscedastic cannot use "nomoscedastic" formula > robust standard - errors (applying correction > - if errors are nomoscedastic = can use "heteroscedastic" formula > use robust standard always errors - > most "homoscedastic formula" - programs use by default -> need override Detai categorical variables ↑ regression with binary variable equivalent to difference of means analysis ↳ hypothesis 4 is there these statistically significant difference between means of populations 4) for each group , compute sample mean (and standard error) of dependent variable ↑ test if differeese is statistically significant 4 it. wage = Bo + B , (male) + E 4 B , statistically significant - ↑ statistically significant difference between mace and feraced 4 if B not , statistically significant ↳ no statistically significance difference 4) B, statistically significant and negative ↑ being make associated with lower wages T compared - to females using binary variable : - ↑ experiment = outcome control) - Outcome streatment] Ho : Nexperiment = 0 Ha : experiment O ↑ ECY/Treatment : 1] ECY/treateat = 07 experiment : - ↳ differences estimator estimates treatment effect average 4 differences in sample averages for treatment and control groups Regression of Y on X and X on Y You have a dataset with variables X and Y. When you perform a regression of Y on X, you obtain coefficient estimates β0 and β1. Conversely, when you reverse the roles and perform a regression of X on Y, you obtain coefficient estimates γ0 and γ1. What is the connection between β1 and γ1 in these two regression models? Do both models have the same R2? If not, why do they differ? Y=β0 +β1 X+ε X=γ0 +γ1 Y+ν No, both models do not have the same R2. R2 is dependent on the variance of the dependent variable in each regression. To explain sampling: Sampling always happens with constraints. The constraints define the ”population” from which sampling happens. ”Population” refers to the set of all people that satisfy the criteria you are interested in. The researcher argues that these two samples are ”identical” to each other on aspects such as education, age, ethnicity, gender, tenure (time in current job), occupation, and union status and only differ w.r.t. wages and imprisonment. They are also ”independent” because both samples were obtained through random sampling from their respective ”sub-populations”. If you wish to argue that these two samples are not comparable, then you will have to find systematic differences between these two samples. Such as family background, propensity for alcohol/drug abuse, drug abuse or participation in gang activity (distribution of all these variables are different across both samples). And this is the way to show that these samples are not ”identical”. As a solution, the research design should ”control for” these differences to make proper estimates. -hidelines for good experiment : random allocation into treatment and control groups · · attrition : 4) should not change composition of treatment and control groups no spillover : · 4 interact control and treatment group should not sample · size : 4 adequate sample size Quasi-experiments - regression discontinuity · 4 natural experiment setup where treatment/control on assignment to depends certain variable crossing threshold value it. gou offer financial did if income handwidth : window of inclusion around cutoff 4 small :t assignment closer to random > - random occurrence > sample size typically small - > - harder to extrapolate results (external validity concerns) random & large -> assignment less > - not random. occurrence 4) there should be except for treatment/control , no systematic differences between individuals immediately around cutoff & make sure other covariates are not statistically different between 2 groups ↳ cutoff not chosen to allow certain individuals into treatment variable" ↳ individuals can manipulate value of "assignment to gain treatment, leads to around cutoff heaping · differences in differences natural treatment/control experiment to fully ↳ setup which assignment not random 4 treatment and control largely comparable groups still u assumptions less strict of treatment, unobserved differences between treatment key assumption In absence & : same over time and control groups are ↳ requires data for both treatment and control pre-intervention and Intervention post · parallel trends most critical assumption ↳ - fulfill hardest to - in absence of treatment, difference between treatment and control group constant over time ↳ no statistical test , judge by inspection ↳ DID strengths : ↳ intuitive interpretation ↑ obtain causal effect using observational data if met assumptions are ↑ use either individual or group level data start at different levels of outcome ↳ comparison groups can other than intervation due to factors 4 account for change/no change ↳ DID limitations : 4) requires baseline data and non-intervention group I i intervention cannot use if allocation determined by baseline outcome comparison groups have different outcome trend staple composition of groups pre/post change not unobservable confounding variables - Measuring certain variables can be unethical/infeasible/expensive ↳ potential confounding variables may be unmeasurable n can use panel data instead · panel dataset - contains observations on multiple entities where each entity observed at 2 more or points in time ~ SYi , Xi5 i = 1 , 2 ni entities entity... , , = , n = no of [Yit Xit], , t = 1,... T t= time period , T = no - of time periods with k covariates : &X lit , Xzit.... Xit , Y it] · data is useful : why panel ↑ control for certain unobservable factors & unobservables could cause omitted variable bias it omitted · ↳ what kind of observablesPanel data can help with : (time invariant) 4) those across entities but do not across time that vary vary ↑ culture ↑ place of early uppringing ↳ time invariant ↳ those that vary across time but not across entities Centity invariant) ↑ sales tax for purchases made in SE ↳ entity invariant - specific entity invariant ↑ fire invariant : firm industry , location , ownership 4 state invariant : historical factors geographical Size , Y TLDR : if omitted variable does not over change over time my , any changes time cannot be caused writted variable by - we can estimate B, even it Zi unavailable variables navy ~ outcome Y = Bo + B, x + E E(y] = Y x Pr[Y = y] oxPr[y = 0] = + 1 x Pr[y = 1] = Pr[y = 1] Pr(y ((X = = x + 0x) - Pr(y = 1(x = x) - B, = - X 4) a I unit increase X is associated with B x100 in , percentage points change in Pr(Y 1) = ↳ model of probability as linear function X ↳ advantages : > - simple to estimate and interpret > - inference same as for multiple regression ↳ disadvantages 1 > can be solved non-linear model probability - may not make sense linear in X using > predicted probability >/ - can be 20 or #bit Model z value of probit model en Pr (Y = 1(x) = E (Bo + B , X ( ↓ cumulative standard normal distribution function > - "S-shape" gives : ↳ o [Pr(y = 1/X) 11 for all X 4) when B, > 0 , Pr(Y = /IX) will be increasing in X ↳ B , 30 : probability ↑ with X B , 10 : probability ↓ with X 4) similar hypo testing to linear regression ↳ if p-value threshold , below significance accept alternate hypothesis ↳ hard to interpret coefficient value 4) to find unit expected in y if X changes by 1 change ↳ compute change manually for given value of X Logit Model Pr (y = /(x) = F(Bo + B , x) F is cumulative standard distribution function logistic - F(Bo + B , x) = 1 + e - ( Bo + B , x) 3 TLDR very game to probit , just easier to calculate Assumptions for causal inference With binary outcome variable ① no threats to internal validity ② observations I I D.. ③ outliers unlikely ① no perfect multicollinearity Entity Fixed Effects regression - estimation 3 methods : I ① "n-1 binary regressors" os regression identical estimates of regression coefficients , identical standard errors & "Entity demeaned" OLs regression ③ "Changes" (only T= 2) specification works when ① &Q : work for general time ① only practical when n notwo big ① i-1 regressors" ② "entity demeaned" ↑it is difference between Yit and average of y for all observations of i - of y over time dimension : average = entity demeaning ↳ entity fixed effects regression controls for All variables that are constant for time an utify over Y regardless confounding or not Time Fixed Effects ↳ omitted variable over time across states might vary but not & "T-1 regressors" ②"Time deeaned " Yit is difference between Yet and average of y for all observations of - average of y over entity dimension It = time demeaning > - time effects fixed that are constant over all entities for each time regression controls for All variables period Y regardless confounding or not Multiple Entity Fixed Effects ↳ multiple time invariant variables ↳ use entity denearing to eliminate all all time-invanant variables are perfectly correlated , ↳ within each entity , entity fixed level if not confounding effect 4 Xi captures effect of all time-invariant variables time fixed variables ) ↳ ke captures effect of all entity-invariant effect Time Fixed Effects regression Entity and Y it : Po + B , Xit + BrZi + BySt + Hit I um -- = Bot B Xit. + Yz D2i +... InDni + SB2t +.... SBTt + Wit me - S S = BIXit + Xi + Kt + Uit Zi : entity fixed effects St : time fixed effects 4 constant for entity over time ↳ constant for all entities within given time-period ↳ time invariant " entity invariant estimation : "n-1 and 7-1 binary regressors" or regression estimation : "Entity and tree-deeaned" Ons regression Entity Fixed Effects Regression Yit = B , X it + Xi + Wit assumptions for causal inference ① uit has conditional mean of O 4) ECUit(Xi1 , Xi... Xit , Xi) = 0 & (Xi1 , Xiz... Xit , nic , Miz... kit) , int , e... n are iid draws from joint distribution ③ large outliers unlikely : (Xit Uit] , have non-zero finite fourth moments ④ no perfect multicollinearity ① uit has conditional mean of 0 4 ECUit(Xi1 , Xiz... Xit , Xi) = 0 ↳ error tere has conditional mean O given all T values of X for that entity ↳ similar to OLS assumption for cross Sectional data ↳ implies no threat to internal validity ↳ conditional of not ANY values of X for ecan hit should depend on future ( that entity - breaks X into 2 parts , ↳ I part that might be correlated with an omitted variable and I part that is not > - isolate part that is not correlated with omitted variable estimate B, , nence without bias > - use instrumental variable Zi correlated with Xi but no other determinant , only of Yi > Zi detects that uncorrelated with variables - movements in X; are omitted and use these to estimate B, without bias valid instrument : · Instrument relevance corr (Zi , Xi) = 0 4 instrument must be correlated with primary independent variable ↑ can be established test by performing correlation Instrument · exogenuity ) ↳ instrument must not be correlated with any other determinant of Yi "Zi has impact Y: an on only through Xi if counter point , need to state underlying * if arguing against exogenuity , bringing up assumptions * least square) regression 12SLS- Cstage contain confounders Estimation using instrument : f ① First isolate part of X that is uncorrelated with U : ↳ regress X on E using OLS Xi = Fo + Tizi + Vi 4 Zi uncorrelated with all other determinants of Yi ) ↳ nence FrotTr, Zi uncorrelated with all other determinants of Yi 4) predicted values of Xi *i = Fo + Fizi ② replace Xi with i in original regression Yi = Bo + B, Xi + hi > - Yi = Bo + B , Y; IV why does regression work : it ↳ only retains changes in X that are unaffected by omitted variable bids 4 allow causal reference ↑ instrumental variable is a "process" that is randomly allocating a part of X 2SLS regression assumptions ↓ [Yi , Xi , zi3 are i i d.... 2 all variables have finite , nonzero 4th moments 3. no perfect multicollinearity 4. Z is valid instrument ↳ instrument relevance 4 instrument exogenuity