Artificial Intelligence Mathematics Statistics 20240926 OCR PDF

01 -;, L.: c::,.XI L.. c::, c:::, e 010~ -n L..; -:iIX.A. 1 - 01-/5 "'Ml...

01 -;, L.: c::,.XI L.. c::, c:::, e 010~ -n L..; -:iIX.A. 1 - 01-/5 "'Ml c::::, '1 El ~-f:! [email protected] 0 t~ Cff "2.t *.ii! "2.t~ ~ XI8-"2.t~ / ~ ro >- => CY (0, Cl') C' ~· r- - U), ~ C'J 0 ,20 40 60 80 1100 2 5, 110.20 X Fle xibi lit.y LEFT RIGHT Black: Truth RED: Test MES Orange: Linear Estimate Grey: Training MSE Blue: smoothing spline Dashed: Minimum possible test Green: smoothing spline (more MSE (irreducible error) flexible) EXam pies with Different Levels of Flexibility: Example 3 ~6 ~en~- :f.;3 ,o, N 1.0 ~ y- 0 t:: w ,o, y- =m ~ ,o > ~ CY y- {J') ~ ,o, ID :E, 1.0 ,o, i 0 ,20 40 60 80 "1100 2' 5, "110 ,20 X Fle,xibility LEFT RIGHT Black: Truth RED: Test MES Orange: Linear Estimate Grey: Training MSE Blue: smoothing spline Dashed: Minimum possible test Green: smoothing spline (more MSE (irreducible error) flexible) Bias/ Variance Tradeoff The previous graphs of test versus training MSE's illustrates a very important tradeoff that governs the choice of statistical learning methods. There are always two competing forces that govern the choice of learning method i.e. bias and variance. Bias of Learning Methods Bias refers to the error that is introduced by modeling a real life problem (that is usually extremely complicated) by a much simpler problem. For example, linear regression assumes that there is a linear relationship between Y and X. It is unlikely that, in real life, the relationship is exactly linear so some bias will be present. The more flexible/complex a method is the less bias it will generally have. Variance of Learning Methods Variance refers to how much your estimate for f would change by if you had a different training data set. Generally, the more flexible a method is the more variance it has. The Trade-off It can be shown that for any given, X=x 0 , the expected test MSE for a new Y at x 0 will be equal to Expected Test MSE = E (Y - f (x0 )) = Bias 2 +Var + s2 2 Irreducible Error What this means is that as a method gets more complex the bias will decrease and the variance will increase but expected test MSE may go up or down! Test MSE, Bias and Variance ~ ,, 0 0 >- 0 20 40 60 80 100 20 40 60 80 100 20 40 60 BO 100 X - MSE Biaa. - V;:ir 2 11 20 2 5 10 2 5 10 20 Flexibility RleiCi The Classification Setting For a regression problem, we used the MSE to assess the accuracy of the statistical learning method For a classification problem we can use the error rate i.e. n Error Rate =  I ( yi  yˆ i ) / n i =1 I ( yi  yˆ i ) is an indicator function, which will give 1 if the condition ( yi  yˆ i ) is correct, otherwise it gives a 0. Thus the error rate represents the fraction of incorrect classifications, or misclassifications A Fundamental Picture High Bias Low Bias In general training Low Variance....... ______ High Varia ce -------~ errors will always decline. Test Sam However, test errors / will decline at first (as / reductions in bias Training Samp e dominate) but will Low High then start to increase Model Complexity again (as increases in variance dominate). We must always keep this picture in mind when choosing a learning method. More flexible/complicated is not always better! The Linear Regression Model Yi = b0 + b1X1 + b2 X2 + + b p X p + e The parameters in the linear regression model are very easy to interpret. 0 is the intercept (i.e. the average value for V if all the X's are zero), j is the slope for the jth variable Xj j is the average increase in V when Xj is increased by one and all other X's are held constant. Least Squares Fit y We estimate the parameters using least squares i.e. n 2 ( MSE = å Yi - Yˆi 1 n i=1 ) 1 n ( = å Yi - b̂0 - b̂1 X1 - ) 2 - b̂ p X p n i=1 1 Relat i OnShi p between population and least squares lines ~6 ~en~- Population line Yi = b0 + b1X1 + b2 X2 + + b p X p + e j. j. ·~ Least Squares line Yˆi = b̂0 + b̂1 X1 + b̂2 X2 + + b̂ p X p We would like to know O through  P i.e. the population line. Instead we know ˆ0 through ̂ p i.e. the least squares line. Hence we use ˆ0 through ̂ p as guesses for O through  P and Yˆi as a guess for Yi. The guesses will not be perfect just as - is not a perfect guess for . X Assessing the accuracy of the coefficient estimates Bias 00 0 0 0..... 0..... Standard error 0 ~o 0 0 I.{) 0o § 00 0 0 0 0 0 0 LO 0 0 0 @ >- cP CX) >- 0 0 0 o 0 00 8 0 0 I 0 § 000. ~ 0 I.{) LO I 0o I 6>o 0 0 0 0 0......,... I I -2 -1 0 1 2 -2 -1 0 1 2 X X Figure 2: true vs estimated lines Standard error 95% C.I. of 𝛽1 test statistics : Residual standard error : RSS RSE- n-2 Measures of Fit: 𝑅2 Some of the variation in Y can be explained by variation in the X's and some cannot. R2 tells you the fraction of variance that can be explained by X. R2 TSS-RSS T'SS R2 is always between 0 and 1. Zero means no variance has been explained. One means it has all been explained (perfect fit to the data). Inference in Regression Estimated (least squares) line The regression line from the sample is not the regression 14.. line from the population. 12 What we want to do: 10 Assess how well the line ‒ 8 describes the plot. >- ‒ Guess the slope of the 6 population line. 4 ‒ Guess what value Y would take 2 for a given X value - - 0 -10 -5 0 5 10 X True (population) line. Unobserved Some Relevant Questions Is j=O or not? We can use a hypothesis test to answer this question. If we can't be sure that j ~ 0 then there is no point in using Xj as one of our predictors. Can we be sure that at least one of our X variables is a useful predictor i.e. is it the case that 13 2 = = 13P=O? 1= 13 Is the whole regression explaining anything at all? Test for: H0: all slopes = 0 (1=2==p=0), Ha: at least one slope  0 Answer comes from the F test in the ANO VA (ANalysis Of VAriance) table. The ANOVA table has many pieces of information. What we care about is the F Ratio and the corresponding p-value. ANOVA Table Source df SS MS F p-value Explained 2 4860.2347 2430.1174 859.6177 0.0000 Unexplained 197 556.9140 2.8270 Is bi=O i.e. is Xi an important variable? We use a hypothesis test to answer this question H0 : bj=O vs H0 : bj0 Calculate ˆ j Number of standard deviations t= I SE ( ˆ j ) away from zero. If tis large (equivalently p-value is small) we can be sure that bj O and that there is a relationship Regression coefficients Coefficient Std Err t-value p-value Constant 7.0326 0.4578 15.3603 0.0000 TV 0.0475 0.0027 17.6676 0.0000 I Iˆ I ̂1 SE ( 1 ) ̂1 is 17.67 SE’s from 0 P-value Testing Individual Variables Is there a (statistically detectable) linear relationship between Newspapers and Sales after all the other variables have been accounted for? Regression coefficients Coefficient Std Err t-value p-value Constant 2.9389 0.3119 9.4223 0.0000 TV 0.0458 0.0014 32.8086 0.0000 Radio 0.1885 0.0086 21.8935 0.0000 No: big p-value Newspaper -0.0010 0.0059 -0.1767 0.8599 Almost all the explaining that Newspapers could do in simple regression has already been done by TV and Radio in multiple regression! Regression coefficients Coefficient Std Err t-value p-value Constant Newspaper 12.3514 0.0547 0.6214 0.0166 19.8761 3.2996 0.0000 0.0011 Small p-value in simple regression Accuracy of the model Sales Radio Qualitative Predictors How do you stick "men" and "women" (category listings) into a regression equation? Code them as indicator variables (dummy variables) For example we can "code" Males=O and Females= 1 Interpretation Suppose we want to include income and gender. Two genders (male and female). Let --------- ì 0 if male Genderi = í î 1 if female then the regression equation is ìï b0 + b1Incomei if male Yi » b0 + b1Incomei + b2Genderi = í ïî b0 + b1Incomei + b2 if female  2 is the average extra balance each month that females have for given income level. Males are the "baseline". Regression coefficients Coefficient Std Err t-value p-value Constant 233.7663 39.5322 5.9133 0.0000 Income 0.0061 0.0006 10.4372 0.0000 Gender_Female 24.3108 40.8470 0.5952 0.5521 Other Coding Schemes There are different ways to code categorical variables. Two genders (male and female). Let ì -1 if male Genderi = í î 1 if female then the regression equation is ìï b0 + b1Incomei -b2, if male Yi » b0 + b1Incomei + b2Genderi = í ïî b0 + b1Incomei + b2 , if female  2 is the average amount that females are above the average, for any given income level.  2 is also the average amount that males are below the average, for any given income level. Interaction When the effect on V of increasing X 1 depends on another X2. Example: ‒ Maybe the effect on Salary (V) when increasing Position (X 1) depends on gender (X 2 )? ‒ For example maybe Male salaries go up faster (or slower) than Females as they get promoted. Advertising example: ‒ TV and radio advertising both increase sales. ‒ Perhaps spending money on both of them may increase sales more than spending the same amount on one alone? Interaction in advertising Sales = b0 + b1 ´TV + b2 ´ Radio+ b3 ´TV ´ Radio Spending $1 extra on TV increases average sales by o.p191 + 0.0011Radio Interaction Term Sales = b0 +(b1 + b3 ´ Radio)´TV + b2 ´ Radio Spending $1 extra on Radio increases average sales by 0.0289 + 0.0011 TV Sales = b0 + (b2 + b3 ´TV)´ Radio + b2 ´TV Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 6.7502202 0.247871 27.23

Artificial Intelligence Mathematics Statistics 20240926 OCR PDF

Document Details

Tags

Related

Summary

Full Transcript