BMGT430 Final Exam Review Problems PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document contains review problems for a BMGT430 final exam, covering topics like regression analysis, hypothesis testing, and correlation. The problems involve scenarios in multiple regression and logistic regression.
Full Transcript
**Review Problems for the BMGT430 final exam** **True/False** a. When we compare the two models, [*Y* = *β*~0~ + *β*~1~*X*~1~ + *ε*]{.math.inline} and [*Y* = *β*~0~ + *β*~1~*X*~1~ + *β*~2~*X*~2~ + *β*~3~*X*~3~ + *ε*]{.math.inline}, we should use F-test, but not t-test. b. When we compar...
**Review Problems for the BMGT430 final exam** **True/False** a. When we compare the two models, [*Y* = *β*~0~ + *β*~1~*X*~1~ + *ε*]{.math.inline} and [*Y* = *β*~0~ + *β*~1~*X*~1~ + *β*~2~*X*~2~ + *β*~3~*X*~3~ + *ε*]{.math.inline}, we should use F-test, but not t-test. b. When we compare the two models, [*Y* = *β*~0~ + *β*~1~*X*~1~ + *β*~2~*X*~2~+ *ε*]{.math.inline} and [*Y* = *β*~0~ + *β*~1~*X*~1~ + *β*~2~*X*~2~ + *β*~3~*X*~3~ + *ε*]{.math.inline}, we should use F-test, but not t-test. c. The R-square for [*Y* = *β*~0~ + *β*~1~*X*~1~ + *ε*]{.math.inline} should be always smaller than the R-square for [*Y* = *β*~0~ + *β*~2~*X*~2~ + *β*~3~*X*~3~ + *ε*]{.math.inline} because it has less number of predictors. d. The R-square for [*Y* = *β*~0~ + *β*~1~*X*~1~ + *ε*]{.math.inline} should be always smaller than the R-square for [*Y* = *β*~0~ + *β*~1~*X*~1~ + *β*~2~*X*~2~ + *ε*]{.math.inline} because it has less number of predictors. e. For selecting a subset of predictor variables, "all possible regressions" is the best method, because it is computationally efficient and considers all different possibilities of subset models. **Blood Pressure -** Researchers wanted to determine the relationship between systolic blood pressure and three potential predictor variables, age, body mass index (BMI) and gender. They conducted a pilot study with 27 individuals randomly selected from a population with ages ranging from 20 to 70 years. Below is the list of variables that were considered for this analysis. - **SBP**: systolic blood pressure (in mmHg) - **Age**: age (in years) - **BMI**: body mass index - **Gender**: 0 if male; 1 if female +-----------------------------------------------------------------------+ | **Model \#1:** | | | | Estimate Std. Error t value Pr(\>\|t) | | | | (Intercept) 69.452 4.824 14.40 0.000 | | | | Age 0.8936 0.094 9.51 0.000 | | | | BMI 0.9767 0.190 5.14 0.000 | | | | Gender -4.0205 1.657 ???? ????? | | | | Residual standard error: 3.775 | | | | Multiple R-Squared = 95.88% | +-----------------------------------------------------------------------+ a. Write out the fitted regression equation for predicting systolic blood pressure based on Age, BMI and Gender. b. Write out the population-level regression equation that is assumed by the output above. c. Test whether Gender is an important variable for predicting systolic blood pressure in the presence of other predictors. Use α=0.05. \[4 points\] - Hypotheses: - Test statistic: - Assuming the p-value is \< 0.05, please provide a conclusion d. Give an interpretation of the estimated coefficient for Gender. e. The following is the ANOVA table for testing about the overall regression relationship. Fill in the blank based on the summary output for Model \#1 above. Show your work in the space below the table. **Source** **DF** **SS** **MS** **F** ---------------- -------- -------- -------- ----------- **Regression** **178.2** **Error** **Total** f. Consider the model below with an interaction term between BMI and Gender. **Model \#2:** [**SBP=** **β**~**0**~**+β**~**1**~(Age)**+β**~**2**~(BMI)**+β**~**3**~(Gender)**+β**~**4**~(BMI)**\***(Gender) **+** **ε**]{.math.inline} Explain how the interaction term changes the regression model. **\ ** **Reviewing Correlation** a. Based on the scatter plot of Salary against Homerun at the right, what is the direction of the correlation between Salary and Homerun? Circle one: **Positive Negative** Circle the better statistic for identifying outliers. residuals standardized residuals In simple linear regression, which of the following could be used to test H~0~: β~1~ = 0 vs Ha: β~1~ ≠ 0? Select all that apply. F-test t-test Confidence interval In simple linear regression, which of the following could be used to test H~0~: β~1~ = 2 vs Ha: β~1~ ≠ 2? Select all that apply. F-test t-test Confidence interval In simple linear regression, which of the following could be used to test H~0~: β~1~ = 0 vs Ha: β~1~ \> 0? Select all that apply. F-test t-test Confidence interval In multiple linear regression, which of the following should be used to test H~0~: β~1~ = β~2~ = 0 vs Ha: β~1~ ≠ 0 or β~2~ ≠ 0? Select all that apply. F-test t-test Confidence interval The following output was obtained with backward elimination in R Backward Elimination Method \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# Candidate Terms: \#\# 1. ADV \#\# 2. BONUS \#\# 3. MKTSHR \#\# 4. COMPET \#\# \#\# We are eliminating variables based on p value\... \#\# \#\# - COMPET \#\# \#\# Backward Elimination: Step 1 \#\# \#\# Variable COMPET Removed \#\# \#\# Model Summary \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# R 0.927 RMSE 91.751 \#\# R-Squared 0.858 Coef. Var 7.230 \#\# Adj. R-Squared 0.838 MSE 8418.203 \#\# Pred R-Squared 0.814 MAE 70.921 \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# RMSE: Root Mean Square Error \#\# MSE: Mean Square Error \#\# MAE: Mean Absolute Error \#\# \#\# ANOVA \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# Sum of \#\# Squares DF Mean Square F Sig. \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# Regression 1072191.474 3 357397.158 42.455 0.0000 \#\# Residual 176782.266 21 8418.203 \#\# Total 1248973.740 24 \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# \#\# Parameter Estimates \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# model Beta Std. Error Std. Beta t Sig lower upper \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# (Intercept) -620.638 240.108 -2.585 0.017 -1119.969 -121.306 \#\# ADV 2.470 0.278 0.802 8.872 0.000 1.891 3.049 \#\# BONUS 1.900 0.726 0.237 2.617 0.016 0.390 3.411 \#\# MKTSHR 3.116 4.314 0.060 0.722 0.478 -5.854 12.087 \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# \#\# \#\# - MKTSHR \#\# \#\# Backward Elimination: Step 2 \#\# \#\# Variable MKTSHR Removed \#\# \#\# Model Summary \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# R 0.925 RMSE 90.749 \#\# R-Squared 0.855 Coef. Var 7.151 \#\# Adj. R-Squared 0.842 MSE 8235.292 \#\# Pred R-Squared 0.820 MAE 71.916 \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# RMSE: Root Mean Square Error \#\# MSE: Mean Square Error \#\# MAE: Mean Absolute Error \#\# \#\# ANOVA \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# Sum of \#\# Squares DF Mean Square F Sig. \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# Regression 1067797.321 2 533898.660 64.831 0.0000 \#\# Residual 181176.419 22 8235.292 \#\# Total 1248973.740 24 \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# \#\# Parameter Estimates \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# model Beta Std. Error Std. Beta t Sig lower upper \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# (Intercept) -516.444 189.876 -2.720 0.013 -910.222 -122.666 \#\# ADV 2.473 0.275 0.803 8.983 0.000 1.902 3.044 \#\# BONUS 1.856 0.716 0.232 2.593 0.017 0.372 3.341 \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# \#\# \#\# \#\# No more variables satisfy the condition of p value = 0.3 \#\# \#\# \#\# Variables Removed: \#\# \#\# - COMPET \#\# - MKTSHR \#\# \#\# \#\# Final Model Output \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# \#\# Model Summary \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# R 0.925 RMSE 90.749 \#\# R-Squared 0.855 Coef. Var 7.151 \#\# Adj. R-Squared 0.842 MSE 8235.292 \#\# Pred R-Squared 0.820 MAE 71.916 \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# RMSE: Root Mean Square Error \#\# MSE: Mean Square Error \#\# MAE: Mean Absolute Error \#\# \#\# ANOVA \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# Sum of \#\# Squares DF Mean Square F Sig. \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# Regression 1067797.321 2 533898.660 64.831 0.0000 \#\# Residual 181176.419 22 8235.292 \#\# Total 1248973.740 24 \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# \#\# Parameter Estimates \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# model Beta Std. Error Std. Beta t Sig lower upper \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# (Intercept) -516.444 189.876 -2.720 0.013 -910.222 -122.666 \#\# ADV 2.473 0.275 0.803 8.983 0.000 1.902 3.044 \#\# BONUS 1.856 0.716 0.232 2.593 0.017 0.372 3.341 \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# Elimination Summary \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# Variable Adj. \#\# Step Removed R-Square R-Square C(p) AIC RMSE \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- \#\# 1 COMPET 0.8585 0.8382 3.1054 302.5419 91.7508 \#\# 2 MKTSHR 0.8549 0.8418 1.6052 301.1557 90.7485 \#\# \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-- a. Write the final fitted model selected by the algorithm. b. Is there evidence of omitted variable bias in the final model? Why or why not? c. Explain why MKTSHR was dropped when moving from the 2^nd^ step to the 3^rd^ step of the algorithm. d. Explain why the algorithm stopped in the 3^rd^ step, i.e., why is ADV and BONUS in the final model? **[Question using the Partial F-test ]** Use the regression output below to test using an F-statistic whether any of the seasonal dummy variables belong in the model. \#\# Coefficients: \#\# Estimate Std. Error t value Pr(\>\|t\|) \#\# (Intercept) 214.59413 2.95039 72.734 \< 2e-16 \*\*\* \#\# trend 2.56610 0.09895 25.932 \< 2e-16 \*\*\* \#\# quarter\_2 -29.86610 3.21712 -9.284 5.72e-11 \*\*\* \#\# quarter\_3 -29.53220 3.22168 -9.167 7.85e-11 \*\*\* \#\# quarter\_4 -3.74830 3.22927 -1.161 0.254 \#\# Analysis of Variance Table \#\# Response: SALES \#\# Df Sum Sq Mean Sq F value Pr(\>F) \#\# trend 1 34818 34818 673.4576 \< 2.2e-16 \*\*\* \#\# quarter\_2 1 2647 2647 51.1924 2.414e-08 \*\*\* \#\# quarter\_3 1 5096 5096 98.5730 1.023e-11 \*\*\* \#\# quarter\_4 1 70 70 1.3473 0.2536 \#\# Residuals 35 1810 52 A. Write the null and alternative hypotheses. B. Calculate the F-statistic. C. Use the F-statistic to make a conclusion (or at least write the decision criteria that you would use). D. Combine the output below with the output from the full model above to calculate the F-statistic in a different way. \# Coefficients: \#\# Estimate Std. Error t value Pr(\>\|t\|) \#\# (Intercept) 199.017 5.128 38.81 \< 2e-16 \*\*\* \#\# trend 2.556 0.218 11.73 3.42e-14 \*\*\* \#\# Analysis of Variance Table \#\# \#\# Response: SALES \#\# Df Sum Sq Mean Sq F value Pr(\>F) \#\# trend 1 34818 34818 137.5 3.417e-14 \*\*\* \#\# Residuals 38 9622 253 **Questions about logistic regression**: 1. Suppose the response variable of interest is the sales volume for a restaurant. The explanatory variables are location demographics, marketing expenses, etc. What is the appropriate technique to understand the relationship between the X\'s and Y? Select all that apply. Multiple Linear Regression Logistic Regression 2. Suppose the response variable of interest is whether a restaurant will go bankrupt. The explanatory variables are location demographics, marketing expenses, etc. What is the appropriate technique to understand the relationship between the X\'s and Y? Select all that apply. Multiple Linear Regression Logistic Regression 3. If the probability of an event occurring is 0.8. Calculate the odds of that event occurring. 4. After running a logistic regression model, you find for a new observation that the estimated odds that Y=1 is 1.765. This means that you should predict that Y=0 for that observation. The output below shows the estimates for a logistic regression model, where the response (GROUP) is 1 if an employee has a satisfactory performance and 0 otherwise A black text on a white background Description automatically generated 5. What is the logistic regression model? 6. As the score on TEST1 increases, is it more or less likely that an employee has a satisfactory performance? Why?