Summary

This document contains a practice exam with questions on topics in regression analysis. The questions cover various aspects of regression analysis, including outlier detection, sample size, interpretation of intercepts, and regression model validity. The practice exam is suitable for undergraduate-level students in disciplines including economics and business.

Full Transcript

## Question 1 An outlier - is defined as an observation considerably larger or smaller than any other in the sample ## Question 2 Which of the following best describes a sufficiently large sample for testing regression coefficients using the z-value, which is equal to two for a 95% cl. (Hint: the z...

## Question 1 An outlier - is defined as an observation considerably larger or smaller than any other in the sample ## Question 2 Which of the following best describes a sufficiently large sample for testing regression coefficients using the z-value, which is equal to two for a 95% cl. (Hint: the z value is the critical value for a normal distribution). - you need a sample of at least 1500 to perform testing ## Question 3 The intercept in a simple regression equation may always be interpreted as - the value of the dependent variable when the explanatory variable is zero ## Question 4 In simple regression models, the F-test and the t-test - yield the identical p-values ## Question 5 A simple random sample - will have the same chance of occurring as any other sample of the same size. ## Question 6 Prediction intervals in regression output are valid for forecasting purposes only if, for the period being forecast, - independent variables are not outside their range during the estimation period ## Question 7 In many regression situations, according to the central limit theorem, - all of these - the assumption of normally distributed ɛ is approximately true for large samples - least-squares estimators are efficient under the assumptions of the classical model - the least-squares equation minimizes the sum of squared errors ## Question 8 A lumber mill forecasts costs for 3 alternative market scenarios: high, low, and unchanged prices. These are - all of these - contingency forecasts - conditional forecasts - ex ante forecasts - ex post forecasts ## Question 9 If all regression assumptions are valid except there is nonconstant o, estimators are - unbiased but not efficient ## Question 10 Any of the following may result in specification errors in a model except: - using a linear instead of a correct quadratic form ## Question 11 An insurance company gradually reduced its field offices from 51 to 38, to 23, to 15, to 11, and finally to 7. A model is estimated relating total operating cost, OpCost, to number of offices, Office, and policies written, Policy: OpCost = Bo + B1 Office + ẞ2 Policy + ε A 2010 forecast of operating costs is subject to extrapolation error if - any of these - the number of policies for 2010 not known with certainty - Policy variable may not belong in 2010 model - the company consolidates to 3 offices - the company splits up into 45 offices ## Question 12 According to the Gauss-Markov theorem - both b and c - the least-squares estimators are efficient - the least-squares estimators are unbiased ## Question 13 Clients at a large brokerage company are primarily concerned about selling their stocks just before a major downward correction occurs in the market. The brokerage company should therefore choose a model that - forecasts turning points best ## Question 14 The advantage of using a lagged independent variable is - a conditional forecast ## Question 15 If you have data for the entire population, which of the following will no longer be a factor? - all of the above - sampling error - measurement error - modeling error - errors in judgment ## Question 1 Variables: the statistical model will explain and predict Gross, the domestic gross box office revenue of a movie (in millions of $) by four variables identified in the film industry as potentially affecting Gross: (a) ProdBudg, movie's production budget (in millions of $); (b) Directed, experience of the director (# of movies previously directed); (c) Runtime, movie length (in minutes); and (d) Screens, theatrical distribution in general release (number of screens). Data: A random sample of 29 major studio movies released last year (i.e., in 2007) containing values of the 5 variables for each movie. This model uses a sample of **cross-sectional** data (type of data: 2 words), is **multiple** regression (type of regression), and the dependent variable is **gross** (variable name). ### Expected Effects Many people won't go out to a movie unless it has fabulous special effects or big-name box-office stars, both of which require a studio to spend a lot in the production budgets. Thus, you should expect ProdBudg to have a/an **direct** effect on Gross and thus a **one** -tail t-test will be run. A director who has made many movies has gained experience in her craft, and directing a lot of movies usually indicates that her movies have been successful. Thus, you should expect that Directed will have a/an **direct** effect on Gross and thus a **one** -tail t-test will be run. Longer movies give movie goers more entertainment value for their money and have time to deal with complex, Oscar-worthy subject matter, thus suggesting that Runtime has a/an **direct** effect on Gross. However, longer movies can drag on due to failure to edit out lengthy, unneeded scenes, which indicated a/an **inverse** effect of Runtime on Gross. Thus a **two** tail t-test will be run. A successful movie needs to be shown in thousands of multiplexes and small towns or else large segments of the population won't ever get a chance to see it. Thus, Screens will have a/an **direct** effect on Gross and a **one** -tail t-test will be run. ## Question 3 | | ProdBudg | Directed | Runtime | Screens | |--------------|----------|----------|---------|---------| | ProdBudg | 1 | 0.46 | 0.44 | 0.75 | | Directed | | 1 | 0.26 | 0.24 | | Runtime | | | 1 | 0.17 | | Screens | | | | 1 | Based on the above correlations among the independent variables in the model, multicollinearity damage could only occur for two of these variables: **ProdBudg** and **Screens** (type their variable names) ## Question 4 | | Gross | ProdBudg | Directed | Runtime | Screens | |----------------------|------|----------|----------|---------|---------| | Mean | 94 | 77 | 4.6 | 112 | 2932 | | Standard Error | 17 | 14 | 0.8 | 5 | 150 | | Median | 70.108 | 61 | 3 | 104 | 2848 | | Mode | #N/A | 110 | 3 | 90 | #N/A | The descriptive statistics points out several important aspects of movie data. First, notice how the presence of a few blockbuster hits drags mean Gross well above the median for this **skewed-to-the-right** variable. A 95% confidence interval for the population mean of Gross is from a low of $ **60** million to a high of $ **128** million. ## Question 1 | | | |-------------------------|-------------------------| | Regression Statistics | | | Multiple R | 0.849 | | R Square | 0.721 | | Adjusted R Square | 0.674 | | Standard Error | 51.3 | | Observations | 29 | | | | | ANOVA | df | SS | MS | F | Significance F | | Regression | 4 | 163027 | 40757 | 15.5 | 2.2E-06 | | Residual | 24 | 63206 | 2634 | | | | Total | 28 | 226234 | | | | Our first task was to test our model because if it's worthless, we'd have to go back and design a new model. Recall how we told you there's always a chance that a model provides no useful information for predicting movie gross any better than simply guessing average gross of all movies. You alerted us that you weren't willing to adopt our findings if there were as much as a 1% risk that our results are useless. Well, great news! The second table above confirms that the model easily meets your rigorous .01 standards. In fact, the p-value is so tiny that there is almost no chance that our model is useless for prediction: p equals 0.0000022. Our model accounted for 72.1% of the total variation in movie grosses. You mentioned a study you heard about that reported a fit of 57.3% after adjusting for sample size and number of variables. Well, our model found an even better adjusted fit of 67.4%. ## Question 2 | | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |-------------------------|----------------|--------|---------|-----------|-----------| | Intercept | 71 | -1.84 | 0.08 | -279 | 16 | | ProdBudg | 0.26 | 1.24 | 0.23 | -0.2 | 0.9 | | Directed | 4.3 | 1.40 | 0.18 | -3 | 15 | | Runtime | 0.45 | 0.61 | 0.55 | -0.6 | 1.2 | | Screens | 0.018 | 2.84 | 0.009 | 0.014 | 0.088 | Now that we've demonstrated the significance of the model and overall relative fit measures, this report will test and estimate the impact of each of the model's four explanatory variables on a movie's gross. Recall that in our first report, we used the crucial movie industry insights you provided us to anticipate the only logical direction of possible effects from production budget, experience of the director, and number of screens. Only the runtime of a movie might logically affect movie gross in either a direct or inverse way. We took advantage of this advanced knowledge to improve our chances of finding significant effects for the other three variables. Most of you executives Colossal Studios told us to check for any variables likely to have an effect on movie gross as long as there was a less than 10% chance of being mistaken. On that basis for the test, here are our conclusions about which variables do or do not affect movie: production budget **does not** have a significant effect, the number of films directed **does** have a significant effect, runtime **does not** have a significant effect, and number of screens **does** have a significant effect. If that is your standard, then the variable Directed **no longer** tests significant. Next, we present our findings for the effect on movie gross from a change in the number of screens that a film opens at in general release. You've told us that up to half a film's total domestic gross occurs during opening week and drops off steeply the following weeks. We therefore calculated the addition to a movie's gross revenue when the studio is able to convince theatre owners to show the movie on more screens. Based on the output above, we can say with 95% confidence that adding 1000 screens opening week will a movie's gross by an average by **$ 51** million with a margin of error of **$ 36** million . ## Question 3 | | ProdBudg | Directed | Runtime | Screens | |--------------|----------|----------|---------|---------| | ProdBudg | 1 | 0.29 | 0.44 | 0.77 | | Directed | | 1 | 0.19 | 0.26 | | Runtime | | | 1 | 0.17 | | Screens | | | | 1 | Finally, we also checked whether one or more variables may actually affect movie gross but were prevented from tasting significant due to any high correlations we noticed in our earlier report. After checking correlations in the above table and significance tests just conducted, it is clear that tests for the variable Screens **could not** have been damaged and Runtime **could not** have been damaged by multicollinearity (type in either could or could not for each). ## Question 1 | | | |-------------------------|-------------------------| | Regression Statistics | | | Multiple R | 0.836 | | R Square | 0.699 | | Adjusted R Square | 0.649 | | Standard Error | 53.3 | | Observations | 29 | | | | | ANOVA | df | SS | MS | F | Significance F | | Regression | 4 | 158350 | 39587 | 13.9 | 5.2E-06 | | Residual | 24 | 68224 | 2843 | | | | Total | 28 | 226573 | | | | | | | | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | | Intercept | 68 | -1.37 | 0.18 | -234 | 47 | | ProdBudg | 0.25 | 1.70 | 0.102 | -0.1 | 0.9 | | Directed | 3.4 | 1.22 | 0.23 | -2.8 | 11.0 | | Runtime | 0.44 | 0.20 | 0.84 | -0.8 | 1.0 | | Screens | 0.018 | 2.52 | 0.019 | 0.008 | 0.081 | The above output is the result of fitting our model to the random sample of movie data collected for this study. Colossal Studios may be assured that the formula we report here will generate unbiased predictions that are the closest possible to the actual grosses of last year's movies, under a standard set of model assumptions. That formula you should use to predict movie gross (in millions of dollars) is: Gross = -93 +0.42 ProdBudg + 4.1 Directed + 0.09 Runtime + 0.045 Screens. For example, the movie Blades of Glory, had values for the explanatory variables of ProdBudg = 61, Directed = 2, Runtime = 93, and Screens = 4050. Thus, our formula would predict this film to gross about $ **130** million (round to the nearest ten). Of course, no movie gross can be guessed precisely in advance due to the countless other factors that may attract or turn off moviegoers. However, at least we can quantify that margin of error for any predicted gross calculated by this formula. Specifically, you'll be nearly sure of being right (i.e., 19 out of 20 times, on average) if you accompany your movie gross predictions with a "plus or minus" of $ **100** million (round to nearest ten). ## Question 2 | Movie | Gross | Prod Budg | Directed | Runtime | Screens | |-----------------------|------|----------|----------|---------|---------| | Bridge to Terabithia | 82.3 | 60 | 1 | 95 | 3210 | | Norbit | 95.7 | 60 | 7 | 110 | 3145 | | The Bucket List | 93.5 | 45 | 13 | 97 | 2915 | | 3:10 to Yuma | 53.6 | 55 | 6 | 117 | 3006 | | | | | | | | | | | | | | | | **Generates Model Predictive Performance Measures** | | | | | | | Movie | Fore- | Actual | Squared | Absolute | Abs% | | | cast | Value | Error | Error | Error | | Bridge to Terabithia | 90.1 | 82.3 | 62 | 8 | 10% | | Norbit | 99.5 | 95.7 | 15 | 4 | 4% | | The Bucket List | 76.4 | 93.5 | 293 | 17 | 18% | | 3:10 to Yuma | 95.0 | 53.6 | 1716 | 41 | 77% | | | | | | | | | | Bias = | 9 | RMSE= | 23 | MAE= | 18 | MAPE= | 27% | To check how well our formula performs, we chose four of last year's movies not used in our sample data. In the Excel output chart above, the predicted gross is calculated from our formula by plugging in the values of each explanatory variables for each movie listed in the top table. A comparison of the actual and forecasts makes it clear that the **third** of the four movies outperformed by the largest amount its forecast value, and the **first** of the four movies substantially underperformed its forecasted gross. On average, we found our formula predictions for these four movies exhibited a small **upward** bias. The average absolute error of the forecasts was approximately $ **20** million, and the error as a percent of gross averaged about **30** percent. ## Question 19 An insurance company forecasts earnings from policies: Earnings = ẞO + β₁ Policy + ε Which will **not** cause regression parameters B₁ to change? - Policy sales increase. ## Question 20 A model is more likely to test significant if - all of these - sample size n is large - many independent variables - a used for the test is small ## Question 16 In simple regression models, the F-test and the t-test - all of these - test statistically equivalent hypotheses - yield the identical p-values for each test - result in an F-ratio equal to the t-ratio ## Question 17 If all classical regression model assumptions are valid, least-squares estimators - all of these - are unbiased - have minimum standard deviation among all unbiased estimators - are efficient ## Question 18 If all regression assumptions are valid, least-squares estimators - all of the above - are unbiased - have minimum standard deviation among all unbiased estimators - are efficient

Use Quizgecko on...
Browser
Browser