Lecture 7: Sales Revenue Analysis PDF
Document Details
Uploaded by ReasonableDerivative
Southampton
2024
Nicolas Apfel
Tags
Summary
This document presents a lecture on multiple linear regression, specifically focusing on sales revenue analysis. It details various tests, including those for individual coefficients and overall model significance. The analyses involve assessing the impact of price and advertising on sales.
Full Transcript
# Lecture 7 ## Monday, 18 November 2024 1:08 PM - Recall the example on sales revenues of a shop $S¡ = β₁ + β2P; + β3A; + e¡$ | Variable | Coefficient | Std. Error | t-Statistic | Prob. | |---|---|---|---|---| | C | 118.9136 | 6.3516 | 18.7217 | 0.0000 | | PRICE | -7.9079 | 1.0960 | -7.2152 | 0.00...
# Lecture 7 ## Monday, 18 November 2024 1:08 PM - Recall the example on sales revenues of a shop $S¡ = β₁ + β2P; + β3A; + e¡$ | Variable | Coefficient | Std. Error | t-Statistic | Prob. | |---|---|---|---|---| | C | 118.9136 | 6.3516 | 18.7217 | 0.0000 | | PRICE | -7.9079 | 1.0960 | -7.2152 | 0.0000 | | ADVERT | 1.8626 | 0.6832 | 2.7263 | 0.0080 | $R^2 = 0.4483$ $SSE= 1718.943$ $ô= 4.8861$ $s= 6.48854$ ## Testing a single coefficient in MLR model - We are interested to verify the importance of price $H_0: β_2 = 0$ $H_1: β_2 ≠ 0$ -Test statistic under $H_0$ $t = \frac{b_2}{se(b_2)} \sim t_{n-K}$ -Logic: checking whether $b_2$ is greater than what could be obtained simply by chance. -Distinguish statistical significance vs numerical magnitude. -If we decide for $\alpha = 0.05$, the two critical values that isolate 0.025 probability on each tail of the distribution are $t(0.975,72) = 1.993$ and $t(0.025,72) = -1.993$ -The sample value of the t statistic is $t = \frac{-7.908}{1.096} = -7.215$ -The associated two-sided p-value is $P(t_{72} > 7.215 | H_0) + P(t_{72} < -7.215 | H_0) = 2 \cdot (2.2 \times 10^{-10}) \approx 0.000$ -We reject $H_0$ since $-7.215 < -1.993$, or since $pv(t) = 0.000 < 0.05 = α $ -Hence, the data suggests that sales depends on price; the hypothesis that price does not matter is rejected by the data with a confidence of 95%. ## Testing a single coefficient in MLR model -If we want to test whether sales revenues are related to advertising expenditure $H_o : β_3 = 0$ $H_1: β_3 ≠ 0$ -Test statistic under $H_o$ $t = \frac{b_3}{se(b_3)} \sim t_{n-K}$ -Using a 5% significance level we reject $H_o$ if $|t| > 1.993$, or alternatively if $pv(t) < 0.05$ -The sample value of the t statistic is $t = \frac{1.8626}{0.6832} = 2.726$ -The associated two-sided p-value is $P(t_{72} > 2.726 | H_0) + P(t_{72} < −2.726 | H_0) =$ - $P(|t| > 2.726 | H_0) = 2 \cdot 0.004 = 0.008 $ -We reject $H_o$ since $2.726 > 1.993$, or since $pv(t) = 0.008 < 0.05$ -Hence, the data support the conjecture that sales are affected by advertising expenditure. ## Testing a single coefficient in MLR model $S₁ = β₁ + β2P; + β3A; + e¡$ -We want to test whether demand is price-elastic, knowing that revenues are defined as $S = Q \cdot P$ and $ ε = \frac{\Delta Q \cdot P}{\Delta P \cdot Q}$ $H_0: β_2 ≥ 0 $ $ε ≤1$ $H_1: β_2 < 0$ $ε > 1$ -Test statistic under $H_0$ $t = \frac{b_2}{se(b_2)} \sim t_{n-K}$ -Using a 5% significance level we reject $H_0$ if $t < -1.668$, or alternatively if p(t) < 0.05 -The sample value of the t statistic is $t = \frac{-7.908}{1.906} = -7.215$ -The associated p-value is $P(t_{72} < -7.215 | H_0) = 0.000$ -We reject $H_0$ since $-7.215 < -1.666$ or since $pv(t) = 0.000 < 0.05$ ## Goodness-of-fit -Coefficient of determination in MLR model $\newline$ $R^2 = \frac{SSR}{SST}= \frac{\sum_{i=1}^{N}(y_i - \bar{y})^2}{\sum_{i=1}^{N}(y_i- \bar{y})^2} = 1 - \frac{SSE}{SST} = 1 - \frac{\sum_{i=1}^{N}e_i ^2}{\sum_{i=1}^{N}(y_i- \bar{y})^2}$ $\newline$ -Fitted value $\newline$ $\hat{y}_i = b_1 + b_2x_{i2} + b_3x_{i3} + ... + b_kx_{ik}$ $\newline$ -Total standard deviation of dependent variable $\newline$ $\hat{σ_y} = \sqrt{\frac{1}{N-1}\sum_{i=1}^{N}(y_i - \bar{y})^2} = \sqrt{\frac{SST}{N-1}}$ $\newline$ -So $SST = (N - 1)\hat{σ^2}$ $\newline$ -In the shop example $\newline$ $SST = 74 \cdot 6.448^2 = 3115.485$ $\newline$ $SSE = 1718.743$ $\newline$ -So, we obtain $\newline$ $R^2= 1 - \frac{SSE}{SST} = 1 - \frac{1718.743}{3115.485} = 0.448$ $\newline$ -Advantages of $R^2$ $\newline$ * Unit-free * Bounded measure * Concise $\newline$ -Problems of $R^2$ $\newline$ * Adding a regressor will never reduce the $R^2$; so, tendency to over-fitting. * Does not say if appropriate regressors have been chosen. * Not suitable to compare models with different dependent variable. * Model must include intercept, otherwise SST ≠ SSR + SSE. * What is a satisfactory $R^2$ depends on the field. $\newline$ -The adjusted $R^2$ is fairer since it penalizes over-parameterized models. $\newline$ $R^2 = 1-\frac{SSE/(N-K)}{SST/(N-1)}$ $\newline$ -Adding a regressor increases $R^2$ if the associated |t| > 1. $\newline$ -No more interpretation as proportion of explained variation. $\newline$ -It is possible to test one or more joint hypotheses using an approach that is based on the loss of fit, usually under the name of F test. $\newline$ -We will consider first the F test in the case of one single coefficient and then extend it to a set of joint hypotheses. $\newline$ -Recall our MLR model on sales $\newline$ $S_i = β₁ + β2P_i + β3A_i + e_i$ $\newline$ -Assume we want to test the following hypothesis $\newline$ $H_0:\begin{cases} β_3 = 0 \end{cases}$ $\newline$ $H_1:\begin{cases} β_3 ≠ 0 \end{cases}$ $\newline$ -We know that adding a regressor causes an increase in fit, while removing a regressor causes a drop in fit. $\newline$ -The idea is to compare the unrestricted model $\newline$ $ S_i = β₁ + β2P_i + β3A_i + e_i$ $\newline$ with the restricted model $\newline$ $S_i = β₁ + β2P_i + e_i$ $\newline$ and assess how big is the loss of fit (the increase in SSE) that results from imposing the exclusion restriction on $A_i$. $\newline$ -We estimate both models (unrestricted and restricted); for each model we calculate the sum of squared residuals (SSE), and then we compare them. $\newline$ -We build de F test statistic $\newline$ $F = \frac{(SSER - SSEU)/J}{SSEU/(N-K)} = \frac{(SSER - SSEU)/J}{\hat{σ^2}}$ $\newline$ -Under the H0 $\newline$ $ F \sim F(J,N-K)$ $\newline$ -If the H0 is not true, $SSER - SSEU$ becomes large implying that the constraints placed on the model by the H0 have a large effect on the ability of the model to fit the data. $\newline$ -In our sales example we have $\newline$ $F = \frac{(SSER - SSEU)/1}{SSEU/(75 - 3)} \sim F( 1,72)$ $\newline$ -Using α = 0.05 the critical value is $F(0.95,1,72) = 3.97$ $\newline$ -The sample value of the test statistic is $\newline$ $F = \frac{(SSER - SSEU)/J}{SSEU/(N-K)} = \frac{(1718.94 - 1261.83)/1}{1718.94/72} = 52.06$ $\newline$ $pv(F) = P(F(1,72) ≥ 52.06|H_0) = 0.000$ $\newline$ -Clearly, we reject the restriction in Ho. $\newline$ -The F distribution is defined over values from 0 to infinity, so the F test is one-tailed. $\newline$ -When testing a single “equality" null hypothesis (a single restriction) against a “not equal to” alternative hypothesis, using the t-test or the F-test is equivalent. ## Testing the significance of the model - Consider the general MLR model $\newline$ $y_i = β_1 + β_2x_{i2} + β_3x_{i3} + ... + β_kx_{ik} + e_i$ $\newline$ -We can test the overall significance of the model. $\newline$ $H_o : \begin{cases} β_2 = β_3 = ... = β_k = 0 \end{cases}$ $\newline$ $H_1: \begin{cases} at least one β_k ≠ 0 for k = 2, 3, ..., K \end{cases}$ $\newline$ -This is clearly a joint hypothesis, so we can use the F test. $\newline$ -The unrestricted model is the original model $\newline$ $y_i = β_1 + β_2x_{i2} + β_3x_{i3} + ... + β_kx_{ik} + e_i$ $\newline$ -The restricted model under $H_o$ is $\newline$ $y_i = β_1 + e_i$ $\newline$ -The OLS estimate of $β_1$ in the restricted model is $b_1^* = \bar{y}$ $\newline$ -So we have $\newline$ $SSER = \sum_{i=1}^{N}(y_i - b_1^*)^2 = \sum_{i=1}^{N}(y_i - \bar{y})^2 = SST$ $\newline$ -Hence the F test statistic becomes $\newline$ $F = \frac{(SSER - SSEU)/ J}{SSEU/(N-K)} = \frac{(SST - SSE)/(K-1)}{SSE/(N-K)}$ $\newline$ which under $H_0$ $\newline$ $F \sim F(K-1, N-K)$ $\newline$ -In our example of sales $\newline$ $S_i = β₁ + β2P_i + β3A_i + e_i$ $\newline$ -The test on the overall significance $\newline$ $H_o : \begin{cases} β_2 = β_3 = 0 \end{cases}$ $\newline$ $H_1: \begin{cases} β_2 ≠ 0 or β_3 ≠ 0 or both nonzero \end{cases}$ $\newline$ $F = \frac{(SST - SSE)(3 - 1)}{SSE/(75-3)} \sim F(2,72)$ $\newline$ -The sample value of the test statistic $\newline$ $F = \frac{(3115.485 - 1718.943)/2}{1718.943/(75 - 3)} = 29.25$ $\newline$ -The critical value at 5% is 3.12 $\newline$ -Hence, we reject $H_0$ and conclude than the overall model is good to explain sales. ## Testing an extended model. -That theoretical assumption can be acomodated by a quadratic specification $\newline$ $S_i = β₁ + β2P_i + β3A_i + β₄A_i^2 + e_i$ $\newline$ -Here the marginal effect of advertising depends on the level of advertising. $\newline$ $\frac{\Delta E[S|P, A]}{\Delta A} = β_3 + 2β_4A $ (P held constant) $\newline$ -We can test the importance of advertising as a joint hypothesis $\newline$ $H_o : \begin{cases} β_3 = β_4 = 0 \end{cases}$ $\newline$ $H_1: \begin{cases} β_2 ≠ 0 or β_3 ≠ 0 or both nonzero \end{cases}$ $\newline$ -The two models are $\newline$ $S_i = β₁ + β2P_i + β3A_i + β₄A_i^ 2 + e_i $ (unrestricted model) $\newline$ $S_i = β₁ + β2P_i + e_i$ (restricted model) $\newline$ -The F test statistic is $\newline$ $F = \frac{(SSER - SSEU)/2}{SSEU/(75-4)} = 8.44$ $\newline$ -The critical value at 5% $\newline$ $F_{0.95; 2,71} = 3.126$ $\newline$ -The p-value $\newline$ $P(F_{2,71} > 8.44 | H_0) = 0.0005$ $\newline$ -We conclude that advertising has a statistically significant effect on sales revenues. ## Testing hypothesis from economic theory. -Economic theory tells us the profit-maximing firms act so that marginal benefits equal marginal costs. $\newline$ -Marginal cost of advertising= $1 (excluding te additional cost of producing the new demanded quantity). $\newline$ -Marginal benefit = $∂S/∂A = β_3 + 2β_4A$ $\newline$ -Firm's equilibrium requires $\newline$ $β_3 + 2β_4A = 1$ $\newline$ -This is a linear restriction on the coefficients, so it's testable, either using the t-test or the F-test. $\newline$ -One specific shop has spent $1,900 monthly on advertising. Is this amount consistent with the optimal choice? $\newline$ -The set of hypotheses is $\newline$ $H_0:\begin{cases} β_3 + 2\cdot β_4 \cdot 1.9 = 1 \end{cases}$ $\newline$ $H_1:\begin{cases} β_3 + 2\cdot β_4 - 1.9 ≠ 1 \end{cases}$ $\newline$ $H_0:\begin{cases} β_3 + 3.8β_4 = 1 \end{cases}$ $\newline$ $H_1:\begin{cases} β_3 + 3.8β_4 ≠ 1 \end{cases}$ $\newline$ -To test a linear restriction on the coefficients we can use the t test $\newline$ $t = \frac{b_3 + 3.8b_4 - 1}{se(b_3 + 3.8b_4)}$ $\newline$ -At the denominator we have to calculate the variance of a linear combination $\newline$ $var(b_3 + 3.8b_4) = var(b_3) + 3.8^2var(b_4) + 2\cdot 3.8 \cdot cov(b_3, b_4)$ $\newline$ $ = 12.646 + 3.8^2\cdot 0.885 - 2\cdot 3.8 \cdot 3.289$ $\newline$ $ = 0.428$ $\newline$ -The sample value of the test statistic is $\newline$ $ \frac{1.633 - 1}{0.633} = 0.968 $ $\newline$ -The 5% critical value $\newline$ $t(0.975,71) = 1.994$ $\newline$ -Since |t| < 1.994 we cannot reject H0 that $1,900 is the optimal level of advertising. $\newline$ -There is no evidence suggesting that the shop should change its advertising strategy $\newline$ -Let's test the restriction $β_3 + 2β_4A_0 = 1$ using the F test. $\newline$ -The unrestricted model $\newline$ $S_i = β₁ + β2P_i + β3A_i + β₄A_i^2 + e_i$ $\newline$ -The restricted model $\newline$ $S_i = β₁ + β2P_i + (1 − 3.8β₄)A_i + β₄A_i^2 + e_i$ $\newline$ $S_i - A_i = β₁ + β2P_i + β₄(A_i - 3.8A_i) + e_i$ $\newline$ -After running the two estimations we get $\newline$ $F = \frac{(SSER - SSEU)/J}{SSEU/(N-K)} = \frac{(1552.286 - 1532.084)/1}{1532.084/71} = 0.936$ $\newline$ -Critical value at 5% $\newline$ $F(0.95;1,71) = 3.976 = t^2 = 1.994^2$ $\newline$ $t = \sqrt{0.936} = 0.967$ $\newline$ -The two p-values $\newline$ $pv(F) = P(F(1,71) > 0.936|H_0) = 0.336$ $\newline$ $pv(t) = P(|t| > 0.967|H_0) = 0.336$ $\newline$ -Let's see the case of a one-tail test (inequality in $H_1$). $\newline$ -Assume we want to test whether the optimal value of advertising is greater than $1,900. $\newline$ -The restriction is $β_3 + 2β_4A_0 = 1$. $\newline$ -Estimates are the following $\newline$ $S_i = 109.72 - 7.64P_i + 12.15A_i - 2.77A_i^2$ $\newline$ $(6.80) \quad (1.05) \quad (3.56) \quad (0.94)$ $\newline$ -The set of hypotheses $\newline$ $H_0:\begin{cases} A_{opt} ≤ 1.9 \end{cases}$ $\newline$ $H_1:\begin{cases} A_{opt} > 1.9 \end{cases}$ $\newline$ -Knowing that $β_4 < 0$ it becomes $\newline$ $H_0:\begin{cases} β_3 + (2\cdot 1.9)β_4 ≤ 1 \end{cases}$ $\newline$ $H_1:\begin{cases} β_3 + (2\cdot 1.9)β_4 > 1 \end{cases}$ $\newline$ -Which we can rewrite as $\newline$ $H_0:\begin{cases} β_3 + (2\cdot 1.9)β_4 − 1 ≤ 0 \end{cases}$ $\newline$ $H_1:\begin{cases} β_3 + (2\cdot 1.9)β_4 − 1 > 0 \end{cases}$ $\newline$ -We cannot use the F test, so we use the t test calculated earlier. $\newline$ $t = \frac{b_3 + 3.8b_4 - 1}{se(b_3 + 3.8b_4)} = \frac{1.633 - 1}{0.654} = 0.968$ $\newline$ -The 5% critical value of the right-tail test $\newline$ $t(0.95,71) = 1.667$ $\newline$ -Since 0.9568 < 1.667 we failed to reject $H_0$. $\newline$ -Conclusion: there is not sufficient evidence that the optimal advertising level is beyond $1,900. ## Using non-sample information -Whenever we have non-sample information we are confident about we should use it because it improves estimates' precision. $\newline$ -Non-sample information enters the model in the form of restrictions on the model's parameters $\newline$ -Assume for instante the following model from economic theory, where demand for beer depends on its price, price of liquor, price of remaining goods, and income. $\newline$ $In(Q) = β₁ + β_2ln(PB) + β_3ln(PL) + β_4ln(PR) + β_5ln(I) + e$ $\newline$ -If we believe that economic agents are rational and don't suffer from money illusion, the same proportional change in all prices should leave demand unaltered $\newline$ $In(Q) = β₁ + β_2ln(APB) + β_3ln(APL) + β_4ln(APR) + β_5ln(λI) + e$ $\newline$ $= β₁ + β_2ln(PB) + β_3ln(PL) + β_4ln(PR) + β_5ln(1)$ $\newline$ $+ (β_2 + β_3 + β_4 + β_5)ln(λ) + e$ $\newline$ -The “no money illusion” assumption translates into the resitrction $\newline$ $β_2 + β_3 + β_4 + β_5 = 0 $ $\newline$ -We can impose the restriction by solving for one of the parameters in terms of the others. $\newline$ $β_4 = -β_2 - β_3 - β_5$ $\newline$ and substitute it in the model $\newline$ $In(Q) = β₁ + β_2ln(PB) + β_3ln(PL) + (−β_2 − β_3 − β_5)In(PR) + β_5ln(I) + e$ $\newline$ $= β₁ + β_2[ln(PB) – In(PR)] + β_3[ln(PL) – In(PR)] + β_5[ln(I) – ln(PR)] + e$ $\newline$ $= β₁ + β_2ln(\frac{PB}{PR})+ β_3ln(\frac{PL}{PR}) + β_5ln(\frac{I}{PR}) + e$ $\newline$ -If we run OLS estimation on this restricted model we obtain Restricted Least Squares estimators. $\newline$ -Economic theory (in the form of parameters restrictions and model specification) is an important ingredient in empirical research. -Restricted LS estimator is biased unless the restriction we impose is exactly true. $\newline$ -Imposing restrictions on parameters reduces the variance of the estimators since it reduces the estimation variability caused by random sampling. $\newline$ * Typical trade-off between variance and bias * Recall that restrictions can be tested