Practice Final Fall Quarter 2024 PDF
Document Details
Uploaded by Deleted User
University of Chicago
2024
Tags
Summary
This is a practice final exam for Fall Quarter 2024 at the University of Chicago. It includes questions covering various statistical concepts, hypothesis testing, and OLS estimation. The exam is open-note and has a time limit of 2 hours, and is out of 100 points.
Full Transcript
Practice Final Fall Quarter 2024 December 5, 2024 Name:_____________________________________________________ Instructions: You will have the full class time for this exam of 2 hours. Answer...
Practice Final Fall Quarter 2024 December 5, 2024 Name:_____________________________________________________ Instructions: You will have the full class time for this exam of 2 hours. Answer each question as thoroughly as you can. I encourage you to spend 5 minutes reading through the exam before starting to answer questions. You do not need to feel like you must answer the questions in order. This is an open note exam. If you have any questions about the wording of the question please feel free to ask. The exam is out of 100 points Academic Integrity and Disclosure Agreement By continuing this exam, I agree to abide by the University of Chicago's academic honesty and student conduct policies. All work in this exam is mine and completed using only the allowed materials. I understand that plagiarism and academic misconduct will be reported to the university and punished to the severest extent. I also agree to not discuss or make public the contents of this exam, in part or in full, to other students or other media before Dec 10th at 12 pm CDT. 1 1. (20 Pts) Concepts (a) (10 pts) Suppose you and your coworker are working on a statistical model together. In particular you are looking at the health outcomes of individuals who vape. You compare the means of the health outcomes of individuals who vape and individuals who don't vape. However, upon review of the t-statistic you nd that the dierence between the vaping is not statistically signicant in the sense that you cannot reject the null hypothesis that it is equal to zero. Your friend states that this implies that vaping is not dangerous. Is your friend correct? Explain (b) (5 pts) Suppose you wish to estimate the relationship between Age and Wages. Your friend points out that this relationship is not linear but instead quadratic. That is instead the population regression function is actually written as wagei = β0 + β1 Age2i + εi Does this pose a problem in terms of estimating the coecient β1 via Ordinary Least Squares? Why or why not? Explain. (c) (10 pts) Suppose we have a population regression function is given by Yi = β0 + β1 Xi + εi Suppose you estimate the above via Ordinary Least Squares. What is the dierence between the residuals and the error term 2 2. Hypothesis Testing Kexin is a volconalogist studying the behavior of two volcanoes in the pacic rim, we will name them Volcano A and Volcano B for the purposes of this question. She is particularly concerned that Volcano B is showing signs of becoming more active. As part of the monitoring process she takes measurements of sulfur dioxide (SO2 ) emitted from fumaroles and has those daily readings averaged over the past 6 months (Dierences in number of observations have to do with the ability to access equipment or potential breakdown of equipment over those six months). The sulfur dioxide levels are measured in Parts Per Million. In addition, she is unsure of the exact distribution of the underlying population of daily sulfur dioxide readings though is condent there is some true mean µ and some true variance σ 2 for the underlying population of each volcano's sulfur dioxide emissions. Her results are presented in the table below Volcano A Volcano B ¯ 2 SO 716 854 s2 38671 443448 n 154 54 (a) (10 pts) In order to do any analysis to begin with Kexin obviously needs to know how to work with the distribution of the sample mean. What will be the distribution of the sample mean for Volcano A? Be specic about how you arrive at the chosen distribution and the parameters of that distribution (i.e. if it's a t distribution what are its degrees of freedom, if it's a normal what are the mean and variance you can use theoretical population values to answer this question) (b) (6 pts) She would rst like to know the precision of her estimates of the sulfur dioxide emissions. Construct a 95 percent condence interval about the mean of Volcano B. (c) (12 pts) She believes the old baseline of Volcano A's sulfur dioxide levels from past papers were around 684. Conduct a hypothesis test to see whether her current estimate is consistent with the having a population mean equal to the old value of 684. What will be your null and alternative hypothesis? What is your test statistic and its value and nally will you reject or fail to reject the null hypothesis at the 10 percent signicance level? (d) (12 pts) Both Volcano A and Volcano B have not erupted in over 50 years however an early sign of a potential eruption event is a change in the gas composition. In particular more active volcanoes emit more sulphur dioxide. She now wishes to test whether Volcano B is emitting more sulfur dioxide than Volcano A over the past 6 months. Set up a rejectable null and alternative hypothesis. Determine your test statistic and do you reject or fail to reject the null at the 10 percent signicance level? (e) (10 pts) What are the p-values of your test statistics in part (c) and part (d) (If I'm asking you this on the quiz this should serve as a hint to what distribution you should be using for (a), (c) and (d), since I must assume you don't have a calculator that can evaluate CDFs)? Conceptually what do these values represent? 3 3. (15 pts) Manipulating the OLS estimator Suppose we have the following linear model: Yi = β0 + β1 Xi + εi Our estimation of the above model is therefore given by: Yi = βˆ0 + βˆ1 Xi + εi however due to some discrepancies in the data collection process we instead observe: YiN ew = 5Yi XiN ew = Xi + 7 Running our regression now we estimate the following model: YiN ew = β˜0 + β˜1 XiN ew + ε̃N i ew (a) (5 pts) How does the OLS esimator for β˜1 compare to our estimate, β̂1 from the original model? Show (b) (5 pts) How does the OLS estimator for β̃0 compare to our estimate, β̂0 from the original model ? Show 4 4. Blood Doping The international olympic organizing comittee has vested interest in keeping the perception of the olympics as a fair playing eld. As such they are debating between two new testing procedures to detect blood doping. Procedure 1, has a false positive probability of 0.1 and a false negative probability of 0.05 percent. Procedure 2 has a lower false positive probability of 0.05 but a higher false negative probability of 0.15. Rather than take a stand on either method, the committee has each athelete have two blood samples taken and each is tested under a dierent procedure. This allows both tests to be conditionally independent of the other, in other words the probability of a positive or negative result on test one is independent of the probability of a positive or negative result on test 2 once you condition on the underlying condition of doping or not doping. The committee believes that about 10 percent of atheletes dope. The committee typically receives the results of the rst procedure back rst. They undergo the rst procedure for a russian curler and receive a positive result. (a) What is the probability of having a positive result under the rst procedure? (b) What is the probability the Russian Curler was doping given he received a positive result on the rst procedure? (c) Since the Curler tested positive under the rst procedure what is his probability of testing positive under the second procedure, before seeing the result? (Note: You are trying to nd the conditional probability of a positive on the 2nd procedure given the positive result on the 1st procedure) (d) The second procedure also returns a positive result. What is the probability that the curler was doping? 5