Code for Regression Analysis Across Seasons

Document Details

TruthfulRealism2101

Uploaded by TruthfulRealism2101

Princess Nourah Bint Abdulrahman University

Tags

regression analysis python programming statistical modeling data science

Summary

This code snippet outlines a Python approach for executing a regression analysis across different seasons of data. It demonstrates functions for annual regression calculations, creating lists to store results, and associating these results with corresponding seasons. The code appears to be part of a tutorial or a programming project.

Full Transcript

In this video, we'll be writing code that will allow us to run the regression that we developed in the previous video for several seasons of data at one time. Now there's four main steps we'll be taking in this process. First, we'll be writing a general function for the regression. Second, we'll be...

In this video, we'll be writing code that will allow us to run the regression that we developed in the previous video for several seasons of data at one time. Now there's four main steps we'll be taking in this process. First, we'll be writing a general function for the regression. Second, we'll be creating a list where the regression results can be stored. Third, we'll run the regression and store them. Fourth, we'll associate each regression in the list with its season. We can go ahead to our code and start to go through these steps. The first thing we're going to do, is we're going to write a function to run the money or regression annually for free agents only. This function is going to look pretty similar to the regression that we ran in the previous video. First, we're starting out by defining the function. Def, that's the start for function. We are calling our function MBExpandFA, and then our input variable is going to be Season. Whichever Season we'd like to run our regression for, we're going to input into our function here. Next we're been a subset our data. We're going to create this MB _ Seas as we did in the last video, and we're going to subset on our master data. The variables that we're going to be sub setting on are the SalYear. That's going to be equal to the Season that we input into our function and also for free agency equal to 1. Then the third line of code here is this global lm. This global statement will allow this lm variable to be accessed outside of the function. You can see in the next line, we have lm, which is equal to the ols model that we'll be running here. If we want to access this lm variable outside of this function, we're able to do this using this global statement here. This global statement will allow us to access this lm variable outside of this function, which we will use in a little bit. Then finally we have a return statements, so return and then semicolon, which ends the function. That's the general function that we'll be using to run our regressions for every season in our data. Let's go ahead and run that function. Now the next step, we are going to create a list to store our regression results. There's two components of the list that we need to take into account. We're going to initialize these variables here at the top. The first one is the index and we've initialized that setting equal to 0, and what the index represents is it's going to represent the elements of the list that we are going to create. Then the second line of code here, this lm _ Results equals 0 in the brackets is the actual list that we are initializing here. Right now we've just created this variable index initialized to zero. We've created a list with one element equal to 0 right now. Now the next chunk of code here is a loop. What we're gonna do here is we're going to populate our list with values ranging from 1-22. You can see, we see this for index in range 1-23 and what that's going do is, for all the values of the index in range 1-23, it's going to append the current value of the index onto the list, lm results. Then from there we're going to increase our index by one and repeat the process up until our index reaches 23, but not including 23. That's how the range statement works in Python. Let's talk through an example. We start out, our index is zero and then when we get to this portion of the code, so for index in range 1-23, index is going to start at one. Then in the next line, it's going to append the index with a value of one onto our existing list, which is lm results zero and now after we append that value it's going to be a zero and a one. That process is going to continue up until our index hits 23, but it's not going to include 23 in the list. What we should get after we run this, is we should get a list of values on it, lm results from 0-22. Let's go ahead and run that and that's exactly what we get as you see here. We have a list of values from 0-22. Now the next step of the code is we're going to populate this list with our regression results for every Season in our timeframe. To start out with, we're going to initialize a couple variables again. The first variable we're going to initialize Season and it's going to be equal 1994 and the second variable is i, which is going to be initialized to zero and i is analogous to the index variable that we used in the previous section of code. Then what we're going to do is we're going to use another loop here. While season is less than or equal 2015, we're going to run our function MB ExpandFA for whatever the Season is in our loop and then this lm results i equals lm, we're going to store that regression within the list we created in the previous step here. Remember this lm variable is what we ran in our function. We are able to now access that lm variable on which is our ols regression in this loop here. Then once again, we're going to iterate and we're going to increase i by one and Season by one up until Season is equal 2015. Again, let's just talk through an example for step here. Season's equal 1994 and i equals 0, last Season is less than or equal 2015. Since 1994 is less than or equal to 2015, we're going to run this function for 1994 as that's what Season is currently equal to. That regression is going to be equal to what we ran in the previous video. Then we're going to store the results of that regression in element zero since i equals 0. In element zero of the list so up here, we're going to store that regression in the first element of the lm results list, which is element zero. From there we are going to just keep doing that process, i is going to increase, Season is going to increase until all of the seasons in our timeframe are populated in this list, lm results. Let's go ahead and run that. Last step is to go ahead and give names to each of the regressions that we're running and the names are just going to be the Seasons that they represent. We follow a similar process now to what we've been doing. We're going to initialize Season to be 1994 and then we're gonna create this list lm_Season equal to 1994 and that's going to be a string because it's just going to be the column name or the heading name. Then we have another loop, so for season in range 1995-2016, we're going to append the string of a season variable onto our lm season list and then we're going to increase Season by one and that process is going to continue up until Season reaches 2016, but not including 2016. When we run this, we'll get a list of seasons from 1994-2015. We can go ahead and run that portion. Now we have a list of our ols regressions from every season from 1994-2015 and we have our headers for each regression from 1994-2015. The next step is going to be accessing results, dividing those results up in a different eras and continuing on with our Moneyball story to see how it holds over this expanded timeframe.

Use Quizgecko on...
Browser
Browser