2018 Run Expectancy Calculation PDF
Document Details
Uploaded by TruthfulRealism2101
Princess Nourah Bint Abdulrahman University
2018
Tags
Summary
This document describes a calculation of run expectancy in baseball using data from 2018. It calculates the average number of runs scored in a baseball game based on the given event conditions or starting states (bases, outs). This data is used to determine future results. This data is likely used for statistical analysis of baseball games and player/team performances.
Full Transcript
Now we've set up the data. We're ready to actually calculate run expectancy based on our data frame from 2018. What we're going to do then is calculate the mean number of runs scored from any given event to the conclusion of the inning for all events in our data set. Our next line of code is a group...
Now we've set up the data. We're ready to actually calculate run expectancy based on our data frame from 2018. What we're going to do then is calculate the mean number of runs scored from any given event to the conclusion of the inning for all events in our data set. Our next line of code is a group by, which groups by starting state and tells us the runs future. That's a variable in the data which tells us how many runs were scored for the beginning of event to the end of inning. We calculate the mean of that in order to calculate run expectancy. We're going to create a list of run expectancy for each possible starting state. If we run the code here, you can see here we have run expectancy matrix, but in the long form that we discussed right at the beginning. As you can see there, for example, at the very first row, if there are no runners on base, first base, second base, third base and no outs, the expected number of runs in that state is 0.49 or about half a run. That says on the average over the season, whenever the game was in this state at the start every other time you were in this state a run was scored. Roughly half the time this resulted in a Run being scored. You can see, for example, the next row, same situation in terms of base states, no runners on base, but one out, you see a big drop in the run expectancy. With one out, the run expectancy falls to 0.26. Then in the third state, no one on base and two outs, it's even worse. The run expectancy is less than one tenth of a run. In that situation, runs were very unlikely to be scored. You can see then these different Run expectancies based on the number of the runners on base and the number of outs. Indeed you can see that one of the states that we were looking at earlier on, remember with case B, we started off with a runner on third and one out. We saw that was the starting state. You can see here in the row listed as number four here, that's the state a runner on third and one out. You can see the run expectancy in that situation is just over one run. When you get to that state, it was on average team scored or at least one Run in that situation. We have a list then of all of the run expectancies for each of the base states in our data. That's a very nice Start and gives us some intuition about how runs are scored in relation to the state of the game. What we do next is merge this pack into our original data set , so that provides us with an additional column of data. We have the run expectancy at the beginning of each actual events in the 2018 season. If we run that code now, and then we look at the last row, for example, the last column. We can see there we've got the run expectancy in each of the possible states. These are based on the Start states and gives us the starting run expectancy. Remember, we talked at the beginning about the ultimate objective being to create something called runs value. The runs value being the number of runs scored on an event plus the difference between the run expectancy at the beginning of the event and the run expectancy at the end of the event. We've now done half the job as it were because we've got the run expectancy at the beginning of the event, but now what we want to do is add in the run expectancy at the end of each of these events. We can then calculate runs value. In order to do that, we have to do something extra here we have to add an extra state, a state which cannot exist at the beginning of an event, but it can exist at the end of event. That is for there to be three outs. If there are three outs, the inning is over. That couldn't be the state at the beginning of the event. There is no event after three outs in an inning. However, it is possible at the end of an event for there to be three out. Of course, the run expectancy when you have three outs is zero because the inning is over, you cannot score a run in that inning, once that inning is finished by definition. What we're going to do is we're going to add in an extra set of states, one for each of the possible base states, which says, what is the run expectancy with three outs so that we can include that in our end run expectancy variable. We create that here, we just create a list. This is just a list of base states with three outs. Each of the possible base states with three outs. We then create now the end run expectancy which is going to be the combination of the start run expectancy and these eight base states at the end where there are three outs. Now notice that it doesn't matter whether we call it start or end in terms of the run expectancy values that we measure here, the start run expectancy matrix is as same as the end run expectancy matrix apart from the additions of these three out states. However, when we merge this back into our main data set, what we're going to see is that we're going to match the end run expectancy with each end state so that we have both the run expectancy from the start and the run expectancy at the end of each event. First let's create the end run expectancy matrix, we just rename start state with end state and then add in the extra states where there are three outs and you can see here the end run expectancy just is, again, the first 24 rows here are the same as they are in the start run expectancy and then we've got these eight zeros at the end where there are base states with three outs. Now we merge that back in so we're going to merge this using end state because these are end state run expectancies. When we merge that in, we can see the following. For each event we have a start run expectancy and an end run expectancy. For example, if you look at the first row, the start run expectancy and the end run expectancy was zero so there were no outs and no one on base at the beginning and no outs and no one on base at the end of the event. The second event in our row there are no outs and no one on base at the beginning and then there's a runner on first at the end state and no outs and so the run expectancy you can see the start run expectancy was 0.49, but at the end, because you've got a runner on base and no outs, the runner expectancy has gone up and it's now 0.887 is the run expectancy. You can see here the way in which run expectancy changes with each event and depending on what happens with that event, and then run expectancy can go up if you gain basis and the run expectancy can go down if you lose out for player is out but also note that run expectancy is going to go down when runners already on base complete yo run. Again, if you think back to our original example of case B, there was a runner on third that runner on third has a high running expectancy when that runner on third gets to home base, you no longer have a runner on third, that runner is not there anymore and so in that sense your runner expectancy has fallen, although of course, offset that you scored a run so that's a valuable contribution. That's the next thing that we do then let's calculate the runs value of each event. We said at the beginning we're going to define run value as runs scored on the play plus the run expectancy at the end of the event, minus the run expectancy at the start of the event and that's what this formula does, and if we run that now, we can see that we have a run expectancy for each event for the entire 2018 season. You can see here, so for example, in our first case remember, in our first example we said there was no one on base at the start and no outs and there was no one on base and no outs at the end, so actually, you can tell what that event had to have been, that event was a home run and that's the runs value for that event was just equal to one because there is no change in the base states, no change in the number of outs, you just get the run value of one. In the second case, where the batter got to first base with no outs, then we saw that the run expectancy rose from 0.49-0.87, and you can see here then the runs value of this event was 0.38, that was the increase in run expectancy. There were no actual runs scored, but we have an increase in run expectancy due to the batter managing to get a hit and make it to first base. You can see now we have a very powerful tool for measuring the contribution of batting events based specifically on the context of the game and the context of what the base states and the number of outs. Having created that, we're going to go on next to talk about ways in which we might use that information.