Premier League Game Result Forecasts PDF
Document Details
Uploaded by TruthfulRealism2101
Princess Nourah Bint Abdulrahman University
Tags
Summary
This document details a two-stage process for forecasting game results in the English Premier League. The first stage involves building a model using historical data, while the second stage uses this model to predict future game outcomes. The document also discusses the correlation between player salaries and team performance.
Full Transcript
Welcome, and in this week, we're going to generate forecasts for game results in a sports league. The particular League we're going to look at is the English Premier League, we've looked at English Premier League data before, and we're going to use some of that data to generate our predictions. In d...
Welcome, and in this week, we're going to generate forecasts for game results in a sports league. The particular League we're going to look at is the English Premier League, we've looked at English Premier League data before, and we're going to use some of that data to generate our predictions. In doing so, we are going to follow a two-stage process. The first stage will be to generate a model of past results, extensionally fitting our data using model, and the second stage will then be to use that model based on information we know in advance of a game in order to predict results of particular matches. Then when we've generated these forecasts, we're going to compare our results to the betting odds in order to see how closely we matched those predictions. Then as a final exercise, we're going to use our forecast to generate a prediction for the league table as a whole, for the Premier League to just bring it all together. We want to forecast games that are going to be played at some point in the future. What we need for that is some data which is going to tell us something about the chances for the teams in that future point in time. Now, in the first week of this course, we actually looked at a very good indicator of team performance, and that was player salaries. We looked at data across a number of different leagues, but one of those was the English Premier League. We showed that there's a very high correlation between player salaries and team performance. That's not just accidental, there's a reason for that. There's a reason which is rooted in the market economics of soccer. There are many teams, there are many players, and the productivity of plays is easily observed. You can see how good players are and you can rank plays fairly reliably, maybe not exactly, but you know pretty much when one player is better than another player. Because of that, the teams that hire the place also know how good the players are, and they offer them wages that reflect their level of ability, so better players command higher salaries, and indeed that's what an agent is for an agent is there to make sure a player gets the highest salary he can, and he uses that based on the player's ability. It's not that players who are paid more try harder, all the players are trying hard in professional soccer as they are in most professional sports. The reason why better players are paid more is because it's about the operation of the market, but then that also then translates into performance of the team. Teams which pay higher salaries on average perform better. If we want to forecast the future performance of teams, we might think that player salaries would be a very good basis for doing that, but now we hit a problem. The day drum player salaries that we have actually comes from the financial statements of the Premier League teams, and these financial statements are published every year on a website called Companies House in the UK, and you can go to that website and download the financial statements for yourself if you're interested, and that will give you the data, but the problem is, that the data that is published relates to the past. A financial statement reflects what happened in the previous year, and indeed, there's often a lag in the publication of the statements, so in fact, often if the even the most up-to-date statements reflect things that happened maybe 18 months ago. Well, that's not going to be much of use if you want to forecast what's going to happen now. Well, we have an alternative source and that alternative source is generated by a website called transfermarkt, transfermarkt started in Germany and it is a site the publishes valuations of pretty much every professional soccer player in the world. It's interesting to know how they do this, they do this by a method of crowd sourcing. They basically get fans to argue amongst themselves about how much a player is worth, the fans reach a consensus and then they publish that valuation on their site. You can see here in the mark-up, you can see some discussion of exactly how those evaluations are generated. We can use transfermarkt valuations as our basis for the value of the teams, and therefore, to predict how well a team is going to perform, rather than having to rely on the salary data, which is inevitably out of date. Now, it would be a leap of faith if we would just accept that the transfermarkt data is reliable indicator, and we shouldn't be in the business of making leaps of faith, here we want to have factual, solid packing for our results. What we're going to do in the rest of this session is actually test how reliable the transfermarkt data is, and see whether that we would be confident enough to use that in predicting performance of teams in the future. We load up the data now for the EPL wage data and for the TM values. One thing we can look at, there are 20 teams in the Premier League, but because of the promotion relegation system, there are actually more than 20 observations, 20 clubs in each data frame. If we look at, for example, just the unique values you see here is a list of all of the names of all the clubs that are in our wage data file across the seasons, and if we do the same thing for the TM data, you can see a list of all the names of the clubs appearing in the top division. You can see actually there's slightly more teams in the wage data file and that's because it covers a longer period of time, so there's more teams to be promoted and relegated in that period. Again, we could just look at the dimensions of the data by doing a dot describe, and you can see here we've got the values of the season. We refer to each season by the year in which it ends, because it starts in one calendar year then it ends in another calendar year, so we refer to them always by the currency in the year end. You can see here we've got 440 club observations with 447 separate clubs in the data, and you can see we've got the wage data there in the loss column. Then here for the TM data, you can see we have 200 observations in our dataset, and you can see here the TM values, which are available for all the clubs.