Summary

This document discusses ordinal logistic regression, a statistical method used for predicting an ordered outcome variable. It covers the core concepts, theoretical properties, and how to implement it in Python using hockey data. The document also includes details on interpreting the results and transforming the odds and probabilities for better interpretability.

Full Transcript

Hi, in this section, we'll be talking about the ordinal logistic regression, which is also called as ordinal regression. As the name of the model implies, now the dependent variable becomes categorical ordinal variable. It sounds complex, but the model is basically the logical extension from the log...

Hi, in this section, we'll be talking about the ordinal logistic regression, which is also called as ordinal regression. As the name of the model implies, now the dependent variable becomes categorical ordinal variable. It sounds complex, but the model is basically the logical extension from the logistic regression, so the core idea remains the same. But the only difference is the nature of the dependent variable. Now the categorical dependent variable has the order. Then what is ordered outcome? Ordinal variable is a type of qualitative variable in that there is an inherent hierarchy between the categories, but the interval between the categories is not equally distanced. For instance, the students letter grade can be considered ordinal variable in the sense that we know A is higher than B and so forth and forth. However, we are not able to tell how much better the letter grade of A is than the letter grade of B. Another example would be the consumer ratings of the service provider quality scaled from poor to great. Surely, we can say that the grade rating is better than good or poor, but it's rather qualitative, and often it's subjective measure of the service quality. Then think about some examples of the ordered outcome in sport. What about the game results? I mean, there are three potential outcomes of the match. Win, draw or lose, and we can treat this as an ordinal variable. In this regard, we are trying to predict such outcomes by using the ordinal regression model. Okay, now I'm going to talk about the major theoretical properties underlying the ordinal regression model. As always, the focus of the lecture is to understand the conceptual insights without getting into too much details in math or statistics. However, it's also inevitable for us to scratch some functional forms as it really do provide some insights into understanding how the model operates in an empirical setting. So here are major features of the ordinal regression. The dependent variable has three or more ordered outcomes. The probabilities of each outcome changes as the independent variable changes. The probabilities of each outcome changes as the level of independent variable changes. So the idea behind the ordinal regression is that as the level of independent variable increases, it would result in different probabilities of each outcome in sports contest. In sports, there's a qualitative shift in the outcome of the game as a function of, say, Pythagorean winning percentage of the team. So with the ordinal regression, we will fit both a regression coefficient and a set of thresholds to classify the outcome. Let's take a look at the functional form of the ordinal regression. Here we see the logic function again as a form of the dependent variable. But the ordinal regression also provides a threshold for another output as well. As it turns out, the goal of ordinal aggression is to obtain the thresholds of each outcome in the dependent variable and the coefficient of the given independent variable. Transforming the resulting logit value gives us a community of probability of the given category. Visually, the area under curve divided by the thresholds corresponds to the probabilities of each outcome, which add up to 100% as total. We'll go through the Python codes in the Jupiter notebook later, but let's focus on the results here. So in this practice, we will use the hockey data that we have used to fit the logistic regression. Furthermore, the model is specified in the same manner, I mean, we're going to use the Pythagorean winning percent as an independent variable. And then now the outcome variable has three levels, lose, draw, and win. So as a result of operating the ordinal aggression, we obtain three major parameters. The first one is the regression coefficient for the given independent variable, which is the Pythagorean winning percent and then the threshold for lose and draw and another threshold for draw and win. As you recall, the draw is possible in the game of hockey, but we dropped all the draws from the original data set for the demonstration purpose. But in this example, we will not drop the draws, so the outcome dependent variable now has three levels, win, draw, and lose. Okay, now let's talk about how to interpret the model. Basically, the functional form of the dependent variable is the logit function. But here we have two equations drawn from two thresholds to classify three ordered outcomes. So we can obtain the linear products from the model. Then we can transform the odds back to the probabilities for the sake of interpretability. For instance, say we have a team with a Pythagorean winning percent of 0.2, then we can simply factor in the given value of 0.2 to the equations to get the linear product. Then we can transform the logit back to the probabilities to better interpret the results. One thing you have to remember is that the resulting value gives you the cumulative probabilities, meaning that the probability of draw can be obtained by subtracting 0.838 from 0.829. So the fitted probability of draw is 0.009. In a similar manner, the chance of winning can be obtained by subtracting 1, which is 100%, from the probabilities of either draw or lose. So the chance of winning my team is roughly about 16% in this regard. As a result, the fitted probabilities of each outcome are as follows,

Use Quizgecko on...
Browser
Browser