The Issue
It became evident after Week 12 of the 2016 NCAAF regular season that I needed to make an improvement to my ratings. I needed to account for the disparity of talent between the divisions in college football (Div I-A vs Div I-AA vs Div-II, etc).This improvement was nothing I hadn't considered before, rather, I just hadn't quite worked out how to implement it in a completely mathematical and objective way. However, as I mentioned, after Week 12, it was very clear I needed to figure out how to implement it because of the issue it was creating without it.
During Week 12, North Carolina beat The Citadel. The Citadel was a previously undefeated Div I-AA team. After the victory, UNC jumped 9.29 points in the ratings. To put that into perspective, The Ohio State University gained 6.73 points after defeating #4 Michigan. (Note: you will not find that 6.73 value on this blog because that was using the algorithm prior to the implemenation of this improvement. The current algorithm gave OSU a 7.46 increase in the rating for the victory). Now, as I explain in detail in the explanation of my rating system, there are other factors that contribute to a change in a team's rating from week to week other than that team's specific outcome (such as their rating going into the game, opponent's change in ratings, etc). However, to suggest that UNC's victory over Div I-AA Citadel (which UNC was heavily favored to win) was approximately equal to OSU's victory over Michigan is an indicator that something needed to be addressed.
Why the issue arose
At the beginning of the season, I start all teams with the same rating variables (mu and sigma) equal to each other. I introduce zero bias. However, this means that Div I-A teams are started with the same values as Div I-AA, Div II, and Div III teams. Teams do play inter-division games, and this can help the average of the divisions begin to shift as you would expect: Div I-A being the highest and Div III being the lowest. However, there are so few inter-division games that the average differs very little. The fact is, the vast majority of the games are played within the divisions. This leads to distributions of teams for each division spread around a nearly identical average (see Figure 1, below).
Figure 1: Distributions of teams values using previous algorithm
In Figure 1, the cartoon (not real data, just an example) shows the distribution of teams within Div I-A (red) and Div I-AA (blue) late in the season. The horizontal axis represents rating and the vertical axis represents the number of teams that have a particular rating. Typically, a normal distribution would have a few teams that are very highly rated (to the right of this graph) and a few teams with very low ratings (to the left of this graph), while most of the teams are somewhere in between. Due to the vast majority of the games being played within each division, there is little opportunity to separate these curves from one another. In other words, the teams within each division will fall in a normal distribution around the same average as shown above. What this means is that beating the best team in Div I-AA is equivalent to beating the best team in Div I-A, because the ratings of the best team in each division will be nearly the same. This is obviously not realistic. However, it is a problem the previous version of my algorithm encountered because I did not pre-bias my ratings.
Why wasn't this improvement implemented sooner?
As I mentioned, it's something that has been on my mind. However, it just wasn't obvious to me the issue it was causing for several reasons. First, when I post my ratings, I separate the divisions. Second, it wasn't until this year that I added a new column ("Last Result" column) that makes it very easy to see who they played without having to go to ESPN and check their schedule, or dig into the variables of my code. So this change in my output of my ratings made this issue obvious, thus prompting me to come up with a solution on how to implement the improvement.
The Solution
My overriding rule for the implementation of this improvement was to ABSOLUTELY NOT introduce bias into the ratings. I did NOT just want to say "well, Div I-AA teams should be scaled to 80% of Div I-A," and so on. I would have no justification for that factor, and I could no longer say that my ratings have zero bias.
Therefore, in addition to each teams' individual ratings, I now rate each division overall as well (henceforth, referred to as the "current" method). The overall division ratings are calculated from each inter-divisional match-up. The overall division ratings are then used to scale the average team ratings for each division. Every team in the division is scaled using the same numerical factor (both mu and sigma values are scaled) so that the relative ratings between each team in that division are not compromised. This is extremely important because there are much more games played within the division, so I did not want to change the ratings within each division in any way, or else I would be undoing what my rating system is intended to do in the first place. The current method is within my iterative process, and thus apart of the calculation that gets converged.
The result of using this method creates a non-biased and objective method to distinguish the divisions' average team rating, as shown below in Figure 2. Once again, in the cartoon, the red and blue lines represent Div I-A and Div I-AA distributions, respectively. However, note that the highest rated team in Div I-AA is not nearly equivalent to the highest rated team in Div I-A. In fact, a decent number of teams in Div I-A are rated higher than Div I-AA. Remember, this is just an illustration and where that best team in Div I-AA actually falls relative to the Div I-A teams is a function of the outcome of the inter-division games.
Figure 2: Distributions of teams values using current algorithm
The current method uses nothing but the results of games played in order to determine the factors, and will fluctuate week to week as more inter-division games are played, as well as team's that play in those inter-division games experience rating changes from week to week. In addition, the factors will vary from season to season. For example, perhaps in 2017 Div I-AA is closer in talent to Div I-A relative to 2016. My rating system will reflect that, and therefore the averages from each division will be closer.
No comments:
Post a Comment