/cdn.vox-cdn.com/uploads/chorus_image/image/62836343/933765284.jpg.0.jpg)
Welcome to the first in a new feature, Stats Corner. We’ve been devoid our statistic-driven content but are about to change that. Look for a weekly post focused on different areas of stats. Some may be general, such as this. Others, especially in-season, may be focused on current trends or teams.
Humans are obsessive with finding the best. Whether it is the best president, best restaurant, or the best musician, there are lists, and Humans are obsessive with finding the best. Sports is no different, in fact, sports may be the pinnacle: Lebron vs. Michael, Brady vs. Montana, 1939 Yankees vs. 1906 Cubs. There have been several ways for the NCAA to determine the best team: voting (think BYU 1984 National Champions), computers (hello BCS), a committee (current Playoffs), and tournaments. Currently, the NCAA moved from RPI to NET to rank basketball teams before the NCAA tournament in March.
So the question is, how does this system work? The short answer is “we don’t know”. Dan Gavitt, NCAA senior vice-president for basketball, said that no decision has been made to reveal how NET works to the public. He did say that “formulas are archaic like the RPI was” and artificial intelligence algorithms are not easy for people to understand. But because we like to challenge the Forces that be, and we don’t like being told just to trust the people who set up a system to keep Boise State, Fresno, and UCF out of the college football playoffs, let’s do just the opposite of what we were told.
NET is an acronym for NCAA Evaluation Tool and is honestly one of the best acronyms ever. More importantly, it is also a predictive-learning model. Predictive learning is a method of machine “learning”. The machine attempts to create a model by simulating different outcomes in situations. These situations are numerous, with the computer comparing expected results with actual results and making adjustments as needed. Think of it as a small child trying to solve a problem. They try something it doesn’t work, so they adjust based on what they have just observed and experienced. NET is not learning in the human sense of the word, the purpose is to use the known effects of actions and create planning operators, or in non-geek, it is more like discovering which models produce the most realistic results by comparing them to actual results. It is not artificial intelligence, the robotic overloads are not coming (yet).
The NCAA was gracious enough to release 5 items which will influence the model. The most influential is Team value index. In short, results: did you win or lose, where you did play (i.e. home or away), and who did you play, as strength of schedule does play a role. Team efficiency or points scored and allowed per possession for both offense and defense. Run and shoot games (no defense) who have high points scored for offense but could have a lower points per possession than a slow and grind game. The major significance for this item is the predictive element. The network will be able to predict scores and efficiency of team, not players, and then compare expected to real results, and then make adjustments. Wins, obviously important, will be considered “not-insignificant” whatever that means. But only wins against Division 1 schools will count. Playing Division 2 cupcakes towards the end of the season like SEC football, will not help the NET score. Adjusted winning percentages is road versus home.
Interestingly, which team is played does not factor in the item. For this item, a road win is more valuable than a home win, regardless of who you are playing. A victory counts +1.4 on the road, +1 on a neutral court, and +.6 at home. Losses will count -.6 on the road, -1 on the neutral court, and -1.4 at home. So if you are going to play a cupcake make sure it is a D1 team and do it on the road, it counts more. Scoring margin: remember when BCS models included “style points” and a team would run up the score to look better, well it is back to a point, actually to 10 points. Win big, and it helps, however, all double-digit wins are the same as the points are capped at 10. This also is a big predictive model. Gavin relayed “Ten was the number that was the most optimal to getting the most level of accuracy without going so far that we started to influence the behavior of coaches. In other words, the NCAA doesn’t like when teams pour it on just improve the chances of getting into the NCAA tournament. Does Utah State football putting up 50 points in the first half consistently and then pulling their quarterback for the second half, but not being ranked in the top 25, sound familiar? Overtimes games are given special treatment, the winning team is automatically assigned a +1 score and the losing side -1 score no matter what the final score is. This is to show that at the end of 40 minutes it was a close game and not penalize a team who loses in overtime after being tied at the end of the regulation game.
Will it help select and seed the best teams for the tournament? Who knows, but probably not. The football committee only has to select 4 teams, and there are always disagreements, choosing 68 schools, many of which are not in the top 68 due to receiving an automatic bid for winning conference tournaments, is not going please everyone. But how did the first rankings turn out? Interesting would be an understatement: Ohio State was No 1 and Kentucky was 61 st, Nevada was 13th, USU 37th, Fresno State 64th, and San Diego State 96th. The issue was the human AP poll had Kentucky 10th and Ohio State 16th. Remember, this is a predictive-learning tool. The more games are played, and the more data is given, the more the “accurate” the model can be as it learns which situation is more realistic. With more data, the most recent model had Virginia, Gonzaga, Michigan, and Duke top 4. Ohio State was 26th, with Nevada 9th, Utah State 16th, and San Diego State 45th. Is this model accurate? Will the Mountain West have 3 teams in the tournament? Will Nevada be a 3 seed and Utah State a 5th? Or will the NCAA committee ignore the data from their predictive-learning model and choose the big boy schools?