The Probabily Models

All the models except LRMCD assume a normally distributed "point spread" random variable for each game. So, the models require a mean and variance estimate for the point spread of each possible match-up. The models use one of the available rating systems and/or the betting lines to estimate the mean point spread.

The "KenPom" model is based on win probabilities calculated using Kenpom Adjusted Efficiency Margin (AdjEM) and Adjusted Tempo (AdjT). The mean point spread used is 1.1 * (AdjEM1-AdjEM2)*(AdjT1+AdjT2)/200. The standard deviation used is 11. The 1.1 factor is the blowout inflation factor from Breiter and Carlin [1]. This factor adjusts for a apparent bias in these spreads relative to the betting market ("Vegas") spreads.

The "KenPomV" model is the KenPom model except that the first round spreads are replaced with the Vegas betting market spreads.

The "Moore" model is based on win probabilities calculated using Sonny Moore's Ratings. The rating spread is used as mean point spread. The standard deviation used is 11.

The "MooreV" model is the Moore model except that the first round spreads are replaced with the Vegas betting market spreads.

The "LRMCD" model is based on win probabilities derived via Equation 8 of Kvam and Sokol [3].

The "LRMCP" model is based on point spreads derived via Equation 6 of Kvam and Sokol [3] and using a standard deviation of 10.9. It has been revised to use Bayesian LRMC Ratings. The multiplicative factor is different from the original value from the paper. The factor is optimized to maximize likelihood.

The "Original" model is based on Breiter and Carlin [1]. It assumes a fixed variance derived from historical data. It uses the betting line spreads for the first round. It uses the Sagarin Predictor ratings for later rounds. Brieter and Carlin included a blowout inflation factor of 1.05 for the Sagarin ratings, but the Sagarin Predictor ratings don't need this factor.

The "Sagarin","Massey", and "Vegas" models are based on Kaplan and Garstka [2]; these assume a matchup-specific independent Poisson distribution variance. The "Vegas" model uses "ratings" derived soley from the first round game betting lines (spreads and totals) as described in [2]. The "Sagarin" model uses the Sagarin Predictor ratings and the "Massey" model uses the Massey ratings.

The “LRMCLog” is based on the same equation as LRMCP except the log of the pi values is used. Of course it has a different factor optimized in the same manner for the logs.

The “Combo” uses a linear combination of the probabilities from the KenPomV and LRMCLog models.

The "Futures" model uses the betting market futures for the winner probabilities for the fourth (regional) round and sixth (championship) round available from www.tradesports.com as well as the same betting line data used by the Vegas Model. The Futures Model is similar to the Vegas Model, but the ratings used from the second to forth round are fitted to yield the forth round futures and the ratings used for the fifth and sixth rounds are fitted to yield the sixth round futures. The futures model was developed by me, Tom Adams.

I would guess that the LRMCLog, Sagarin, Original, and Futures model are comparable in performance. But that is a default assumption based on almost no data in the case of the Futures model since it was first available 2006. The LRMCD and LRMCP models borderline underperformers according to the Log Likelihood Test. Kvam and Sokol [3] have shown evidence that the Vegas Model (which they call "KG") is an underperformer. Also the Vegas model underperforms according to the Log Likelihood Test.

The Futures and Vegas models are based on completely on betting market data whereas the other models rely, at least in part, on historical data from games prior to the tournament. This might be an advantage for the Futures and Vegas model if injuries, illness, or suspensions cause the historical data to be biased relative to a team's current status. In egregious cases where a star player on a top team is knocked out just before the tournament, I will probably make ad hoc adjustments to the historical-data-based ratings.

Go here to compare the probabilities generated by the various models.

References:

[1] "How to Play Office Pools If You Must" by David Breiter and Bradley Carlin (Chance Vol. 10, No 1, 1997, pp. 5-11)

[2] "March Madness and the Office Pool" by Edward H. Kaplan and Stanley J. Garstka (Management Science Vol. 7, No 3, March 2001, pp. 369-382)

[3] "Logistic Regression/Markov Chain Model for NCAA Basketball" by Paul Kvam and Joel S. Sokol (Naval Research Logistics Vol. 53, No 8, December 2006, pp. 788-803)