The Probabily Models

All the models except LRMCD assume a normally distributed "point spread" random variable for each game. So, the models require a mean and variance estimate for the point spread of each possible match-up. The models use one of the available rating systems and/or the betting lines to estimate the mean point spread.

The "LRMCP" model is based on point spreads derived via Equation 6 of Kvam and Sokol [3] and using a standard deviation of 11.

The "Original" model is based on Breiter and Carlin [1]. It assumes a fixed variance derived from historical data. It uses the betting line spreads for the first round. It uses the Sagarin Predictor ratings (with a 1.05 correction factor applied) for later rounds.

The "Sagarin","Massey", and "Vegas" models are based on Kaplan and Garstka [2]; these assume a matchup-specific independent Poisson distribution variance. The "Vegas" model uses "ratings" derived soley from the first round game betting lines (spreads and totals) as described in [2]. The "Sagarin" model uses the Sagarin Predictor ratings and the "Massey" model uses the Massey ratings.

The "Futures" model uses the betting market futures for the winner probabilities for the fourth (regional) round and sixth (championship) round available from www.tradesports.com as well as the same betting line data used by the Vegas Model. The Futures Model is similar to the Vegas Model, but the ratings used from the second to forth round are fitted to yield the forth round futures and the ratings used for the fifth and sixth rounds are fitted to yield the sixth round futures. The futures model was developed by me, Tom Adams.

I would guess that the LRMCD, LRMCP, Sagarin, Original, and Futures model are comparable in performance. But that is a default assumption based on almost no data in the case of the Futures model since it was first available 2006. Kvam and Sokol [3] have shown evidence that the Vegas Model (which they call "KG") is an underperformer.

The Futures and Vegas models are based on completely on betting market data whereas the other models rely, at least in part, on historical data from games prior to the tournament. This might be an advantage for the Futures and Vegas model if injuries, illness, or suspensions cause the historical data to be biased relative to a team's current status. In egregious cases where a star player on a top team is knocked out just before the tournament, I will probably make ad hoc adjustments to the historical-data-based ratings.

Go here to compare the probabilities generated by the various models.

References:

[1] "How to Play Office Pools If You Must" by David Breiter and Bradley Carlin (Chance Vol. 10, No 1, 1997, pp. 5-11)

[2] "March Madness and the Office Pool" by Edward H. Kaplan and Stanley J. Garstka (Management Science Vol. 7, No 3, March 2001, pp. 369-382)

[3] "Logistic Regression/Markov Chain Model for NCAA Basketball" by Paul Kvam and Joel S. Sokol (Naval Research Logistics Vol. 53, No 8, December 2006, pp. 788-803)