Improved Approximations in MML

Edmund Lam

Supervised by Dr David L. Dowe


Minimum Message Length (MML) is an invariant statistical inference principle which takes an information-theoretic Bayesian approach. MML provides a framework for inferring a model from a set of data. It has been developed as an approximation to Strict MML (SMML), since SMML is generally intractable. The approximations inherent within the Wallace-Freeman (1987) MML estimate are listed, analysed and
improvements suggested. Three MML approximation techniques (MMLD, Asymmetric coding region, fourth-order extension) are introduced and two (MMLD, fourth-order extension) are implemented to
extract experimental results. Using the Kullback-Leibler and Root-Mean-Square distances, the new techniques, when applied to the binomial distribution, are compared with some earlier MML estimates
including Wallace-Boulton (1968), Wallace-Freeman (1987) and Farr-Wallace (private communications). The results shows that MMLD and fourth-order performs better, using either objective functions,
than the Wallace-Freeman's (1987) MML estimate.


Thesis available in gzipped postscript format, Acrobat PDF format and HTML format.
Only the postscript version faithfully reproduces the thesis, although the PDF version does a pretty neat job too.

A downloadable copy of the source code (from the appendices) along with extra scripts and files not found in the thesis, is available. You may view it as individual files or as a tarred-gzipped file. The source code is available under the GNU GPL license.

Update - a errata has been released dated Feb/2001. It correct typos as well as some serious mistakes. The original thesis above remains unchanged and uncorrected. This first errata is available in both LaTeX format and gzipped postscript format. This errata has been partially sponsered by the Monash Summer Vacation Studentship.