Improved Approximations in MML
Edmund Lam
Supervised by Dr David L. Dowe
Abstract:
Minimum Message Length (MML) is an invariant statistical inference principle
which takes an information-theoretic Bayesian approach. MML provides a
framework for inferring a model from a set of data. It has been developed
as an approximation to Strict MML (SMML), since SMML is generally intractable.
The approximations inherent within the Wallace-Freeman (1987) MML estimate
are listed, analysed and
improvements suggested. Three MML approximation techniques (MMLD, Asymmetric
coding region, fourth-order extension) are introduced and two (MMLD, fourth-order
extension) are implemented to
extract experimental results. Using the Kullback-Leibler and Root-Mean-Square
distances, the new techniques, when applied to the binomial distribution,
are compared with some earlier MML estimates
including Wallace-Boulton (1968), Wallace-Freeman (1987) and Farr-Wallace
(private communications). The results shows that MMLD and fourth-order
performs better, using either objective functions,
than the Wallace-Freeman's (1987) MML estimate.
Thesis:
Thesis available in
gzipped postscript format,
Acrobat PDF format and
HTML format.
Only the postscript version faithfully reproduces the thesis, although the
PDF version does a pretty neat job too.
A downloadable copy of the source code (from the appendices) along with
extra scripts and files not found in the thesis, is available. You may view it
as individual files or as a
tarred-gzipped file. The source code is available
under the GNU GPL license.
Update - a errata has been released dated Feb/2001. It correct typos as well as
some serious mistakes. The original thesis above remains unchanged and uncorrected.
This first errata is available in both LaTeX format
and gzipped postscript format. This errata has been
partially sponsered by the Monash Summer Vacation Studentship.