This thesis introduces improvements in Minimum Message Length (MML) inference techniques. MML provides an objective measure to measure the ``goodness'' of a theory. This measure encompasses both the accuracy and the complexity of the theory. Specifically, MML avoids the problem of over-fitting by ensuring that an overly complex theory is rejected in preference to a simpler theory when there is only a limited amount of data available to justify the theory. This means that in the presence of an infinite amount of training data, MML inference degenerates back to the case of traditional inference, whereby the most accurate model is seen as the best. For a fixed number of parameters, this is the same as Maximum Likelihood theory.
MML inference theory is a statistical inference theory which extends the work of Solomonoff [#!Solomonoff:1964!#], Chaitin [#!Chaitin:1966!#] and Kolmogorov [#!Kolmogorov:1965!#]. However, rather than using Universal Turing Machines (UTM) as its model of computation, it has its basis in Bayesian statistics [#!Dowe.Farr:1997!#] and information theory [#!Shannon:1948!#,#!Shannon.Weaver:1959!#]. As such, using Turing Machines is merely a special case of the MML approach, when the prior knowledge - as required in any Bayesian approach - is the properties of the UTM used [#!Wallace.Dowe:1999a!#]. However, the general MML principle goes back to Wallace and Boulton [#!Wallace.Boulton:1968!#]. Similar ideas have been subsequently redeveloped by Rissanen [#!Rissanen:1978!#] as well as by Maciejowski [#!Maciejowski:1978!#,#!Maciejowski:1979!#,#!Maciejowski:1980!#].
MML encodes the data into a two part message. The first part contains the inferred hypothesis from the data. The second part of the message contains the data, encoded using the hypothesis. The hypothesis in the first part does not have to be the ``one true'' model. As long as it produces an overall message with the shortest encoding, it is considered a good model.
The MML approximation technique, has been developed as an approximation to the more rigorous form Strict MML (SMML) and is required for many practical problems [#!Wallace.Dowe:1993!#,#!Wallace.Dowe:2000!#]. SMML being mathematically elegant, but computationally intractable for all but the simplest of problems [#!Farr.Wallace:1997!#]. This has limited the widespread usage of SMML. The term MML is used to label both the general inference principle (i.e. minimal encoding of a two-part message) and the practical approximation technique to SMML.
This thesis centres upon the approximation technique by Wallace and Freeman [#!Wallace.Freeman:1987!#]. The derivation is broken down - step-by-step - and the assumptions inherent within the technique highlighted. More accurate MML estimates are derived by varying the assumptions. The performance of the new estimates are quantified and compared using two objective metrics. These estimators will be compared with Wallace and Freeman [#!Wallace.Freeman:1987!#], Wallace and Boulton [#!Wallace.Boulton:1968!#] and a variation of Wallace and Freeman called Farr-Wallace [#!Farr:1999!#]. Wallace and Boulton's estimate is actually the Minimum Expected Kullback-Leibler Distance (MEKLD) [#!Dowe.Baxter.Oliver.Wallace:1998!#] and does not minimise the message length. Nonetheless, it can be used as a good MML estimate, just as a MML estimate can be used (not perfectly) as a good MEKLD estimator. The new approximation techniques introduced include the fourth-order extension [#!Dowe:private!#] to the Wallace and Freeman's estimate [#!Wallace.Freeman:1987!#] and the new approximation technique MMLD [#!Dowe:private!#]. These techniques will be applied to the binomial distribution and experimental results presented. Appendix A contains some information about these techniques applied to the Poisson distribution and Appendix B with the Gaussian distribution.