HYPOTHESIS TESTING

Home | Intro | MML | Other methods | Downloads | Results

Minimum Message Length

Minimum Message Length (MML) is a statistical inference technique based upon information theory dating back to work by C. Wallace and D. Boulton [1], whereby a set of data is encoded as a binary string, along with a string representing the hypothesis used to encode the data:

[Message for Hypothesis] + [Message for Data given Hypothesis]

The principle states that the best model for the data is the one which yields the shortest such two-part string. This provides a natural resolution to the problem of comparing models of differing order, in that a more complex model will only be considered worthwhile if the increased cost of stating the extra parameters is more than offset by a saving in stating the data.

An important result is an approximation to this two-part message length, derived by C. Wallace and P. Freeman [2] using Taylor series expansions. The Wallace-Freeman approximation is:


where h(θ) is a prior distribution on the parameters, f(x|θ) is the likelihood function, F(θ) is the Fisher information and K is a constant dependent on the number of parameters.

To see the results of the application of the MML principle to various situations, click on a link below. For the derivation of these criteria, the reader is directed to the thesis available in the download section.

  • Z-Test for the mean with a known standard deviation.
  • χ2-Test for the standard deviation.
  • T-Test for the mean with an unknown standard deviation.
  • F-Test for two standard deviations.


[1] Wallace C. and Boulton D. (1968). An information measure for classification. Computer Journal 11: 185-194.
[2] Wallace C. and Freeman P. (1987). Estimation and inference by compact coding. J. Roy. Stat. Soc. B 49: 223-265.