Minimum Message Length, MML 
Inductive Inference and Machine Learning by Minimum Message Length (MML) encoding.
IntroductionSee also:
For a hypothesis H and data D we have from Bayes: P(H&D) = P(H).P(DH) = P(D).P(HD)
From Shannon's Mathematical Theory of Communication (1949) we know that in an optimal code, the message length of an event E, MsgLen(E), where E has probability P(E), is given by MsgLen(E) = log_{2}(P(E)): MsgLen(H&D)Now in inductive inference one often wants the hypothesis H with the largest posterior probability. MsgLen(H) can usually be estimated well, for some reasonable prior on hypotheses. MsgLen(DH) can also usually be calculated. Unfortunately it is often impractical to estimate P(D) which is a pity because it would yield P(HD). However, for two rival hypotheses, H and H' MsgLen(HD)MsgLen(H'D)Consider
a transmitter T and a receiver R
connected by one of Shannon's communication channels.
T must transmit some data D to R.
T and R may have previously agreed on a code book for hypotheses,
using common knowledge and prior expectations.
If T can find a good hypothesis, H, (theory, structure, pattern, ...) to fit
the data then she may be able to transmit the data economically.
An explanation is a two part message: The message paradigm keeps us "honest":
Any information that is not common knowledge
must be included in the message for it to be decipherable by the receiver;
there can be no hidden parameters. A more complex hypothesis fits the data better than a simpler model, in general. We see that MML encoding gives a tradeoff between hypothesis complexity, MsgLen(H), and the goodness of fit to the data, MsgLen(DH). The MML principle is one way to justify and realise Occam's razor. Continuous RealValued ParametersWhen a model has one or more continuous, realvalued parameters they must be stated to an "appropriate" level of precision. The parameter must be stated in the explanation, and only a finite number of bits can be used for the purpose, as part of MsgLen(H). The stated value will often be close to the maximumlikelihood value which minimises MsgLen(DH). If the log likelihood, MsgLen(DH), varies rapidly for small changes in the parameter, the parameter should be stated to high precision. If the log likelihood varies only slowly with changes in the parameter, the parameter should be stated to low precision. The simplest case is the multistate or multinomial distribution where the data is a sequence of independent values from such a distribution. The hypothesis, H, is an estimate of the probabilities of the various states (eg. the bias of a coin or a dice). The estimate must be stated to an "appropriate" precision, ie. in an appropriate number of bits. Applications and Related Areas
Related ResearchersDr. Lloyd Allison , Dr. Trevor Dix, Dr. David Dowe, Dr. Kevin Korb, the late Prof. Chris Wallace. 

