next up previous contents
Next: Symmetric coding region Up: Derivation of the second-order Previous: Derivation of the second-order

   
Fundamental MML approximation

The technique used by Wallace and Freeman [#!Wallace.Freeman:1987!#] is to define a coding region $R=[\theta-a(\theta), \theta+b(\theta)]$ about parameter $\theta$ so that the continuous model-space is quantised into discrete sets. Indeed, the fundamental problem in MML inference theory is how to optimally divide the model-space into discrete coding regions. Herein lies MML's first approximation to SMML. For continuous distributions, MML's optimal spacing parameter is a function of the model $\theta$. On the other hand, SMML divides the model-space depending upon the data x [#!Dowe:private!#].

As a result of combining similar models into a set, the message length is not simply the probabilities of the model $\theta$. Instead, it is the sum of all alternate theories within the set or coding region. Therefore, the message length (now in nits) of the model is -


 \begin{displaymath}
-\log_e \int_ {\theta - a(\theta)}^{\theta+b(\theta)} h(\theta^\prime)d\theta^\prime
\end{displaymath} (3)

Where $R=[\theta-a(\theta), \theta+b(\theta)]$ is the coding region.

Therefore to optimally encode the data given the model, it would take $-\log_e f(x\vert\theta^\prime)$ nits, where $\theta^\prime$ is the model in the first part of the message. However, $\theta^\prime$ can be any value within the coding region $R=[\theta-a(\theta), \theta+b(\theta)]$. Therefore, the average or expectation value of $-\log_e f(x\vert\theta^\prime)$ is taken over the coding region.


 \begin{displaymath}
E_{\theta^\prime}[-\log_e f(x\vert\theta^\prime)] = \frac{1}...
...theta+b(\theta)} -\log_e f(x\vert\theta^\prime) d\theta^\prime
\end{displaymath} (4)

(Note that nits (natural bits) have been used as the unit of information. This is purely for mathematical convenience. However, the total message length can always be multiplied by the constant $\log_2 e \approx 1.443$ to convert nits to bits.)

Therefore, the total message length has the form of the following expression.


 \begin{displaymath}
\textrm{MesgLen~} = -\log_e \int_ {\theta - a(\theta)}^{\the...
...theta+b(\theta)} -\log_e f(x\vert\theta^\prime) d\theta^\prime
\end{displaymath} (5)

This equation is the basis of the MML technique. The above equation ([*]) does not have the property of invariance [#!Dowe:private!#]. However, an invariant version [#!Dowe:private!#] is used for the MMLD approximation technique (section [*]).


next up previous contents
Next: Symmetric coding region Up: Derivation of the second-order Previous: Derivation of the second-order
Edmund Lam
2000-12-04