next up previous contents
Next: Extension onto multi-variate distributions Up: MMLD Previous: MMLD

   
Derivation of optimal univariate coding region

As in section [*], the first step to finding the optimal coding region R is to specify the message length expression. The message length expression is an invariant version of equation ([*]).


 \begin{displaymath}
\textrm{MesgLen~} = -\log_e \int_R h(\theta) d\theta - \frac...
...t_R h(\theta)} \int_R h(\theta) \log_e f(x\vert\theta) d\theta
\end{displaymath} (19)

The expression is then minimised with respect to $\theta$.


 \begin{displaymath}
\begin{split}
0 &= \frac{-h(\theta)}{\int_R h(\theta) d\thet...
...eta} \int_R h(\theta)\log_e f(x\vert\theta) d\theta
\end{split}\end{displaymath} (20)

This can be interpreted as the ``negative-log-likelihood'' at the boundary of the coding region is equal to the expected (with respect to the prior) ``negative-log-likelihood'' throughout the region plus one [#!Dowe:private!#]. The optimal coding region is calculated from repeatedly solving equation ([*]) until a convergent coding region is found [#!Dowe:private!#]. This convergent coding region is the optimal coding region R [#!Dowe:private!#]. Therefore, an efficient algorithm to find the region would be as follows -

1.
Let the region Ri be the interval [ai, bi] and i = 0
2.
Let the initial region R0 be the maximum likelihood point (ie $a_0 = b_0 = \theta_{\textrm{maximum~likelihood}}$)
3.
Find the expected negative-log-likelihood value within the region.

\begin{displaymath}\textrm{Avg~} = -\bigg[\int_{R_i} h(\theta) d\theta \bigg]^{-1}
\int_{R_i} h(\theta)\log_e f(x\vert\theta) d\theta
\end{displaymath} (21)

4.
Find ai+1 such that $\textrm{Avg~} + 1 = - \log_e f(x\vert a_{i+1})$
5.
Find bi+1 such that $\textrm{Avg~} + 1 = - \log_e f(x\vert b_{i+1})$
6.
Until the region Ri converges (to a steady state), i = i+1 then go to step 3.

In practice, this method converges to a solution (the optimal region R) steadily. This numerical algorithm has shown itself to be numerically stable and well-behaved in experiments.

In numerical experiments later in this thesis, Simpson's rule has been used to integrate the expected ``negative-log-likelihood'' function. First order integration techniques (such as midpoint or trapezoidal rule) results in an unacceptably slow convergence rate compared with second order techniques. Step 4 was achieved using the bisection algorithm within the interval [amin, ai] and similarly for step 5 (within an interval of [bi, bmax]) where amin is the smallest value within the parameter-space and bmax is the largest. For the binomial problem later, amin = 0 and bmax = 1.


next up previous contents
Next: Extension onto multi-variate distributions Up: MMLD Previous: MMLD
Edmund Lam
2000-12-04