next up previous contents
Next: Binomial Distribution Up: Fourth Order extension Previous: Point estimation of the

   
Extension onto multi-variate distributions

The message length expression can also be extended to multi-variate distributions. However, there is a change in notation and a clarification of the meaning for certain variables. As for univariate distributions, the length of the first part of the message is represented mathematically by -


 \begin{displaymath}
-\log_e h(\vec{\theta})
\end{displaymath} (43)

Where $\vec{\theta}$ is a n-dimensional vector fully describing the model. Just like the one-dimensional case, $h(\vec{\theta})$ is approximated to a set of values. However, $h(\vec{\theta})$ is confined to a hyper-cube of values rather than an interval of values. An interval being a hyper-cube of dimension one.

The hypercube has sides of length $s_1(\vec{\theta}), s_2(\vec{\theta}), ..., s_n(\vec{\theta})$. For ease of calculation, a symmetric cube is assumed ( $s_1(\vec{\theta}) = s_2(\vec{\theta}) = ... = s_n(\vec{\theta})$) and that the prior $h(\vec{\theta})$ lies uniformly about the cube. This means that the volume of the cube is $s(\vec{\theta}) = \prod_i^n s_i(\vec{\theta}) = s_1(\vec{\theta})^n$. Furthermore, it is assumed that each of the parameters $\theta_i$ are independent of each other. Therefore, the length of the the message (c.f. ([*])) is


 \begin{displaymath}
\textrm{MesgLen~} = -\log_e \int_{cube}
h(\vec{\theta}^\prim...
...\log_e f(\vec{x}\vert\vec{\theta}^\prime) d\vec{\theta}^\prime
\end{displaymath} (44)

Applying the midpoint integration rule, the first part of the message can be approximated to be


\begin{displaymath}-\log_e \int_{cube} h(\vec{\theta}^\prime)d\vec{\theta}^\prime \approx -\log_e h(\vec{\theta}) s(\vec{\theta})
\end{displaymath} (45)

As before (subsection [*]), the techniques Wallace [#!Wallace:2000!#] has suggested, can be applied to ensure that $h(\vec{\theta}) s(\vec{\theta})$ does not exceed 1.

As with the one-dimensional case (subsection [*]), the Taylor series expansion of the negative-log-likelihood is taken -


 \begin{displaymath}
\begin{split}
-\log_e f(\vec{x}\vert\vec{\theta}^\prime) &= ...
...c{x}\vert\vec{\theta})) \\
& \quad + O(\theta_n^5)
\end{split}\end{displaymath} (46)

The expectation values (ie integrate over the hypercube) of the odd-order terms evaluate to zero as they are an odd function about $\vec{\theta}$, along at least one axis. With the second order case, when $(i \ne j)$, the terms also evaluate to zero as they are an odd function - along two axes - about $\vec{\theta}$. With the fourth order case there are two scenarios when the coefficients will be non-zero. The first is when i=j=k=l and the other is where there are two groups $(i=j) \ne (k=l)$. The resulting message length expression is


 \begin{displaymath}
\begin{split}
\textrm{MesgLen} &= -\log_e h(\vec{\theta})s(\...
...s_j(\vec{\theta})^2}{3456} G_{ij}(\vec{\theta}) \\
\end{split}\end{displaymath} (47)

where

 \begin{displaymath}
\begin{split}
\textrm{Let~~} F_i(\vec{\theta}) &= E_{\vec{x}...
...eta_i^4} -\log_e f(\vec{x}\vert\vec{\theta}) \Bigg]
\end{split}\end{displaymath} (48)

and assuming that $F_i(\vec{x}, \vec{\theta}) \approx F_i(\vec{\theta})$ and $G_{ij}(\vec{x}, \vec{\theta}) \approx G_{ij}(\vec{\theta})$.

The message length can then be minimised with respect to the spacing parameter $s(\vec{\theta})$. Since each of the parameters are independent, this can be done by minimising the message length with respect to $s_i(\vec{\theta})~\forall~i$.


 \begin{displaymath}
\begin{split}
0 &= \frac{-1}{s_i(\vec{\theta})} + \frac{s_i(...
...a}) s_j(\vec{\theta})^2}{864} G_{ij}(\vec{\theta})
\end{split}\end{displaymath} (49)

Generally speaking, this is a set of n non-linear equations, which is difficult to solve in practice, even though a unique solution exists theoretically. However, there was a previous assumption that the hypercube was symmetrical. Therefore, $s_1(\vec{\theta}) = s_2(\vec{\theta}) = ... = s_n(\vec{\theta})$ and the set of n non-linear equation collapses into a single non-linear equation. The assumption that the matrix Gij is symmetrical has also been made. This is reasonable as $\theta_i$ are assumed to be independent of each other.

Therefore, the optimal spacing can be derived by solving a single instance of equation ([*]), which can be reformulated (with the substitution $z_i(\vec{\theta}) = s_i(\vec{\theta})^2$) as


 \begin{displaymath}
\begin{split}
0 &= z_i(\vec{\theta})^2 \Bigg[\frac{G_{ii}(\v...
...i(\vec{\theta})\frac{F_i(\vec{\theta})}{12} - 1 \\
\end{split}\end{displaymath} (50)

This results in the expression for $z_i(\vec{\theta})$ as


 \begin{displaymath}
\begin{split}
z_i(\vec{\theta}) &= \frac{-20 F_i(\vec{\theta...
..._{j=1, i \ne j}^n \frac{5}{9} G_{ij}(\vec{\theta})}
\end{split}\end{displaymath} (51)

As a consistency check, it has been verified that when n=1, the message length and spacing equations collapse back to the one-dimensional fourth order case. It is trivial to check by noting the following changes


next up previous contents
Next: Binomial Distribution Up: Fourth Order extension Previous: Point estimation of the
Edmund Lam
2000-12-04