The 2nd (data) part of message,
-ln f(x|theta)
Define:
d2
F(theta) = Ex( -------- -ln f(x|theta) )
d theta2
NB. f(x|theta) = P(x|theta),
i.e. prob' of data given theta, for discrete data.Ex, expectation.The first (estimate) part of message:
h(theta) is prior probability density
function of theta.
State theta to ±s/2, assume s small, & h(theta) does not vary much over [theta-s/2, theta+s/2]
- ln( h(theta).s ) nits
theta' = theta + t, where -s/2<t<s/2
- ln f(x|theta')
= - ln f(x|theta + t)
d
= -ln f(x|theta) +t -------(-ln f(x|theta))
d theta
1 d2
+ - t2 --------(-ln f(x|theta)) + ...
2 d theta2
s2 d2
-ln f(x|theta) + --.--------( -ln f(x|theta))
24 d theta2
Add two parts of message together:
- ln(h(theta).s) - ln f(x|theta) s2 d2 + --.--------( -ln f(x|theta)) 24 d theta2differentiate w.r.t. s and set to zero
s2 = 12 / F(x,theta)
d2
where F(x,theta) = -------- -ln( f(x|theta))
d theta2
(NB. F(x,theta) is not F(theta), but the two are related...)
But this depends on x, which the receiver does not know.S2 = 12/( Ex f(x|theta).F(x,theta) ) = 12/F(theta)as x ranges over the data-space X. Both transmitter and receiver can evaluate F(theta).
1
- ln h(theta) -ln f(x|theta) + -ln F(theta)
2
1 1 F(x,theta)
- -ln 12 + -.----------
2 2 F(theta)
"what is usually done is to replace the last term [...] by 1/2"
(-Farr 1999 p41),
a reasonable approximation if
F(x,theta)-F(theta) is small over [theta-s/2, theta+s/2].
1
- ln h(theta) - ln f(x|theta) + -ln F(theta)
2
1 1
- -ln 12 + -
2 2
theta = <theta1, theta2, ..., thetan,>
F(x,theta)ij
d2
= -----------------( -ln f(x|theta))
d thetai d thetaj
F(theta) = SUMx:X f(x|theta).F(x,theta)
F(x,theta) and F(theta) are n×n matrices.
1
-ln h(theta) - ln f(x|theta) + -ln F(theta)
2
+ -(1 + ln(kn)) nits
where
kn are lattice constants
(re partitioning parameter space),
k1=1/12 and
kn->1/(2 pi e) = 0.0585498n->infinity
(Farr 1999 p43).
Strict MML (SMML) makes no simplifying assumptions, but may be mathematically and algorithmically difficult.
Some sources: