^up^ [01] >>

Fisher Information

Model complexity, -log(H), depends on precision parameter(s) stated to
general form based on Fisher information
shows sensitivity of -log P(D|H) to estimate
high sensitivity: [_________________________]
low sensitivity: [_________________________]

Online at http://www.csse.monash.edu.au/~lloyd/tilde/CSC4/CSE454/ including links to other resources.

<< [02] >>

Fisher (one parameter)

The 2nd (data) part of message, -ln f(x|theta).

Define:

                  d²
F(theta) = E_x( --------  -ln f(x|theta) )
               d theta²

NB. f(x|theta) = P(x|theta), i.e. prob' of data given theta, for discrete data.
NB. E_x, expectation.

<< [03] >>

Prior

The first (estimate) part of message:

h(theta) is prior probability density function of theta.

State theta to ±s/2, assume s small, & h(theta) does not vary much over [theta-s/2, theta+s/2]

Cost to state estimate is:: - ln( h(theta).s ) nits

NB. Integral of h(theta)=1.

<< [04] >>

Rules of the game

Transmitter sends an estimate theta from the code book
estimate represents [theta-s/2, theta+s/2]
choose theta to be optimal on average over interval
theta' = theta + t, where -s/2<t<s/2

<< [05] >>

theta' = theta + t, where -s/2<t<s/2

  - ln f(x|theta')

= - ln f(x|theta + t)

                       d
= -ln f(x|theta) +t -------(-ln f(x|theta))
                    d theta

    1      d²
  + - t² --------(-ln f(x|theta)) + ...
    2    d theta²

-- Taylor expansion

<< [06] >>

Average over [-s/2, s/2],: linear term vanishes; integral of t² over [-s/2,s/2] is s³/12

average is:

                 s²    d²
-ln f(x|theta) + --.--------( -ln f(x|theta))
                 24 d theta²

<< [07] >>

Precision, s

Add two parts of message together:


- ln(h(theta).s) - ln f(x|theta)

  s²    d²
+ --.--------( -ln f(x|theta))
  24 d theta²

differentiate w.r.t. s and set to zero

<< [08] >>

s² = 12 / F(x,theta)

                      d²
where F(x,theta) = -------- -ln( f(x|theta))
                   d theta²

(NB. F(x,theta) is not F(theta), but the two are related...)

But this depends on x, which the receiver does not know.
Have to use the expected quantity.

S² = 12/( E_x f(x|theta).F(x,theta) )

   = 12/F(theta)

as x ranges over the data-space X. Both transmitter and receiver can evaluate F(theta).

<< [09] >>

The Message Length

                               1
- ln h(theta) -ln f(x|theta) + -ln F(theta)
                               2
  1        1 F(x,theta)
- -ln 12 + -.----------
  2        2  F(theta)

"what is usually done is to replace the last term [...] by 1/2" (-Farr 1999 p41), a reasonable approximation if F(x,theta)-F(theta) is small over [theta-s/2, theta+s/2].

msgLen ~

                                1
- ln h(theta) - ln f(x|theta) + -ln F(theta)
                                2
                1        1
              - -ln 12 + -
                2        2

<< [10] >>

Fisher (multiple parameters)

theta = <theta₁, theta₂, ..., theta_n,>

F(x,theta)_ij

       d²
= -----------------( -ln f(x|theta))
  d theta_i d theta_j


F(theta) = SUM_x:X f(x|theta).F(x,theta)

F(x,theta) and F(theta) are n×n matrices.
The Fisher information is the determinant of F(theta).

<< [11] >>
msgLen ~

                               1
-ln h(theta) - ln f(x|theta) + -ln F(theta)
                               2

             + -(1 + ln(k_n)) nits

where k_n are lattice constants (re partitioning parameter space), k₁=1/12 and k_n->1/(2 pi e) = 0.0585498. as n->infinity (Farr 1999 p43).

<< [12]

Summary

General form for

message length and parameter accuracy
based on Fisher information.

NB. An approximation; it may break down if

h(theta) varies rapidly in region of estimate
e.g. for small amounts of data

Strict MML (SMML) makes no simplifying assumptions, but may be mathematically and algorithmically difficult.

Some sources:

C. S. Wallace & P. R. Freeman. Estimation and Inference by Compact Coding. J. Royal Stat. Soc. 49(3) pp240-265, 1987
G. Farr & C. S. Wallace. The Complexity of Strict Minimum Message Length Inference, Computer Journal 45(3) pp285-292, 2002, & TR97/321 Department of Computer Science, Monash University, Aug 1997
G. Farr. Information Theory and MML Inference. School of Computer Science and Software Engineering, 1999
Also see LA's bibliography.