filler
^up^ [01] >>

Multistate and Multinomial Distributions

Online at   http://www.csse.monash.edu.au/~lloyd/Archive/2005-09-Mstate/index.shtml   with hyper-links to other resources.

[Recall] general message length form & Fisher:

Model complexity, -log(H), depends on precision parameter(s) are stated to.

F(T) = Ex det'( d2/d Ti Tj { -ln f(x|T)})     (the Fisher).

msgLen ~
                       1
-ln h(T) - ln f(x|T) + - ln F(T)
                       2

               1         1
             - - ln 12 + -
               2         2
T estimate a.k.a. theta,   x data,   Ex expectation,   h() prior on param(s),   f(x|T) likelihood.

filler
<< [02] >>

e.g. 3-States, M=3


filler
<< [03] >>
fig Fisher information for two parameters

filler
<< [04] >>

Negative log likelihood

-log LH =

-n1 log T1 - n2 log T2 - n3 log(1-T1-T2)

The Fisher information is the expected value of the determinant of the matrix of 2nd derivatives of the -log likelihood.


filler
<< [05] >>

1st derivates

d/d T1 {-log LH}  = -n1/T1 + n3/(1-T1-T2)

d/d T2 {-log LH}  = -n2/T2 + n3/(1-T1-T2)

2nd derivates

d2/d T12 {-log LH} = n1/T12 + n3/(1-T1-T2)2

d2/d T22 {-log LH} = n1/T22 + n3/(1-T1-T2)2

filler
<< [06] >>

off-diagonal entries

  d2/d T1 d T2 {-log LH} = n3/(1-T1-T2)2

= d2/d T2 d T1 {-log LH}

symmetric, as it happens.

The expection of n1 over the data space is N.T1, similarly for n2 and n3, so the Fisher is ...


filler
<< [07] >>

Fisher:

  |
  |
  |
  |
N/T1+N/T3   N/T3

N/T3        N/T2+N/T3
|
|
|
|

filler
<< [08] >>
  N2
= --
  T32
|
|
|
|
(1-T2)/T1      1

1      (1-T1)/T2
|
|
|
|

filler
<< [09] >>
   N2  (1-T2) (1-T1)
= ---.(------.------ - 1)
  T32    T1     T2


= (N2/T32).((1-T1-T2)/(T1 T2)

= N2/(T1.T2.T3)

-- Fisher for 3-states --

Putting this back in the general form gives our 2-part message length

filler
<< [10] >>
L.Allison

M-States

[It can be shown] that for M-states,

i.e. M-1 parameters, and for probabilities

T1, T2, ..., TM-1, & TM = 1-T1-...-TM-1,

T = <T1,...,TM-1>, that

F(T) = NM-1/(T1.T2...TM).
Putting this back in the general form gives our 2-part message length.

filler
<< [11] >>

The MML estimator

Assuming a uniform prior on T, the only parts of the general form that depend on the estimate for T are shown in red:
                       1
-ln h(T) - ln f(x|T) + - ln F(T)
                       2

                  1         1
                - - ln 12 + -
                  2         2
differentiate these parts w.r.t. Ti   . . .

filler
<< [12] >>

don't forget, TM = 1 - T1 - ... - TM-1


d msgLen
---     = 
d Ti

  ni
- --  
  Ti

  nM
+ --  
  TM

   1
- ---  
  2Ti

   1
+ --- = 0
  2TM
 

ni + 1/2
-------- =  
   Ti

nM + 1/2
---------
   TM

  for i=1..m-1
So
Ti = (ni + 1/2) / (N + M/2),  i=1..m

filler
<< [13]

Summary

For an M-state multistate distribution:

NB.
The no-inference, adaptive-code uses frequencies+1, efficient code, conservative "estimator".
The MML estimator uses frequences+1/2, good code, good estimator.
The maximum likelihood uses raw frequencies, no code specified, too aggressive estimator.

References


© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.
Created with "vi (IRIX)",   charset=iso-8859-1