Multistate and Multinomial Distributions

LA home
Computing
 Algorithms
 Bioinformatics
 FP,  λ
 Logic,  π
 MML
 Prog.Langs

MML
 Discrete
  <2-State<
  m-state
   #demo
  >Integers>

  >Fisher>
  >KL-dist>

The MML estimator for an M-state distribution gives Ti=(ni+1/2)/(N+M/2), where ni is the number of observations of statei, during a total of N observations, N=∑i=1..M {ni}.

In general, the uncertainty region for the MML estimate for k-parameters T = <T1,T2, ...,Tk>, is about sqrt(12k/F(T)), where F(T) is the [Fisher information]. Note that k=M-1 for the multi-state distribution.

3-States, M=3

A 3-state source has two parameters, T1 and T2 in [0,1]; i.e. T=<T1,T2>. It is convenient to define T3, also in [0,1], where T3=1-T1-T2, but T3 is not a third (free) parameter. We observe n1 occurrences of state1, n2 of state2 and n3 of state3 where N=n1+n2+n3. The likelihood, LH = T1n1.T2n2.T3n3, so ...


-log LH = -n1 log T1 - n2 log T2 - n3 log(1-T1-T2)


d/d T1 {-log LH}   = -n1/T1 + n3/(1-T1-T2)

d/d T2 {-log LH}   = -n2/T2 + n3/(1-T1-T2)



d2/d T12 {-log LH} = n1/T12 + n3/(1-T1-T2)2

d2/d T22 {-log LH} = n1/T22 + n3/(1-T1-T2)2


d2/d T1 d T2 {-log LH} = n3/(1-T1-T2)2

                       = d2/d T2 d T1 {-log LH}

The expection of n1 over the data space is N.T1, similarly for n2 and n3, so the Fisher is ...

|
|
|
|
N/T1+N/T3   N/T3

N/T3        N/T2+N/T3
|
|
|
|
  N2
= --
  T32
|
|
|
|
(1-T2)/T1      1

1      (1-T1)/T2
|
|
|
|
   N2  (1-T2) (1-T1)
= ---.(------.------ - 1)
  T32    T1     T2


= (N2/T32).((1-T1-T2)/(T1 T2)

= N2/(T1.T2.T3)

M-States

It can be shown that for M-states, i.e. M-1 parameters, and probabilities T1, T2, ..., TM-1, and Tm=1-T1-...-TM-1, T=<T1,...,TM-1>, that F(T)=NM-1/(T1.T2...Tm), e.g. [click here . . .].

Demonstration

YET TO ADD 2-PART MESSAGE CALCULATIONS.

Use the HTML FORM below to generate a data sample for specified probabilities and length N. The `code' button calculates message lengths for various codes. NB. the appoximations used may break down for very small values of N.


L
.
A
l
l
i
s
o
n

Data Generation:

Alphabet=[]
pr= []
N = []

Inference:

fixed code: p' = []

maxLH: []

Don't worry if JavaScript gives a ridiculous number of decimal places.

Notes

  • C. S. Wallace & D. M. Boulton. An Information Measure for Classification. Computer Journal 11(2) pp185-194, Aug 1968
    (see the appendix) and ...
  • D. M. Boulton & C. S. Wallace. The Information Content of a Multistate Distribution. J. Theor. Biol. 23 pp269-278, 1969.
    When these papers were written there were different notions of the information content of a sequence in use in the literature. W&B showed that if the calculations were done correctly and if all information was truly taken into account then these notions gave essentially the same answer.
And a delightful piece of trivia about dice via Dean McKenzie [7/1999]:
'Several decades ago, the Harvard statistician Frederick Mosteller had an opportunity to test the [dice-tossing] model against the behavior of real dice tossed by a real person. A man named Willard H. Longcor. who had an obsession with throwing dice, came to him with an amazing offer to record the results of millions of tosses. Mosteller accepted, and some time later he received a large crate of big manilla envelopes, each of which contained the results of twenty thousand tosses with a single die and a written summary showing how many runs of different kinds had occurred. "The only way to check the work was by checking the runs and then comparing the results with theory," Mosteller recalls. "It turned out [Longcor] was very accurate" Indeed, the results even highlighted some errors in the then-standard theory of the distribution of runs'.
- Peterson, I. (1998) The Jungles of Randomness. Penguin, London. pp7-8. (originally published 1998 by Wiley, New York).
window on the wide world:

Computer Science Education Week

Linux
 Ubuntu
free op. sys.
OpenOffice
free office suite,
ver 3.4+

The GIMP
~ free photoshop
Firefox
web browser
FlashBlock
like it says!

© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Faculty of Information Technology (Clayton), Monash University, Australia 3800 (6/'05 was School of Computer Science and Software Engineering, Fac. Info. Tech., Monash University,
was Department of Computer Science, Fac. Comp. & Info. Tech., '89 was Department of Computer Science, Fac. Sci., '68-'71 was Department of Information Science, Fac. Sci.)
Created with "vi (Linux + Solaris)",  charset=iso-8859-1,  fetched Sunday, 26-Oct-2014 06:44:15 EST.