^up^
[01]
>>
Consider a discrete sample space of M unordered values,
e.g.
- throw = {head, tail}
M = 2
- base = {A, C, G, T}
M = 4
- roll = {1, 2, 3, 4, 5, 6} M = 6. NB. unordered
- amino acid = {Glycine, Alanine, Valine, Isoleucine, Leucine,
Phenylalanine, Proline, Methionine,
Serine, Threonine, Tyrosine, Tryptophan
Aspargine, Glutamine, Cysteine,
Aspartic acid, Glutamic acid, Lysine, Arginine, Histidine}
M = 20
and sequences of these.
This
document is online at
http://www.csse.monash.edu.au/~lloyd/Archive/2005-04-Fin-state/index.shtml
and contains hyper-links to other resources.
<<
[02]
>>
Distribution has M-1 parameters
T1,
T2, ...,
TM-1.
M-1 degrees of freedom.
Also define
TM = 1 - T1 - T2 ... - TM-1
<<
[03]
>>
Estimators
From data, observed frequencies are
n1, ..., nM,
let N = SUMi=1..M ni.
Maximum likelihood:
Ti,ML = ni/N
what if ni=0?
Minimum Message Length:
Ti,MML = (ni + 1/2)/(N + M/2)
MinEKL estimator:
Ti,MinEKL = (ni + 1)/(N + M)
minimum expected Kullback Leibler
<<
[04]
>>
- discrete sample spaces (as seen) and also
- model of the "class" attribute in supervised classification
- sub-model on 1st-order Markov model
- proportions of the classes in a mixture model
(unsupervised classification)
- frequency of transitions out of a state in a
Probabilistic Finite State Automaton
(PFSA, hidden Markov model, HMM) . . .
<<
[05]
>>
Note finite number of transitions out of each state of automaton