Problem: Given a multivariate data-set
| attributes | |||||
|---|---|---|---|---|---|
| @1 | @2 | ... | @k | ||
| things | x1 | x11 | x12 | ... | x1k |
| x2 | x21 | x22 | ... | x2k | |
| ... | ... | ... | ... | ... | |
| xn | xn1 | xn2 | ... | xnk | |
Does the data-set consist of one population or of two or more sub-populations (clusters, classes, components, species)?
Is it well described by a mixture of models, one per sub-population?
www.csse.monash.edu.au/~lloyd/Seminars/2005-II/Mixture/index.shtmlestMixture ests dataSet = let -- [estimator]->[dataSpace] -> model of dataSpace -- i.e. [estimator] -> estimator ...
Takes a list of estimators, one per component of the mixture.
memberships (Mix mixer components) =
let -- memberships|Mixture
doAll (d:ds) = prepend (doOne d) (doAll ds) -- all data
doAll [] = map (\x -> []) components
doOne datum = normalise( -- one datum
zipWith (\c -> \m ->
(pr mixer c)*(pr m datum)) [0..] components)
-- pr(c) * pr(datum|c) for class #c = m
in doAll dataSet
Given the components of the mixture, find (fit) the fractional membership weights of things (data) in (to) the components.
randomMemberships =
let
doAll seed [] = map (\_ -> []) ests
doAll seed (_:ds) = -- all data
let
doOne seed [] ans = (seed, normalise ans)
doOne seed (_:ests) ans = -- one datum
doOne (prng seed) ests
((fromIntegral(1+ seed `mod` 10)) : ans)
in let (seed2, forDatum) = doOne seed ests []
in prepend forDatum (doAll seed2 ds)
in doAll 4321 dataSet
Allocate initial pseudo-random (prng) fractional membership weights to things (data), not very interesting.
fit [] [] = [] -- Models|memberships
fit (est:ests) (mem:mems) =
(est dataSet mem) : (fit ests mems)
fitMixture mems =
Mix (freqs2model (map (foldl (+) 0) mems)) -- weights
(fit ests mems) -- components
From the membership weights, calculate mixture-weights of the components and fit components (use the given estimators) to their weighted members.
cycle mx = fitMixture (memberships mx) -- EM step cycles 0 mx = mx cycles n mx = cycles (n-1) (cycle mx) -- n x cycle in mixture( cycles ?? (fitMixture randomMemberships) )
Fit memberships to components; fit components to the memberships. Iterate, either some number of times or until convergence.
-- That's all --