|
Abstract: This report describes the current state of software, version 2003 09 (tested with ghc 6.0.1) in the project Inductive Inference using Haskell. The main aim of the project is to understand and to model(!), from a programming point of view, "statistical models" as used in artificial intelligence, data mining, inductive inference, machine learning, and statistical inference. Haskell's type-system is used as the analytical tool. Types and type-classes are given for (basic) models including probability distributions, function models including regressions, time-series models including Markov models, mixture models (unsupervised classification, clustering), classification trees (supervised classification), and operators and conversion functions on these. In addition, some important and well-known instances of statistical models and of inference algorithms for them are also given, first to make a general claim for being practical, and also to study their structures and parts. A simple algorithm is sometimes given as a proof of concept; any more sophisticated algorithm(s) can obviously be programmed in Haskell, in principle, but this is not the main aim just now. Keywords: Class, inductive inference, machine learning, model, probability, statistics, type. |
The project was initially motivated by the observation of several postgraduate students and other researchers in machine learning over more than a decade: Often a student devises a new (or extended) "model" for some kind of problem, does some hard mathematics on estimators for the model (i.e. is not just a hacker), implements the ideas in a computer program (usually in C or C++, i.e. can also do practical stuff), compares the results of the program with competitive methods (hopefully beating enough of them often enough), writes a thesis, gets a PhD, and leaves. The programs left behind, not always but often, have rudimentary I/O, use annoyingly different I/O formats from each other, are "delicate" on extreme kinds of data, come with little if any documentation, are hard to move to another platform, and are not as general as they could be. (All this is quite understandable because a postgraduate student is under a lot of pressure to finish quickly and a researcher tends to concentrates on her or his immediate problem; naturally I am guilty too.) Consequently it can be difficult for others to use these programs, or use two or more in combination, or modify or extend them. Many basic components are reinvented over and over again, sometimes with small variations and new bugs etc., e.g. I/O or basic distributions. Opportunities for generalization are lost.
It seemed that if there was a usable "theory" (model, semantics, library, prelude) of the "statistical models" that are produced in inductive inference, it might help to reduce the reinvention of basic components, and to increase sharing of components, which would lead to more reliable, more general and lighter code.
It is therefore necessary to understand just what is a "statistical model", for want of a name, from a programming point of view in artificial intelligence, data mining, inductive inference, machine learning, and/or statistical inference, call the area what you will. It is very desirable to have a theory that can be used -- type-checked, compiled, and run -- and the functional programming language Haskell is the best piece of "programming technology" for this kind of exercise. It was also obvious that polymorphic types and high-order functions would be very useful for programming with statistical models, just as they are for programming with functions. Such ideas, in less general forms, found their ways into application programs (in C, Java and Pascal) for particular inference problems[APD99,ASED00].
On examination, "what is a statistical model from a programming point of view" turns out to be an interesting question. Looking for analogies to the question and possible answers, one thinks of list processing and the body of operators that has grown up for it over 40+ years; it seems likely that programming with statistical models should be at least as rich. "Function" is to "functional programming" as "statistical model" is to what?
This first step of the project is very much a proof of concept, and deliberately makes conservative choices for some options, e.g. sticking with the standard Haskell-98 language, unless there is a strong reason to do otherwise. Note that since the first publications[All03a,All03b], the names of some functions (methods) have been revised, and the straightforward, but important, ability of models to show (print) themselves has been implemented.
The Haskell code itself might or might not grow into a useful package in its own right. It will be nice if it does. Even if it does not, the lessons learned can be incorporated into software written in other languages.
Haskell is a functional programming language development of which began in the 1990s. It features lazy evaluation, parametric polymorphic types with a type-inference algorithm, and type classes -- which allow overloading of operators. Haskell is a pure functional language, i.e. there are no side-effects within a Haskell program. Nevertheless a program can interact fully with the operating system (using monadic I/O) and can create, read and write files etc..
Most code in this report uses the Haskell-98 language [PJ03] in the interests of familiarity and standardisation. However the Haskell community is very active in research of type-systems and many extensions have been proposed, some of them being available in the "Glasgow Haskell Compiler", ghc, under the control of command-line flags. It seems possible that a new Haskell-X will be standardised in the not too distant future. Haskell-98 is powerful enough for most of the requirements of the inductive inference project at this stage, but some experiments, not described here, have investigated the use of type extensions.
Haskell has type-classes to allow ad-hoc polymorphism, i.e. where a function (operator) name is overloaded by different sequences of code to be executed depending on the types of its parameter(s). (Note in contrast that map is uniform and executes the same code regardless of the types t and u. Although note that there is also a class Functor with fmap, and that list ([]) is an instance with map.)
Much active research in functional programming centres around the best form of type classes, their generalisations and uses.
Experience with classes on imperative object-oriented languages such as C++, Java and C# may be some help to the new Haskell programmer, but perhaps less than he or she might think. The notion of an "object" as an instance of a class tends to focus the mind on a "state", but the emphasis is different in functional languages. There is also the fortunate absence of the schizophrenia of some imperative O.O. languages over "basic types" and "classes": In Haskell a value has a type and a type can belong to one or more classes; values, types and classes are in clearly separated worlds.
It is clear that computer science has not yet agreed on the perfect class system. It is also interesting to note for example that Haskell has no built-in class Function (overloading juxtaposition "f x" and "$"), nor class Tuple (overloading "fst", "snd", etc.) nor Pair, nor many other "natural" classes. Some of the reasons must be historic and some technical.
Any system for inductive inference must address the problem of overfitting: In general a more complex model will likely fit data better than a simpler model. For example, a quadratic fitted to points {(xi,yi)}, is likely to give a lower squared error than a straight line. The problem is to decide if the extra complexity of the quadratic over the line worth it. To solve the problem this work uses minimum message length (MML) inference[WB68,WF87,WF02], a Bayesian method based on information theory.
It is important to note that msglen(H) for a hypothesis (model, theory), H, includes the statement of any continuous valued parameters to (finite) optimum precision. Without this, Bayesianism often does not make sense.
It should be clear that MML could, in principle, be replaced by some other "compositional" method of assessing the complexity of models, provided that it can work on combinations of sub-models used to build larger models. (A replacement might not work as well as MML.) It is even conceivable that the method could be "abstracted out" of the algorithms. In any case, the code described works in the MML framework.
02Classes.hs defines the more important types and classes for "statistical models".
Note that wallaceIntModel, a universal model over integers >=0, is defined in this module, rather than the next, for convenience: It is used as the default model for "lengths" when a model of a data-space is converted into a model of sequences over that data-space.
03Models.hs defines some (basic) models, distributions and estimators. As noted above, wallaceIntModel should, logically, be defined in this module but is in the previous one for convenience.
Note that mixture modelling "requires" weighted estimators (the use of unweighted estimators gives biased mixtures in general). A weighted estimator accepts data and also weights, e.g. a datum can be 0.6 present; this is needed for fractional membership of a datum in a component of a mixture model. For this reason weighted and unweighted estimators are defined. (Of course an unweighted estimator could invoke the weighted one with weights of 1.0 -- at a performance penalty. The "best" organisation of estimators, weighted and unweighted, is not yet settled.)
04FnModels.hs defines function-models, including regressions.
05TSModels.hs is for time-series models.
06MixtureModels.hs defines mixture modelling, i.e. clustering, unsupervised classification, numerical taxonomy.
The main business of the module is the definition of estMixture, a simple estimator for a mixture model. It is given two parameters -- a list of weighted estimators and a data-set. In this simple example there is one estimator per component of the mixture. The estimator is therefore told in advance how many components there will be; it is not hard to remove this restriction, in principle, but our primary concern is with types and classes rather than algorithms. (The simple estimator could also be run for 1, 2, 3, ..., up to some reasonable limit number of components, and the message lengths compared, to select the best.) The estimators can be heterogeneous but they must all have the same type; they must all be models of the given data-space.
A gradient descent (~expectation maximization (EM)) algorithm is used. Initially data are allocated (fractional) memberships of components at random. Parameters of the components are (re-)estimated from the memberships. The data are re-allocated memberships of the revised components. This process is iterated "for a while". (A similar algorithm could easily be written for unweighted estimators but it would give biased estimates if it made "deterministic" allocations of memberships; stochastic allocation performs well given enough data.)
A simple test is based on a game played with a silver coin and a gold coin.
If the two-component hypothesis, mix2, has a shorter total message length than a one-component hypothesis, something fishy is going on in the game.
07CTrees.hs defines classification-trees, sometimes called decision-trees. These are also generalised to function-model trees, i.e. regression-trees.
A classification- (decision-) tree consists of leaf-nodes, CTleaf, and fork-nodes, CTfork. A leaf-node contains a model of some target output attribute(s), and a fork-node contains a test, a Splitter, on one (or potentially more) input attribute. A classification-tree is therefore an implementation of a function-model.
There is an opportunity for a class Function above. (The CTforkFM option of CTreeType is experimental and little exploited. Its purpose is to allow function-models to control mixtures over the sub-trees.)
The estimator for a classification-tree takes four parameters: an estimator for the leaf models, a function that produces lists of possible Splitters given an input data-set, an input data-set, and a corresponding output data-set. (There are cases where it would be more convenient for input and output pairs to come zipped and other cases where it is better for them to come unzipped.)
Note that the splits parameter could be removed, that is made implicit, thanks to class Splits -- see 01Utilities.hs.
Presently, the simplest zero-lookahead search algorithm is implemented: A one leaf tree is compared with each possible one-fork (+leaves) tree, and the best tree, in MML terms, is kept. If the one-leaf tree wins then that is the end of the search. Otherwise the search is continued recursively for each sub-tree with its corresponding input and output data sets. (Zero-lookahead search can form the basis of 1-lookahead search, and hence of k-lookahead search, in the obvious way.) The present MML calculations are "acceptable" but simplified from the original method[WP93] and they should be "tightened up". (The present algorithm can easily be speeded up in some obvious ways but is fast enough for our purposes, particularly on categorical data.)
Note that a leaf model can be absolutely any model of
any ouput-space -- be it discrete, continuous, multivariate --
for which there is an estimator,
or even a function-model converted to a (basic) model
(see 02Classes.hs)
which leads to a function-model tree, i.e. a regression-tree.
This is more general than classification-trees and classification-graphs
produced in CSSE to date, and more general than
C4.5
The code is an experimental prototype; there are obvious gaps to be filled, and algorithmic improvements to be made etc. but this is justified considering the primary aim. Some extensions, e.g.[FAC03], are under consideration. The greatest potential, and the biggest challenge, is to use Haskell's type-classes on this application in the best possible way.
|
Also see:
L. Allison.
Models for machine learning and data mining in functional programming.
[doi:10.1017/S0956796804005301],
|
-- See: L. Allison. Types and classes of machine learning and data mining.
-- 26th Australasian Computer Science Conference (ACSC) pp.207-215,
-- Adelaide, February 2003
-- L. Allison. Models for machine learning and data mining in
-- functional programming. doi:10.1017/S0956796804005301
-- J. Functional Programming, 15(1), pp.15-32, Jan. 2005
-- Author: Lloyd ALLISON lloyd at bruce cs monash edu au
-- http://www.csse.monash.edu.au/~lloyd/tildeFP/II/200309/
-- This program is free software; you can redistribute it and/or modify it
-- under the terms of the GNU General Public License (GPL) as published by
-- the Free Software Foundation; either version 2 of the License, or (at
-- your option) any later version. This program is distributed in the hope
-- that it will be useful, but without any warranty, without even the implied
-- warranty of merchantability or fitness for a particular purpose. See the
-- GNU General Public License for more details. You should have received a
-- copy of the GNU General Public License with this program; if not, write to:
-- Free Software Foundation, Inc., Boston, MA 02111, USA.
module SM_Utilities (module SM_Utilities) where -- export all
latticeK1 = 1/12 -- lattice
latticeK2 = 5/(36 * sqrt(3)) -- constants
-- see J.H.Conway and N.J.A.Sloane. Sphere Packings, Lattices and Groups.
-- Springer 1998 3rd edn.
data Throw = H | T deriving (Enum, Bounded, Eq, Ord, Show) -- Mr. Bernoulli
instance Splits Throw where splits = splitsBE -- Splits Throw
-- ----------------------------------------------------------------------------
assert p x = assert1 p x "assertion failed"
assert1 p x why = if p x then x else error why
log2 = log 2.0
-- logPlusBase b (-logBase b p1) (-logBase b p2) = -logBase b (p1+p2)
logPlusBase b nlPr1 nlPr2 = -- NB. this is "OK" but there are faster ways.
let (bigger, smaller) =
if nlPr1 >= nlPr2 then (nlPr1, nlPr2) else (nlPr2, nlPr1)
diff = bigger - smaller -- is non-negative
eps = b ** (-diff) -- between 0 and 1
nits = log b
in if diff * nits > 30 then smaller else smaller - (logBase b (1+eps))
-- the cutoff of 30 nits is somewhat(!) arbitrary.
-- and base e ...
logPlus nlPr1 nlPr2 = -- NB. this is "OK" but there are faster ways.
let (bigger, smaller) =
if nlPr1 >= nlPr2 then (nlPr1, nlPr2) else (nlPr2, nlPr1)
diff = bigger - smaller -- is non-negative
eps = exp (-diff) -- between 0 and 1
in if diff > 30 then smaller else smaller - (log (1+eps))
normalise xs = -- scale xs to make sum to 1.0
let (ans, finalTotal) = n xs 0
n [] total = ([], total)
n (x:xs) total =
let (rest, sum) = n xs (total+x)
in ((x/finalTotal):rest, sum)
in ans
stirling n = -- Stirling's approximation
let ans = (n + 0.5)*(log n) - n + (log(2*pi)) / 2 -- ~ logBase e (n!)
in max ans 0
stirlingBase b n = (stirling n) / (log b) -- and to base b
-- ----------------------------------------------------------------------------
prng seed = (seed * (1 + 4*37*109) + 9999) `mod` (32 * 1024)
-- Pseudo Random Number Generator
-- ([0 .. 32K-1], only a 32K cycle, really should use a better one!)
-- Linear Congruential Pseudo Random Number Generator,
-- see: D. E. Knuth, The Art of Computer Programming, Vol. 2.
-- ----------------------------------------------L.Allison--CSSE--Monash--.au--
-- The class Splits and associated type and functions are primarily to do
-- with partitioning data, as in classification-trees do, although they might
-- have more uses, e.g. in clustering/ unsupervised-classification.
class Splits t where
splits :: [t] -> [Splitter t] -- i.e. can find ways of partitioning t-lists;
-- these ways should be proposed in order of decreasing prior-probability.
data Splitter t = Splitter Int (t -> Int) (() -> String)
-- i.e. arity partition_fn description
instance Show (Splitter t) where
show (Splitter _ _ description) = description()
aritySplitter (Splitter n _ _) = n
applySplitter (Splitter _ f _) d = f d -- apply it to one datum
-- partition a data-set, ds, into n parts according to s
split s ds = partition (aritySplitter s) (map (applySplitter s) ds) ds
-- The following is not allowed :-(
-- instance (Bounded t, Enum t) => Splits t where splits = splitsBE
-- because ... The instance type must be of the form (T a b c) ...
-- Splits discrete_type ...
instance Splits Bool where splits = splitsBE -- e.g. Splits Bool
-- Splits continous type ...
instance Splits Float where splits = splitsOrd -- e.g. Splits Float
instance Splits Double where splits = splitsOrd -- e.g. Splits Double
splitsBE [] = [] -- calculate the Splitter's for a Bounded Enum type
splitsBE (d:ds) =
let { lwb = minBound `asTypeOf` d; lwbn = fromEnum lwb;
upb = maxBound `asTypeOf` d; upbn = fromEnum upb;
arity = upbn - lwbn + 1 }
in if all ((==) d) ds then [] -- all the same, no variety in data set
else [Splitter arity (\u -> (fromEnum (u `asTypeOf` d) - lwbn))
(\()->"="++(show lwb)++(if arity > 2 then ".." else "|")++(show upb))]
-- If the type is also Ord(ered, e.g. Bad|Poor|...|VG) and if
-- arity is quite big, then it might be better to use splitsOrd lwb..upb .
splitsOrd ds = -- calculate Splitter's for an Ord(ered, inc' continuous) type
let splitPoints [] = [] -- lines like the next make it all worthwhile
splitPoints ds = (medianEtc . tail . unique . msort) ds -- !
in map (\cut -> Splitter 2 (\y -> if y < cut then 0 else 1)
(\() -> "<|>=" ++ show cut)) (splitPoints ds)
-- Splits tuples ...
instance (Splits t1, Splits t2) => Splits (t1, t2) where -- e.g. Splits Pair
splits xys =
let (xs, ys) = unzip xys
in interleave (splitsAttr 0 fst xs)
(splitsAttr 1 snd ys)
-- Note, we get hierarchical dataspaces done for free.
instance (Splits t1, Splits t2, Splits t3) => Splits (t1,t2,t3) where -- Triple
splits xyzs =
let (xs, ys, zs) = unzip3 xyzs
sx = splitsAttr 0 (\(x,_,_) -> x) xs
sy = splitsAttr 1 (\(_,y,_) -> y) ys
sz = splitsAttr 2 (\(_,_,z) -> z) zs
in interleave3 sx sy sz
instance (Splits t1, Splits t2, Splits t3, Splits t4) =>
Splits (t1,t2,t3,t4) where -- Quad
splits wxyzs =
let (ws, xs, ys, zs) = unzip4 wxyzs
sw = splitsAttr 0 (\(w,_,_,_) -> w) ws
sx = splitsAttr 1 (\(_,x,_,_) -> x) xs
sy = splitsAttr 2 (\(_,_,y,_) -> y) ys
sz = splitsAttr 3 (\(_,_,_,z) -> z) zs
in interleave4 sw sx sy sz
splitsAttr n sel xs = -- selected attribute
-- get splits for nth attribute of multivariate dataset
-- NB. sel must select the nth attribute !
let prefix = "@" ++ (show n)
in map (\(Splitter n f d)->Splitter n (f.sel) (\()->prefix++d())) (splits xs)
-- ------------------------------------- make tuples instances of class Enum --
-- NB. tuples of Bounded types are already in Bounded
instance (Enum t1,Bounded t1, Enum t2,Bounded t2) => Enum (t1,t2) where -- Pair
fromEnum (v1,v2) = -- NB. (minBound, minBound') -> 0
let min1 = fromEnum (minBound `asTypeOf` v1)
min2 = fromEnum (minBound `asTypeOf` v2)
max2 = fromEnum (maxBound `asTypeOf` v2)
width = max2 - min2 + 1
in (fromEnum v1 - min1)*width + (fromEnum v2 - min2)
toEnum = -- had a big battle to remove 'ambiguous type' errors in this
let (mnB1, mnB2) = minBound `asTypeOf` (te 0) -- !
(mxB1, mxB2) = maxBound `asTypeOf` (mnB1, mnB2)
(min1, min2) = (fromEnum mnB1, fromEnum mnB2)
(max1, max2) = (fromEnum mxB1, fromEnum mxB2)
width = max2 - min2 + 1
te n = (toEnum(min1 + n `div` width), toEnum(min2 + n `mod` width))
in te
instance (Enum t1,Bounded t1, Enum t2,Bounded t2, Enum t3,Bounded t3) =>
Enum (t1,t2,t3) where -- triple
fromEnum (v1,v2,v3) = fromEnum (v1,(v2,v3))
toEnum n = let (v1,(v2,v3)) = toEnum n in (v1,v2,v3)
instance (Enum t1,Bounded t1, Enum t2,Bounded t2, -- quad
Enum t3,Bounded t3, Enum t4,Bounded t4) => Enum (t1,t2,t3,t4) where
fromEnum (v1,v2,v3,v4) = fromEnum (v1,(v2,v3,v4))
toEnum n = let (v1,(v2,v3,v4)) = toEnum n in (v1,v2,v3,v4)
instance (Enum t1,Bounded t1, Enum t2,Bounded t2, Enum t3,Bounded t3, -- five
Enum t4,Bounded t4, Enum t5,Bounded t5) => Enum (t1,t2,t3,t4,t5) where
fromEnum (v1,v2,v3,v4,v5) = fromEnum (v1,(v2,v3,v4,v5))
toEnum n = let (v1,(v2,v3,v4,v5)) = toEnum n in (v1,v2,v3,v4,v5)
-- ------------------------------9/2002--9/2003--L.Allison--CSSE--Monash--.au--
-- some ordinary but useful List functions...
unzip4 = foldr (\(w,x,y,z) ~(ws,xs,ys,zs) -> (w:ws, x:xs, y:ys, z:zs))
([], [], [], []) -- standard prelude has up to unzip3
-- given a list of heads & a list of tails (lists), put each head on its tail.
prepend heads tails = zipWith (:) heads tails
addToList n x lsts = -- add x to the nth List
let add n [] = add n [[]] -- prem' end of Lists, so lengthen!
add 0 (lst:lsts) = (x:lst) : lsts -- add x to this lst
add n (lst:lsts) = lst:(add (n-1) lsts)
in add n lsts
partition n indxs elts = -- partition elts into n parts according to indxs
let p [] [] ans = ans
p (i:is) (x:xs) ans = p is xs (addToList i x ans) -- add x to i-th part
in p indxs elts (replicate n [])
interleave [] ys = ys
interleave xs [] = xs -- interleave the elements of two lists
interleave (x:xs) (y:ys) = x : y : (interleave xs ys)
interleave3 [] ys zs = interleave ys zs
interleave3 xs [] zs = interleave xs zs
interleave3 xs ys [] = interleave xs ys
interleave3 (x:xs) (y:ys) (z:zs) = x : y : z : (interleave3 xs ys zs)
interleave4 ws xs ys zs = interleave (interleave ws ys) (interleave xs zs)
unique [] = [] -- POST: no duplicates in output
unique (x:xs) = -- PRE: any duplicates in input are adjacent
let u _ [] = []
u z (x:xs) = if z == x then u z xs else x : (u x xs)
in x : (u x xs)
merge [] bs = bs
merge as [] = as -- merge two sorted(!) lists
merge (aas@(a:as)) (bbs@(b:bs)) =
if a <= b then a : (merge as bbs) else b : (merge aas bs)
msort [] = []
msort [x] = [x] -- merge sort
msort inList = -- an (unstable) merge-sort, cos it is simple!
let (as, bs) = splt inList [] []
splt [] as bs = (as, bs)
splt (x:xs) as bs = splt xs bs (x:as)
in merge (msort as) (msort bs)
-- NB. This code for medianEtc is an improvement on the 2002 test code both
-- in complexity and also in the order in which its results are produced.
medianEtc [] = [] -- returns [median, quartiles, octiles, ...]
medianEtc s = -- PRE: s is sorted
let add _ 0 _ 0 _ ss = ss -- n1 = n2 = 0
add _ n1 s1 0 _ ss = ((n1,s1):ss) -- n1 > 0, n2 = 0
add _ 0 _ n2 s2 ss = ((n2,s2):ss) -- n1 = 0, n2 > 0
add True n1 s1 n2 s2 ss = (n1,s1) : (n2,s2) : ss -- n1 > 0, n2 > 0
add False n1 s1 n2 s2 ss = (n2,s2) : (n1,s1) : ss -- switch _2, _1
select fwd [] [] = []
select fwd [] level = select (not fwd) level [] -- down a level
-- param 3 of select is an accum'ing buffer of next level of xyz-iles
select fwd ((n,s):l1) l2 = -- NB. n == |s| (usable part), n >= 1
let n1 = n `div` 2 -- 1->0, 2->1, 3->1 ... i.e. |small| (proper)
n2 = n - n1 -- 1->1, 2->1, 3->2 ... i.e. |median:big|
small = s -- or at any rate small's 1st n1 elements
(median:big) = drop n1 s
in median : (select fwd l1 (add fwd n1 small (n2-1) big l2))
-- note the breadth-first traversal
in select True [(length s, s)] []
-- Guaranteed "OK" behaviour, alternatively might try Hoare's Find, one day.
-- Neighbouring values in the result are as similar as possible.
-- ------------------------------9/2002--9/2003--L.Allison--CSSE--Monash--.au--
test01 = print "-- test01 --"
>> print( "prng 17 ... = " ++ show( take 8 (iterate prng 17) ) )
>> (let sb1010 = stirlingBase 10 10
in print("stirlingBase 10 10 = "++show sb1010
++ ", 10**_="++show (10**sb1010) ))
>> print( "normalise [1,1,2,4] = " ++ show( normalise [1,1,2,4] ) )
>> print( "logPlus 7 7 = " ++ show( logPlus 7 7 ) )
>> print( "logPlusBase 2 7 7 = " ++ show( logPlusBase 2 7 7 ) )
>> print( "partition 2 ... ... = "
++ show( partition 2 [0,1,3,0,1] ["fst1","snd1","fth","fst2","snd2"] ) )
>> print( "msort [3,1,6,5,...] = " ++ show( msort [3,1,5,3,3,2,4] ) )
>> print( "medianEtc ... = "
++ show( medianEtc [1,2,3,4,5,6,7, 8, 9,10,11,12,13,14,15] ) )
>> print( "splits[(H,(True,..) = "
++ show( splits [(H,(True,4.0)),(H,(False,1.0 :: Double)),
(T,(True,3.0)),(H,(True, 2.0))]))
-- ----------------------------------------------------------------------------
-- See: L. Allison. Types and classes of machine learning and data mining.
-- 26th Australasian Computer Science Conference (ACSC) pp.207-215,
-- Adelaide, February 2003
-- L. Allison. Models for machine learning and data mining in
-- functional programming. doi:10.1017/S0956796804005301
-- J. Functional Programming, 15(1), pp.15-32, Jan. 2005
-- Author: Lloyd ALLISON lloyd at bruce cs monash edu au
-- http://www.csse.monash.edu.au/~lloyd/tildeFP/II/200309/
-- This program is free software; you can redistribute it and/or modify it
-- under the terms of the GNU General Public License (GPL) as published by
-- the Free Software Foundation; either version 2 of the License, or (at
-- your option) any later version. This program is distributed in the hope
-- that it will be useful, but without any warranty, without even the implied
-- warranty of merchantability or fitness for a particular purpose. See the
-- GNU General Public License for more details. You should have received a
-- copy of the GNU General Public License with this program; if not, write to:
-- Free Software Foundation, Inc., Boston, MA 02111, USA.
module SM_Classes (module SM_Classes) where -- export all
import SM_Utilities
-- SuperModel ---.--- Model dataSpace
-- |
-- |--- TimeSeries dataSpace
-- |
-- |--- FunctionModel inSpace opSpace
-- |
-- .--- ?other?
topline =
"Haskell-98: Model types & classes, L.A., CSSE, Monash, .au, 9/2002, 6/2003"
type Probability = Double
type MessageLength = Double
-- References:
--
-- L. Allison. Types and classes of machine learning and data mining.
-- 26th Australasian Computer Science Conference (ACSC), pp207-215, Feb. 2003.
--
-- L. Allison. The types of models.
-- 2nd Hawaii Int. Conf. on Statistics and Related Fields (HICS), June 2003.
-- Life is tough.
class (Show sMdl) => SuperModel sMdl where -- SuperModel
prior :: sMdl -> Probability -- prior of blah-Model
msg1 :: sMdl -> MessageLength -- 1st part of any 2-part message
msg1Base :: Double -> sMdl -> MessageLength
mixture :: (Mixture mx, SuperModel (mx sMdl)) =>
mx sMdl -> sMdl -- Mixture of blah_Model -> blah_Model
prior sm = exp (-msg1 sm)
msg1 sm = - log (prior sm)
msg1Base b sm = (msg1 sm) / (log b)
mixture mx = error "mixture not defined for SuperModel"
-- NB. Any instance of SuperModel must define mixture and
-- either prior or msg1 or both.
-- In principle there might be a SuperModel not in class Show, but
-- in practice easier to require them all to be, even if only trivially.
class Model mdl where -- Model
pr :: (mdl dataSpace) -> dataSpace -> Probability
nlPr :: (mdl dataSpace) -> dataSpace -> MessageLength -- neg log prob
nlPrBase :: Double -> (mdl dataSpace) -> dataSpace -> MessageLength
msg :: SuperModel (mdl dataSpace) =>
(mdl dataSpace) -> [dataSpace] -> MessageLength
msgBase :: SuperModel (mdl dataSpace) =>
Double -> (mdl dataSpace) -> [dataSpace] -> MessageLength
msg2 :: (mdl dataSpace) -> [dataSpace] -> MessageLength
msg2Base :: Double -> (mdl dataSpace) -> [dataSpace] -> MessageLength
pr m datum = exp (- nlPr m datum) -- NB instances define pr or nlPr or both
nlPr m datum = - log (pr m datum)
nlPrBase b m datum = (nlPr m datum) / (log b)
msg m ds = (msg1 m) + (msg2 m ds) -- complete 2-part message
msgBase b m ds = (msg m ds) / (log b)
msg2 m ds = foldl (+) 0 (map (nlPr m) ds) -- part2 dataset|m (list|m)
msg2Base b m ds = (msg2 m ds) / (log b)
-- Some other properties (methods) that are possible, even desirable:
-- print (Show) yourself, generate sample data, ...
class TimeSeries tsm where -- TimeSeries
-- given a TimeSeries, ts, of a dataSeries, `predictors ts dataSeries' gives
-- a list of models for each position plus one (!) in the dataSeries,
-- each model conditional on the preceding data elements.
predictors :: (tsm dataSpace) -> [dataSpace] -> [ModelType dataSpace]
prs :: (tsm dataSpace) -> [dataSpace] -> [Probability]
nlPrs :: (tsm dataSpace) -> [dataSpace] -> [MessageLength]
nlPrsBase :: Double -> (tsm dataSpace) -> [dataSpace] -> [MessageLength]
-- any instance of TimeSeries must at least define predictors
-- and note that the output must be one longer than the dataSeries !
prs tsm dataSeries = zipWith pr (predictors tsm dataSeries) dataSeries
nlPrs tsm dataSeries = zipWith nlPr (predictors tsm dataSeries) dataSeries
nlPrsBase b tsm dataSeries =
zipWith (nlPrBase b) (predictors tsm dataSeries) dataSeries
class FunctionModel fm where -- FunctionModel
condModel :: (fm inSpace opSpace) -> inSpace -> ModelType opSpace
condPr :: (fm inSpace opSpace) -> inSpace -> opSpace -> Probability
condNlPr :: (fm inSpace opSpace) -> inSpace -> opSpace -> MessageLength
condNlPrBase :: Double ->
(fm inSpace opSpace) -> inSpace -> opSpace -> MessageLength
-- any instance of FunctionModel must at least define condModel
condPr fm i o = pr (condModel fm i) o -- cond prob, pr(o|i,fm)
condNlPr fm i o = nlPr (condModel fm i) o
condNlPrBase b fm i o = (condNlPr fm i o) / (log b)
-- -----------------------------------------------------------------L.Allison--
class Mixture mx where -- for miscellaneous weighted mixtures
mixer :: (SuperModel t) => mx t -> ModelType Int
components :: (SuperModel t) => mx t -> [t]
data (SuperModel elt) =>
MixtureType elt = Mix (ModelType Int) [elt] -- weighted averages; MixtureType
instance Mixture MixtureType where
mixer (Mix m _ ) = m
components (Mix _ es) = es
instance (SuperModel elt) => SuperModel (MixtureType elt) where
msg1 (Mix m es) =
foldl (+) (nlPr wallaceIntModel (length es - 1) + msg1 m) (map msg1 es)
-- NB. This assumes that the receiver must be told the number of components.
-- Printable
instance (SuperModel ct) => Show (MixtureType ct) where
showsPrec p (Mix m es) s =
"{Mix " ++ showsPrec p m (" : " ++ showsPrec p es (s++"}"))
-- ct is in Show because SuperModels are in Show
-- Could have a MixtureModel, a sub-class of Mixture and of Model,
-- ditto MixtureTimeSeries, MixtureFunctionModel, but cannot(?)
-- have MixtureType as an instance of Model & TimeSeries & FunctionModel.
-- In any case, mixture (Mix m cs) is in the class of its components.
-- ----------------------------------------------------------------------------
-- conversion functions
model2timeSeries m = -- :: Model of dataSpace -> TimeSeries of dataSpace
TSM (msg1 m) (\context -> m) (\()->"TSM "++show m) -- trivial but useful
-- :: Model of dataSpace -> FunctionModel of inSpace dataSpace
model2functionModel m = FM (msg1 m) (\ip -> m) (\()->"FM "++show m)
-- Model of Int -> TimeSeries of dataSpace -> Model of [dataSpace]
-- the 1st param is for "lengths"
timeSeries2model1 lenMdl tsm =
MnlPr (msg1 tsm + msg1 lenMdl) (\dataSeries ->
foldl (+) (nlPr lenMdl (length dataSeries)) (nlPrs tsm dataSeries))
(\() -> "[" ++ (showsPrec 0 lenMdl (":" ++ (show tsm) ++ "]") ) )
timeSeries2model tsm = -- TimeSeries of dataSpace -> Model of [dataSpace]
timeSeries2model1 wallaceIntModel tsm
timeSeries2functionModel tsm = -- TimeSeries of ds -> FunctionModel of [ds] ds
FM (msg1 tsm) (\dataSeries -> last(predictors tsm dataSeries))
(\()->"FM "++show tsm)
-- FunctionModel of inSpace opSpace -> Model of (inSpace, opSpace)
functionModel2model fm = MnlPr (msg1 fm) (\(i, o) -> condNlPr fm i o)
(\() -> "(_,o):"++show fm)
-- i.e. pr(o|i,fm), i assumed common knowledge!
-- FunctionModel of [dataSpace] dataSpace -> TimeSeries of dataSpace
functionModel2timeSeries fm =
TSM (msg1 fm) (condModel fm) (\()->"TSM "++show fm)
-- Model inSpace -> FunctionModel inSpace opSpace -> Model (inSpace, opSpace)
condition m fm = -- ? should the param order be v.v. ?
let nlp (i,o) = (nlPr m i) + (nlPr (condModel fm i) o)
in MnlPr (msg1 m + msg1 fm) nlp
(\() -> "(" ++ showsPrec 0 fm ("|" ++ show m ++ ")"))
-- ----------------------------------------------------------------------------
-- ModelType
data ModelType dataSpace =
MPr MessageLength (dataSpace -> Probability) (() -> String) |
-- msg len, prob fn, description
MnlPr MessageLength (dataSpace -> MessageLength) (() -> String)
-- msg len, neg log prob fn, description
instance SuperModel (ModelType dataSpace) where
msg1 (MPr mdlLen p _) = mdlLen
msg1 (MnlPr mdlLen m _) = mdlLen
mixture mx =
let nlp datum =
let (m0:ms) = components mx
mxr = mixer mx
w n [] total = total -- done
w n (m:ms) total = -- next component, m
w (n+1) ms (logPlus total ((nlPr mxr n)+(nlPr m datum)))
-- component + datum|component
in w 1 ms ((nlPr mxr 0) + (nlPr m0 datum)) -- do weighted average
in MnlPr (msg1 mx) nlp (\()->show mx)
instance TimeSeries ModelType where -- Model of ds is a trivial TimeSeries ds
predictors m dataSeries = map (\_ -> m) ((error "") : dataSeries)
-- Note that the ouput list is one longer than the input dataSeries !
instance Model ModelType where
nlPr (MPr _ p _) datum = - log (p datum)
nlPr (MnlPr _ n _) datum = n datum
instance Show (ModelType dataSpace) where
show (MPr _ _ desc) = desc()
show (MnlPr _ _ desc) = desc()
-- of course there can be other types that are instances of Model
-- ----------------------------------------------------------------------------
-- Logically, wallaceIntModel should be in module Models, but I get an
-- intermittent compile error if it is -- so there, or rather, here.
wallaceIntModel = -- Model of non-neg Int, >= 0
-- A ``universal'' code, from the Wallace tree code (Wallace & Patrick 1993).
-- The number of full binary trees of n leaves is the (n-1)th Catalan number;
-- the nth catalan number is {[2n]C[n]}/(n+1)
-- C[0]=1, C[1]=1, C[2]=2, C[3]=5, C[4]=14, ... -- n code cat' cum'
let catalans = -- NB. "cached" -- 0 0 1 1
let cats last n = -- 1 100 1 2
let twoN = 2*n -- 2 10100 2 4
n1 = n+1 -- 3 11000
nSq = n*n -- 4 1010100 5 9
-- next = last * twoN * (twoN-1) `div` nSq -- obvious
next = last * 2 * (twoN-1) `div` n -- better
in (next `div` n1) : (cats next n1)
in 1 : (cats 1 1)
cumulativeCatalans = scanl1 (+) catalans
find n posn (e:es) = if n < e then posn else find n (posn+1) es
in MnlPr 0 (\n -> ((find (assert (0<=) n) 0 cumulativeCatalans)*2+1)*log2)
(\()->"wallaceIntModel")
-- Note that it is a non-parametric model.
-- ----------------------------------------------------------------------------
-- TimeSeriesType
data TimeSeriesType dataSpace =
TSM MessageLength ([dataSpace] -> ModelType dataSpace) (() -> String)
-- i.e. context -> Model of next elt description
-- NB. context ordered last..first
instance SuperModel (TimeSeriesType dataSpace) where
msg1 (TSM mdlLen m _) = mdlLen
mixture mx =
let f context = mixture(Mix (mixer mx)
(map (\(TSM _ ftsm _) -> ftsm context) (components mx)))
in TSM (msg1 mx) f (\()->show mx)
instance TimeSeries TimeSeriesType where
predictors (TSM mdlLen f _) dataSeries =
let scan [] context = [f context] -- NB. context ordered last..first
scan (d:ds) context = (f context) : (scan ds (d:context))
in scan dataSeries []
-- This method of definition is efficient if f examines the last few datas.
-- Note that the ouput list is one longer than the input dataSeries !
instance Show (TimeSeriesType dataSpace) where
show (TSM _ _ desc) = desc()
-- of course there can be other types that are instances of TimeSeries
-- ----------------------------------------------------------------------------
-- FunctionModelType
data FunctionModelType inSpace opSpace =
FM MessageLength (inSpace -> ModelType opSpace) (() -> String)
instance SuperModel (FunctionModelType inSpace opSpace) where
msg1 (FM mdlLen m _) = mdlLen
mixture mx =
let condM inp = mixture (Mix (mixer mx)
(map (\f -> condModel f inp) (components mx)))
in FM (msg1 mx) condM (\()->show mx)
instance FunctionModel FunctionModelType where
condModel (FM mdlLen f _) = f
instance Show (FunctionModelType inSpace opSpace) where
show (FM _ _ desc) = desc()
-- of course there can be other types that are instances of FunctionModel
-- ------------------------------9/2002--6/2003--L.Allison--CSSE--Monash--.au--
test02 = -- some very simple (non-exhaustive) tests
let { p 0 = 1/2; p 1 = 1/4; p 2 = 1/8; p 3 = 1/8;
dice4 = MPr 0 p (\()->"skewed4"); -- a very(!) simple Model
values = [0, 1, 2, 3] }
in print "-- test02 --"
>> print( "pr dice4 0, 1, 2, 3 = " ++ show( map (pr dice4) values ))
>> print( "nlPr dice4 0, 1, 2, 3 = " ++ show( map (nlPr dice4) values ))
>> print( "nlPrBase 2 dice4 0, 1, 2, 3 = "
++ show( map (nlPrBase 2 dice4) values ) ++ " bits")
>> print( "msgBase 2 dice4 [0,1,2,3] = "
++ show( msgBase 2 dice4 values ) ++ " bits")
>> print( "nlPrBase 2 (timeSeries2model dice4) [0,1,2,3] = "
++ show( nlPrBase 2 (timeSeries2model dice4) values ) ++ " = "
++ show( msg2Base 2 (timeSeries2model dice4) [values] ))
-- ----------------------------------------------------------------------------
-- See: L. Allison. Types and classes of machine learning and data mining.
-- 26th Australasian Computer Science Conference (ACSC) pp.207-215,
-- Adelaide, February 2003
-- L. Allison. Models for machine learning and data mining in
-- functional programming. doi:10.1017/S0956796804005301
-- J. Functional Programming, 15(1), pp.15-32, Jan. 2005
-- Author: Lloyd ALLISON lloyd at bruce cs monash edu au
-- http://www.csse.monash.edu.au/~lloyd/tildeFP/II/200309/
-- This program is free software; you can redistribute it and/or modify it
-- under the terms of the GNU General Public License (GPL) as published by
-- the Free Software Foundation; either version 2 of the License, or (at
-- your option) any later version. This program is distributed in the hope
-- that it will be useful, but without any warranty, without even the implied
-- warranty of merchantability or fitness for a particular purpose. See the
-- GNU General Public License for more details. You should have received a
-- copy of the GNU General Public License with this program; if not, write to:
-- Free Software Foundation, Inc., Boston, MA 02111, USA.
module Models (module Models) where
import SM_Utilities
import SM_Classes
bivariate (m1, m2) = -- (Model of d1, Model of d2) -> Model of (d1, d2)
let nlp (d1, d2) = (nlPr m1 d1) + (nlPr m2 d2) -- d1, d2 assumed independent
in MnlPr ((msg1 m1) + (msg1 m2)) nlp
(\() -> "(" ++ showsPrec 0 m1 ("," ++ show m2 ++ ")") )
estBivariate (est1, est2) dataSet =
let (ds1, ds2) = unzip dataSet
in bivariate (est1 ds1, est2 ds2)
estBivariateWeighted (est1, est2) dataSet weights = -- for mixture modelling
let (ds1, ds2) = unzip dataSet
in bivariate (est1 ds1 weights, est2 ds2 weights)
-- ----------------------------------------------------------------------------
-- 2003, I moved wallaceIntModel, -- Model of non-neg Int, >= 0
-- to module Classes for "pragmatic" reasons.
quadraticIntModel = -- a simple parameterless Model of non-neg Int, >= 0
let nlPr n = if n < 0 then error "n < 0"
else log (fromIntegral ((n+1)*(n+2))) -- n >= 0
-- i.e. pr n = 1/((n+1)*(n+2))
in MnlPr 0 nlPr (\() -> "P(n)=1/((n+1)*(n+2), n>=0")
-- NB. Sum[n=0..] 1/((n+1)*(n+2)) = Sum[n=0..]{ 1/(n+1) - 1/(n+2) }
-- = 1 - 1/2 + 1/2 - 1/3 + 1/3 ... = 1, i.e. it is a prob' dist'n.
fifty50 = uniform 0 1 -- 50:50 on {0,1}
uniform lo hi = -- Model of Enum Bounded dataSpace
let (lwb, upb) = if (fromEnum hi) >= (fromEnum lo)
then (lo, hi) else (hi, lo) -- ?switch?
in MPr 0 (\_ -> 1 / fromIntegral((fromEnum upb) - (fromEnum lwb) + 1))
(\() -> "uniform[" ++ show lo ++ ".." ++ show hi ++ "]")
probs2model ps = -- :: [Probability] -> Model of Int
MPr 0 (\n -> ps !! n ) (\() -> "mState"++show ps) -- NB. MPr 0 ...
freqs2model fs = -- :: frequencies, [Num] -> Model of Int (i.e. MultiState)
-- Taking a short-cut here, see Wallace and Boulton (1968), and
-- Wallace and Freeman (1987) for more information.
let total = foldl (+) 0 fs
-- noModel estimates cost of data using the "uninformative" method
noModel [] ans = ans
noModel (f:fs) ans = noModel fs (ans - (stirling f)) -- NB base e
-- MML: part1 + part2 = noModel... + delta, so...
part1 = (noModel fs (stirling (total+1))) + delta - part2
part2 = - (foldl (+) 0 ((zipWith (*) fs (map log ps))))
-- part2 = msg Len of data given the MML model
delta = nParams * ((log(pi/6))+1) / 2 -- about 0.176 nits per parameter
nParams = fromIntegral(length fs - 1)
ps = normalise (map ((+) 0.5) fs) -- probabilities, NB +0.5 MML est'r
p n = ps !! n -- the prob' function
in MPr (if part1 > 0 then part1 else 0) p (\()->"mState"++show ps)
-- list of frequencies from a series of (weights and) Enum, Bounded type
countWeighted dataSeries weights =
let c [] _ counters = counters
c (d:ds) (w:ws) counters = c ds ws (inc counters (fromEnum d) w)
inc [] n _ = error ("count, in inc, ran off end of list "++(show n))
inc (c:cs) 0 w = (c+w) : cs -- counter, c += w
inc (c:cs) n w = c : (inc cs (n-1) w) -- w is a weight
zeroes 0 = []
zeroes n = 0 : (zeroes (n-1)) -- e.g. zeroes 2 = [0, 0]
upb = maxBound `asTypeOf` (dataSeries !! 0)
lwb = minBound `asTypeOf` (dataSeries !! 0)
typeSize = (fromEnum upb) - (fromEnum lwb) + 1
in c dataSeries weights (zeroes typeSize)
count dataSeries = -- list of frequencies from a series of Enum, Bounded type
countWeighted dataSeries (repeat 1) -- totalAssignment
-- (Bounded t, Enum t) => t -> Model of Int -> Model of t
modelInt2model egValue intModel = -- example value gives the Model's type
let fromE x = fromEnum( x `asTypeOf` egValue )
toInt x = (fromE x) - (fromE minBound)
p datum = pr intModel (toInt datum)
in MPr (msg1 intModel) p (\()->show intModel)
estMultiStateWeighted dataSet weights = -- weights and dataSet -> Model of t
modelInt2model (dataSet !! 0) (freqs2model (countWeighted dataSet weights))
estMultiState dataSet = -- :: [Enum Bounded t] -> Model of t
modelInt2model (dataSet !! 0) (freqs2model (count dataSet))
coinByPrH prH = modelInt2model H (probs2model [prH, 1-prH])
-- ------------------------------9/2002--6/2003--L.Allison--CSSE--Monash--.au--
normal eps m s = normalModel 0 eps m s -- i.e. a *fixed* Normal
normalModel m1 eps m s = -- i.e. Gaussian a.k.a. Normal
-- eps is the data measurement accuracy, +/-(eps/2).
if s < eps then -- need better integration, much earlier really!!!
error "normal: std-dev too small v. data measurement accuracy"
else MnlPr m1 (normalNlPr eps m s) -- Model of Float
(\() -> "N(" ++ show m ++ "," ++ show s ++ ")(+-" ++ show eps ++ ")")
normalNlPr eps m s x = normalNlPrDensity m s x - log eps -- neg log prob'
normalNlPrDensity m s x = -- neg log dens'
let constPart = (log(2 * pi)) / 2 + (log s)
in ( constPart + (((x-m)/s)**2)/2 )
estNormal minMean maxMean minSigma maxSigma eps xs =
-- Dealing with the normal (Gaussian) distribution is a good deal trickier
-- than sometimes realised. The programmer cannot have any prior idea of the
-- ranges of the mean and standard deviation, but the guy with the data does!
-- So we require information about the ranges to be passed in.
-- This version of an estimator uses a uniform prior for the mean over
-- the given range, and 1/sigma over the given range.
-- (In some applications it is not unknown to stretch correctness and
-- to use priors based on the population (data) statistics.)
-- Priors, mean: uniform over [minMean..maxMean]
-- sigma: 1/sigma over [minSigma..maxSigma]
-- Data is measured to +/-eps/2
-- f(x|m,s) = (1/(sqrt(2 pi) s)) exp(-sqr(x-m)/(2 sqr(s))) pr(x) density
-- L = -ln(f( )) neg log likelihood(xi)
-- = N.{ln(2 pi)/2 + ln(s)} + {SUMi (xi-m)^2}/(2 s^2)
-- 1st derivatives:
-- d L/d m = {N.m - SUMi xi}/s^2 (hence max LH est', m ~ sum/N)
-- d L/d s = -N/s - {SUMi (xi-m)^2}/s^3 (hence max LH est', s ~.../N)
-- 2nd derivatives:
-- d2 L/d m d s = zero in expectation
-- d2 L/d m2 = N/s^2
-- d2 L/d s2 = N/s^2 + 3.{SUM (xi-m)^2}/s^4 = - 2.N/s^2 in expectation
-- Fisher Info = 2.N^2 / s^4 (e.g. see Ch6 s6.2 CSW's book-draft 6/2003)
-- NB. This code is deliberately straightforward, not optimised.
let sqr z = z*z
n = fromIntegral(length xs)
xMean = (foldl (+) 0 xs) / n -- naive algorithm
xVariance
= if n <= 1 then exp((log minSigma + log maxSigma) / 2) -- geom' mean
else (foldl (\a -> \b -> a+sqr(b-mean)) 0 xs) / (n-1) -- NB n-1
mean = min maxMean (max minMean xMean) -- or error if outside limits ???
sigma = min maxSigma (max (max eps minSigma) (sqrt xVariance))
-- 1. It does not make sense to infer a standard deviation less
-- than something-like(!) the data measurement accuracy.
-- 2. A better(!) method of integrating the prior and the
-- likelihood should be used if n is small.
-- 3. Sometimes (small) data-sets are "odd", with just the one value
-- repeated in which case should you really be using the normal?
nlPriorMean = log(maxMean - minMean) -- i.e. uniform prior
-- NB. INTEGRAL[a..b] 1/x = log b - log a
nlPriorSigma = log(log maxSigma - log minSigma)+log sigma -- i.e 1/sigma
nlPrior = nlPriorMean + nlPriorSigma
-- konstant: the *2, for 2 parameters, cancels with /2 in the general msg form.
konstant = (1 + log(latticeK2))
halfLogFisher = (log 2)/2 + log n - 2 * log sigma
logFisher = 2 * halfLogFisher
-- msg1, 1st part of message, e.g. Farr 1999, or
-- e.g. http://www.csse.monash.edu.au/~lloyd/tildeMML/Notes/Fisher.html
msg1 = nlPrior + halfLogFisher + konstant
in normalModel msg1 eps mean sigma
-- ----------------------------------------------------------------------------
test03 =
let multi01 = freqs2model [1.1, 1.1, 2.2] -- test fractional freq's
coin01 = coinByPrH 0.5 -- 50:50
coin02 = estMultiState [H,H,T] -- 5:3 !
-- NB. tuples of Enum Bounded types made instances of class Enum in Utilities
twoCoin = estMultiState [(H,H),(T,T), (H,T),(H,T), (T,H), (T,H) ]
bitsMdl = freqs2model [100,100]
n01 = normal 0.1 0 1 -- NB. data measured to +/- 0.1
flts1 = [1.0, 2.0, 0.0, 1.0, -1.0, 3.0]
flts2 = take 60 (cycle flts1) -- just more of the same
norm1 = estNormal (-10.0) 10.0 0.1 10.0 0.1 flts1
norm2 = estNormal (-10.0) 10.0 0.1 10.0 0.1 flts2
in print "-- test03 --"
>> print("multi01 = " ++ show multi01
++ " = " ++ show(map (pr multi01) [0, 1, 2] ) )
>> print("pr coin01 H and T = " ++ show(map (pr coin01) [H, T] ) )
>> print("pr coin02 H and T = " ++ show(map (pr coin02) [H, T] ) )
>> print("pr twoCoin same = " ++ show(map (pr twoCoin) [(H,H),(T,T)] ) )
>> print("msgBase 2 bitsMdl [0,1]= "
++ show( msgBase 2 bitsMdl [0,1] ) ++ " bits" )
>> print("msg1Base 2 bitsMdl = "
++ show( msg1Base 2 bitsMdl ) ++ " bits" )
>> print("msg2Base 2 bitsMdl [0,1]= "
++ show( msg2Base 2 bitsMdl [0,1] ) ++ " bits" )
>> print("msg1Base 2 (mixture bitsMdl ...) = "
++ show( msg1Base 2 (mixture (Mix bitsMdl [bitsMdl, bitsMdl])) )++" bits")
>> print("wallaceIntModel 0, 1, ... = "
++ show(map (nlPrBase 2 wallaceIntModel) [0,1,2,3,4,5,6]) ++ " bits")
>> print(show n01 ++ " 0.0 1.0 = " ++ show( map (pr n01) [0.0, 1.0] ) )
>> print(show norm1 ++" flts1 = " ++show(msg1 norm1) ++"+"
++show(msg2 norm1 flts1) ++" nits")
>> print(show norm2 ++" flts2 = " ++show(msg1 norm2) ++"+"
++show(msg2 norm2 flts2) ++" nits")
-- ----------------------------------------------------------------------------
-- See: L. Allison. Types and classes of machine learning and data mining.
-- 26th Australasian Computer Science Conference (ACSC) pp.207-215,
-- Adelaide, February 2003
-- L. Allison. Models for machine learning and data mining in
-- functional programming. doi:10.1017/S0956796804005301
-- J. Functional Programming, 15(1), pp.15-32, Jan. 2005
-- Author: Lloyd ALLISON lloyd at bruce cs monash edu au
-- http://www.csse.monash.edu.au/~lloyd/tildeFP/II/200309/
-- This program is free software; you can redistribute it and/or modify it
-- under the terms of the GNU General Public License (GPL) as published by
-- the Free Software Foundation; either version 2 of the License, or (at
-- your option) any later version. This program is distributed in the hope
-- that it will be useful, but without any warranty, without even the implied
-- warranty of merchantability or fitness for a particular purpose. See the
-- GNU General Public License for more details. You should have received a
-- copy of the GNU General Public License with this program; if not, write to:
-- Free Software Foundation, Inc., Boston, MA 02111, USA.
module FnModels (module FnModels) where
import SM_Utilities
import SM_Classes
import Models
-- NB. tuples of Enum Bounded types are made instances of Enum in Utilities.
-- (weighted) estimate a FunctionModel of ipSpace opSpace
estFiniteIpFunctionWeighted estOpModel ipSeries opSeries weights =
let -- re weights: bug corrected 18/12/2003
-- select outputs and weights corr' to a given input value, v.
select v (ip:ips) (op:ops) (w:ws) selOp selW =
if v == ip then select v ips ops ws (op:selOp) (w:selW) -- pick
else select v ips ops ws selOp selW -- drop
select v _ _ _ selOp selW = (selOp, selW)
mdls = map (\v -> uncurry estOpModel
(select v ipSeries opSeries weights [] []))
[lwbIp .. upbIp]
part1 = foldl (+) 0 (map msg1 mdls)
condMdl ip = mdls !! ((fromIp ip) - (fromIp lwbIp))
fromIp ip = fromEnum (ip `asTypeOf` (ipSeries !! 0))
lwbIp = minBound `asTypeOf` (ipSeries !! 0)
upbIp = maxBound `asTypeOf` (ipSeries !! 0)
in FM part1 condMdl (\()->"{finite_FM "++show mdls++"}")
estFiniteIpFunction estOpModelWeighted ipSeries opSeries = -- unweighted
estFiniteIpFunctionWeighted estOpModelWeighted ipSeries opSeries (repeat 1)
-- outputSpace also Enum Bounded
estFiniteFunctionWeighted ipSeries opSeries weights =
estFiniteIpFunctionWeighted estMultiStateWeighted ipSeries opSeries weights
estFiniteFunction ipSeries opSeries = -- unweighted
estFiniteIpFunction estMultiStateWeighted ipSeries opSeries
-- The next one will let us infer an order_k Markov model (TimeSeries)
-- WARNING: It is assumed that |alphabet|**k is "small enough", else
-- you had better implement ppm-like context trees or similar.
-- estimate an order_k predictor FunctionModel of [dataSpace1] dataSpace2
estFiniteListFunction k inputs outputs =
if k <= 0 then
model2functionModel (estMultiState outputs) -- order zero
else -- the order, k >= 1
let select v [] _ selIps selOps = (selIps, selOps) -- done
select v _ [] selIps selOps = (selIps, selOps) -- done
select v ([]:ips) (d:ops) selIps selOps = -- c empty
select v ips ops selIps selOps -- exclude
select v (ip:ips) (op:ops) selIps selOps = -- c's matching [v,...]
if v == head ip
then select v ips ops ((tail ip):selIps) (op:selOps) -- include
else select v ips ops selIps selOps -- exclude
fms = map (\v -> uncurry (estFiniteListFunction (k-1))
(select v inputs outputs [] []))
[lwbIp .. upbIp] -- input values
part1 = foldl (+) 0 (map msg1 fms)
predictorFn [] = -- k > |input|, i.e. input too short
uniform lwbOp upbOp -- at least it's simple!!!
predictorFn (ip:ips) =
condModel (fms !! ((fromIp ip) - (fromIp lwbIp))) ips
egInput = (inputs !! 0)!! 0
upbIp = maxBound `asTypeOf` egInput
lwbIp = minBound `asTypeOf` egInput
fromIp ip = fromEnum( ip `asTypeOf` egInput )
egOutput = outputs !! 0
upbOp = maxBound `asTypeOf` egOutput
lwbOp = minBound `asTypeOf` egOutput
fromOp op = fromEnum( op `asTypeOf` egInput )
in FM part1 predictorFn (\()->"{FiniteListFunction"++show fms++"}")
-- ------------------------------9/2002--6/2003--L.Allison--CSSE--Monash--.au--
test04 = let
{ coin1 = [ H, H, H, H, T, T, T, T]; -- inputs
coin2 = [ H, H, H, T, T, T, T, H]; -- outputs
fm01 = estFiniteFunction coin1 coin2; -- 3.5:1.5 = 7:3 same:different
-- NB. tuples of Enum Bounded types are made instances of Enum in Utilities
bool2 = [(True,True), (True,False), (False,True), (False,False)];
bool2X = [ False, True, True, False ];
xor = estFiniteFunction (take 12 (cycle bool2)) -- add some noise
((take 8 (cycle bool2X)) ++ (map not bool2X));
inpts = [[H,H],[H,T],[T,H],[T,T], [H,T],[T,H]]; -- inputs of length 2
rslts = [ H, T, H, T, T, T ]; -- results
fm02 = estFiniteListFunction 2 inpts rslts -- 3:1 1:5 1:1 1:3
}
in print "-- test04 --"
>> print("fm01 = " ++ show fm01 )
>> print("fm01 H H,... H T = "
++ show( zipWith (condPr fm01) [H,H,T,T] [H,T,H,T] ))
>> print("noisy xor = " ++ show xor )
>> print("fm02 = " ++ show fm02 )
>> print("fm02 HH ... -> H = "
++ show( zipWith (condPr fm02) [[H,H],[H,T],[T,H],[T,T]] [H,H,H,H]))
-- ----------------------------------------------------------------------------
-- See: L. Allison. Types and classes of machine learning and data mining.
-- 26th Australasian Computer Science Conference (ACSC) pp.207-215,
-- Adelaide, February 2003
-- L. Allison. Models for machine learning and data mining in
-- functional programming. doi:10.1017/S0956796804005301
-- J. Functional Programming, 15(1), pp.15-32, Jan. 2005
-- Author: Lloyd ALLISON lloyd at bruce cs monash edu au
-- http://www.csse.monash.edu.au/~lloyd/tildeFP/II/200309/
-- This program is free software; you can redistribute it and/or modify it
-- under the terms of the GNU General Public License (GPL) as published by
-- the Free Software Foundation; either version 2 of the License, or (at
-- your option) any later version. This program is distributed in the hope
-- that it will be useful, but without any warranty, without even the implied
-- warranty of merchantability or fitness for a particular purpose. See the
-- GNU General Public License for more details. You should have received a
-- copy of the GNU General Public License with this program; if not, write to:
-- Free Software Foundation, Inc., Boston, MA 02111, USA.
module TSModels (module TSModels) where
import SM_Utilities
import SM_Classes
import Models
import FnModels
-- estimate a Markov model of order k, a TimeSeries of dataSpace
estMarkov k dataSeries =
-- scan makes list of input contexts for the predictor function
let scan (d:ds) context = context : (scan ds (d:context))
scan [] _ = []
-- e.g. scan "abcdef" [] -> ["","a","ba","cba","dcba","edcba"]
contexts = scan dataSeries [] -- NB. each context is "backwards"
in functionModel2timeSeries (estFiniteListFunction k contexts dataSeries)
-- --------------------------------------9/2002--L.Allison--CSSE--Monash--.au--
test05 = let
{ seq = [H,T,H,T,H,T, T,H, H,T]; -- mostly HT alternating
mm0 = estMarkov 0 seq; -- order 0
mm1 = estMarkov 1 seq; -- order 1
mm2 = estMarkov 2 seq; -- order 2
mmm = mixture(Mix fifty50 [mm0, mm1]);
test n =
let s = (take n . cycle) seq
m k = let mm = timeSeries2model (estMarkov k s)
in ", " ++ show k ++ ":" ++ show(msgBase 2 mm [s])
in print( "length=" ++ show n ++ m 0 ++ m 1 ++ m 2 ++ " bits" )
}
in print "-- test05 --"
>> print("prs (coinByPrH 0.4) [H,T,H,T] = "
++ show(prs (coinByPrH 0.4) [H,T,H,T] ))
>> print("msg1 mm0 = " ++ show( msg1 mm0 ))
>> print("msg1 mm1 = " ++ show( msg1 mm1 ))
>> print("msg1 mm2 = " ++ show( msg1 mm2 ))
>> print("prs mm0 seq = " ++ show( prs mm0 seq ))
>> print("prs mm1 seq = " ++ show( prs mm1 seq ))
>> print("prs mm2 seq = " ++ show( prs mm2 seq ))
>> print("mix mm0 mm1 = " ++ show( prs mmm seq))
>> print("mm1 = " ++ show mm1 )
>> print("mmm = " ++ show mmm )
>> test 10
>> test 100
>> test 1000
-- ----------------------------------------------------------------------------
-- See: L. Allison. Types and classes of machine learning and data mining.
-- 26th Australasian Computer Science Conference (ACSC) pp.207-215,
-- Adelaide, February 2003
-- L. Allison. Models for machine learning and data mining in
-- functional programming. doi:10.1017/S0956796804005301
-- J. Functional Programming, 15(1), pp.15-32, Jan. 2005
-- Author: Lloyd ALLISON lloyd at bruce cs monash edu au
-- http://www.csse.monash.edu.au/~lloyd/tildeFP/II/200309/
-- This program is free software; you can redistribute it and/or modify it
-- under the terms of the GNU General Public License (GPL) as published by
-- the Free Software Foundation; either version 2 of the License, or (at
-- your option) any later version. This program is distributed in the hope
-- that it will be useful, but without any warranty, without even the implied
-- warranty of merchantability or fitness for a particular purpose. See the
-- GNU General Public License for more details. You should have received a
-- copy of the GNU General Public License with this program; if not, write to:
-- Free Software Foundation, Inc., Boston, MA 02111, USA.
module MixtureModels (module MixtureModels) where
import SM_Utilities
import SM_Classes
import Models
-- estimate a mixture given a list of (component) estimators & a dataset,
-- expectation maximisation loop for a given number of components,
-- guess memberships -> mixture -> loop{ -> m'ship -> mixture },
-- membership refers to a datum fractionally "belonging to"
-- components of the mixture.
estMixture ests dataSet = -- [estimator] -> [dataSpace] -> model of dataSpace
let -- i.e. [estimator] -> estimator
memberships (Mix mixer components) = -- memberships|Mixture
let doAll (d:ds) = prepend (doOne d) (doAll ds) -- all data
doAll [] = map (\x -> []) components
doOne datum = normalise( -- one datum
zipWith (\c -> \m -> (pr mixer c)*(pr m datum)) [0..] components)
-- pr(c) * pr(datum|c) for class #c = m
in doAll dataSet
randomMemberships =
let doAll seed [] = map (\_ -> []) ests
doAll seed (_:ds) = -- all data
let doOne seed [] ans = (seed, normalise ans)
doOne seed (_:ests) ans = -- one datum
doOne (prng seed) ests ((fromIntegral(1+ seed `mod` 10)) : ans)
in let (seed2, forDatum) = doOne seed ests []
in prepend forDatum (doAll seed2 ds)
in doAll 4321 dataSet -- should use a better prng!
fit [] [] = [] -- Models|memberships
fit (est:ests) (mem:mems) = (est dataSet mem) : (fit ests mems)
fitMixture mems = Mix (freqs2model (map (foldl (+) 0) mems)) -- weights
(fit ests mems) -- components
cycle mx = fitMixture (memberships mx) -- EM step
cycles 0 mx = mx
cycles n mx = cycles (n-1) (cycle mx) -- n x cycle
in mixture( cycles 20 (fitMixture randomMemberships) )
-- ------------------------------9/2002--6/2003--L.Allison--CSSE--Monash--.au--
test06 = let
{ twoUp = (take 100 . cycle) -- (imaginary) 2-up data ...
[(H,H),(H,H),(H,H),(H,H), (T,T),(T,T),(T,T),(T,T), (H,T),(T,H)];
pairs4 = [(H,H), (H,T), (T,H), (T,T)]; -- fair coins
est2coins = estBivariateWeighted ( estMultiStateWeighted,
estMultiStateWeighted );
mix1 = estMixture [est2coins] twoUp; -- 1:1:1:1
mix2 = estMixture [est2coins, est2coins] twoUp; -- 4:1:1:4 +/-
mix3 = estMixture [est2coins, est2coins, est2coins] twoUp;
test str mm =
print(str ++ " = " ++ show mm)
>> print("pr "++str++" [HH HT TH TT] = " ++ show(map (pr mm) pairs4))
>> print("msg "++str++" twoUp = " ++ show( msg mm twoUp )
++ "=" ++ show(msg1 mm) ++ "+" ++ show(msg2 mm twoUp) ++ " nits")
>> print("msg "++str++" pairs4* = "
++ show(msg mm ((take 100 . cycle) pairs4)) ++ " nits")
}
in print "-- test06 --"
>> test "mix1" mix1 -- too simple
>> test "mix2" mix2 -- mix2 should be the best model, for twoUp
>> test "mix3" mix3 -- unnec' complex
-- ----------------------------------------------------------------------------
-- See: L. Allison. Types and classes of machine learning and data mining.
-- 26th Australasian Computer Science Conference (ACSC) pp.207-215,
-- Adelaide, February 2003
-- L. Allison. Models for machine learning and data mining in
-- functional programming. doi:10.1017/S0956796804005301
-- J. Functional Programming, 15(1), pp.15-32, Jan. 2005
-- Author: Lloyd ALLISON lloyd at bruce cs monash edu au
-- http://www.csse.monash.edu.au/~lloyd/tildeFP/II/200309/
-- This program is free software; you can redistribute it and/or modify it
-- under the terms of the GNU General Public License (GPL) as published by
-- the Free Software Foundation; either version 2 of the License, or (at
-- your option) any later version. This program is distributed in the hope
-- that it will be useful, but without any warranty, without even the implied
-- warranty of merchantability or fitness for a particular purpose. See the
-- GNU General Public License for more details. You should have received a
-- copy of the GNU General Public License with this program; if not, write to:
-- Free Software Foundation, Inc., Boston, MA 02111, USA.
module CTrees (module CTrees) where
import SM_Utilities
import SM_Classes
import Models
-- A classification tree, a.k.a. decision tree, is
-- an instance of SuperModel and FunctionModel
data CTreeType inSpace opSpace =
CTleaf (ModelType opSpace) |
CTfork MessageLength (Splitter inSpace) [CTreeType inSpace opSpace] |
CTforkFM (FunctionModelType inSpace Int) [CTreeType inSpace opSpace]
-- The last option, CTFork, is rather speculative as of 2002, 2003.
instance SuperModel (CTreeType inSpace opSpace) where
-- NB. For simplicity, this costs the *structure* at 1-bit per node
-- which is only optimal for binary trees. See Wallace & Patrick 1993.
-- It also assumes that a tree uses only Trad or only non-Trad (FM) forks.
msg1 (CTleaf leafModel) = log2 + msg1 leafModel
msg1 (CTfork fnLen f dts) = log2 + fnLen + (foldl (+) 0 (map msg1 dts))
-- pity no built-in class Function in Haskell
msg1 (CTforkFM fnMixer dts) =
log2 + (msg1 fnMixer)+(foldl (+) 0 (map msg1 dts))
instance FunctionModel CTreeType where
condModel (CTleaf leafModel) i = leafModel
condModel (CTfork fnLen f dts) i = condModel (dts !! (applySplitter f i)) i
-- only include the following to check the types
condModel (CTforkFM fnMixer dts) i = -- below is == the 1-branch method
let choice = biggest( map (\n -> condPr fnMixer i n)
[0 .. ((length dts)-1)] ) 0 (-1) 0
biggest [] posn bigVal bigPosn = bigPosn
biggest (p:ps) posn bigVal bigPosn =
if p > bigVal then biggest ps (posn+1) p posn
else biggest ps (posn+1) bigVal bigPosn
in condModel (dts !! choice) i
instance Show (CTreeType inSpace opSpace) where -- printing etc.
showsPrec d (CTleaf mdl) rest = "{CTleaf " ++ (showsPrec d mdl ("}" ++ rest))
showsPrec d (CTfork _ f subTrees) rest =
"{CTfork " ++ (show f) ++ (showsPrec d subTrees ("}" ++ rest))
showsPrec d (CTforkFM f subTrees) rest =
"{CTforkFM " ++ (show f) ++ (showsPrec d subTrees ("}" ++ rest))
-- estimate a classification tree, a CTree
estCTree estLeafMdl splits ipSet opSet =
let -- NB. estLeafMdl must be able to handle an empty data set
search ipSet opSet =
let
leaf = CTleaf leafMdl -- the simplest 1-level tree is just a leaf
leafMdl = estLeafMdl opSet
leafMsg = msg (functionModel2model leaf) (zip ipSet opSet)
-- this is lazy programming, not lazy evaluation!-)
-- search for the best (1- or) 2-level tree
alternatives [] bestML bestCTree bestIpParts bestOpParts =
(bestCTree, bestIpParts, bestOpParts) -- done
-- 2-level tree. NB simplest 0-lookahead algorithm
alternatives (sp:sps) bestML bestCTree bestIpParts bestOpParts =
let -- NB. valid, but probably better to bias towards "earlier" splts
theTree = CTfork (log (fromIntegral (length splts))) sp leaves
leaves = map CTleaf leafMdls -- one leaf ...
leafMdls = map estLeafMdl opParts -- ... per part
partNums = map (applySplitter sp) ipSet
ipParts = partition (aritySplitter sp) partNums ipSet
opParts = partition (aritySplitter sp) partNums opSet
msgLen = msg (functionModel2model theTree) (zip ipSet opSet)
in
if msgLen < bestML -- an improvement
then alternatives sps msgLen theTree ipParts opParts -- new 1
else alternatives sps bestML bestCTree bestIpParts bestOpParts -- old
splts = splits ipSet -- beware if very long (slow)
in case alternatives splts leafMsg leaf [ipSet] [opSet] of
((CTfork msgLen pf leaves), ipParts, opParts) ->
CTfork msgLen pf (zipWith search ipParts opParts) ; -- subtrees?
(t, _, _) -> t -- the single leaf wins
in search ipSet opSet
-- ------------------------------9/2002--6/2003--L.Allison--CSSE--Monash--.au--
test07 = let -- test
{ ct01_leaf1 = CTleaf (freqs2model [2,2]) ; -- 1:1
ct01_leaf2 = CTleaf (freqs2model [1,3]) ; -- 3:7
ct01 = CTforkFM
(FM 0 (\ip->MPr 0 (\op->if ip==op then 1.0 else 0.0) (\()->"?"))
(\()->"?"))
[ct01_leaf1, ct01_leaf2] ;
-- NB. ct01 (trivially) has a FunctionModel in the fork nodes
ct02_leaf0 = CTleaf (coinByPrH 0.5);
ct02_leaf1 = CTleaf (coinByPrH 0.2);
ct02 = CTfork 0 (Splitter 2 (\ip->ip) (\()->"id|{0,1}"))
[ct02_leaf0, ct02_leaf1];
ct03 = estCTree estMultiState splits ips ops ; -- infer a CTree
-- NB. above uses default splits for ips; see Utilities.
ipCases = [ (H, H, 1.0), (H, H, 2.0), (H, T, 1.0), (H, T, 2.0 :: Float),
(T, H, 1.0), (T, H, 2.0), (T, T, 1.0), (T, T, 2.0),
(H, H, -1.0), (H, H, -2.0), (H, T, -1.0), (H, T, -2.0),
(T, H, -1.0), (T, H, -2.0), (T, T, -1.0), (T, T, -2.0) ] ;
opCases = [ H, H, H, H, T, T, T, T,
H, H, H, H, H, H, H, H ] ; -- H iff @0 == H or @2 < 1.0
ips = (take 100 (cycle ipCases)) ++ ipCases ; -- ++ noise
ops = (take 100 (cycle opCases)) ++ [T,T,T,T, H,H,H,H, T,T,T,T, T,T,T,T] ;
-- Normal (Gaussian) leaves
ct04 = estCTree (estNormal (-10.0) 10.0 0.1 10.0 0.1) splits
(take 100 (cycle b2)) (take 100 (cycle cts)) ;
-- play with sep'n and with n
-- @0 => N(0,1) v. N(sep', 1); @1 irrelevant
b2 = [(True, True), (True, False), (True, True), (True, False),
(False,True), (False,False), (False,True), (False,False)] ;
cts = let c = [1.0, 1.0, -1.0, -1.0]
cplus separation = map (+separation) c
in c ++ cplus 0.9
}
in print "-- test07 --"
>> print("condPr ct01 0 0 and 1 0 = "
++ show( zipWith (condPr ct01) [0,1] [0,0]) )
>> print("condPr ct02 0 H and 1 H = "
++ show( zipWith (condPr ct02) [0,1] [H,H] ) )
>> print("splits ips = " ++ show( splits ips ) )
>> print("ct03 = " ++ show(ct03) ++ " = "
++ show(msg1 ct03) ++ " nits, data|ct03 = "
++ show( msg2 (functionModel2model ct03) (zip ips ops) ) ++ " nits")
>> print("condPr ct03 [(H,H,1.0) ... ] H = "
++ show( zipWith (condPr ct03) ipCases (take 16 (repeat H)) ) )
>> print("ct04 = " ++ show ct04 )
-- ----------------------------------------------------------------------------