filler
[01] >>

Machine Learning, Haskell

Lloyd Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.

Abstract:   The functional programming language, Haskell, is applied to the area of artificial-intelligence/ data-mining/ inductive-inference/ machine-learning/ (whatever). Haskell types and type-classes are defined to specify various kinds of, for want of a name, statistical model as used in that area. Basic models (distributions) are like values, function-models (regressions) are like functions, and time-series are a little like lists. Convenient operators and conversion functions allow new models to be built out of building blocks (-- in the usual functional programming style).

Haskell makes a good tool to examine formally the types and behaviour of models. The result is a sort of theory of programming with models (or a rapid prototype for a data analysis system), which of course also compiles and runs. Some (simple) algorithms for classic machine learning problems are given to support a general claim to being realistic.

Presented to Dept. of Computer Science and Software Engineering (CSSE), 111 Barry St., Melbourne University, Victoria, Australia, 21 October 2003. This document can be found at  www.csse.monash.edu.au/~lloyd/Seminars/20031021-MU/index.shtml   and includes hyper-links to other resources.

filler
<< [02] >>

Groundhog days . . .

repeat
New, bright student begins PhD in machine-learning (often using MML under CSW or associate),
 
devises a new "statistical model" for analysing data from some problem,
does some difficult Maths on estimator(s) for the model,
 
implements estimator in C (usual at Monash) or similar, reinvents data input (slightly new format of course) & display of results (ditto),
 
resulting in a program that is just robust enough to run enough tests and comparisons for 2 or 3 papers and the thesis,
 
gets PhD and departs  
until ???;   ...

filler
<< [03] >>
... we are left with many programs that
 
have little documentation,
 
are not robust in extreme cases,
 
cannot easily cooperate with each other or be used to build new models, and
 
are not easy to generalize.
(NB. Not the students' fault, and there are exceptions.)

filler
<< [04] >>

questions arising

 
What are the products of machine-learning research? (of AI, data mining, computational statistics,...)
 
How [ do | could ] they behave?
 
What can you do to them?
 
How can two or more of them be combined?
 
 
What are their types and classes?
 
 
(c.f. S-Plus, R, Weka, etc..)

filler
<< [05] >>

Inspiration   (coals to Newcastle slide)

Functional Programming i.e. programming with functions
 
lambda-calculus, Lisp, . . ., SML, and Haskell-98
 
high-order functions (e.g. map, fold, zip, . . . etc.)
 
lazy-evaluation (e.g. ints = 0 : (map ((+) 1) ints))
 
polymorphic types (e.g. map :: (t->u) -> [t] -> [u])
 
automatic type-inference
 
type-classes
 
type checked & compiled (fast)
 
So, trying to create ``programming with (statistical) models'' here (using Haskell-98)...

filler
<< [06] >>

... programming with models

we get:
 
high-order (functions of) models
e.g. estimate Classification-Tree parameterized by (estimate Leaf Model)
e.g. Classification-Tree -> Regression-Tree
 
polymorphic typed (& type-checked) models
e.g. a Model of some Bounded Enum <data-space>, say
e.g. FunctionModel of <in-space> to <op-space>
 
classes of statistical model
e.g. (basic) Model, FunctionModel, TimeSeries, . . ., ¿other?
 
(and lazy evaluation is useful in some algorithms)
NB. Some slight abuse of the type system above and following.

filler
<< [07] >>
class Model mdl where 
pr :: (mdl dataSpace) -> dataSpace -> Probability 
nlPr :: (mdl dataSpace) -> dataSpace -> MessageLength 
msg :: SuperModel (mdl dataSpace) =>
  (mdl dataSpace) -> [dataSpace] -> MessageLength 
msg2 :: (mdl dataSpace) -> [dataSpace] -> MessageLength 
. . .
 
e.g.
data ModelType dataSpace = 
MPr MessageLength (dataSpace -> Probability) ... |
MnlPr MessageLength (dataSpace -> MessageLength) ...
 
instance Model ModelType where 
nlPr (MPr . . . ) = . . . 
nlPr (MnlPr . . . ) = . . .
. . .
(Explain a little about MML, information, over-fitting and stopping criteria.)

filler
<< [08] >>

e.g.

wallaceIntModel =
  let catalans =
        let cats last n =
           . . .
           . . .
        in 1 : (cats 1 1)

      cumulativeCatalans
         = scanl1 (+) catalans

      find n posn (e:es) = . . .

  in MnlPr 0 (\n -> ((find  . . .
n code pr
0
1
2
3
4
...
0
100
10100
11000
1010100
. . .
1/2
1/8
1/32
. . .
-- a (universal) Model of non-negative Ints

filler
<< [09] >>

example operators on Models

bivariate (m1, m2) =
let
    negLogPr (d1, d2) = (nlPr m1 d1) + (nlPr m2 d2) 
in MnlPr ((msg1 m1)+(msg1 m2)) negLogPr . . .

-- similarly bivariate estimators etc.


mixture :: (Mixture mx, SuperModel (mx sMdl)) =>
mx sMdl -> sMdl
-- a weighted average of Models, and also of Time-Series and of Function-Models (i.e. of SuperModels)

filler
<< [10] >>

examples

estMultiState :: ... => [t] -> ModelType t   -- estimator for Enum Bounded t
 
estBivariate estMultiState estMultiState   -- estimator of two coins, say
 
instance (Enum t1,Bounded t1, Enum t2,Bounded t2)   => Enum (t1,t2) where...   -- allows *MultiState to apply directly to pairs   -- e.g. of coins, NB. not equiv' to previous example.

filler
<< [11] >>

Mixture Modelling (clustering)

estMixture ests dataSet = let
memberships (Mix . . . ) =
  . . .
fitMixture mems =
  . . .
in mixture( cycles . . . (fitMixture randomMemberships) )
 
estMixture ::
[ [dataSpace] -> [Float] -> Model of dataSpace ]   -- wtd estimators
-> [ dataSpace ]           -- training data
-> (Mixture) Model of dataSpace
unsupervised classification, clustering, yes it works (slight abuse of types above). NB. Mixture modelling chosen as an example `only' because it is important in AI, ML, DM, CSc.

filler
<< [12] >>

Function Models

class FunctionModel fm where 
condModel :: (fm inSpace opSpace)   -> inSpace   -> ModelType opSpace 
condPr :: (fm inSpace opSpace) ->  inSpace   -> opSpace   -> Probability
. . .
 
data FunctionModelType inSpace opSpace =   FM MessageLength (inSpace -> ModelType opSpace)   . . .
 
instance . . .

-- regressions, classification trees, etc..

filler
<< [13] >>

e.g. Classification Tress

data CTreeType inSpace opSpace =
CTleaf (ModelType opSpace)   |
CTfork MessageLength   (Splitter inSpace)   [CTreeType inSpace opSpace]   | . . .   -- & maybe more
 
instance FunctionModel CTreeType where
condModel (CTleaf leafModel) i   = leafModel
condModel (CTfork fnLen f dts) i   = condModel (dts !! (applySplitter f i)) i

filler
<< [14] >>
estCTree estLeafMdl splits* ipSet opSet =
  let
    search ipSet opSet =
      let
        leaf = CTleaf leafMdl   -- simplest tree
        leafMdl = estLeafMdl opSet
        leafMsg = msg (functionModel2model leaf) (zip ipSet opSet)+
 
        alternatives . . . =   -- more complex trees
          . . .
 
      in case alternatives (splits ipSet) . . . leaf . . . in
        . . . return the best one . . .
 
  in search ipSet opSet
* One can also define an interesting type-class, Splits, to give the partitioning functions implicitly.   + Some lazy programming!

filler
<< [15] >>

Time-Series

class TimeSeries tsm where
predictors :: (tsm dataSpace)   -> [dataSpace]   -> [ModelType dataSpace] 
prs :: (tsm dataSpace)   -> [dataSpace]   -> [Probability]
. . .
 
data TimeSeriesType dataSpace = TSM MessageLength   ([dataSpace] -> ModelType dataSpace)   . . .
 
instance . . .
e.g. Markov models, etc.

filler
<< [16] >>

Conversions ~ generality

e.g. assuming the input-space data, i, are "common knowledge" . . .  
functionModel2model fm = MnlPr (msg1 fm)   (\(i, o) -> condNlPr fm i o)
 
functionModel2model fm :: (FunctionModel fm) =>
fm ipSpace opSpace  ->   ModelType (ipSpace, opSpace)
 
similarly . . .  
estFunctionModel2estModel ::
( [ipSpace] -> [opSpace] -> FunctionModel of ipSpace opSpace)
-> ( [ (ipSpace, opSpace) ] -> ModelType (ipSpace, opSpace))
e.g. This can be used to turn a classification-tree into a FunctionModel- (regression-) tree. (There are more conversion functions.)

filler
<< [17]

Conclusion

Haskell is very good (types & classes, high-order, lazy) for M.L., 
maybe it can serve both programing and "scripting" purposes ,
beginning to understand programming with models 
(it is clear that "compositional" criteria other than MML could be substituted to deal with hypothesis- (model-) complexity).
But
some "tension" between wish for static v. dynamic types, e.g. re I/O, and
some "natural" uses seem to require multi-parameter type-classes, e.g. would like (m1,m2) to "be" a Pair of Models and also a Model of Pairs. (c.f. [functions].)

© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3168.
Created with "vi (Linux & Solaris)",   charset=iso-8859-1