[01]
>>
Lloyd Allison,
School of Computer Science and Software Engineering,
Monash University,
Australia 3800.
Abstract:
The functional programming language, Haskell,
is applied to the area of
artificialintelligence/ datamining/ inductiveinference/ machinelearning/
(whatever). Haskell types and typeclasses are defined
to specify various kinds of, for want of a name,
statistical model as used in that area.
Basic models (distributions) are like values,
functionmodels (regressions) are like functions, and
timeseries are a little like lists.
Convenient operators and conversion functions
allow new models to be built out of building blocks
( in the usual functional programming style).
Haskell makes a good tool to examine formally the types and behaviour
of models.
The result is a sort of theory of programming with models
(or a rapid prototype for a data analysis system),
which of course also compiles and runs.
Some (simple) algorithms for classic machine learning problems are
given to support a general claim to being realistic.
Presented to
Dept. of Computer Science and Software Engineering
(CSSE),
111 Barry St.,
Melbourne University,
Victoria, Australia,
21 October 2003.
This
document can be found at
www.csse.monash.edu.au/~lloyd/Seminars/20031021MU/index.shtml
and includes hyperlinks to other resources.
<<
[02]
>>
Groundhog days . . .
 repeat
 New, bright student begins PhD in machinelearning
(often using
MML
under CSW or associate),

 devises a new "statistical model" for analysing data from some problem,
 does some
difficult Maths
on estimator(s) for the model,

 implements
estimator in C (usual at Monash) or similar,
reinvents data input
(slightly new format of course) &
display of results (ditto),

 resulting in a program that is just robust enough to run enough
tests and comparisons
for 2 or 3 papers and the thesis,

 gets PhD and departs _{ }
 until ???; ...
<<
[03]
>>
 ... we are left with many programs that

 have little documentation,

 are not robust in extreme cases,

 cannot easily cooperate with each other or
be used to build new models, and

 are not easy to generalize.
(NB. Not the students' fault, and
there are exceptions.)
<<
[04]
>>
questions arising

 What are the products of machinelearning research?
(of AI, data mining, computational statistics,...)

 How [ do  could ] they behave?

 What can you do to them?

 How can two or more of them be combined?


 What are their types and classes?


(c.f. SPlus, R, Weka, etc..)
<<
[05]
>>
Inspiration (coals to Newcastle slide)
 Functional Programming i.e. programming with functions

 lambdacalculus, Lisp, . . ., SML, and Haskell98

 highorder functions (e.g. map, fold, zip, . . . etc.)

 lazyevaluation (e.g. ints = 0 : (map ((+) 1) ints))

 polymorphic types
(e.g. map :: (t>u) > [t] > [u])

 automatic typeinference

 typeclasses

 type checked & compiled (fast)

 So, trying to create ``programming with (statistical) models''
here (using Haskell98)...
<<
[06]
>>
... programming with models
 we get:

 highorder (functions of) models
 e.g. estimate ClassificationTree parameterized by (estimate Leaf Model)
 e.g. ClassificationTree > RegressionTree

 polymorphic typed (& typechecked) models
 e.g. a Model of some Bounded Enum <dataspace>, say
 e.g. FunctionModel of <inspace> to <opspace>

 classes of statistical model
 e.g. (basic) Model, FunctionModel, TimeSeries, . . ., ¿other?

 (and lazy evaluation is useful in some algorithms)
NB. Some slight abuse of the type system
above and following.
<<
[07]
>>
 class Model mdl where_{ }
 pr ::
(mdl dataSpace) > dataSpace > Probability_{ }
 nlPr :: (mdl dataSpace) > dataSpace > MessageLength_{ }
 msg :: SuperModel (mdl dataSpace) =>
 (mdl dataSpace) > [dataSpace] > MessageLength_{ }
 msg2 :: (mdl dataSpace) > [dataSpace] > MessageLength_{ }
 . . .

 e.g.
 data ModelType dataSpace =_{ }
 MPr
MessageLength (dataSpace > Probability) ..._{ }
 MnlPr MessageLength (dataSpace > MessageLength) ...

 instance Model ModelType where_{ }
 nlPr (MPr . . . ) = . . ._{ }
 nlPr (MnlPr . . . ) = . . .
 . . .
(Explain a little about
MML,
information, overfitting and stopping criteria.)
<<
[08]
>>
e.g.wallaceIntModel =
let catalans =
let cats last n =
. . .
. . .
in 1 : (cats 1 1)
cumulativeCatalans
= scanl1 (+) catalans
find n posn (e:es) = . . .
in MnlPr 0 (\n > ((find . . .

n  code  pr 
0 1 2 3 4 ...

0 100 10100 11000 1010100 . . .

1/2 1/8 1/32 . . .


 a (universal) Model of nonnegative Ints
<<
[09]
>>
example operators on Models
 bivariate (m1, m2) =
 let
 negLogPr (d1, d2)
= (nlPr m1 d1) + (nlPr m2 d2)_{ }
 in MnlPr ((msg1 m1)+(msg1 m2))
negLogPr
. . .
 similarly bivariate estimators etc.
 mixture ::
(Mixture mx, SuperModel (mx sMdl)) =>
 mx sMdl > sMdl
 a weighted average of Models, and
also of TimeSeries and of FunctionModels
(i.e. of SuperModels)
<<
[10]
>>
examples
 estMultiState :: ... => [t] > ModelType t
 estimator for Enum Bounded t

 estBivariate estMultiState estMultiState
 estimator of two coins, say

 instance
(Enum t1,Bounded t1, Enum t2,Bounded t2)
=>
Enum (t1,t2)
where...
 allows *MultiState to apply directly to pairs
 e.g. of coins,
NB. not equiv' to previous example.
<<
[11]
>>
Mixture Modelling (clustering)
 estMixture ests dataSet = let
 memberships (Mix . . . ) =
 . . .
 fitMixture mems =
 . . .
 in mixture( cycles . . . (fitMixture randomMemberships) )

 estMixture ::
 [ [dataSpace] > [Float] > Model of dataSpace ] ^{ }
 wtd estimators
 > [ dataSpace ] ^{ } _{ }
 training data
 > (Mixture) Model of dataSpace
unsupervised classification, clustering,
yes it works (slight abuse of types above).
NB. Mixture modelling chosen as an example `only'
because it is important in AI, ML, DM, CSc.
<<
[12]
>>
Function Models
 class FunctionModel fm where_{ }
 condModel :: (fm inSpace opSpace)
> inSpace
> ModelType opSpace_{ }
 condPr :: (fm inSpace opSpace)
> inSpace
> opSpace
> Probability
 . . .

 data FunctionModelType inSpace opSpace =
FM MessageLength (inSpace > ModelType opSpace)
. . .

 instance . . .
 regressions, classification trees, etc..
<<
[13]
>>
e.g. Classification Tress
 data_{ }CTreeType
inSpace opSpace =
 CTleaf_{ }(ModelType opSpace) 
 CTfork MessageLength
(Splitter inSpace)
[CTreeType inSpace opSpace]
 . . . 
& maybe more

 instance_{ }FunctionModel CTreeType where
 condModel_{ }(CTleaf leafModel) i
= leafModel
 condModel (CTfork fnLen f dts) i
= condModel (dts !! (applySplitter f i)) i
<<
[14]
>>
 estCTree estLeafMdl splits^{*} ipSet opSet =
 let
 search ipSet opSet =
 let
 leaf = CTleaf leafMdl
 simplest tree
 leafMdl = estLeafMdl opSet
 leafMsg = msg (functionModel2model leaf) (zip ipSet opSet)^{+}

 alternatives . . . =
 more complex trees
 . . .

 in case alternatives (splits ipSet) . . . leaf . . . in
 . . . return the best one . . .

 in search ipSet opSet
^{*} One can
also define an interesting typeclass, Splits, to give
the partitioning functions implicitly.
^{+} Some lazy programming!
<<
[15]
>>
TimeSeries
 class_{ }TimeSeries tsm where
 predictors :: (tsm dataSpace)
> [dataSpace]
> [ModelType dataSpace]_{ }
 prs :: (tsm dataSpace)
> [dataSpace]
> [Probability]
 . . .

 data TimeSeriesType dataSpace
= TSM MessageLength
([dataSpace] > ModelType dataSpace)
. . .

 instance . . .
e.g. Markov models, etc.
<<
[16]
>>
Conversions ~ generality
 e.g. assuming the inputspace data, i,
are "common knowledge" . . . _{ }
 functionModel2model fm =
MnlPr (msg1 fm)
(\(i, o) > condNlPr fm i o)

 functionModel2model fm :: (FunctionModel fm) =>
 fm ipSpace opSpace >
ModelType (ipSpace, opSpace)

 similarly . . . _{ }
 estFunctionModel2estModel ::
 ( [ipSpace] > [opSpace]
> FunctionModel of ipSpace opSpace)
 > ( [ (ipSpace, opSpace) ]
> ModelType (ipSpace, opSpace))
e.g. This can be used to turn a classificationtree into a
FunctionModel (regression) tree.
(There are more conversion functions.)
<<
[17]
Conclusion
 Haskell is very good (types & classes, highorder, lazy)
for M.L.,_{ }
 maybe it can serve both programing and
"scripting" purposes_{ },
 beginning to understand programming with models_{ }
 (it is clear that "compositional" criteria other than
MML
could be substituted to deal with hypothesis (model) complexity).
 But
 some "tension" between wish for static v. dynamic types,
e.g. re I/O,
and
 some "natural" uses seem to require multiparameter typeclasses,
e.g. would like (m1,m2) to "be" a Pair of Models and also
a Model of Pairs.
(c.f.
[functions].)