Machine Learning, Haskell

Lloyd Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.

Abstract: The functional programming language, Haskell, is applied to the area of artificial-intelligence/ data-mining/ inductive-inference/ machine-learning/ (whatever). Haskell types and type-classes are defined to specify various kinds of, for want of a name, statistical model as used in that area. Basic models (distributions) are like values, function-models (regressions) are like functions, and time-series are a little like lists. Convenient operators and conversion functions allow new models to be built out of building blocks (-- in the usual functional programming style).

Haskell makes a good tool to examine formally the types and behaviour of models. The result is a sort of theory of programming with models (or a rapid prototype for a data analysis system), which of course also compiles and runs. Some (simple) algorithms for classic machine learning problems are given to support a general claim to being realistic.

Presented to Dept. of Computer Science and Software Engineering (CSSE), 111 Barry St., Melbourne University, Victoria, Australia, 21 October 2003. This document can be found at users.monash.edu/~lloyd/Seminars/20031021-MU/index.shtml and includes hyper-links to other resources.

<< [02] >>

Groundhog days . . .

repeat: New, bright student begins PhD in machine-learning (often using MML under CSW or associate),; devises a new "statistical model" for analysing data from some problem,; does some difficult Maths on estimator(s) for the model,; implements estimator in C (usual at Monash) or similar, reinvents data input (slightly new format of course) & display of results (ditto),; resulting in a program that is just robust enough to run enough tests and comparisons for 2 or 3 papers and the thesis,; gets PhD and departs
until ???; ...

<< [03] >>

... we are left with many programs that: have little documentation,; are not robust in extreme cases,; cannot easily cooperate with each other or be used to build new models, and; are not easy to generalize.

(NB. Not the students' fault, and there are exceptions.)

<< [04] >>

questions arising

What are the products of machine-learning research? (of AI, data mining, computational statistics,...)
How [ do | could ] they behave?
What can you do to them?
How can two or more of them be combined?
What are their types and classes?

(c.f. S-Plus, R, Weka, etc..)

<< [05] >>

Inspiration (coals to Newcastle slide)

Functional Programming i.e. programming with functions: lambda-calculus, Lisp, . . ., SML, and Haskell-98; high-order functions (e.g. map, fold, zip, . . . etc.); lazy-evaluation (e.g. ints = 0 : (map ((+) 1) ints)); polymorphic types (e.g. map :: (t->u) -> [t] -> [u]); automatic type-inference; type-classes; type checked & compiled (fast)
So, trying to create ``programming with (statistical) models'' here (using Haskell-98)...

<< [06] >>

`...` programming with models

we get:
high-order (functions of) models: e.g. estimate Classification-Tree parameterized by (estimate Leaf Model); e.g. Classification-Tree -> Regression-Tree
polymorphic typed (& type-checked) models: e.g. a Model of some Bounded Enum <data-space>, say; e.g. FunctionModel of <in-space> to <op-space>
classes of statistical model: e.g. (basic) Model, FunctionModel, TimeSeries, . . ., ¿other?
(and lazy evaluation is useful in some algorithms)

NB. Some slight abuse of the type system above and following.

<< [07] >>

class Model mdl where: pr :: (mdl dataSpace) -> dataSpace -> Probability; nlPr :: (mdl dataSpace) -> dataSpace -> MessageLength; msg :: SuperModel (mdl dataSpace) =>; (mdl dataSpace) -> [dataSpace] -> MessageLength; msg2 :: (mdl dataSpace) -> [dataSpace] -> MessageLength; . . .
e.g.
data ModelType dataSpace =: MPr MessageLength (dataSpace -> Probability) ...|; MnlPr MessageLength (dataSpace -> MessageLength) ...
instance Model ModelType where: nlPr (MPr . . . ) = . . .; nlPr (MnlPr . . . ) = . . .; . . .

(Explain a little about MML, information, over-fitting and stopping criteria.)

<< [08] >>

e.g.

wallaceIntModel =
  let catalans =
        let cats last n =
           . . .
           . . .
        in 1 : (cats 1 1)

      cumulativeCatalans
         = scanl1 (+) catalans

      find n posn (e:es) = . . .

  in MnlPr 0 (\n -> ((find  . . .

n	code	pr
0 1 2 3 4 ...	0 100 10100 11000 1010100 . . .	1/2 1/8 1/32 . . .

-- a (universal) Model of non-negative Ints

<< [09] >>

example operators on Models

bivariate (m1, m2) =: let; negLogPr (d1, d2) = (nlPr m1 d1) + (nlPr m2 d2); in MnlPr ((msg1 m1)+(msg1 m2)) negLogPr . . .

-- similarly bivariate estimators etc.

mixture :: (Mixture mx, SuperModel (mx sMdl)) =>: mx sMdl -> sMdl

-- a weighted average of Models, and also of Time-Series and of Function-Models (i.e. of SuperModels)

<< [10] >>

examples

estMultiState :: ... => [t] -> ModelType t -- estimator for Enum Bounded t
estBivariate estMultiState estMultiState -- estimator of two coins, say
instance (Enum t1,Bounded t1, Enum t2,Bounded t2) => Enum (t1,t2) where... -- allows *MultiState to apply directly to pairs -- e.g. of coins, NB. not equiv' to previous example.

<< [11] >>

Mixture Modelling (clustering)

estMixture ests dataSet = let: memberships (Mix . . . ) =; . . .; fitMixture mems =; . . .
in mixture( cycles . . . (fitMixture randomMemberships) )

estMixture ::: [ [dataSpace] -> [Float] -> Model of dataSpace ] -- wtd estimators; -> [ dataSpace ] -- training data; -> (Mixture) Model of dataSpace

unsupervised classification, clustering, yes it works (slight abuse of types above). NB. Mixture modelling chosen as an example `only' because it is important in AI, ML, DM, CSc.

<< [12] >>

Function Models

class FunctionModel fm where: condModel :: (fm inSpace opSpace) -> inSpace -> ModelType opSpace; condPr :: (fm inSpace opSpace) -> inSpace -> opSpace -> Probability; . . .
data FunctionModelType inSpace opSpace = FM MessageLength (inSpace -> ModelType opSpace) . . .
instance . . .

-- regressions, classification trees, etc..

<< [13] >>

e.g. Classification Tress

dataCTreeType inSpace opSpace =: CTleaf(ModelType opSpace) |; CTfork MessageLength (Splitter inSpace) [CTreeType inSpace opSpace] | . . . -- & maybe more
instanceFunctionModel CTreeType where: condModel(CTleaf leafModel) i = leafModel; condModel (CTfork fnLen f dts) i = condModel (dts !! (applySplitter f i)) i

<< [14] >>

estCTree estLeafMdl splits^* ipSet opSet =
let
search ipSet opSet =
let
leaf = CTleaf leafMdl -- simplest tree
leafMdl = estLeafMdl opSet
leafMsg = msg (functionModel2model leaf) (zip ipSet opSet)⁺

alternatives . . . = -- more complex trees
. . .

in case alternatives (splits ipSet) . . . leaf . . . in
. . . return the best one . . .

in search ipSet opSet

^* One can also define an interesting type-class, Splits, to give the partitioning functions implicitly. ⁺ Some lazy programming!

<< [15] >>

Time-Series

classTimeSeries tsm where: predictors :: (tsm dataSpace) -> [dataSpace] -> [ModelType dataSpace]; prs :: (tsm dataSpace) -> [dataSpace] -> [Probability]; . . .
data TimeSeriesType dataSpace = TSM MessageLength ([dataSpace] -> ModelType dataSpace) . . .
instance . . .

e.g. Markov models, etc.

<< [16] >>

Conversions ~ generality

e.g. assuming the input-space data, i, are "common knowledge" . . .
functionModel2model fm = MnlPr (msg1 fm) (\(i, o) -> condNlPr fm i o)
functionModel2model fm :: (FunctionModel fm) =>: fm ipSpace opSpace -> ModelType (ipSpace, opSpace)
similarly . . .
estFunctionModel2estModel ::: ( [ipSpace] -> [opSpace] -> FunctionModel of ipSpace opSpace); -> ( [ (ipSpace, opSpace) ] -> ModelType (ipSpace, opSpace))

e.g. This can be used to turn a classification-tree into a FunctionModel- (regression-) tree. (There are more conversion functions.)

<< [17]

Conclusion

Haskell is very good (types & classes, high-order, lazy) for M.L.,: maybe it can serve both programing and "scripting" purposes,; beginning to understand programming with models; (it is clear that "compositional" criteria other than MML could be substituted to deal with hypothesis- (model-) complexity).
But: some "tension" between wish for static v. dynamic types, e.g. re I/O, and; some "natural" uses seem to require multi-parameter type-classes, e.g. would like (m1,m2) to "be" a Pair of Models and also a Model of Pairs. (c.f. [functions].)

Allison, L., Powell, D., Dix, T. I.: Compression and Approximate Matching, Computer Journal 42(1) (1999) pp.1--10 (special case, now generalized).
Allison, L.: Types and Classes of Machine Learning and Data Mining, ACSC2003 (2003) pp.207--215 (an early version, NB. some terms have since changed).
Wallace, C. S., Freeman, P. R.: Estimation and Inference by Compact Coding, Journal of the Royal Statistical Society series B. 49(3) (1987) pp.240--265 (about MML).
[more...]

Created with "vi (Linux & Solaris)", charset=iso-8859-1