JAVASCRIPT IS OFF
Haskell TimeSeries
Lloyd Allison ,
School of Computer Science & Software Engineering,
Monash University , Clayton, Victoria,
Australia 3800 .
Abstract :
The talk concerns 'statistical models' as used in
artificial intelligence, data mining,
inductive inference or machine learning.
Many models are naturally polymorphic.
There are useful operators,
some known and some yet to be discovered^{*} ,
for combining models to make new models.
Overfitting is a potential problem in inductive inference but there is a
natural combinational criterion , MML,
to tradeoff model complexity v. fit to data.
I suggest that statistical models, MML and functional programming
make a natural threesome.
Examples are given from "time"series analysis.
www.csse.monash.edu.au/~lloyd/Seminars/200412TimeSeries/index.shtml
To:
Prog. Lang. & Sys.
(PLASMA) research group,
Dept. of Computer Science ,
York University ,
12.1513.15,
room CS202J ,
Wed. 15 Dec. 2004.
[*] (un)known unknowns?
What should this kind of programming be called?
Q: 'Function' is to 'functional programming' as
'statistical model' is to <what>?
? Inductive programming ?
Model selection?
data

model "error" v. model complexity
Overfitting is a problem, unless...
... MML model selection!
data + model complexity
Math can be hard with (variable numbers of)
discrete and continuous parameters.
Most important abilities of a (basic) model
class Model mdl where
pr :: (mdl dataSpace)
> dataSpace > Probability
nlPr :: (mdl dataSpace)
> dataSpace > MessageLength
...
msg :: SuperModel (mdl dataSpace)
=> (mdl dataSpace) > [dataSpace]
> MessageLength
...
A "superclass" for Statistical Models
class (Show sMdl) => SuperModel sMdl where
prior :: sMdl > Probability
msg1 :: sMdl > MessageLength
...
mixture ::
(Mixture mx, SuperModel (mx sMdl))
=> mx sMdl > sMdl
...
FunctionModel (Regression)
class FunctionModel fm where
condModel :: (fm inSpace opSpace)
> inSpace > ModelType opSpace
condPr :: (fm inSpace opSpace)
> inSpace > opSpace > Probability
...
I.e. Given the values of the input (exogenous) attributes (variables)
make a conditional (model) prediction of the output (endogenous) attributes.
TimeSeries Model
class TimeSeries tsm where
predictors :: (tsm dataSpace)
> [dataSpace] > [ModelType dataSpace]
prs :: (tsm dataSpace)
> [dataSpace] > [Probability]
...
I.e. A timeseries model makes a sequence of (model) predictions
for what comes next given the preceding context.
e.g.
estMarkov k dataSeries =
let
scan (d:ds) context = ...
contexts = scan dataseries []
in functionModel2timeSeries
(estFiniteListFunction
k contexts dataSeries)
Many useful functions such as
functionModel2timeSeries ::
(FunctionModel fm)
=> fm [ds] ds
> TimeSeriesType ds
data OrBoth a b = JustL a
 JustR b
 Both a b
deriving ...
modelOrBoth ::
model Ops > model a
> model b
> model (a,b)
> model (OrBoth a b)
tsmOrBoth ::
(model a > model b > model (a,b))
> tsm Ops
> tsm a
> tsm b
> tsm (OrBoth a b)
... when it is convenient to make a model (a, b)
from the (model a)
and the (model b).
(NB some tortured types.)
e.g. optimal alignment
ACGTACGTA GT
   
AC TACATACGT
^ ^ ^
  JustL C
 Both A G
JustR G
2D dynamic programming algorithm.
dpa2D :: Int >
tsm (OrBoth a b)
> [a] > [b]
> [OrBoth a b]
Note that `a' and `b' can be different types,
discrete, continuous, multivariate, even nonEQ types.
The timeseries type, tsm , often has a "memory".
Needs timeseries models with "state".
[OrBoth a b] <> ([Ops], [a], [b]) ,
LHS & RHS equivalent, if lengths match.
(re)estimate:
[Ops] > (tsm Ops),
[a] > (tsm a),
[b] > (tsm b)
> (new) tsm (OrBoth a b)
> (new) [OrBoth a b] ... (EM) must converge.
Approximate Repeats
A generalised Lempel Ziv timeseries model
data ApproxRepeats a =
Gen a
 Jump Int
 Rpt (OrBoth a a)
deriving ...
tsmApproxRepeats ::
(model a > model a > model (a,a))
> tsm OpsAR
> tsm a
> tsm OpsOB
> tsm a
(and recall that tsmOrBoth
has a (tsm a) parameter ... yes you can.)
as
this
is to
that
subject of this talk
inductive inference
list prelude
list processing
{parser combinator}
parsing
denotational semantics of L
L
... before parser combinators etc. were invented.
Conclusion
We have ...
(the start of) a combinator library for machine learning,
a combinational denotational semantics of statistical models,
a way of "software engineering" much of AI & machine learning,
some tiny but very general programs for e.g.
timeseries analysis,
[clustering, (decision) classificationtrees ]
and
[Bayesian Networks ]
etc..
And if you have a good answer to the [question ] 
please tell me.
Reading: [more ]
(and on approximate repeats
[here ]).
©
L. A λλ ison ,
School of Computer Science and Software Engineering,
Monash University, Australia 3800.
Created with "vi (Linux & Solaris)", charset=iso88591