|
The project is to form a "theory" (or prelude or semantics or model(!))
of `programming with "statistical models" from AIDMIIMLSI',
using Haskell as the analytical tool.
The project aims are, in order of priority:
- Understand and formally define exactly what "statistical models"
(i.e. the
products[1] of
AIDMIIMLSI[2])
really are from a programming point of view,
that is how do they behave, what can be done to each one, and
how can two or more be combined?
- Develop
- (polymorphic) types and type-classes to
define "statistical models", and
- a useful set of operators (functions, combinators, methods)
to act on them,
i.e. a prelude (library).
- Dismember and reconstruct a few representative, important and well-known
"statistical models" from AIDMIIMLSI and, while doing so,
- add generality, and
- make a broad claim to being realistic.
- Eventually, encode as much as possible of AIDMIIMLSI
in a convenient, compact library
while adding lightness and generality.
[1]
E.g.
Mixture-modelling (unsupervised classification, clustering),
classification- (decision-) trees (supervised classification, expert systems),
or
Bayesian/ causal networks/ models,
etc..
[2]
Definition: AIDMIIMLSI =
artificial-intelligence/
data-mining/
inductive-inference/
machine-learning/
statistical-inference/
etc., call it what you will,
i.e. somewhere between the intersection and union of these areas.
( ~ Super Thunder Sting Car Ray Bird -- Pete & Dud.)
Note that the getting of understanding is the primary aim, and
it could, for example, be used to
(i) design better components for an existing AIDMIIMLSI platform,
(ii) design a better AIDMIIMLSI platform, or even
(iii) build a serious platform in Haskell,
but these other things are secondary not primary aims.
|
(notes) |
| This exercise
| is to
|
artificial-intelligence/ data-mining/ inductive-inference/
machine-learning/ statistical-inference
|
|
as
| PreludeList (map, fold, zip, ...)
| is to
| list-processing
| Imagine yourself in the late 1950s or the early
1960s developing Lisp (McCarthy et al), or APL (Iverson), or similar.
NB. PreludeList
now has 40+ years of experience behind it!
|
parser combinators
| parsing
|
|
the mathematical semantics of Algol-60 (Moses 1974)
| the programming language Algol-60
| Understanding of Algol-60 compared
to other languages.
|
and
|
| function
| is to
| functional- programming
|
|
as
|
statistical model
| is to
|
<what?>- programming,
perhaps 'inductive- programming'?
| The term I.P. was
suggested by Charles Twardy (2004).
|
other "possible" approaches |
approach | why not |
Create a new language for machine learning
| A lot of work.
Hard to maintain, 'port, etc..
Unlikely to improve on Haskell's notion of value and type
(although a simplified, specialised subset might be a "goer").
Don't create languages without good reasons.
|
1-1 translate some existing AIDMIIMLSI platform
directly into Haskell, say.
| Might be useful but does not aid understanding.
|
the project is not primarily about... | ...because |
Any particular application area or
problem instance of AIDMIIMLSI
| (if it was it might use R, rather)
it is about understanding what AIDMIIMLSI could be.
|
Software Engineering
| although it has aspects of a software engineering job on AIDMIIMLSI.
|
Stamp (i.e. model) collecting |
no way, but it is about understanding the machinery that allows
stamps to be produced, and generalized.
|
Haskell |
Haskell is "just" a good tool.
But it is curious that there is no built-in
[class Function], or
[class Pair],
etc..
|
- Seminars at:
Monash U.,
Griffith U. &
U. Sydney
[I.P.] 2005,
York [TSM],
York [II] 2004.
- L. Allison, Coding Ockham's Razor, Springer,
doi:10.1007/978-3-319-76433-7, 2018.
- L. Allison,
Added Distributions for use in Clustering (Mixture Modelling), Function Models, Regression Trees, Segmentation, and mixed Bayesian Networks in Inductive Programming 1.2,
TR 2008/224, FIT Monash U.,
April 2008.
- M. B. Dale, L. Allison, P. E. R. Dale.
Segmentation and clustering as complementary sources of information.
Acta Oecologica, Elsevier, 31(2), pp.193-202,
March-April 2007,
doi:10.1016/j.actao.2006.09.002.
- J. Bardsley.
Generalising Data Description for Machine Learning,
BCS honours project, 2006.
(Explores the use of Template Haskell
to generate helper functions, types and class instance declarations)
- L. Allison.
A Programming Paradigm for Machine Learning, with a Case Study of Bayesian Networks.
ACSC2006, pp.103-111, January 2006.
- L. Allison.
Inductive inference 1.1.2:
Inductive programming and a case study of Bayesian networks.
Faculty of Info. Tech. (#75, Clayton),
Monash University, Australia 3800,
TR 2005/177, pp.18, Oct. 2005.
(Also see TR2004/153.)
- L. Allison.
Models for machine learning and data mining in functional programming,
J. Functional Programming, 15(1), pp.15-32,
January 2005
(online 23 July 2004).
- L. Allison.
Inductive Inference 1.1.
TR 2004/153,
School of Computer Science and Software Engineering,
Monash University, May 2004
(inc. mixed Bayesian networks).
- L. Allison.
Inductive Inference 1.
TR 2003/148,
School of Computer Science and Software Engineering,
Monash University, Dec' 2003,
inc' .hs code.
- A seminar, 21 Oct. 2003, to
[Dept CS & SWE, Me1bourne U.].
- L. Allison.
Types and Classes of Machine Learning and Data Mining,
Twenty-Sixth Australasian Computer Science Conference (ACSC2003)
pp.207-215, Adelaide, Australia, 4-7 February 2003,
(some method names have since changed).
Terminology
Terminology is a problem.
Just consider the many uses of the words "model" and "class"
in computing, mathematics and statistics.
|
|