^CSE454^
[progress]
>2003>

# plan 2002

7, 18 Feb 2002, L.A. & DLD.

### CSE454 Learning and Prediction I: Data Mining (LA) weeks 1-7

Topics include:
Elementary information theory (including noiseless coding and compression);
introduction to inductive inference and prediction;
data modelling and data mining;
introduction to Minimum Message Length (MML) inference;
clustering, mixture modelling and unsupervised classification;
supervised classification and decision trees.
Example applications may be described.

foundn/models/applics [[practical]]
------------- -------------
1: prob' (codes), info', terminology,
2: estimators, entropy, KL-dist, prob' pred'n
3: (Snob out of synch')
derive binomial ?3-ways? + Fisher,
4: give multinomial maxL MML minEKL,
[[do unsup' class'n, mix' model & use Snob]]
5: normal, give Fisher, maxL, MML
6: [Unsupervised] mixture modelling + Snob
7: other prob' distributions
8: [Supervised] decision trees 1
[[do sup' class'n and use a dec' tree]]
9: decision trees 2
10: dist'ns and codes for integers
11: KL defn on a mult, on a normal
12: codes++, Kraft, Huffman, Shannon-Fano; application advert' (for MML:-)

### CSE455 Learning and Prediction II: MML Data Mining
(DLD) weeks 7 or 8-13

CSE454 is a prereq'

Topics include:
Foundations of inductive inference;
Minimum Message Length (MML) inference; Fisher information;
MML of specific models such as decision graphs, hidden Markov models,
linear and polynomial regression, causal models. Data mining.
Applications to be considered may include:
text and image analysis, models of protein folding, bushfire prediction, DNA
alignment and the human genome project, authorship identification for
texts, etc.

foundn/models/applics [[practical]
------------- ------------
1: Kolmo', Solom',..., Fisher (more) derive normal, mention Poisson F'
2: decision graphs
3: time-series, segmentation, trends etc.
4: sequences, H-Markov-Ms, W+Georgeff
[[time series, seg'n &/or HMM &/or auto reg']]
5: DNA pattern discovery
6: alignment, evolutionary trees
7: protein secondary structure pred'n
8: lin'-regression, polynomials and/or polygons
[[regression, polynomials, polygons, A.N.N.]]
9: mixtures + 1st order Markov
10: causal and/or neural models
11, 12:research applics such as...
Lempel-Ziv, Wally improv', approx' repeats
images, noisy/dirty pics, Markov fields

© L. Allison,
School of Computer Science and Software Engineering,
Monash University, Australia 3800.
Created with "vi (Linux & IRIX)", charset=iso-8859-1