^CSE454^ [progress] >2003>

# plan 2002

7, 18 Feb 2002, L.A. & DLD.

### CSE454 Learning and Prediction I: Data Mining (LA) weeks 1-7

Topics include: Elementary information theory (including noiseless coding and compression); introduction to inductive inference and prediction; data modelling and data mining; introduction to Minimum Message Length (MML) inference; clustering, mixture modelling and unsupervised classification; supervised classification and decision trees. Example applications may be described.

```foundn/models/applics             [[practical]]
-------------                     -------------
1: prob' (codes), info', terminology,
2: estimators, entropy, KL-dist, prob' pred'n

3: (Snob out of synch')
derive binomial ?3-ways? + Fisher,
4: give multinomial maxL MML minEKL,
[[do unsup' class'n, mix' model & use Snob]]

5: normal, give Fisher, maxL, MML
6: [Unsupervised] mixture modelling + Snob

7: other prob' distributions
8: [Supervised] decision trees 1
[[do sup' class'n  and use a dec' tree]]

9: decision trees 2
10: dist'ns and codes for integers

11: KL defn on a mult, on a normal
12: codes++, Kraft, Huffman, Shannon-Fano; application advert' (for MML:-)
```

### CSE455 Learning and Prediction II: MML Data Mining (DLD) weeks 7 or 8-13

CSE454 is a prereq'

Topics include: Foundations of inductive inference; Minimum Message Length (MML) inference; Fisher information; MML of specific models such as decision graphs, hidden Markov models, linear and polynomial regression, causal models. Data mining. Applications to be considered may include: text and image analysis, models of protein folding, bushfire prediction, DNA alignment and the human genome project, authorship identification for texts, etc.

```foundn/models/applics             [[practical]
-------------                     ------------

1: Kolmo', Solom',..., Fisher (more) derive normal, mention Poisson F'
2: decision graphs

3: time-series, segmentation, trends etc.
4: sequences, H-Markov-Ms, W+Georgeff
[[time series, seg'n &/or HMM &/or auto reg']]

5: DNA pattern discovery
6: alignment, evolutionary trees

7: protein secondary structure pred'n
8: lin'-regression, polynomials and/or polygons
[[regression, polynomials, polygons, A.N.N.]]

9: mixtures + 1st order Markov
10: causal and/or neural models

11, 12:research applics such as...
Lempel-Ziv, Wally improv', approx' repeats
images, noisy/dirty pics, Markov fields
```

© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.
Created with "vi (Linux & IRIX)",   charset=iso-8859-1