filler
[01] >>
^MML^

Segmentation of Ordered Data

On work by Leigh Fitzgibbon

Other debts (lacitebahpla order): Chris Wallace, Jon Oliver, Dean McKenzie, David Dowe, Rohan Baxter

© 2001, Lloyd Allison, School of Computer Science and Software Engineering, Monash University, Australia 3168

See: L. J. Fitzgibbon, L. Allison & D. L. Dowe. Minimum message length grouping of ordered data. Proc. 11th Int. Workshop on Algorithmic Learning Theory (ALT'2000) pp56-70, Sydney, LNCS V1968, Springer-Verlag, December 2000

This document is online at  www.csse.monash.edu.au/~lloyd/Seminars/Series-Seg01/index.shtml   and contains hyper-links to more detailed notes on certain topics and to some of the references.


filler
<< [02] >>

Problem

- a difficult problem if data ranges of neigbouring segments overlap.

Simplest model: One segment.

Most complex model: One segment per element (over-fitting).

filler
<< [03] >>
e.g. Generated data, 4 segments, 3 cut-points, means & S.D.s shown
generated data for Segmentation Problem

filler
<< [04] >>

Fisher's Algorithm (1958)

Dynamic Programming: so O(K*n2)-time.

filler
<< [05] >>
2 segments, Least Squares
2 Segment Least Squares fit
finds one cut point well

filler
<< [06] >>
3 segments LSq
3 Segment Least Squares fit
2nd cut point is close

filler
<< [07] >>
4 segments LSq
4 Segment Least Squares fit
two small segments inappropriately inferred

filler
<< [08] >>
5 segments LSq
5 Segment Least Squares fit
two cut points poor, one fair, one good

filler
<< [09] >>

Stopping

But Fisher's algorithm (1958) has no stopping criterion: Least squares differences (say) decrease as k-->n.

Solution: Model has a message length (cost) . . . This prevents over-fitting.

filler
<< [10] >>
MML fit:
MML fit: 4 Segments;  correct
essentially correct

filler
<< [11] >>
Real-World Data, MML:
Lake Michigan Huron levels from W Fisher 96 years of data
W. Fisher's data: 96 years of Lake Michigan Huron

filler
<< [12] >>
Real-World Data, MML:
Segmentation Lake Michigan Huron levels from W Fisher 96 years of data

filler
<< [13] >>
MML:
Segmentation Lake Michigan Huron levels from 140 years of data
140 years of lake levels . . .

filler
<< [14]

Summary

References


© L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3168.
Created with "vi (Linux & IRIX)",   charset=iso-8859-1