^CSE454^
^2005^
Practical 2, CSE454 CSSE Monash, Semester 1, 2005
Due CSSE general office,
noon Thursday, week 12, 26 May 2005.
Do not use Excel or other software, particularly not to produce graphs and
diagrams. This is not to be perverse; there are far fewer problems with
readability when graphs and diagrams are drawn by hand.
This practical involves using the
[C5 (on nexus)]
decision (classification) tree program
[tutorial (csse domain)].
- Take the
[Anderson / Fisher]
Iris data set and put it in a suitable form
for use with C5 -- the 5th attribute, species,
allows it to be used for 'supervised classification'.
In 1-page, draw the tree (nicely!), or the top levels if the whole is too big,
and describe what it means for the data set.
If your use of Snob in practical-1 (unsupervised classification)
found a different set(s) of "species" from the botanists,
take the (most probable) class in Snob's "best" set as the
the attribute to be predicted by C5 and do the same analysis.
[5 marks]
In [../XD6/]
is (i) a program to generate any amount of a certain kind of data
and (ii) examples of this kind of data.
The data have ten binary attributes, `a1' to `a9' and `class'
where
class = (a1 & a2 & a3) or (a4 & a5 & a6) or (a7 & a8 & a9)
+/- noise.
- Draw the decision (classification) tree that describes
the reduced XD':
class' = (a1' & a2') or (a3' & a4').
Identify the repeated sub-tree.
- How many nodes would the full tree for XD6 have? Why?
- Decision (classification) tree programs have difficulty
learning disjunctive functions (-graphs do better).
Investigate the ability of C5 to learn a good model,
or the true model, for XD6, as you vary
- the size of the training data and
- the noise level (you will need to modify the generator slightly)
Write a short report on C5's performance.
You might like to include some of
- Optionally, results on a reduced XD' data set.
- Some example trees.
- Tables of results.
- Right/wrong scores on test data.
- The "closeness" of true and inferred trees.
- Other ?
[15 marks]
© L. Allison,
School of Computer Science and Software Engineering,
Monash University, Australia 3800.
Created with "vi (Linux & IRIX)", charset=iso-8859-1