^CSE454^ ^2002^

Prac' 2, CSE454 CSSE Monash Semester 1, 2002

DRAFT, may be revised slightly, but certainly based around C5.

Due 4pm Friday, week 6, 19 April 2002, at the CSSE general office.

This prac' involves using the [C5 (on nexus)] decision (classification) tree program [tutorial (click)]

  1. Use C5 to analyse one of the data sets in the .../c5/Data/ directory: In 1-page, draw the tree (nicely!), or the top levels if the whole is too big, and describe what it means for the data set.
    [5%]

    In [../XD6/] is (i) a program to generate any amount of a certain kind of data and (ii) examples of this kind of data. The data has 10 binary attributes, `a1' to `a9' and `class' where   class = (a1 & a2 & a3) or (a4 & a5 & a6) or (a7 & a8 & a9) +/- noise.

  2. Draw the decision (classification) tree that describes the reduced XD':   class' = (a1' & a2') or (a3' & a4').   Identify the repeated sub-tree.
  3. How many nodes would the full tree for XD6 have?
  4. Decision (classification) tree programs have difficulty learning disjunctive functions (-graphs do better). Investigate the ability of C5 to learn a good model, or the true model, for XD6, as you vary
    1. the size of the training data and
    2. the noise level (you will need to modify the generator)
    Write a short report on C5's performance. You might like to include
    1. Optionally, results on a reduced XD' data set.
    2. Some example trees.
    3. Tables of results.
    4. Right/wrong scores on test data.
    5. The "closeness" of true and inferred trees.
    6. Other ?
    [15%]


    © L. Allison, School of Computer Science and Software Engineering, Monash University, Australia 3800.
    Created with "vi (Linux & IRIX)",   charset=iso-8859-1