^II/Working/^
[01]
>>
Classification- (Decision-) TreesSee L. Allison, (i) TR 2003/148, and (ii) Types and Classes of Machine Learning and Data Mining, Twenty-Sixth Australasian Computer Science Conference (ACSC2003) pp207-215, Adelaide, Australia, 4-7 February 2003. Conferences in Research and Practice in Information Technology, Vol.16. Michael Oudshoorn, Ed. |
^II/Working/^
<<
[02]
>>
Partitioning on input attributes
|
^II/Working/^
<<
[03]
>>
data CTreeType inSpace opSpace = CTleaf (ModelType opSpace) | CTfork MessageLength (Splitter inSpace) [CTreeType inSpace opSpace] | ... -- can also consider other non-traditional options A `Splitter' can be used to partition the input space. |
^II/Working/^
<<
[04]
>>
instance SuperModel (CTreeType inSpace opSpace) where -- NB. For simplicity only, this costs the -- structure at 1-bit per node. This is -- valid but only optimal for binary trees. msg1 (CTleaf leafModel) = log2 + msg1 leafModel msg1 (CTfork fnLen f dts) = log2 + fnLen + (foldl (+) 0 (map msg1 dts)) Message length of a tree is that of the node plus those of the sub-trees, if any. |
^II/Working/^
<<
[05]
>>
Make a Classification Tree an instance of a FunctionModel: instance FunctionModel CTreeType where condModel (CTleaf leafModel) i = leafModel condModel (CTfork fnLen f dts) i = condModel (dts !! (applySplitter f i)) i |
^II/Working/^
<<
[06]
>>
estCTree : Estimate a Classification TreeIt is convenient to have an inner, local ``search'' function: estCTree estLeafMdl splits ipSet opSet = let -- NB. estLeafMdl must be able to handle an empty data set search ipSet opSet = let . . . |
^II/Working/^
<<
[07]
>>
Simplest possible tree is a single leaf node: leaf = CTleaf leafMdl leafMdl = estLeafMdl opSet leafMsg = msg (functionModel2model leaf) (zip ipSet opSet) |
^II/Working/^
<<
[08]
>>
recursive search for the best (1- or) 2-level tree alternatives [] bestML bestCTree bestIpParts bestOpParts = (bestCTree, bestIpParts, bestOpParts) -- done alternatives ((arity,pFn):pFns) bestML bestCTree bestIpParts bestOpParts = let -- NB. the `1' below is acceptable but not optimal theTree = CTfork (log (fromIntegral (length splts))) sp leaves leaves = map CTleaf leafMdls -- one leaf ... leafMdls = map estLeafMdl opParts -- ... per part partNums = map (applySplitter sp) ipSet ipParts = partition (aritySplitter sp) partNums ipSet opParts = partition (aritySplitter sp) partNums opSet msgLen = msg (functionModel2model theTree) (zip ipSet opSet) in if msgLen < bestML then alternatives sps msgLen theTree ipParts opParts else alternatives sps bestML bestCTree bestIpParts bestOpParts |
^II/Working/^
<<
[09]
>>
splts = splits ipSet in case alternatives splts leafMsg leaf [ipSet] [opSet] of -- subtrees? ((CTfork msgLen pf leaves), ipParts, opParts) -> CTfork msgLen pf (zipWith search ipParts opParts); -- the single leaf wins? (t, _, _) -> t in search ipSet opSet -- ----------9/2002--6/2003--L.Allison--CSSE--Monash--.au-- |
^II/Working/^
<<
[10]
>>
Summary
... FunctionModel- (Regression-) TreesUse a conversion function estFunctionModel2estModel: estFunctionModel2estModel estFn ipOpPairs = functionModel2model (uncurry estFn (unzip ipOpPairs)) ft = estCTree (estFunctionModel2estModel estFnMdl) splits trainIp trainOp L. Allison. Types and Classes of Machine Learning and Data Mining, 26th Australasian Computer Science Conference (ACSC2003) pp207-215, Adelaide, Australia, 4-7 February 2003. Conferences in Research and Practice in Information Technology, Vol.16. Michael Oudshoorn, Ed. © 2004 L. Allison, School of Computer Science and Software Engineering, |