See L. Allison. Types and Classes of Machine Learning and Data Mining, Twenty-Sixth Australasian Computer Science Conference (ACSC2003) pp207-215, Adelaide, Australia, 4-7 February 2003. Conferences in Research and Practice in Information Technology, Vol.16. Michael Oudshoorn, Ed.
CSE454
2003
:
This document is online at
http://www.csse.monash.edu.au/~lloyd/tilde/CSC4/CSE454/
and contains hyper-links to other resources
data CTreeType inSpace opSpace =
CTleaf (ModelType opSpace) |
CTforkTrad MessageLength
(inSpace -> Int)
[CTreeType inSpace opSpace]
-- can also consider other non-traditional options
instance SuperModel (CTreeType inSpace opSpace) where -- NB. For simplicity only, this costs the -- structure at 1-bit per node. This is -- only optimal for binary trees. msg1 (CTleaf leafModel) = 1 + msg1 leafModel msg1 (CTforkTrad fnLen f dts) = 1 + fnLen + (foldl (+) 0 (map msg1 dts))
Message length of a tree is that of the node plus those of the sub-trees, if any.
Make a Classification Tree an instance of a FunctionModel:
instance FunctionModel CTreeType where
condModel (CTleaf leafModel) i = leafModel
condModel (CTforkTrad fnLen f dts) i
= condModel (dts !! (f i)) i
It is convenient to have an inner, local ``search'' function:
estCTree estLeafMdl splits ipSet opSet = let -- NB. estLeafMdl must be able to handle an empty data set search ipSet opSet = let . . .
Simplest possible tree is a single leaf node:
leaf = CTleaf leafMdl
leafMdl = estLeafMdl opSet
leafMsg = foldl (+)
(msg1 leaf) -- part 1
(map (msg2 leafMdl) opSet) -- part 2
use a partitioning function, pFn, to partition ipSet and opSet into arity ``parts''
partition arity pFn ipSet opSet =
let
p [] [] ipParts opParts
= (ipParts, opParts) -- done
p (ip:ips) (op:ops) ipParts opParts =
let
partNum = pFn ip
allocate 0 (ipP:ipPs) (opP:opPs) = -- add ip and op to
(((ip:ipP):ipPs), ((op:opP):opPs)) -- the correct parts
allocate n (ipP:ipPs) (opP:opPs) =
(\(ipPs,opPs) ->
(ipP:ipPs, opP:opPs)) (allocate (n-1) ipPs opPs)
in uncurry (p ips ops) (allocate partNum ipParts opParts)
in p ipSet opSet (replicate arity []) (replicate arity [])
recursive search for the best (1- or) 2-level tree
alternatives []
bestML bestCTree bestIpParts bestOpParts
= (bestCTree, bestIpParts, bestOpParts) -- done
alternatives ((arity,pFn):pFns)
bestML bestCTree bestIpParts bestOpParts
=
let
-- NB. the `1' below is acceptable but not optimal
theTree = CTforkTrad 1 pFn leaves -- 2-level tree
leaves = map CTleaf leafMdls
leafMdls = map estLeafMdl opParts
(ipParts, opParts) = partition arity pFn ipSet opSet
msgLen = foldl (+) (msg1 theTree)
(map (\(leafMdl,opSet) ->
foldl (+) 0 (map (msg2 leafMdl) opSet))
(zip leafMdls opParts))
in
if msgLen < bestML -- i.e. an improvement
then alternatives pFns msgLen theTree ipParts opParts -- new 1
else alternatives pFns bestML bestCTree bestIpParts bestOpParts -- old
in
case alternatives (splits ipSet) leafMsg leaf [ipSet] [opSet] of
-- subtrees?
((CTforkTrad msgLen pf leaves), ipParts, opParts) ->
CTforkTrad msgLen pf (zipWith search ipParts opParts);
-- the single leaf wins?
(t, _, _) -> t
in search ipSet opSet
-- --------------------------------------9/2002--L.Allison--CSSE--Monash--.au--
Use a conversion function estFunctionModel2estModel:
estFunctionModel2estModel estFn ipOpPairs =
functionModel2model (uncurry estFn (unzip ipOpPairs))
ft = estCTree (estFunctionModel2estModel estFnMdl)
splits trainIp trainOp
L. Allison. Types and Classes of Machine Learning and Data Mining, 26th Australasian Computer Science Conference (ACSC2003) pp207-215, Adelaide, Australia, 4-7 February 2003. Conferences in Research and Practice in Information Technology, Vol.16. Michael Oudshoorn, Ed.