Problem:
Given a data-set from a data-space,
E.g. from
data CTreeType inSpace opSpace =
CTleaf (ModelType opSpace) |
CTfork MessageLength
(Splitter inSpace)
[CTreeType inSpace opSpace]
| ...
-- can also consider other non-traditional options
A `Splitter' is used to partition the input space.
instance SuperModel (CTreeType inSpace opSpace) where -- NB. For simplicity only, this costs the -- structure at 1-bit per node. This is -- only optimal for binary trees. msg1 (CTleaf leafModel) = log2 + msg1 leafModel msg1 (CTfork fnLen f dts) = log2 + fnLen + (foldl (+) 0 (map msg1 dts))
Message length of a tree is that of the node plus those of the sub-trees, if any.
instance FunctionModel CTreeType where
condModel (CTleaf leafModel) i = leafModel
condModel (CTfork fnLen f dts) i
= condModel (dts !! (applySplitter f i)) i
It is convenient to have an inner,
local ``search'' function:
estCTree estLeafMdl splits ipSet opSet =
let
search ipSet opSet =
let
. . .
Simplest possible tree is a single
leaf node:
leaf = CTleaf leafMdl
leafMdl = estLeafMdl opSet
leafMsg = msg (functionModel2model leaf)
(zip ipSet opSet)
simple recursive search for the best (1- or) 2-level tree;
base case:
alternatives [] bestML bestCTree bestIpParts bestOpParts = (bestCTree, bestIpParts, bestOpParts) -- done
and...
...general case:
alternatives (sp:sps)
bestML bestCTree bestIpParts bestOpParts =
let
-- NB. the `1' below is acceptable but not optimal
theTree = CTfork (log (fromIntegral (length splts)))
sp leaves
leaves = map CTleaf leafMdls -- one leaf ...
leafMdls = map estLeafMdl opParts -- ... per part
partNums = map (applySplitter sp) ipSet
ipParts = partition (aritySplitter sp) partNums ipSet
opParts = partition (aritySplitter sp) partNums opSet
msgLen = msg (functionModel2model theTree)
(zip ipSet opSet)
in
if msgLen < bestML
then alternatives sps msgLen theTree ipParts opParts
else alternatives sps bestML bestCTree bestIpParts bestOpParts
splts = splits ipSet -- partitions of input space
in
case alternatives splts leafMsg leaf [ipSet] [opSet] of
-- subtrees? search continues...
((CTfork msgLen pf leaves), ipParts, opParts) ->
CTfork msgLen pf (zipWith search ipParts opParts);
-- the single leaf wins? Search is over!
(t, _, _) -> t
in search ipSet opSet
estFunctionModel2estModel estFn ipOpPairs =
functionModel2model (uncurry estFn (unzip ipOpPairs))
ft = estCTree (estFunctionModel2estModel estFnMdl)
splits trainingIp trainingOp