|
The 1-D dynamic programming problem is:
Given vertices,
vlo, vlo+1, ..., vhi-1, vhi,
directed edges (vi,vj)
for all i and j where lo≤i<j≤hi, and
an edge cost function, edgeCost i j,
find an optimal path,
[vi0,
vi1, ...,
vim],
from vlo to vhi.
This is a special case of the shortest-path problem;
here the graph is a complete, directed acyclic graph
as determined by the ordering on vertices.
It can therefore be solved by a variation
on the dynamic programming strategy which is also the basis of
Prim's
minimum spanning tree algorithm and
Dijkstra's
single-source shortest paths algorithm.
The algorithm maintains paths from
vlo to a set of "undone" vertices, initially containing
vlo+1, ..., vhi;
it also keeps a set of "done" vertices, initially empty.
The starting path from vlo to vi consists of
the single edge (vlo,vi) and is unlikely to be optimal.
The algorithm
repeatedly finds the best undone vertex, vb, that is closest
to vlo and moves it to the "done" set.
vb may also reveal shorter paths
from vlo via vb to remaining undone vertices.
Enough information is kept to trace an
optimal path backwards from vhi to vlo
when vhi becomes done.
module DPA1D (dpa1D) where
dpa1D lo hi edgeCost =
-- Given vertices lo, lo+1, ..., hi -- and directed,
-- +vely weighted edges (i,j,edgeCost i j) for all i < j,
-- return an optimal path from lo to hi with its cost,
-- i.e. ([lo, i, j, ..., hi], c) where lo < i < j < ... < hi.
let
dpa undone done =
let (b@(_,v,_)) = best undone -- closest (to lo) of the undone
in if v == hi then b:done
else dpa (removeAndUpdate b undone) (b:done)
undone = map (\j -> (lo, j, edgeCost lo j)) [lo+1..hi] -- initially
removeAndUpdate (edge@(i,j,c)) edges =
let
rmvEdge ((edge'@(_,j',_)):es) =
if j==j' then map update es -- drop edge'; seek shortcuts
else edge' : (rmvEdge es) -- else j > j'
-- assumes edge (i,j) => i < j, and that unDone sorted on j.
update edge'@(_,j',c') = -- does j help j' ?
let viaJ = c + edgeCost j j' -- ...(_,j,c)(j,j',c') ?
in if c' > viaJ then (j,j',viaJ) -- improvement or...
else edge' -- ...not
in rmvEdge edges
better (e@(_,_,c)) (e'@(_,_,c')) = if c' >= c then e else e'
best = foldl1 better
trace k edges path = -- trace back optimal soln...
if k <= lo then k:path -- ...from k down to lo
else let ((i,j,c):edges') = edges
in if j == k then trace i edges' (k:path)
else trace k edges' path
(edges @ ((e1@(i,j,cost)):_)) = dpa undone []
in (trace hi edges [], cost)
-- ---------------------------------------LA--UWA-&-U.York--2004--
|
(About [Haskell, ghc, *.hs].)
|
This problem appears in various disguises, for example
in inserting line breaks for paragraph layout and
in polygon fitting.
It also appears in the
inference and data-compression problem (Farr & Wallace 2002)
described in the next section.
A transmitter and a receiver want to share
the results of n Bernouilli trials, i.e. n throws of a coin,
carried out by the transmitter, by sending them over a data link.
They could use a fixed code, e.g. based on pr(H)=0.5
in which case the data would be sent at 1-bit per throw.
But they might be able to do better than this if the coin is biased.
The transmitter, who is the one making the trials,
is in a position to estimate θ=pr(H) and to send
the results using a code based on θ,
but the receiver cannot decode such data
unless told the estimate in a header to the message.
The transmitter and receiver need a "code-book"
for (the parameter of) the 2-state distribution.
Just considering the number of heads, #H, that could turn up,
this quantity lies in [0..n]; there are n+1 possible values for #H.
It is clearly impossible to infer pr(H) more precisely than
something like ±1/(2×(n+1)) and it turns out that
a sensible precision is much coarser than this.
The code-book contains a partition of [0..n]
(which implies a partition of [0.0..1.0]).
The transmitter conducts the trials and selects the entry
for the part of the partition into which #H falls.
The message header is the code-word for that part.
The transmitter then transmits the trial results using the
code for {H,T} based on the estimate corresponding to the part.
The receiver now has the estimate and can decode the results of the trials.
The 1-D DPA can be used
to partition [0..n] optimally(F&W 2002).
It is convenient to set lo=-1 and hi=n.
This example illustrates the correspondence between a path and a partition:
-
path | [ -1, 1, 2,5 ] |
edges | [(-1,1), (1,2), (2,5)] |
----->
----------> --------------->
-1 0 1 2 3 4 5
|
partition | [[0..1], [2..2], [3..5]] |
The cost of an edge (i,j), i.e. part [i+1..j],
is the expected message length over those
outcomes containing more than i and at most j heads.
A prior probability distribution on θ is needed
to complete the calculation;
the uniform distribution over [0.0..1.0] is used below for simplicity.
module Binomial (module Binomial) where
import DPA1D
binomial n = -- compute the SMML code-book for n Bernouilli trials
let
n' = fromIntegral n -- often need n in real ratios etc.
n1'= n'+1 -- ditto, |[0..n]|
codeTrials p h n = -- p = pr(h), h = #head, n = #trials
let t = n-h
in -(if h > 0 then (fromIntegral h)*log p else 0.0)
-(if t > 0 then (fromIntegral t)*log (1-p) else 0.0) -- no 0*inf
interval lo hi = -- expected msg len due to single interval [lo..hi]
let cases = hi-lo+1 -- |[lo..hi]|
cases' = fromIntegral cases
est = fromIntegral(lo+hi) / (2.0 * n')
in (cases' * log(n1'/cases') -- uniform prior
+ codeTrials est (cases*(hi+lo) `div` 2) (cases*n) -- m.l.
) / n1' -- in expectation
edgeCost i j = interval (i+1) j -- (i,j) covers [i+1..j]
(path, c) = dpa1D (-1) n edgeCost
parts = zip (map (+1) path) (tail path) -- partition [0..n]
in (parts, c)
-- --------------------------------------------LA--UWA-&-U.York--2004--
|
A simple main program runs the algorithm through its paces:
module Main where
import DPA1D
import Binomial
main =
putStrLn ("SMML code-book for 2-state (~binomial) distribution\n" ++
"key: n, partition, Expected msg length (bits)")
>> putStrLn( foldr (\n -> \theRest ->
let (p, msgl) = binomial n
in (show n) ++ ", " ++ (show p) ++ ", "
++ (show (msgl/log 2)) ++ "\n" ++ theRest
) ""
[1..30] ) -- a range of n
-- ---------------------------------------LA--UWA-&-U.York--2004--
|
E.g.
SMML code-book for 2-state (~binomial) distribution
key: n, partition, Expected msg length (bits)
1, [(0,1)], 1.0
2, [(0,2)], 2.0
3, [(0,0),(1,3)], 2.88
4, [(0,0),(1,3),(4,4)], 3.77
5, [(0,0),(1,4),(5,5)], 4.58
6, [(0,0),(1,5),(6,6)], 5.43
7, [(0,3),(4,7)], 6.25
8, [(0,0),(1,5),(6,8)], 7.04
9, [(0,0),(1,5),(6,9)], 7.83
10, [(0,0),(1,4),(5,9),(10,10)], 8.63
...
|
Farr and Wallace (2002) show that the SMML code-book problem
is feasible, that is computable in polynomial time,
for probability distributions having 1-D discrete parameter spaces.
"The problem in general is [...] NP-hard."
"The complexity of the trinomial [3-state] case remains open" but
they give a good heuristic for it;
the trinomial problem has two parameters,
θ0=pr(0) and θ1=pr(1),
pr(2)=1-θ0-θ1.
The normal distribution causes us two difficulties:
Its parameter space, (μ,σ), is two dimensional and
both parameters are continuous.
We can appoximate its code-book
by considering μ|σ and
by quantising the range of σ.
This is to make two simplifying assumptions:
An approximate code-book entry corresponds to
a rectangle in parameter space which is suboptimal in general, and
σ is quantised.
μ is an "origin" parameter:
There is no nett effect on the likelihood if
μ and all data are increased by the same value x.
σ is a "scale" parameter:
If μ=0 (see above), there is no nett effect on the likelihood if
σ and all data are multiplied by the same value x>0.
Taking advantage of these facts,
σ quanta are chosen so that their log values
are equally spaced and, for a given σ interval,
the range of μ is divided into some number of equal sized intervals.
The expected "cost" of using one estimate NmEst,sEst
per "patch" (mLo,mHi)×(sLo,sHi)
is the cost of sending the estimate plus
the expected KL-distance from the (untransmittable)
maximum likelihood estimate to it.
This expected KL-distance be calculated by
integration
of the KL-distance from NmaxLH to NmEst,sEst
over the patch.
Making the patches smaller reduces this cost to the data but
it also increases the cost of stating the estimates,
giving a trade-off to be optimised.
module Normal (module Normal) where
import DPA1D
normal nData mLwb mUpb sLwb sUpb nS =
let -- m mean, s sigma
nS' = fromIntegral nS
sRange = sUpb - sLwb
logSlwb = log sLwb
dLogS = (log sUpb - logSlwb) / nS'
sVal n = exp(logSlwb + (fromIntegral n)*dLogS)
sInterval i j =
let si = sVal i
sj = sVal j
in ( log(sRange/(sj-si)) -- code sEst, uniform prior
+ snd( doMean si sj )
) * (sj - si) / sRange -- expectation due to [i..j]
doMean sLo sHi = -- given [sLo...sHi], optimally divide...
let -- ...[mLwb...mUpb] into equal parts
sEst = (sLo + sHi) / 2
divide n =
let n' = fromIntegral n
dM = (mUpb - mLwb) / n'
mEst = dM / 2
eMsgL =
log n' -- code for mEst, uniform prior
+ nData * (integrateKL 0 dM sLo sHi mEst sEst)
/ ((sHi - sLo) * dM)
-- i.e. + nData * avg data "error"
in (n, eMsgL) -- divide
best (xc:xcs) = -- linear-search for the (only) min
let b (xc@(x,c)) ((xc'@(x',c')):xcs) =
if c' < c then b xc' xcs else xc
in b xc xcs
in best (map divide [1.. ]) -- doMean
(sNums, c) = dpa1D 0 nS sInterval
sVals = map sVal sNums
sIntvls = zip sVals (tail sVals)
mNs = map (\(sLo,sHi) -> fst(doMean sLo sHi)) sIntvls
in ((zip sIntvls mNs), c) -- normal
integrateKL m1Lo m1Hi s1Lo s1Hi m2 s2 = -- integrate KL over m1 & s1
let f m1 = ((m1-m2)**3) / (6*sqr s2) + m1*(log s2 - 0.5)
g s1 = (s1**3) / (6*sqr s2) - s1*(log s1 - 1)
-- NB d/dx {x.log x - x} = x/x + log x - 1 = log x
in (f m1Hi - f m1Lo)*(s1Hi - s1Lo) + (m1Hi-m1Lo)*(g s1Hi - g s1Lo)
sqr x = x*x -- maybe faster than x**2 ?
-- --------------------------------------------LA--UWA-&-U.York--2004--
|
module Main where
import DPA1D
import Normal
main =
putStrLn ("Approximate code-book for Normal (Gaussian) distribution\n"
++ "key: ((sigmaLo,sigmHi),nMuIntervals)... E |H|+n*KL_dist (nits)" )
>> putStrLn(
let (is,c) = (normal 30 0.0 2.0 0.1 1.0 100)
in (foldr (\x -> \rest -> (show x) ++ "\n" ++ rest) "" is)
++ show c
)
-- ---------------------------------------LA--UWA-&-U.York--2004--
|
E.g.
n=30; &mu 0.0..2.0; &sigma 0.1..1.0 (100 quanta):
Approximate code-book for Normal (Gaussian) distribution
key: ((sigmaLo,sigmHi),nMuIntervals)... E |H|+n*KL_dist (nits)
((0.10,0.16),24)
((0.16,0.25),15)
((0.25,0.40),10)
((0.40,0.63),6)
((0.63,1.0), 4)
4.32656332488434
|
- G. E. Farr and C. S. Wallace.
The complexity of strict minimum message length inference.
Comp. J., 45(3), pp.285-292, 2002.
- Note that they give expected message lengths
for transmitting #H only, excluding the trial results.
|
Haskell:
(:) | cons |
[x1,...] | list |
[ ] | list |
(++) | append |
\ | λ :-) |
:: | has type |
Compared
|
|
|