
The 1D dynamic programming problem is:
Given vertices,
v_{lo}, v_{lo+1}, ..., v_{hi1}, v_{hi},
directed edges (v_{i},v_{j})
for all i and j where lo≤i<j≤hi, and
an edge cost function, edgeCost i j,
find an optimal path,
[v_{i0},
v_{i1}, ...,
v_{im}],
from v_{lo} to v_{hi}.
This is a special case of the shortestpath problem;
here the graph is a complete, directed acyclic graph
as determined by the ordering on vertices.
It can therefore be solved by a variation
on the dynamic programming strategy which is also the basis of
Prim's
minimum spanning tree algorithm and
Dijkstra's
singlesource shortest paths algorithm.
The algorithm maintains paths from
v_{lo} to a set of "undone" vertices, initially containing
v_{lo+1}, ..., v_{hi};
it also keeps a set of "done" vertices, initially empty.
The starting path from v_{lo} to v_{i} consists of
the single edge (v_{lo},v_{i}) and is unlikely to be optimal.
The algorithm
repeatedly finds the best undone vertex, v_{b}, that is closest
to v_{lo} and moves it to the "done" set.
v_{b} may also reveal shorter paths
from v_{lo} via v_{b} to remaining undone vertices.
Enough information is kept to trace an
optimal path backwards from v_{hi} to v_{lo}
when v_{hi} becomes done.
module DPA1D (dpa1D) where
dpa1D lo hi edgeCost =
 Given vertices lo, lo+1, ..., hi  and directed,
 +vely weighted edges (i,j,edgeCost i j) for all i < j,
 return an optimal path from lo to hi with its cost,
 i.e. ([lo, i, j, ..., hi], c) where lo < i < j < ... < hi.
let
dpa undone done =
let (b@(_,v,_)) = best undone  closest (to lo) of the undone
in if v == hi then b:done
else dpa (removeAndUpdate b undone) (b:done)
undone = map (\j > (lo, j, edgeCost lo j)) [lo+1..hi]  initially
removeAndUpdate (edge@(i,j,c)) edges =
let
rmvEdge ((edge'@(_,j',_)):es) =
if j==j' then map update es  drop edge'; seek shortcuts
else edge' : (rmvEdge es)  else j > j'
 assumes edge (i,j) => i < j, and that unDone sorted on j.
update edge'@(_,j',c') =  does j help j' ?
let viaJ = c + edgeCost j j'  ...(_,j,c)(j,j',c') ?
in if c' > viaJ then (j,j',viaJ)  improvement or...
else edge'  ...not
in rmvEdge edges
better (e@(_,_,c)) (e'@(_,_,c')) = if c' >= c then e else e'
best = foldl1 better
trace k edges path =  trace back optimal soln...
if k <= lo then k:path  ...from k down to lo
else let ((i,j,c):edges') = edges
in if j == k then trace i edges' (k:path)
else trace k edges' path
(edges @ ((e1@(i,j,cost)):_)) = dpa undone []
in (trace hi edges [], cost)
 LAUWA&U.York2004

(About [Haskell, ghc, *.hs].)

This problem appears in various disguises, for example
in inserting line breaks for paragraph layout and
in polygon fitting.
It also appears in the
inference and datacompression problem (Farr & Wallace 2002)
described in the next section.
A transmitter and a receiver want to share
the results of n Bernouilli trials, i.e. n throws of a coin,
carried out by the transmitter, by sending them over a data link.
They could use a fixed code, e.g. based on pr(H)=0.5
in which case the data would be sent at 1bit per throw.
But they might be able to do better than this if the coin is biased.
The transmitter, who is the one making the trials,
is in a position to estimate θ=pr(H) and to send
the results using a code based on θ,
but the receiver cannot decode such data
unless told the estimate in a header to the message.
The transmitter and receiver need a "codebook"
for (the parameter of) the 2state distribution.
Just considering the number of heads, #H, that could turn up,
this quantity lies in [0..n]; there are n+1 possible values for #H.
It is clearly impossible to infer pr(H) more precisely than
something like ±1/(2×(n+1)) and it turns out that
a sensible precision is much coarser than this.
The codebook contains a partition of [0..n]
(which implies a partition of [0.0..1.0]).
The transmitter conducts the trials and selects the entry
for the part of the partition into which #H falls.
The message header is the codeword for that part.
The transmitter then transmits the trial results using the
code for {H,T} based on the estimate corresponding to the part.
The receiver now has the estimate and can decode the results of the trials.
The 1D DPA can be used
to partition [0..n] optimally(F&W 2002).
It is convenient to set lo=1 and hi=n.
This example illustrates the correspondence between a path and a partition:

path  [ 1, 1, 2,5 ] 
edges  [(1,1), (1,2), (2,5)] 
>
> >
1 0 1 2 3 4 5

partition  [[0..1], [2..2], [3..5]] 
The cost of an edge (i,j), i.e. part [i+1..j],
is the expected message length over those
outcomes containing more than i and at most j heads.
A prior probability distribution on θ is needed
to complete the calculation;
the uniform distribution over [0.0..1.0] is used below for simplicity.
module Binomial (module Binomial) where
import DPA1D
binomial n =  compute the SMML codebook for n Bernouilli trials
let
n' = fromIntegral n  often need n in real ratios etc.
n1'= n'+1  ditto, [0..n]
codeTrials p h n =  p = pr(h), h = #head, n = #trials
let t = nh
in (if h > 0 then (fromIntegral h)*log p else 0.0)
(if t > 0 then (fromIntegral t)*log (1p) else 0.0)  no 0*inf
interval lo hi =  expected msg len due to single interval [lo..hi]
let cases = hilo+1  [lo..hi]
cases' = fromIntegral cases
est = fromIntegral(lo+hi) / (2.0 * n')
in (cases' * log(n1'/cases')  uniform prior
+ codeTrials est (cases*(hi+lo) `div` 2) (cases*n)  m.l.
) / n1'  in expectation
edgeCost i j = interval (i+1) j  (i,j) covers [i+1..j]
(path, c) = dpa1D (1) n edgeCost
parts = zip (map (+1) path) (tail path)  partition [0..n]
in (parts, c)
 LAUWA&U.York2004

A simple main program runs the algorithm through its paces:
module Main where
import DPA1D
import Binomial
main =
putStrLn ("SMML codebook for 2state (~binomial) distribution\n" ++
"key: n, partition, Expected msg length (bits)")
>> putStrLn( foldr (\n > \theRest >
let (p, msgl) = binomial n
in (show n) ++ ", " ++ (show p) ++ ", "
++ (show (msgl/log 2)) ++ "\n" ++ theRest
) ""
[1..30] )  a range of n
 LAUWA&U.York2004

E.g.
SMML codebook for 2state (~binomial) distribution
key: n, partition, Expected msg length (bits)
1, [(0,1)], 1.0
2, [(0,2)], 2.0
3, [(0,0),(1,3)], 2.88
4, [(0,0),(1,3),(4,4)], 3.77
5, [(0,0),(1,4),(5,5)], 4.58
6, [(0,0),(1,5),(6,6)], 5.43
7, [(0,3),(4,7)], 6.25
8, [(0,0),(1,5),(6,8)], 7.04
9, [(0,0),(1,5),(6,9)], 7.83
10, [(0,0),(1,4),(5,9),(10,10)], 8.63
...

Farr and Wallace (2002) show that the SMML codebook problem
is feasible, that is computable in polynomial time,
for probability distributions having 1D discrete parameter spaces.
"The problem in general is [...] NPhard."
"The complexity of the trinomial [3state] case remains open" but
they give a good heuristic for it;
the trinomial problem has two parameters,
θ_{0}=pr(0) and θ_{1}=pr(1),
pr(2)=1θ_{0}θ_{1}.
The normal distribution causes us two difficulties:
Its parameter space, (μ,σ), is two dimensional and
both parameters are continuous.
We can appoximate its codebook
by considering μσ and
by quantising the range of σ.
This is to make two simplifying assumptions:
An approximate codebook entry corresponds to
a rectangle in parameter space which is suboptimal in general, and
σ is quantised.
μ is an "origin" parameter:
There is no nett effect on the likelihood if
μ and all data are increased by the same value x.
σ is a "scale" parameter:
If μ=0 (see above), there is no nett effect on the likelihood if
σ and all data are multiplied by the same value x>0.
Taking advantage of these facts,
σ quanta are chosen so that their log values
are equally spaced and, for a given σ interval,
the range of μ is divided into some number of equal sized intervals.
The expected "cost" of using one estimate N_{mEst,sEst}
per "patch" (mLo,mHi)×(sLo,sHi)
is the cost of sending the estimate plus
the expected KLdistance from the (untransmittable)
maximum likelihood estimate to it.
This expected KLdistance be calculated by
integration
of the KLdistance from N_{maxLH} to N_{mEst,sEst}
over the patch.
Making the patches smaller reduces this cost to the data but
it also increases the cost of stating the estimates,
giving a tradeoff to be optimised.
module Normal (module Normal) where
import DPA1D
normal nData mLwb mUpb sLwb sUpb nS =
let  m mean, s sigma
nS' = fromIntegral nS
sRange = sUpb  sLwb
logSlwb = log sLwb
dLogS = (log sUpb  logSlwb) / nS'
sVal n = exp(logSlwb + (fromIntegral n)*dLogS)
sInterval i j =
let si = sVal i
sj = sVal j
in ( log(sRange/(sjsi))  code sEst, uniform prior
+ snd( doMean si sj )
) * (sj  si) / sRange  expectation due to [i..j]
doMean sLo sHi =  given [sLo...sHi], optimally divide...
let  ...[mLwb...mUpb] into equal parts
sEst = (sLo + sHi) / 2
divide n =
let n' = fromIntegral n
dM = (mUpb  mLwb) / n'
mEst = dM / 2
eMsgL =
log n'  code for mEst, uniform prior
+ nData * (integrateKL 0 dM sLo sHi mEst sEst)
/ ((sHi  sLo) * dM)
 i.e. + nData * avg data "error"
in (n, eMsgL)  divide
best (xc:xcs) =  linearsearch for the (only) min
let b (xc@(x,c)) ((xc'@(x',c')):xcs) =
if c' < c then b xc' xcs else xc
in b xc xcs
in best (map divide [1.. ])  doMean
(sNums, c) = dpa1D 0 nS sInterval
sVals = map sVal sNums
sIntvls = zip sVals (tail sVals)
mNs = map (\(sLo,sHi) > fst(doMean sLo sHi)) sIntvls
in ((zip sIntvls mNs), c)  normal
integrateKL m1Lo m1Hi s1Lo s1Hi m2 s2 =  integrate KL over m1 & s1
let f m1 = ((m1m2)**3) / (6*sqr s2) + m1*(log s2  0.5)
g s1 = (s1**3) / (6*sqr s2)  s1*(log s1  1)
 NB d/dx {x.log x  x} = x/x + log x  1 = log x
in (f m1Hi  f m1Lo)*(s1Hi  s1Lo) + (m1Him1Lo)*(g s1Hi  g s1Lo)
sqr x = x*x  maybe faster than x**2 ?
 LAUWA&U.York2004

module Main where
import DPA1D
import Normal
main =
putStrLn ("Approximate codebook for Normal (Gaussian) distribution\n"
++ "key: ((sigmaLo,sigmHi),nMuIntervals)... E H+n*KL_dist (nits)" )
>> putStrLn(
let (is,c) = (normal 30 0.0 2.0 0.1 1.0 100)
in (foldr (\x > \rest > (show x) ++ "\n" ++ rest) "" is)
++ show c
)
 LAUWA&U.York2004

E.g.
n=30; &mu 0.0..2.0; &sigma 0.1..1.0 (100 quanta):
Approximate codebook for Normal (Gaussian) distribution
key: ((sigmaLo,sigmHi),nMuIntervals)... E H+n*KL_dist (nits)
((0.10,0.16),24)
((0.16,0.25),15)
((0.25,0.40),10)
((0.40,0.63),6)
((0.63,1.0), 4)
4.32656332488434

 G. E. Farr and C. S. Wallace.
The complexity of strict minimum message length inference.
Comp. J., 45(3), pp.285292, 2002.
 Note that they give expected message lengths
for transmitting #H only, excluding the trial results.

window on the wide world:
Haskell:
(:)  cons 
[x1,...]  list 
[ ]  list 
(++)  append 
\  λ :) 
::  has type 
Compared


