^Learning and Prediction^

Past Questions

Here are some past exam questions.


Question. This question is about coding.

Prove the Kraft Inequality :-
Given a set of N possible events {e1, e2, ..., eN}, there is a prefix code for the set with code-words of lengths L1, L2, ..., LN if and only if the sum over i=1..N of 2-Li is less than or equal to 1.
[12%]
Notes:
  1. Book work, look under `Coding'.
  2. NB. Must prove it both ways => and also <=.
  3. NB. Huffman code efficiency follows, not a proof of Kraft.

Question. This question is about multi-state distributions and hypothesis testing.

(a) Briefly describe the three "obvious" ways of transmitting a sequence of observations drawn from a multi-state distribution whose parameters are not known in advance.

(b) What is the relationship between the message lengths of the three methods of transmitting the observations? (You do not have to prove it.)

(c) J' claims to be able to read other peoples' minds. A test is arranged where D' tosses and observes a coin which J' cannot see. J' must state whether D' sees a head or tail:

results 1 2 3 4 5 6 7 8 9 10
D' sees: H T H T H H H T H H
J' reports: H T T T T H H T T H
? J' correct ? y y n y n y y y n y
Describe how minimum message length (MML) can be used to test J's claim.

(d) Estimate the message lengths, in bits, for the data under the two hypotheses (i) that J' can read minds to some significant extent and (ii) that J' cannot read minds.  What can be concluded? Discuss the results.
[20%]
Notes:
  1. Read Boulton and Wallace, J. Theor. Biol., 1969.
  2. The example is the y2k coffee hypothesis in disguise of course.
  3. NB. Because the `send (i) params and then (ii) data | params  method' is just very slightly more expensive than the `combinatorial method' (part (b)), an easy way to estimate the reads-minds message length is to calculate the #bits for the combinatorial method - use the table of logs of factorials in the appendix.
  4. The message length given the cannot-read-minds hypothesis (zero parameters because p=0.5, fixed) is of course 10 bits (if we assume #trials is common knowledge and we are only interested in J's correctness.


Question. This question is about inference by Minimum Message Length (MML).

The bureau of meteorology has records of the total annual rain-fall at Darwin, Cairns, Sydney, and Perth, for the last 200 years.
H1:  One hypothesis is that the rain-fall at a location is well modelled by a normal distribution, N(mu,sigma), where mu and sigma depend only on the city.
H2:  A second hypothesis is that there are two or more "kinds" of year, and that the rain-fall at a location is well modelled by a normal distribution, N(mu,sigma), where mu and sigma depend on the city and on the kind of the year. Further, the kind of a year is independent of the kind of the previous year and of the next year.
H3:  A third hypothesis is as for H2, except that the "kind" of a year can depend on the kind of the previous year.

(a) Describe how each of the three hypotheses can be encoded so as to allow their posterior probabilities to be calculated using MML.

(b) What parts of the MML tool kit are relevant to calculating their message lengths? (Why and how.)

(c) Describe the outline of algorithm(s) for fitting the hypotheses to the data and for calculating their message lengths.

[24%]
Notes:
  1. (a) (H1) Normal distribution x 4; (H2) Mixture modelling ~ Snob; (H3) Mixtures + 1st order Markov model ~ Edgoose 1998.
  2. (b) Normal distribution, multi-state (re mixture and also re Markov model).
  3. (c) H1: use MML estimator for four Normals, one per city. H2: EM algorithm, ..., split/ merge classes, ..., fractional membership assignment, ..., H3: EM algorithm, ..., forward-backward passes, ....


Appendix

Log2 table
e.g. log2(0.12) = -3.06.   NB. log2(2x)=1+log2(x),  log2(x2)=2 log2(x).


© L.Allison and D. L. Dowe, Computer Science and Software Engineering, Monash University, Australia 3168.