This page is a work in progress. It is a collection of possible questions for the subject. The exact details of what is covered from year to year may (will) vary slightly.
- | H | T |
Coin1 | 1/2 | 1/2 |
Coin2 | 3/4 | 1/4 |
- | A | C | G | T |
DNA1 | 1/4 | 1/4 | 1/4 | 1/4 |
DNA2 | 1/2 | 1/4 | 1/8 | 1/8 |
DNA3 | 1/4 | 1/8 | 1/8 | 1/2 |
DNA4 | 1/8 | 1/8 | 1/2 | 1/4 |
DNA5 | 1/8 | 1/2 | 1/4 | 1/8 |
DNA6 | 1/8 | 1/8 | 1/4 | 1/2 |
- | 1 | 2 | 3 | 4 | 5 | 6 |
Dice1 | 1/4 | 1/4 | 1/8 | 1/8 | 1/8 | 1/8 |
Dice2 | 1/8 | 1/8 | 1/8 | 1/8 | 1/4 | 1/4 |
Dice3 | 1/4 | 1/8 | 1/8 | 1/8 | 1/8 | 1/4 |
Dice4 | 1/4 | 1/4 | 1/4 | 1/8 | 1/16 | 1/16 |
A four-sided dice has its faces labelled `A', `B', `C', and `D'. It is rolled 200 times and the following frequencies are counted:
A: 25, B: 25, C: 50, D: 100
(a) Define the term `entropy' of a distribution.
(b) What entropy would you infer for the dice?
(c) Define the term `Kullback-Leibler (KL) distance' of two distributions.
(d) There was a mix-up during the experiment and the experimenter copied down these incorrect frequencies:
A: 100, B: 50, C: 25, D:25
What is the KL distance of the distribution inferred by the
experimenter from the correct distribution
This question is about multi-state distributions and hypothesis testing.
(a) Briefly describe the three "obvious" ways of transmitting a sequence of observations drawn from a multi-state distribution whose parameters are not known in advance.
(b) What is the relationship between the message lengths of the three methods of transmitting the observations? (You do not have to prove it.)
(c) J' claims to be able to read other peoples' minds. A test is arranged where D' tosses and observes a coin which J' cannot see. J' must read D's mind and state whether D' sees a head or tail:
results | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
D' sees: | H | T | H | T | H | H | H | T | H | H |
J' reports: | H | T | T | T | T | H | H | T | T | H |
? J' correct ? | y | y | n | y | n | y | y | y | n | y |
You have some data on shooting attempts by a basket-ball player over a season. The data consists of a sequence of 0's and 1's, 1 indicating a basket (success), 0 indicating a miss. It is assumed that the probability of scoring a basket is independent of the outcome of other shots. The coach believes in the `hot hands' hypothesis, i.e. that the player may shoot well during some parts of the season with a high probability of success, and shoots less well at other times. The trainer, on the other hand, does not believe in this hypothesis but believes that any variation is just caused by luck and chance.
(a) In general terms, how could this difference of opinion between the coach and the trainer be examined using MML?
(b) Formulate each hypothesis in MML terms.
This question is about inference by Minimum Message Length (MML).
(a) Describe how each of the three hypotheses can be encoded so as to allow their posterior probabilities to be calculated using MML.
(b) What parts of the MML "tool kit" are relevant to calculating their message lengths? (Why and how.)
(c) Describe the outline of algorithm(s) for fitting the hypotheses to the data and for calculating their message lengths.
(a) Briefly describe the three "obvious" ways of transmitting a sequence of observations drawn from a multi-state distribution whose parameters are not known in advance.
(b) What is the relationship between the message lengths of the three methods of transmitting the observations? (Do not prove the result.)
(c) The chairman of the Geethorn Football Club has called an emergency meeting after 16 games of the football season to discuss firing the coach because of the terrible results. The coach maintains that although the overall results for the season to date are mediocre, there has been an improvement from game 9 onwards when Fido, the club mascot who sometimes bit players on days near a full-moon, ran away. The chairman maintains that this is a lunatic suggestion and that any apparent change is merely due to chance and statistical variation, and that there is no reason to expect the rest of the season to be other than terrible.
Game: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Result: Win/Loss |
L | W | L | L | W | L | L | L | W | W | L | W | L | L | W | L |
(d) Estimate total message lengths for the data under the two hypotheses (chairman v coach), stating any assumptions that you make. Briefly explain the reasoning behind your estimates. You may find information in the appendix helpful in estimating message lengths.
This question is about
Probabilistic Finite State Automata (PFSAs),
also known as Hidden Markov Models (HMM).
Below are seven sentences from a language L over
the alphabet {A,B,C}
/
'
CAAAB/BBAAB/CAAB/BBAB/CAB/BBB/CB/
(a) Define an efficient way of encoding a PFSA.
(b) Define an efficient way of encoding the data given a PFSA.
(c) Describe how MML and PFSAs can be used to model the sentences in languages such as L. How can MML be used to identify the best such model for L?
(d) Below are two PFSAs.
Note that for simplicity, transitions that do not occur on the
above data have not been shown in the 2nd PFSA;
define a sensible convention for dealing with such transitions.
Estimate the message lengths of the above data under the two
PFSAs below.
Give the message lengths of the PFSAs,
the data given each PFSA, and the totals.
Briefly explain the reasoning behind your estimates.
You are the honours-year coordinator in CSSE, concerned with the possibility that some of the subjects taken by students are inherently more `difficult' (or are marked more severely) than other subjects and that some subjects (e.g. CSE454) are easier (or marked more leniently). You believe, as a first approximation, that each student has a single basic `ability', that largely determines how well the student would perform in a subject of a given difficulty. You also believe that a subject has a single inherent `difficulty', and that a student's mark in a subject is mostly determined by the student's ability and the subject's difficulty.
You would like to adjust the raw marks of subjects to compensate for their difficulty. The adjusted marks for a subject should reflect the abilities of the students that took it, and a student's marks should reflect his or her ability.
Note that students do not take all subjects; it might even be possible to find two students who take no subject in common. It is also possible that one or more subjects are taken mainly by the stronger students and that others may be taken mainly by the weaker students, so it is not a good idea to force all subjects to have the same mean and variance.
(a) In general terms only, how could this be formulated as an MML problem?
(b) What is the `null hypothesis' for this problem?
(c) Give examples of simple and complex hypotheses in your framework.
This question is about coding.
(a) What do we mean by `efficient' in `efficient code'?
(b) Describe an efficient, universal code for positive integers (>=0), i.e. where large integers are reasonably likely.
(c) Show how to encode the integer 18 in your code.
(d) Show the steps by which the encoding from part (c) can be decoded to recover 18.
The Lempel Ziv (LZ) model (1976) of strings of characters over a finite alphabet is based on the idea that a string is a mixture of "random" characters and repeated substrings. A repeat must be an exact copy of a substring that begins earlier in the string (NB. "exact").
(a) Describe how the LZ model could be formulated as an MML inference problem.
(b) Give two good, but not necessarily optimal, explanations
of the string
(c) Describe a different way of encoding a string which would sum over all explanations of the string under the LZ model.
(d) Briefly, what differences, if any, would you expect between inferences based on single explanations as in (b) and those based on all explanations as in (c)?
Consider decision (classification) trees with four binary "input" attributes Att1, Att2, Att3, Att4, and a fifth binary attribute `class' which is to be predicted.
(a) Devise code words for the following tree topological structures:
| |
(i) | (ii) |
---|
Briefly explain your code words.
Now consider the following two trees based on the above topologies:
| |
(b) For each leaf node of each of the two trees, state the class predicted and give a rough estimate of the probability of that class.
(c) Cost the above two trees on the same data set. Do this by stating all parts of the message. For each tree in turn, cost those parts of the message that you are able to cost (refer to the tables provided) and outline how to cost any other part(s) of the message.
(d) State which tree is a better explanation of the data, and why?
Also see [2002].
Include the [tables (click)]