Below are David Dowe's 15 Hons projects on offer for 2012:
------------------------------**----------------------------
(1) David Dowe 2012 Hons project #1
Title: Re-visiting entropy as time's arrow
Supervisors: David Dowe and David Paganin (Physics)
Wallace (2005, chapter 8) discusses that Charge, Parity and
Time invariance (or CPT invariance) in physics suggests that
entropy of a closed system should be just as likely to increase
when going forward in time as when going backwards in time.
This flies in the face of conventional wisdom that entropy is
supposedly time's arrow. We explore this by repeating and
extending Wallace's simulations, also noting our penchant
for predicting the future and that the past can be inferred by
Minimum Message Length (MML).
Reference: C. S. Wallace (2005), ``Statistical and Inductive
Inference by Minimum Message Length'', Springer.
``Big bang re-spun'' (or ``Original spin''), New Scientist,
15/Oct/2011, pp44-47
End David Dowe 2012 Hons project #1
------------------------------**----------------------------
------------------------------**----------------------------
(2) David Dowe 2012 Hons project #2
Title: Non-standard models of computation and universality
Supervisor: David Dowe
Zvonkin and Levin (1970) (and possibly earlier, Martin-Lo"f (1966))
consider the probability that a Universal Turing Machine (UTM),
U, will halt given infinitely long random input (where each bit from
the input string has a probability of 0.5 of being a 0 or a 1).
Chaitin (1975) would later call this the halting probability, Omega,
or Omega_U . Following an idea of C. S. Wallace's in private
communication (Dowe 2008a, Dowe 2011a), Barmpalias & Dowe
(to appear) consider the universality probability - namely, the
probability that a UTM, U, will retain its universality. If some input x
to U has a suffix y such that Uxy simlates a UTM, then U has
not lost its universality after input x. Barmpalias, Levin (private
communication) and Dowe (in a later simpler proof) have shown that
the universality probability, P_U, satisfies 0 < P_U < 1 for all UTMs U
and that the set of universality probabilities is dense in the interval
(0, 1).
We examine properties of the universality probability for
non-standard models of computation (e.g., DNA computing).
Reference:
G. Barmpalias and D. L. Dowe, "Universality probability of a
prefix-free machine", accepted, Philosophical Transactions of the
Royal Society A
End David Dowe 2012 Hons project #2
------------------------------**----------------------------
------------------------------**----------------------------
(3) David Dowe 2012 Hons project #3
Title: Database normalisation by Minimum Message Length inference
Supervisor: David Dowe
Minimum Message Length (MML) (Wallace and Boulton, 1968) is a
universal principle in machine learning, statistics and ``data mining''
which, like Ockham's razor, gives us a theory which optimises the
trade-off between simplicity and goodness of fit. It also predicts
near optimally. Dowe and Zaidi (2010) shows how to achieve
database normalisation by following the principles of MML, given
sufficient data - but it took the work only to 1NF, 2NF and 3NF.
We extend this work to higher normal forms such as BCNF,
4NF and 5NF.
References:
Dowe and Zaidi (2010).
D. L. Dowe (2011a),
"MML, hybrid Bayesian network graphical models, statistical consistency,
invariance and uniqueness", Handbook of the Philosophy of Science - (HPS
Volume 7) Philosophy of Statistics,
pp901-982, 1/June/2011.
End David Dowe 2012 Hons project #3
------------------------------**----------------------------
------------------------------**----------------------------
(4) David Dowe 2012 Hons project #4
Title: MML time series and Bayesian nets with discrete and continuous
attributes
Supervisor: David Dowe
The first application of MML to Bayesian nets including both discrete and
continuous-valued attributes was in Comley & Dowe (2003), refined in Comley
& Dowe (2005) [whose final camera-ready version was submitted in Oct 2003],
based on an idea in Dowe & Wallace (1998). We seek to enhance this
original
work to Bayesian nets which can change with time, using the mathematics of
MML time series in Fitzgibbon, Dowe et al. (2004).
Comley, J. & D.L. Dowe (2003). General Bayesian Networks and Asymmetric
Languages, Proc. 2nd Hawaii International Conf' on Statistics and Related
Fields, 5-8 June, 2003.
Comley, J. & D.L. Dowe (2005). Minimum Message Length, MDL and Generalised
Bayesian Networks with Asymmetric Languages, Chapter 11 (pp265-294) in P.
Grunwald et al. (eds.), Advances in Minimum Description Length: Theory and
Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9
Dowe, D. L. (2008a), ``Foreword re C. S. Wallace'', Computer J., Vol. 51
No. 5
[Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560
D L Dowe & C S Wallace (1998). Kolmogorov complexity, minimum message
length
and inverse learning, abstract, page 144, 14th Australian Statistical Conf'
(ASC-14), Qld, 6 - 10 July 1998.
Fitzgibbon, L.J., D. L. Dowe & F. Vahid (2004). Minimum Message Length
Autoregressive Model Order Selection. In M. Palanaswami et al. (eds.),
International Conf' on Intelligent Sensing and Information Processing
(ICISIP),
Chennai, India, Jan 2004, pp439-444
End David Dowe 2012 Hons project #4
------------------------------**----------------------------
------------------------------**----------------------------
(5) David Dowe 2012 Hons project #5
Title: MML clustering and mixture modelling, re-visiting Snob
Supervisor: David Dowe
The Snob program for clustering and mixture modelling using Minimum
Message Length (MML) dates back to the seminal Wallace & Boulton (1968)
paper. Up until Wallace (1990), all the Snob work on MML clustering
and mixture modelling represented the data by clusters (or groups, or
components) of either multinomial and/or Normal (or Gaussian)
distributions. This was extended in a series of papers (Wallace & Dowe
1994, 1996, 1997, 2000) to include Poisson distributions (for counts)
and modelling angular data (such as protein angles) from the von Mises
circular distribution. Other extensions have included latent factor
analysis for modelling correlations within classes (Edwards & Dowe, 1998)
and varieties of spatial image models (Wallace 1998; Visser & Dowe 2007;
Visser, Dowe & Uotila, 2009) where we expect the class of a pixel to be
influenced by the classes of the neighbouring pixels - and where we
typically draw on approximations from thermal physics.
Surveys of this work are given in parts of Wallace (2005) and Dowe (2008a).
This project will involve extending the program in one of more directions.
Dowe, D. L. (2008a), ``Foreword re C. S. Wallace'', Computer J., Vol. 51
No. 5
[Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560
Edwards & Dowe (1998)
Visser & Dowe (2007)
Visser, Dowe & Uotila (2009)
Wallace (1990)
Wallace (1998)
Wallace & Boulton (1968)
Wallace & Dowe (1994)
Wallace & Dowe (2000)
End David Dowe 2012 Hons project #5
------------------------------**----------------------------
------------------------------**----------------------------
(6) David Dowe 2012 Hons project #6
Title: (Algorithmic) Information Theory and Measures of Intelligence
Supervisor: David Dowe
The first work devoted to the relationship between (algorithmic)
information theory (equivalently, Minimum Message Length [MML]) and
algorithmic information theory appears to be Dowe and Hajek (1997, 1998),
partly in response to Searle's ``Chinese room'' argument.
More recently, Hernandez-Orallo and Dowe (Artificial Intelligence, 2010)
outlined how to use (algorithmic) information theory to devise an anytime
universal intelligence test for any subject agent (
http://users.dsic.upv.es/**proy/anynt
),
attracting many downloads and articles in "The Economist", "New Scientist"
and much other media.
There is room for several more people in one aspect of another
of this active research project.
References:
-----------------
J. Hernandez-Orallo and D. L. Dowe (2010), "Measuring Universal
Intelligence: Towards an Anytime Intelligence Test", (the) Artificial
Intelligence journal (AIJ), Volume 174, Issue 18, December 2010,
pp1508-1539. [www.doi.org: 10.1016/j.artint.2010.09.006 .]
D. L. Dowe and J. Hernandez-Orallo (2012), "IQ tests are not for machines,
yet",
accepted, to appear, Intelligence journal.
End David Dowe 2012 Hons project #6
------------------------------**----------------------------
------------------------------**----------------------------
(7) David Dowe 2012 Hons project #7
Title: Inferring evolution of languages
Supervisor: David Dowe
The evolution of human languages raises several interesting issues, such
as how languages evolve, how the evolution of spoken language relates to
the evolution of written language, how it is that geographical regions
of related languages can surround one or more regions of languages not
related, how populations of language speakers migrated eons ago and
possibly also how spoken language relates to DNA. Study of this area
also helps with the inference of now-extinct ancestral languages and at
least indirectly with the preservation of dying languages. The project
will use the Minimum Message Length (MML) principle (Wallace and Boulton,
1968) (Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace,
posthumous, 2005) (Comley and Dowe, 2005), building upon earlier work in
(Ooi and Dowe, 2005).
The project will require strong mathematics - calculus (partial
derivatives, second-order partial derivatives, integration by parts,
determinants of matrices, etc.), etc.
References:
-----------------
References:
CoDo2005 Comley, Joshua W. and D.L. Dowe (2005).
OoDo2005 Ooi, J.N. and D. L. Dowe, Inferring Phylogenetic Graphs of
Natural Languages using Minimum Message Length, Proc. CAEPIA 2005,
Vol. 1, pp I:143 - I:152, Nov. 2005.
``The language paradox - why one species speaks
in so many different ways'' (or ``Powers of Babel''),
New Scientist, 10/December/2011, pp34-37.
Wall2005
WaBo1968 Wallace C.S. & Boulton, D.M. (1968)
WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length
and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4,
pp270-283.
WaFr1987
End David Dowe 2012 Hons project #7
------------------------------**----------------------------
------------------------------**----------------------------
(8) David Dowe 2012 Hons project #8
Title: MML inference of support vector machines
Supervisor: David Dowe
Support Vector Machines (SVMs) are a popular approach to classification
in machine learning and "data mining". They are usually only used to
divide between two classes ("yes"/"no" or "positive"/"negative") and
nor are they typically able to give probabilities with their
predictions. They also have some arbitrariness in the choice of "kernel"
functions for specifying non-linear boundaries. Using Minimum Message
Length (MML) approaches such as those in Tan & Dowe (2004), other notions
intimated in Dowe (2007) and some previously overlooked coding
inefficiencies, we will be able to overcome all these shortcomings.
This will enable us to come up with comparatively simple SVMs which give
excellent (probabilistic) predictions on multi-class problems, possibly
using non-linear cuts.
The mathematics in this project will not be trivial.
References:
-----------------
ComleyDowe2005
D. L. Dowe (2007), Discussion following "Hedging Predictions in Machine
Learning, A. Gammerman and V. Vovk", Computer Journal, Vol. 50, No. 2,
March 2007, pp167-168
D. L. Dowe, S. Gardner and G. R. Oppy (2007) "Bayes not Bust! Why
Simplicity
is no Problem for Bayesians", Brit. Journal Philos. Sci. (BJPS), Dec. 2007,
pp709-754.
P. J. Tan and D. L. Dowe (2004). MML Inference of Oblique Decision Trees,
Proc. 17th Australian Joint Conf. on Artificial Intelligence (AI'04),
Dec. 2004, Lecture Notes in Artificial Intelligence (LNAI) 3339, Springer,
pp1082-1088.
Wallace2005
End David Dowe 2012 Hons project #8
------------------------------**----------------------------
------------------------------**----------------------------
(9) David Dowe 2012 Hons project #9
Title: MML inference of systems of differential equations
Supervisor: David Dowe
MML inference of systems of differential equations
Many simple and complicated systems in the real world can be described
using systems of
differential equations (Bernoulli, Navier-Stokes, etc). Despite the fact
that we can accurately
describe and solve those equations they often fail to produce accurate
predictions. In this
project, our goalis to create a way of inferring the system of (possibly
probabilistic or stochastic
(partial or ordinary) differential equations (with a quantified noise term
accounting for any
inexactness) that describes a real-world system based on a set of given
data. Initially we can
begin by working on a single equation with one unknown. (The noise could
be due to a number
of effects such as measurement inaccuracies or oversimplified models
used.) From there, we
can progressively move to gradually more complicated equations.
Minimum Message Length (MML) will be one of the tools used for modelling
as it can provide ways
of producing simpler models that actually fit closer than their more
complicated counterparts
produced by other methods. The project will become increasingly
CPU-intensive but will ultimately
have many real-world applications in a wide range of areas.
References:
-----------------
Wallace (2005)
Dowe (2011a)
End David Dowe 2012 Hons project #9
------------------------------**----------------------------
------------------------------**----------------------------
(10) David Dowe 2012 Hons project #10
Title: Econometric, statistical and financial time series modelling using
MML
Supervisor: David Dowe
Time series are sequences of values of one or more variables. They
are much studied in finance, econometrics, statistics and various
branches of science (e.g., meteorology, etc.).
Minimum Message Length (MML) inference (Wallace and Boulton, 1968)
(Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace,
posthumous, 2005)(Comley and Dowe, 2005) has previously been applied
to autoregressive (AR) time series (Fitzgibbon et al., 2004), other
time series (Schmidt et al., 2005) and (at least in preliminary manner)
both AR and Moving Average (MA) time series (Sak et al., 2005).
In this project, we apply MML to the Autoregressive Conditional
Heteroskedasticity (ARCH) model, in which the (standard deviations
and) variances also vary with time. Depending upon progress, we can
move on to the GARCH (Generalised ARCH) model or Peiris's Generalised
Autoregressive (GAR) models, or to inference of systems of
differential equations.
This project will require strong mathematics - calculus (partial
derivatives, second-order partial derivatives, integration by parts,
determinants of matrices, etc.), etc.
References:
CoDo2005 Comley, Joshua W. and D.L. Dowe (2005).
FiDV2004 Fitzgibbon, L.J., D. L. Dowe and F. Vahid (2004).
SaDR2005
ScPL2005
Wall2005
WaBo1968
WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length
and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4,
pp270-283.
WaFr1987
End David Dowe 2012 Hons project #10
------------------------------**----------------------------
------------------------------**----------------------------
(11) David Dowe 2012 Hons project #11
Title: Probabilistic machine learning (and copula selection) using MML
Supervisor: David Dowe
All too many machine learning algorithms have their success measured
by how many classifications they got right and how many they got wrong.
This common-sense measure is a fine and reasonable place to start, but
it is highly dependent upon the framing of the problem (Dowe 2011a, sec.
3).
Consider, for example, two different multiple-choice exams where one of
them combines the questions of another into one (big) question. It turns
out that the only scoring system which remains invariant to re-framing of
the questions is log-loss scoring [where probabilities are allocated to
each
prediction and the score given is the logarithm of the probability
allocated
to that prediction] (Dowe 2008a, footnote 175; Dowe 2008b; Dowe 2011a,
sec. 3).
One possible direction in which to take this project is using Minimum
Message Length (MML) for copula selection - a quite general modelling
approach.
References:
-----------------
D. L. Dowe (2008a), ``Foreword re C. S. Wallace'', Computer J.
D. L. Dowe (2008b).
D. L. Dowe (2011a),
"MML, hybrid Bayesian network graphical models, statistical consistency,
invariance and uniqueness", Handbook of the Philosophy of Science - (HPS
Volume 7) Philosophy of Statistics,
pp901-982, 1/June/2011.
R. Fujimaki, Y. Sogawa and S. Morinaga (2011),``Online heterogeneous
mixture
modeling with marginal and copula selection'' Proc. KDD '11, Proceedings
of the 17th
ACM SIGKDD international conference on Knowledge discovery and data mining.
End David Dowe 2012 Hons project #11
----------------------------------------------------------
----------------------------------------------------------
(12) David Dowe 2012 Hons project #12
Title: Vision enhancement and spike sorting algorithms for multiple
electrodes using MML
24-point project
Supervisors: David Dowe and Ramesh Rajan (Physiology)
Spike signals obtained from one or more electrodes receiving input from
the brain are much studied. The signals are typically a mix of one or
more component signals, and both frequencies and certainly amplitudes
can vary - and can be affected by local factors such as capacitance.
Where there are several well-separated electrodes, there might be weak
correlation between the signals, which can possibly be modelled by
latent factor analysis and/or some degree of time series delay. A
typical data-set might be large with data from over 100 electrodes,
modelling neural responses to visual stimuli.
The project will use Minimum Message Length (MML), whose generality
enables it to be used to infer any computable function (Wallace & Dowe
1999a; chap. 2 of Wallace 2005; Dowe 2011) . Co-supervision will come
from Physiology. The Department of Physiology will provide data with
varying degree of complexity: ranging from single electrode data with
known number of sources, single electrode with unknown number of
sources, and electrode array data. The project will start with a simple
problem and work towards more challenging problems, with our developing
mathematical MML models as elaborate as time permits us to. A knowledge
of physiology and/or brain signal data will be an advantage, and strong
mathematics will be essential.
References:
Dowe (2011a)
www.frontiersin.org/SearchData.aspx?sq=Spike+sorting
Quiroga (2012)
http://www.scholarpedia.org/article/Spike_sorting
Wallace (2005)
Wallace & Dowe (1999a)
Wild, Prekopcsak, Sieger, Novak and Jech (2012)
End David Dowe 2012 Hons project #12
----------------------------------------------------------
----------------------------------------------------------
(13) David Dowe 2012 Hons project #13
Title: Model selection and parameter estimation for
influenza infection
24-point project
Supervisors: David Dowe and James McCaw
Influenza viruses enter the body and replicate by infecting susceptible
cells (tissue) in the airways. Infected cells release multiple progeny
virus, leading to initial exponential growth in the viral load. Under
the classical 'target cell limited' model of within-host dynamics, this
growth gives way to a peak and then decline in viral load once the
number of susceptible (target) cells has declined to such an extent that
any given influenza virus is unlikely to find and infect one during its
lifetime. The process of invasion, replication and decline of virus is
described by a set of coupled non-linear 1st order ordinary differential
equations. The model paradigm has been widely used to explain key
biological and epidemiological phenomena such as susceptibility,
infectiousness, drug efficacy and the emergence and consequences of
drug-resistance.
However, we know that this simple model is wrong - we have the
counter-example of influenza infections that do not resolve in
immunocompromised individuals, indicating that processes other than
target cell depletion also contribute to control and eventual
suppression of viral load. Biologically, it is clear that these 'other
processes' originate from the action of the innate and adaptive immune
systems. However, collecting relevant time-series data to differentiate
between models that include immune-responses to varying degrees is
extremely i) time-consuming, ii) costly and iii) difficult to justify if
you don't know what data one really needs to accept/reject different
model constructs.
We will use the information-theoretic techniques behind Minimum
Message Length (MML) to develop optimised strategies for choosing what
experiments to run. Through collaborations with colleagues in virology
and immunology we will then run those experiments, and use the data
and our knowledge of biological plausibility to find the best MML
inference.
The student will have some knowledge of complex systems and
non-linear analysis, gained through study in mathematics, physics,
computer science, statistics, econometrics, (electrical) engineering or
a related field. The ability to program is essential.
References:
Dowe (2011a)
Amber M. Smith and Alan S. Perelson, Influenza A virus infection
kinetics: quantitative data and models, Wiley Interdiscip Rev Syst Biol
Med (2010).
Wallace (2005)
End David Dowe 2012 Hons project #13
----------------------------------------------------------
----------------------------------------------------------
(14) David Dowe 2012 Hons project #14
Title: Software testing as we approach the technological singularity
Alternative Title:
*Can software testing stop Terminator's SkyNet from being released into the
wild?*
Supervisors: Robert Merkel and David Dowe
Hernandez-Orallo and Dowe have proposed a universal intelligence test that
can, at least theoretically, provide a measure of intelligence for any
entity, computational or biological. In a nutshell, they define
intelligence as the ability to maximise rewards across a broad range of
complex environments. Their test assumes that "rewards" are perceived in
the same way by the tester and the test participant, which may not always
be the case.
For instance, in their paper, they use the example of a chimpanzee; a
"correct answer" in the test is rewarded with a banana, which is always
perceived as a greater reward than not getting a banana. But what if our
chimp has had its fill of bananas? What if it starts pushing buttons at
random, just to see what happens? Or deliberately chooses the wrong answer
- perhaps with the hope of ending the test early? Or perhaps a
particularly cunning chimp chooses any one of the above courses of action
because it wants to disguise its true intelligence, to lull its human
jailers into a false sense of security and give it the opportunity to
escape the lab? On the other hand, what about a chimp that wants to send a
message to its tester by, for instance, alternating between very good and
obviously incorrect answers?
J. Hernandez-Orallo and D. L. Dowe (2010), "Measuring Universal
Intelligence: Towards an Anytime Intelligence
Test",
(the) Artificial
Intelligencejournal
(
AIJ ), Volume 174,
Issue 18,
December 2010, pp1508-1539.
[www.doi .org:
10.1016/j.artint.2010.09.006.]
D. L. Dowe and J. Hernandez-Orallo (2012), "IQ tests are not for machines,
yet",
accepted, to appear, Intelligence journal.
End David Dowe 2012 Hons project #14
----------------------------------------------------------
----------------------------------------------------------
(15) David Dowe 2012 Hons project #15
Title: Inference of Liquid State Machines
Supervisors: Asad Khan and David Dowe
See Asad Khan's Hons projects
End David Dowe 2012 Hons project #15
----------------------------------------------------------
----------------------------------------------------------
(16) David Dowe 2012 Hons project #16
Title: Other
Supervisor: David Dowe
Please see me if you have good mathematics and are
interested in issues pertaining to machine learning, statistics,
information theory, ``data mining'' and/or quantifying the notion
of intelligence.
End David Dowe 2012 Hons project #16
----------------------------------------------------------
------------------------------**----------------------------