Monash University, Clayton School of IT

BCS, BSE and BBIS Honours Projects 2010

Honours projects for the Clayton School of IT, 2010

Note that some BSE projects are 12-pts, and that BCS/BBIS projects are 24-pts. Some projects come in 24- and 12-pt versions; the latter involves less work.

Supervisors as of Friday 18th December 2009.


Driving the OptiPortal from Scientific Workflows
David Abramson
Project-id: Abramson-1
Project Description to Appear...

A simulation investigation of sensors that could be used to assist Museum visitors (12pt or 24pt)
David Albrecht & Ingrid Zukerman
Project-id: Albrecht-1

Imagine when you next visit a museum you are handed a device that not only tells you about the current exhibit that you are viewing but also makes recommendations about other exhibits that it believes you may be interested in. For such a device to be useful it will need to interact with sensors that are keeping track of your movements, and it will need to make predictions about your trajectory and interests. There are are several types of sensors, and it is too expensive to try each of them out in a real museum environment.

This project is a continuation of an existing project, where we propose to simulate the effect of different types of sensors on modelling visitors behaviour in a museum.

You must be a competent programmer and familiar with MATLAB (or at least familiarize yourself with MATLAB prior to the start of the project).

Using Bayesian Networks in Cryptanalysis (12pt or 24pt)
David Albrecht
Project-id: Albrecht-2

Bayesian Networks are graphical models that are used for probabilistic reasoning and to represent knowledge involving uncertainties. In this project it is proposed to investigate their use in the cryptanalysis of simple ciphers. The project will initially investigate the cryptanalysis of monoalphabet ciphers of English plain text, using Bayesian Networks to represent knowledge of the cipher and the English language. After this initial investigation other simple ciphers will be consider, such as horizontal and vertical shift ciphers and rotor ciphers.


Approximation-Algorithms for Strict Minimum Message Length Code-Books (12 or 24-pts)
Lloyd Allison and Graham Farr
Project-id: Allison-Farr-MML

Strict Minimum Message Length (SMML) inference is an information-theoretic criterion for machine learning. It was introduced by Wallace and Boulton and has several desirable statistical properties. The SMML code-book problem is to create an optimal code-book that gives the greatest compression of future data. There is a quadratic-time algorithm for data drawn from a binomial distribution, and a good heuristic for the trinomial distribution, but the code-book problem is NP-hard (very probably requires exponential time) in general [FW02]. The project is to devise fast approximation-algorithms, that is algorithms that run quickly and create good, but not necessarily optimal, code books. The first cases to consider will be essentially unbounded one-dimensional in character such as the geometric and Poisson distributions. Optimisations may be possible if the prior on the parameter(s) is "nice". The Gaussian (normal) and related distributions are important for continuous data. The project involves and relates to algorithm analysis, complexity, data compression, and machine learning. You should have good results in mathematics, algorithms and data structures, and formal methods, and must have discussed the project with LA or GF before submitting "the form".

[FW02] G. E. Farr and C. S. Wallace.
The complexity of strict minimum message length inference.
Computer Journal, 45(3), pp.285-292, 2002 [doi:10.1093/comjnl/45.3.285]

Bayesian Evolutionary Trees (12 or 24-pts)
Lloyd Allison and Kevin Korb
Project-id: Allison-Korb-BET

Given K DNA sequences, or protein sequences, the multiple-alignment problem and the evolutionary-tree problem are "chicken and egg" problems: Given an optimal evolutionary-tree one can search for an optimal multiple-alignment of the K sequences (and we know how to do that [LW94]), and given an optimal multiple-alignment of the K sequences one can search for an optimal evolutionary-tree.

An evolutionary-tree is a special case of a Bayesian net [KN04]: a tree rather than a general DAG, with discrete variables (over {A,C,G,T} if DNA) at each node, known values at all leaves, and missing values at all internal (ancestral) nodes, e.g., see fig.4 [LW94].

The project is to modify the CaMML algorithm for Bayesian nets to infer evolutionary trees given a multiple alignment. If that is successful, the feasibility of simultaneously solving the multiple-alignment and evolutionary-tree problems will be investigated.

The project involves algorithms, probability and machine learning. You should have good results in mathematics, algorithms and data structures, and formal methods, and must have discussed the project with LA or KK before submitting "the form".

References:

[LW94] L. Allison and C. S. Wallace.
The Posterior Probability Distribution of Alignments and
its Application to Parameter Estimation of Evolutionary Trees and
to Optimisation of Multiple Alignments.
Jrnl. Molec. Evol., 39(4), pp.418-430, 1994
http://www.csse.monash.edu.au/~lloyd/tildeStrings/Multiple/94.JME/
or
http://dx.doi.org/10.1007/BF00160274

[KN04] K. B. Korb and A. E. Nicholson.
Bayesian Artificial Intelligence.
Chapman and Hall / CRC, 2004.


Artificial Vision and Perception (12 or 24-pts)
Damminda Alahakoon
Project-id: Alahakoon-vis

Vision is indispensable. Building automated machines with human vision capabilities is a formidable challenge in both industry and academia. Much of human learning can be viewed as an unsupervised incremental learning process. Investigating this gives a significant contribution to machine learning. In this project our main focus is to investigate how humans accumulate knowledge incrementally by a series of objects, induce a hierarchy of concepts that summarize and organize such observations and use them in classifying future experiences.

Antepartum Fetus Health Investigations
Damminda Alahakoon
Project-id: Alahakoon-fetus

Pre-labour fetal health is vital for a successful birth. Despite the maternal involvement in assuring fetal health, there are the risky deliveries where intervention is mandatory. The only external measurement that relates directly to fetal health is fetal heart rate patterns. Analyses of this property can be used to determine ominous instances. However, these pattens are largely misinterpreted by human observers leading to unnecessary surgical procedures. This project presents a cognitive approach to fetal heart rate interpretation. Artificial knowledge acquisition means are revamped with emphasis on understanding rather than enforcing knowledge.

Bioinformatics
Damminda Alahakoon
Project-id: Alahakoon-bio

Bioinformatics research involves retrieval and analysis of a large amount of biological data such as gene expressions, DNA, RNA, and proteins. Biological data is characterised by its high volume, high dimensionality and also complexity -in terms of structure, evolving nature and biological processes it involves in. Machine learning algorithms have been widely used for solving problems in bioinformatics such as sequence alignment, gene finding and protein domain analysis. This research is an attempt to develop a connectionist system that identifies dimensional changes in biological data.


Sourcing Business Intelligence and Data Warehousing Systems
David Arnott
Project-id: Arnott-1

Business intelligence (BI) systems have been rated by the Gartner Global CIO Survey as the top technology priority for organizations for the last four years. There has been very little research conducted on the sourcing of BI systems and the data warehouses (DW) that feed them with data. Articles in industry magazines suggest that many systems are developed inhouse, many are outsourced, and some have relatively complex arrangements. This project builds on preliminary research on BI and DW sourcing:

Arnott, D. (2008). Should large-scale decision support systems be outsourced? An empirical study. In Proceedings of the 13th Asia-Pacific Decision Sciences Conference (APDSI 2008). Brisbane, Australia: Griffith University.

This project will investigate how BI and DW are sourced, whether they use different sourcing strategies to enterprise systems, and whether they themselves need different strategies. The foundation theories for the project will be transaction cost economics and the resource based view of the firm. The project could be undertaken by multiple students with each student using a different method, that is, one researcher could use a survey method and another a case study approach. The project is suitable for BBIS and BITS students who have an information systems orientation. Studies in business and management would be helpful but are not required.

Sourcing Business Intelligence and Data Warehousing Systems
David Arnott
Project-id: Arnott-2

IT governance is an area of considerable academic and industry Business intelligence (BI) systems have been rated by the Gartner Global CIO Survey as the top technology priority for organizations for the last four years. There has been very little research conducted on the governance of BI systems and the data warehouses (DW). This project builds on preliminary research on BI and DW governance:

Arnott, D. (2006). Data warehouse and business intelligence governance: An empirical study. In F. Adam, P. Brezillon, S. Carlsson, & P. Humphreys (Eds.). Creativity and innovation in decision making and decision support (pp. 711-730). London: Ludic Publishing.

This project will investigate how BI and DW are governed, whether they use different governance strategies to enterprise systems, and whether they themselves need different strategies. The foundation theories for the project will be the matrix theory of IT governance and the contingency theory of governance. The project could be undertaken by multiple students, with each student using a different method, that is, one researcher could use a survey method and another a case study approach.

The project is suitable for BBIS and BITS students who have an information systems orientation. Studies in business and management would be helpful but are not required.


Biometric Cryptosystem (12 or 24-pts)
Nandita Bhattacharjee and Bala Srinivasan
Project-id: Nandita-bio

Biometric information can serve not only for access or authetication, but also for data protection. It is possible to generate a key from the biometric information that can be used with some cryptographic algorithm to encipher and decipher data. In this project we shall study methods of generating key from finger prints or iris codes, addressing the fundamental problems of biometric cryptography of key change and distortion tolerance. Given the biometrics, a set of reliable features are extracted. In this project we shall design a system in which we do not need to store a biometric template, but only a string of error-correction data from which the biometric cannot be derived, and from which the key cannot be derived either unless the biometric is present.

References

Slime Mould Computer Graphics Model and Agent-Based Simulation (12 or 24-pts)
Alan Dorin
Project-id: Dorin-Slime

The name Slime Mould doesn't give an organism a very appealing ring to it... but these "creatures" are really fascinating! There are two types of slime moulds: plasmodial slime moulds and cellular slime moulds. This project is interested in the latter. The trait that makes these slimes so amazing is their diverse range of behaviours during a complex life cycle. They have several "phases" within a motile (moving) period that give the mould the appearance of an animal, and an immotile (static) period that gives them the appearance of a plant or fungus.

Individual cells of the mould start their lives scattered around an area. At a chemical signal triggered by starvation, they all coagulate into a single blob. This then becomes a slug called a plasmodium that may be a few inches long. This slug then wanders around seeking nutrients. When it finds a good spot, the slug turns into a reproductive structure called a fruiting body. This is a fungus-like structure with a stalk and a ball on top that produces spores. These spores are then scattered... and the life cycle starts over again.

The purpose of this project is to build an agent-based simulation of the slime mould's life cycle (to some level of abstraction) and to visualise it using computer graphics techniques. The look of the model is important! The aim is to produce an attractive but true-to-life representation of the organism's appearance and behaviour. A movie of the slime mould plasmodium is online: http://www.youtube.com/watch?v=96U-6iU8W_A Some decent illustrations of the various phases of the life cycle are online too: http://universe-review.ca/R10-18-slimemoulds.htm

See also Korb and Dorin.


Rating and ranking players and teams using MML (12 or 24 pts)
David Dowe
Project-id: Dowe-1

Ratings are used in chess, tennis, golf, other sports, etc. in order
to rank both teams and individual players. A variety of systems are
used to do the ratings and rankings - including Elo, Glicko, Sonas and
systems more concerned with prize money (over the past 12 months).

We can re-visit and improve these systems using the Minimum Message
Length (MML) principle (Wallace & Boulton, 1968)(Wallace & Freeman,
1987)(Wallace & Dowe, 1999a, which has been the Computer Journal's most
downloaded article)(Wallace, posthumous, 2005)(Comley and Dowe, 2005)
to arrive at a comparatively simple model with an improved and
near-optimal predictive accuracy on future games and contests.

This project will require strong mathematics - calculus (partial
derivatives, second-order partial derivatives, integration by parts,
determinants of matrices, etc.), etc.

ComleyDowe2005 Comley, Joshua W. and D.L. Dowe (2005). Minimum
Message Length, MDL and Generalised Bayesian Networks with Asymmetric
Languages, Chapter 11 (pp265-294) in "Advances in Minimum Description
Length: Theory and Applications", M.I.T. Press, April 2005.
[Final camera-ready copy submitted Oct. 2003.]

DoweGardnerOppy2007 Dowe, D. L., S. Gardner and G. Oppy (2007).
Bayes not bust! Why simplicity is no problem for Bayesians. British
J. Philos. Sci., pp709-754.

Wallace2005

WallaceBoulton1968

WallaceDowe1999a

WallaceFreeman1987

Minimum Message Length analysis in the Higgs boson search (24 pts)
David Dowe and Csaba Balazs
Project-id: Dowe-2

The existence of the Higgs boson is central to the structure of the standard
model of elementary particles in Physics. The current search for the Higgs
boson involves experiments at the Fermilab Tevatron and at the CERN Large
Hadron Collider (LHC).
Simulation software is available for the outputs of the Tevatron and the LHC
for the standard particle model and for various other competing models. This
enables us to statistically analyze the signals of the presence of a Higgs
boson in the data, in advance.

The main statistical approach considered would be the Minimum Message
Length (MML) principle, an approach from Bayesian information theory which
quantitatively trades off the simplicity off a hypothesis against its
goodness of fit to the observed data. Our search problem is partly a problem
in mixture modelling (or clustering), where we wish to identify whether or not
there is sufficient evidence that there is a component of the observed data
which is best explained by the presence of the Higgs boson.

The student should have a reasonably strong background in mathematics (partial
derivatives, integration, matrix determinants, etc.) and advanced quantum
physics.

References:
D. L. Dowe, S. Gardner and G.R. Oppy (2007) "Bayes not Bust! Why Simplicity
is no Problem for Bayesians", British Journal for the Philosophy of Science
(BJPS), Dec. 2007, pp709-754.

Dowe, D. L. (2008a), "Foreword re C. S. Wallace", Computer J.,
Vol. 51 No. 5 [Christopher Stewart WALLACE (1933-2004) memorial
special issue], pp523-560.

An MML-guided approach to reconstruct images from asymmetric sets of noisy discrete projections (24 pts)
A./Prof. David Dowe (Clayton I.T.) and Dr Imants Svalbe (Physics)
Project-id: dld-3

The optimal reconstruction of images from traditional computer
tomography (CT) projections in the presence of real noise is a mature
problem with several practical solutions. This project proposes use
of a combination of symmetric (Finite Radon Transform) and asymmetric
(Mojette Transform) discrete projection techniques to reconstruct
images of test data and real images, and uses Minimum Message Length
techniques to optimise the image reconstruction for a variety of assumed
noise models. It also aims to use MML-related techniques to guide the
selection of the best views to capture the minimal amount of
sufficient real data required to reconstruct variable resolution images.

References:

Comley, J. W. and D. L. Dowe (2005). Minimum Message Length and
Generalized Bayesian Networks with Asymmetric Languages, Chapter 11
(pp265-294) in "Advances in Minimum Description Length: Theory and
Applications", M.I.T. Press, April 2005.
[Final camera-ready copy submitted Oct. 2003.]

D. L. Dowe, S. Gardner and G. R. Oppy (2007) "Bayes not Bust! Why
Simplicity is no Problem for Bayesians", Brit. Journal Philos. Sci.
(BJPS), Dec. 2007, pp709-754.

Guedon, JP and Normand, N., The mojette transform: the first ten years,
LNCS 3429 (2005) 79-91

Matus, F. and Flusser, J., Image representation via a finite Radon
transform, IEEE PAMI 15(10) (1993) 996-1006

Wallace2005

Minimum Message Length Support Vector Machines (24 pts)
David Dowe
Projecy-id: dld-4

Support Vector Machines (SVMs) are a popular approach to classification
in machine learning and "data mining". They are usually only used to
divide between two classes ("yes"/"no" or "positive"/"negative") and
nor are they typically able to give probabilities with their
predictions. They also have some arbitrariness in the choice of "kernel"
functions for specifying non-linear boundaries. Using Minimum Message
Length (MML) approaches such as those in Tan & Dowe (2004), other notions
intimated in Dowe (2007) and some previously overlooked coding
inefficiencies, we will be able to overcome all these shortcomings.
This will enable us to come up with comparatively simple SVMs which give
excellent (probabilistic) predictions on multi-class problems, possibly
using non-linear cuts.

The mathematics in this project will not be trivial.

References:

ComleyDowe2005

Dowe, D. L. (2008a), "Foreword re C. S. Wallace", Computer J.,
Vol. 51 No. 5 [Christopher Stewart WALLACE (1933-2004) memorial
special issue], pp523-560.

D. L. Dowe, S. Gardner and G. R. Oppy (2007) "Bayes not Bust! Why Simplicity
is no Problem for Bayesians", Brit. Journal Philos. Sci. (BJPS), Dec. 2007,
pp709-754.

P. J. Tan and D. L. Dowe (2004). MML Inference of Oblique Decision Trees,
Proc. 17th Australian Joint Conf. on Artificial Intelligence (AI'04),
Dec. 2004, Lecture Notes in Artificial Intelligence (LNAI) 3339, Springer,
pp1082-1088.

Wallace2005

Limits in inference of games and markets (12 or 24 pts)
David Dowe
Projecy-id: dld-5

In games such as "rock, paper, scissors", we typically first seek
to maximise our returns against an opponent's best strategy, and
then we seek to exploit any systematic sub-optimalities in our
opponents' play. There is a balance between trying simultaneously
to play an optimal strategy and learn a model for the opponents'
strategies [while also being wary of the traps of elusive models
(Dowe 2008a, footnote 211)].

The project will start with just one other player and then build
up to games with several other players and games where the number
of other players is not known. We shall eventually seek to infer
the total number of players and their strategies. This has
relevance to financial markets. The ability of MML to infer when
the amount of data per parameter is very limited (e.g., Dowe &
Wallace (1997)) will be central here.

The mathematics in this project will not be trivial. An interest
in game theory is essential, a knowledge of game theory is desirable.

References:

ComleyDowe2005

Dowe, D. L. (2008a), "Foreword re C. S. Wallace", Computer J.,
Vol. 51 No. 5 [Christopher Stewart WALLACE (1933-2004) memorial
special issue], pp523-560

D. L. Dowe, S. Gardner and G. R. Oppy (2007) "Bayes not Bust! Why Simplicity
is no Problem for Bayesians", Brit. Journal Philos. Sci. (BJPS), Dec. 2007,
pp709-754.

Dowe, D. L. and C. S. Wallace (1997).

Wallace2005

WallaceDowe1999a

MML inference of systems of differential equations (24 pts)
David Dowe
Project-id: dld-6

Many real-world systems (aerodynamic, biological, economic, financial,
meteorological, etc.) are modelled by systems of one or more
differential equations - such as Bernoulli, Navier-Stokes, etc.
Such systems are often well-known, but even then they are typically
inexact. One such example is turbulence and the difficulty in
properly accounting for it.

This project will start off simply with data from which we will infer
a differential equation with only one unknown. We will then introduce
a noise term which will enable us to model data which we can not fit
exactly - due to any number of effects, ranging from quantum mechanical
to measurement inaccuracies to the fact that the system is simply too
complicated for our model.

Our modelling will include Minimum Message Length (MML) - variously
because of a variety of optimality properties and its ability to use
simple models that fit more accurately than the more complex models
of other approaches.

The mathematics in this project will not be trivial.

References:

Comley, J. W. and D. L. Dowe (2005). Minimum Message Length and
Generalized Bayesian Networks with Asymmetric Languages, Chapter 11
(pp265-294) in "Advances in Minimum Description Length: Theory and
Applications", M.I.T. Press, April 2005.
[Final camera-ready copy submitted Oct. 2003.]

D. L. Dowe, S. Gardner and G. R. Oppy (2007) "Bayes not Bust! Why Simplicity
is no Problem for Bayesians", Brit. Journal Philos. Sci. (BJPS), Dec. 2007,
pp709-754.

Wallace2005

WallaceDowe1999a

Modelling language change (24 points)
Project-id: dld-7
Supervisors:
A./Prof. David Dowe (IT) and Dr Simon Musgrave (Linguistics)
(24 points)

Languages spoken by humans change over time. For example, French, Italian and
many European languages originate from Latin, but each diverges from Latin in
rather different ways. English is a more complex case, with influences from
Latin and French overlaying its basic Germanic nature. Representing language
by the sounds of spoken words, we use Minimum Message Length (MML) to model
how languages might have developed. Our aim is to reach a model which best
approximates the known history of a language group by running simulations
which assign different probabilities to different types of change, e.g.
internal variation, development of regularity by analogy with existing forms,
and external influences (borrowing). This project will be quite mathematical.

References:
Cangelosi, Angelo and Domenico Parisi (eds.) (2002). Simulating the Evolution
of Language. Berlin: Springer-Verlag.

Dowe, Gardner & Oppy (2007)

Kosmidis, Kosmas, John M. Halley, and Panos Argyrakis (2005). Language
evolution and population dynamics in a system of two interacting species.
Physica A 353: 595-612.

Nowak, Martin A., Natalia Komarova, and Partha Niyogi (2002). Computational
and evolutionary aspects of language. Nature 417: 611-617

Ooi, J.N. and D. L. Dowe (2005). Inferring Phylogenetic Graphs of
Natural Languages using Minimum Message Length, Proc. CAEPIA
2005 (11th Conf. Spanish Assoc. for Artificial Intelligence), Vol. 1,
pp I:143 - I:152, Nov. 2005; ISBN 84-96474-13-5

MML clustering and mixture modelling, re-visiting Snob (24-pts)
David Dowe
Project-id: dld-8
 
The Snob program for clustering and mixture modelling using Minimum
Message Length (MML) dates back to the seminal Wallace & Boulton (1968)
paper. Up until Wallace (1990), all the Snob work on MML clustering
and mixture modelling represented the data by clusters (or groups, or
components) of either multinomial and/or Normal (or Gaussian)
distributions. This was extended in a series of papers (Wallace & Dowe
1994, 1996, 1997, 2000) to include Poisson distributions (for counts)
and modelling angular data (such as protein angles) from the von Mises
circular distribution. Other extensions have included latent factor
analysis for modelling correlations within classes (Edwards & Dowe, 1998)
and varieties of spatial image models (Wallace 1998; Visser & Dowe 2007;
Visser, Dowe & Uotila, 2009) where we expect the class of a pixel to be
influenced by the classes of the neighbouring pixels - and where we
typically draw on approximations from thermal physics.

Surveys of this work are given in parts of Wallace (2005) and Dowe (2008a).

This project will involve extending the program in one of more directions.

Dowe, D. L. (2008a), ``Foreword re C. S. Wallace'', Computer J., Vol. 51 No. 5
[Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560

Edwards & Dowe (1998)

Visser & Dowe (2007)

Visser, Dowe & Uotila (2009)

Wallace (1990)

Wallace (1998)

Wallace & Boulton (1968)

Wallace & Dowe (1994)

Wallace & Dowe (2000)

MML time series and Bayesian nets with discrete and continuous attributes
David Dowe
Project-id: dld-9

The first application of MML to Bayesian nets including both discrete and
continuous-valued attributes was in Comley & Dowe (2003), refined in Comley
& Dowe (2005) [whose final camera-ready version was submitted in Oct 2003],
based on an idea in Dowe & Wallace (1998). We seek to enhance this original
work to Bayesian nets which can change with time, using the mathematics of
MML time series in Fitzgibbon, Dowe et al. (2004).

Comley, J. & D.L. Dowe (2003). General Bayesian Networks and Asymmetric
Languages, Proc. 2nd Hawaii International Conf' on Statistics and Related
Fields, 5-8 June, 2003.

Comley, J. & D.L. Dowe (2005). Minimum Message Length, MDL and Generalised
Bayesian Networks with Asymmetric Languages, Chapter 11 (pp265-294) in P.
Grunwald et al. (eds.), Advances in Minimum Description Length: Theory and
Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9

Dowe, D. L. (2008a), ``Foreword re C. S. Wallace'', Computer J., Vol. 51 No. 5
[Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560

D L Dowe & C S Wallace (1998). Kolmogorov complexity, minimum message length
and inverse learning, abstract, page 144, 14th Australian Statistical Conf'
(ASC-14), Qld, 6 - 10 July 1998.

Fitzgibbon, L.J., D. L. Dowe & F. Vahid (2004). Minimum Message Length
Autoregressive Model Order Selection. In M. Palanaswami et al. (eds.),
International Conf' on Intelligent Sensing and Information Processing (ICISIP),
Chennai, India, Jan 2004, pp439-444

Evolving & quantifying varieties of intelligence (24 pts)

A./Prof. David Dowe

Project-id: dld-10

"Intelligence" is a term used variably to describe a variety of
human aptitudes (memory, mathematical ability, ability to make inductive
inferences), terrestrial non-human aptitudes and (www.SETI.org)
something sought in extra-terrestrials. Humans share much with other
humans and also with terrestrial non-humans, but how much must humans
share with extra-terrestrials or one extra-terrestrial with another?
Dowe & Hajek (1997, 1998) discuss the relevance of the
information-theoretic Minimum Message Length (MML) principle (Wallace
& Boulton, 1968; Wallace & Freeman, 1987; Wallace, 2005) to the
inductive inference part of intelligence, and Hernandez-Orallo &
Minaya-Collado (1998) discuss the relevance of Kolmogorov complexity to
intelligence - and MML is intimately related to Kolmogorov complexity
(Wallace and Dowe, 1999a).
This project concerns the use of information theory, MML and
Kolmogorov complexity to recognise signs of intelligence and ways of
communicating with it. This will inevitably entail measuring - or
attempting to measure - intelligence. Another direction in which
the project might head is to study how intelligence of a system increases
(or possibly doesn't increase) when there is communication involving more
than one intelligent entity (Dowe, 2008a).
This mathematical project might also involve the study of
communications between terrestrial non-humans.

D. L. Dowe (2008a), ``Foreword re C. S. Wallace'', Computer Journal
http://comjnl.oxfordjournals.org/content/vol51/issue5

 

Towards an online Atlas for Graph Theory and Combinatorics
Paul Bonnington (eResearch Centre)
Graham Farr (Clayton School of IT)
Ian Wanless (School of Mathematical Sciences)
Project Id: Farr-Atlas

Researchers in graph theory often want to determine various important properties of specific graphs of modest size: chromatic number, size of maximum clique or independent set, shortest/longest cycle, diameter, girth, genus, automorphism group, etc (most of which you will not have met yet). Such information can be used to formulate or disprove hypotheses, and is very valuable to workers in the field. However, there are huge numbers of different graphs, even of modest size, and there are also very many different parameters which are of interest. Some of these parameters may be very hard to compute. Technology can help a lot here, in storage of large numbers of graphs and their parameters, giving access to this information from anywhere in the world, and computing the frequently NP-hard (or worse) parameters, if they are not already available.

The aim of this project is to build a system which:

As well as system-building, the project will involve doing a survey of existing systems that do parts of the job required, as well as a literature review.

Many decisions will need to be made in the course of the project. Some of these will relate to the relative emphasis given to (a) data management, and using the data on existing graphs to answer user queries about those graphs, and (b) computation, so that queries about new graphs can be answered.

We anticipate giving more emphasis to (a). We also need to decide what graph parameters are to be stored with a graph in the database, and how sophisticated the front end will be (probably primitive, to begin with).

The platform development principles of the Message Lab will be a feature of this project: http://www.messagelab.monash.edu.au/

Prerequisites: good results in: mathematics subjects; Algorithms and Data Structures; Analysis and Design of Algorithms. Some knowledge of databases.

You must talk to Graham Farr before nominating this project.

[See: Allison & Farr project]

Organising better conferences (12-24pts)
Maria Garcia de la Banda and Mark Wallace
Project-id: WGB-conf

One of the most crucial phases when chairing a reviewed conference is to assign reviewers to the submitted papers. This is usually done by asking reviewers to indicate, for each paper, whether they want it, could cope with it, do not want it, and have a conflict of interests with its authors. With this information (and perhaps some extra info such as a maximum/minimum number of papers to be assigned to a given reviewer), most modern conference-support tools (such as Easy-Chair) perform an initial assignment. Initial because it is usually so bad that the Chair has to perform extensive modifications to it, often based on information that might not even have been taken into account by the tool, such as the specific research area of the reviewer, its quality, etc. Such modifications tend to be extremely painful and time consuming due, mainly, to the lack of support from the tool.

The aim of this project is twofold. The first is to apply optimisation techniques that yield a better result than those in current state of the art conference-support tools. In particular, the tool should allow the chair to choose among different kind of objective functions and visualise the resulting assignments. The second aim is to provide a graphical interface that somehow helps Chairs modify the initial assignment. Since we do not know yet what is the best optimisation function not how to support Chairs when modifying the tool, the project should be a lot of fun!

It would be extremely useful (though not absolutely necessary) for the student to have taken FIT3082 Programming Languages and Paradigms (and obtained a good mark).


Applying architectural drawing techniques to scientific visualisation (24-pts)
Wojtek Goscinski
Project-id: Goscinski-1

Volume rendering is a computer graphics technique that is used by scientists to visualise 3D image data. It is becoming an important tool due to the increasing availability of machines that produce 3D data, such as common CT machines, MRI machines, and also due to the introduction of new generation imaging facilities such as the upcoming Imaging and Medical Therapies Beamline at the Australian Synchrotron. Despite the increasing use of volume rendering, viewing a 3D volume on a 2-dimensional screen remains challenging – it is difficult to gauge shape and structure, and it is easy to misrepresent an object. This project proposed that lessons might be learned from other professions that have been representing 3D geometry in 2D for many years. For example, architects are trained to express a building on paper using a range of explicit and implicit techniques such as line weight, shadow, fading, detail and perspective. These techniques are applied carefully to make a drawing read in 3D whilst retaining the simplicity and scientific rigour of a 2D drawing. Few of these techniques are used effectively in scientific visualisation. This project will investigate how architectural drawing can help scientists view their data and develop tools to support this. The outcome will be tested using detailed synchrotron data sets. Student should have an interest in computer graphics, scientific visualisation, medical imaging, or design.


Multi-dimensional Graph Neuron (GN) Array for Wireless Sensor Networks (WSN) (12 or 24 points)
Asad Khan
Project-id:Khan-gn

GN is a highly-scalable associative memory algorithm1 capable of handling multiple streams of input, which are processed and matched with the historical data (available within the network) in real time. It is capable of utilising a large number of low performance processors in connected configurations. Hence GN is highly suited for resource constrained devices such as Berkeley motes. The project will involve generalising the existing GN implementation, which handles 1-dimensional sensory inputs, to a multi-dimensional version capable of analysing 3 or higher dimensional sensory inputs to a WSN. The 24 point project will also investigate whether the algorithm suffers from curse of dimensionality. This project is available to students with programming background in C/C++ or Java.
1. Nasution and Khan, A Hierarchical Graph Neuron Scheme

for Real-Time Pattern Recognition, http://ieeexplore.ieee.org/iel5/72/4359168/04359217.pdf?arnumber=4359217

Evolution of Complexity 12 or 24 point project
Kevin Korb and Alan Dorin
Project id: korb-1

This project will explore the possibilities for implementing creative evolution in silico, responding to the challenge by Bedau et al (2000) to reproduce "open-ended evolution". We will begin by examining previous attempts and Bedau's evolutionary activity statistics (Bedau et al 1998) for assessing them.

We will develop our own measures of the growth of complexity in organism structure and behaviour, and apply them to:

  1. the paleontological record,
  2. the results of existing Artificial Life simulations, and
  3. new simulations based on existing work by Yaeger (2006) and the supervisors.

Students are requested to do some preliminary reading (e.g. at least (Levy 1997) and (Bedau et al 2000)) prior to commencing the project. Enrollment in the Honours unit FIT4012 is strongly encouraged as it will provide background material of direct relevance to the topic.

References:

Learning Dynamic Bayesian Networks 12 or 24 point project
Kevin Korb and Ann Nicholson
Project id: korb-2

Static Bayesian networks (BNs) represent causal processes which may be repeated, but which do not change upon repetition. While simple in concept, they are nevertheless powerful representations for dealing with systems showing significant uncertainty about their outcomes, and they are widely employed in health care, ecological and environmental sciences, manufacturing industries, and government (see www.norsys.com/clients.htm). A great deal of effort has been put into automating the learning of Bayesian networks from data, including at Monash, yielding our CaMML program (Causal discovery via MML).

Dynamic Bayesian networks (DBNs) represent causal systems which change over time, yielding time series data from their observation. Examples include all kinds of economic markets, such as stock markets and currencies, weather and climate systems, and human disease processes. This project will investigate ways of extending CaMML to learn DBNs from observational data.

Evolution of Utility 12 or 24 point project
Kevin Korb and Steven Mascaro
Project id: korb-3

Evolutionary Artificial Life is a powerful simulation tool that has been applied to a wide range of practical and theoretical problems in science. Among those are the evolution of cooperation (Axelrod), complex social behavior (Epstein and Axtell), the evolution of altruism (Mascaro et al), the evolution of language (Kirby), and the evolution of cognition (Yeager).

In this project we propose a new target for simulated experimental evolution: utility. Utility is a key ingredient in rational decision making. Understanding how it evolves and, in particular, evolutionary tradeoffs between utility evolution and improvements in cognition are likely to shed light upon a large variety of issues in the evolution of cognition and behavior generally.

Some familiarity with artificial life and artificial intelligence (especially neural networks and Bayesian decision making) would be an advantage.

Parallel CaMML 12 or 24 point project
Kevin Korb, Ann Nicholson and Steven Mascaro
Project id: korb-4

CaMML (Causal discovery via Minimum Message Length) "data mines" observational data to find the causal Bayesian network most probable in light of the data. It has been successfully applied to a large range of problems. As those problems grow in the number of variables and the complexity of their interrelation, either the time required for discovery or the inaccuracy of the result increases exponentially. In order to increase the range of tractable problems we propose to redesign CaMML to take advantage of parallel supercomputing. As a part of the project we will investigate empirically the relationship between problem size, sample size, time and accuracy.

See also: Alison and Korb and Nicholson and Korb


Jon McCormack's Projects:

Jon McCormack's projects are in the area of computer graphics, human-computer interaction, adaptive intelligence and evolutionary computing. Current project offerings:

Please see Jon McCormack's honours project page for further details on his honours projects.
See also: Porter and McCormack

Human Comprehension of Organisation Charts (24-pts)
Kim Marriott
Project_id: Marriott-Charts

Organisation charts are one of the most common kind of diagrams. However, comparatively little is known about how people understand or use them. In this project eye-tracking equipment will be used to determine how the focus of attention moves around the elements in an organisation chart when it is first seen, and then used in subsequent tasks. Based on this the project will be to develop a model for focus of attention changes when viewing organisation charts and to test the model with eye tracking data. [The project can also be offered for other sorts of visual notations such as UML notations, advanced mathematics, etc]

Automatic Program Code Layout (24-pts)
Kim Marriott and Peter Moulder
Project_id: Marriott-Figure

Automatic document layout is increasingly important topic because of the need to adapt a document's layout to different viewing environments and to dynamically generated content. An important part of technical document layout is how to layout program code, i.e. where to break lines of code if the line is too long. Based on this the project will be to develop a function for measuring the quality of layout of the code and incorporate this into an automatic document layout tool.

Automatic Equation Layout (24-pts)
Kim Marriott and Peter Moulder
Project_id: Marriott-AEL

Automatic document layout is increasingly important topic because of the need to adapt a document's layout to different viewing environments and to dynamically generated content. An important part of technical document layout is how to layout equations and in particular where to break equations if the line is too long. The project will be to develop a function for measuring the quality of layout of the equation and incorporate this into an automatic document layout tool.

Layout of large SBGN diagrams
Kim Marriott and Michael Wybrow
Project_id: Marriott-SBGN

SBGN (Systems Biology Graphical Notation) is a new graphical standard for visualizing the processes in living organisms (http://sbgn.org/). Good automatic layout is required to support the use of SBGN in computer visualization applications. However currently there is no good method for laying out large SBGN diagrams. The project is to extend the automatic layout engine underlying the diagramming tool Dunnart (http://www.csse.monash.edu.au/~mwybrow/dunnart/) to support layout of large SBGN diagrams.

The project requires someone who is interested in algorithms and is not scared by mathematics.

Note that the 24 pt projects can potentially be modified into 12 pt projects (to be negotiated on a case by case basis).

Bernd Meyer's Projects:
BM1 Recruitment in Ant Colonies (biology)
BM2 Communication in Ant Colonies (biology)
BM3 Adaptiveness of Slime Molds (biology)
BM4 High Performance Simulation of Biological Systems on GPUs: Supercomputing on your Desktop
BM5 Swarm Robotics
BM6 Physarum Solver made useful
BM7 Self-organizing Stable Marriages
Please see Assoc. Prof Bernd Meyer's honours project page for details of his projects. There projects are 24 pts but can potentially be modified into 12 pts (to be negotiated on a case by case basis).

Bayesian Poker (12 or 24-pts)
Ann Nicholson, Kevin Korb and Steven Mascaro
Project-id: Nic-Korb-Poker

The Monash Bayesian poker project was started by Kevin Korb in 1993. Since then 5 honours students have worked on various aspects of the project. Our Bayesian poker player (BPP) was originally developed to play 5-card stud poker, using Bayesian network technology. Most recently, BPP has been converted to play Texas Hold'em Poker, the main online form of poker, and re-written in python. BPP has been entered in the inaugural world automated poker playing competition at the main American Artificial Intelligence conference AAAI-2006 and we plan to compete again in 2007. We have developed a simple GUI interface that will allow people to play against BPP via the internet. This project offers lots of options for making BPP a better poker player including improving its bluffing strategies and its opponent modelling. We have access to the AAAI-2006 tournament dgame logs, which provide a rich source of data for automated learning of opponent modelling. Or you might be interested in developing a much more interesting playing interface.

See http://www.csse.monash.edu.au/bai/poker/poker.html for links to Honours projects, online versions of research papers, and to play against the latest version of BPP online.

See also: Korb and Nicholson

GUI for Cortical Networks (24 points)
Andrew Paplinski
Project-id:Paplinski-GUI

Artificial Cortical Networks (ACN) also known as Multimodal Self-Organizing Networks are used to model mapping of the multidimensional sensory information e.g. speach onto two-dimensional areas as in human cortex. Recent applications of ACNs are presented in:

The modelling software is written in MATLAB and the first objective of the project is to add GUI in Simulink. Subsequently a student is expected to complete a specific modelling task using ACNs.


An organic modelling operation
Ben Porter and Jon McCormack
Project-id: Porter-Model

3D modelling is the process of constructing a virtual form for use in diverse fields such as film, computer games, computer aided engineering, architecture, physical simulation, or other interactive software. Many different packages and methods exist for modelling, one such package is Blender [1], a free and open-source modelling, animation and compositing tool.

It is often desirable to create organic forms, for example a giant CG kraken for a pirate movie. However, this is usually a very complex process, as 3d modelling tools are very general and have limited support for the kinds of patterns and shapes that can characterise organic quality. The aim of this project is to take a commonly occurring feature of organic shape and design and implement a tool to support the easy modelling of that feature.

One such recurring motif of many organic forms is covering symmetry. This occurs when a surface is literally covered in a repeated object, such as a tentacle being covered by suckers. A tool that supports covering symmetry would ideally take as input: an arbitrary surface S, an arbitrary shape C, and some parameters (such as covering density, arrangement style, etc.) and then would output the shape of C covering S.

The covering operation can be implemented in a multitude of ways and this project would investigate some of these in a very hands on and practical way. The desired outcome would be an easy to use tool integrated into a 3d modelling package (such as Blender), along with a set of case studies (3d models) resulting from the use of tool. As such, experience in (and enthusiasm for) computer graphics and programming is a requirement for this project.

For more information see this web page.


M-Banking: Evaluating User Adoption Intentions
Md Mahbubur Rahim
Project-id: Rahim-1
In today’s business climate, financial organisations are competing based on the promise of delivering improved services to retain and attract customers. One of the mechanisms that are employed by banks is the use of mobile banking. This project aims to investigate the factors that affect the decision of individuals to adopt mobile devices to conduct banking operations. Several theoretical frameworks drawn from the innovation, marketing, and e-commerce literature streams will be consulted to identify the factors.
Factors Affecting Use of Online Social Networking Services by Australian University Students
Md Mahbubur Rahim
Project-id: Rahim-2
This proposed project would analyse the crucial factors affecting university students’ use of online social networking sites (e.g. Facebook). The project would investigate the effect of the desire to: (i) connect with each other, (ii) earn money, and (iii) share files, as possible leads in university student’s motivations to use such sites. Data will be collected through a survey and the motivations and the factors which persuade student sue such sites will be identified. The findings will be analysed by referring to various theoretical frameworks.
Generation-Y: What do they think about Mobile Payment?
Md Mahbubur Rahim
Project-id: Rahim-3
Payment method has undergone a drastic change in line with technology and science development. Mobile payments are payment for goods, services, and bills/invoices with a mobile device like mobile phones and personal digital assistant, leveraging on wireless and other communication technologies. However, the acceptance and usage of mobile payment is relatively low albeit high penetration of mobile phone services in Australia. This project would investigate the Generation-Y’s perception and intention towards mobile payment using a modified Technology Acceptance Model (TAM). It will involve a survey among students for data collection.
Online Buying Behaviour of Australian Internet Savvy Students
Md Mahbubur Rahim
Project-id: Rahim-4
Internet shopping, also known as online shopping is a growing retail concept. With the growing number of Internet users in Australia, there are large amounts of brick-and-mortar companies that are slowly adapting to online business. One attractive online buyer segment is the Student Internet Users. This project would examine the factors that affect Australian university students’ online buying behaviour. The project will also look at the effect of various factors such as Information Technology Adeptness, WUFs and Privacy & Security on online purchase decision. The project would use a survey for data collection.
Evaluating Use of B2E Portals: A Test of Theory of Consumption Values
Md Mahbubur Rahim
Project-id: Rahim-5
Business-to-Employee (B2E) portals represent a new generation of e-business systems that has received a growing attention from large organisations. However, its adoption and subsequent use has not been thoroughly examined. This project aims to apply the notion of consumption values for explaining the degree of usage of B2E portals in large organisations. A combination of online survey and qualitative case studies will be an integral part of this project.
Understanding Motivations of Customers for Online Shopping in E-marketplaces
Md Mahbubur Rahim
Project-id: Rahim-6
There is a growing tend for customers to participate in online shopping from many consumer-oriented e-marketplaces. However, limited research attention has been paid to understand the motivations for consumers to participate in online shopping. This study among other factors would examine the role of enjoyment that encourages retail customers to participate in online shopping. A number of theories from business value and social psychology literature will be consulted. The research would involve semi-structured interviews and survey among student population.
Assessing the Success of Web-based EDI portals: A Comparative Study
Md Mahbubur Rahim
Project-id: Rahim-7
In recent years, web-enabled EDI systems have gained popularity and form the backbone of business-to-business e-commerce initiatives within supply chain partners. This is particularly true for the Australian automotive industry in which companies like Ford Auto is using web-enabled EDI systems with its small supplier community. This project aims to analyse the perceptions of the suppliers about the success of such systems. It involves use of case studies and a pilot survey to measure how successful is the implementation of web-based EDI systems. Various theoretical frameworks drawn from the IT and eBusiness literature streams are applied to operationalise the concept of success. The variations in the extent of success are investigated and some guidelines will be suggested.
Evaluating Citizen’s Satisfaction with Online Tax Systems in Australia
Md Mahbubur Rahim
Project-id: Rahim-8
E-government is increasingly becoming more important in today’s world due to its effectiveness and applicability in various areas. Online tax filing is one of the e-government services that is becoming ever more important for public to perform their responsibility to the country through tax payment. Despite the growing publicity, the adoption of these systems is less than satisfactory in Australia. This study aims to address four research concerns: a) what are the constructs to measure citizens’ satisfaction with online tax systems, b) how to measure citizen’s level of satisfaction with such systems, c) is the satisfaction related to demographic factors? and d) is satisfaction with online tax systems help increase government image? Data will be collected from a survey. The project also involves a satisfaction model development by reviewing relevant literature streams. The findings will serve as useful guidelines for strategy development in promoting e-government services, particularly tax e-filing service.
A Study of E-government Services: Australian Local Government Context
Md Mahbubur Rahim
Project-id: Rahim-9
Australian government is making considerable investment to promote online services to citizens through various forms of e-government services. However, little attention has been paid at the local government context. Hence, this study seeks to determine the level of online services offered by local councils via their websites to citizens, suppliers and other stakeholders. Drawing upon a range of theoretical frameworks, a model will be developed to determine how stakeholders use such services and how satisfied they are with such services. The project would involve a mixture of qualitative interviews with various stakeholders, visiting council websites, and distributing of survey questionnaire among selected stakeholders. The findings will serve as useful for promoting e-government services at the local government level.

Feature Weighting in Content Based Image Retrieval (24 or 12 pts)
Sid Ray
Project-id: Ray-feature

The purpose of a Content-Based Image Retrieval (CBIR) system is to retrieve images from a database such that the visual contents of retrieved images are similar to that of a query image. Often the image similarities are computed from the representation of visual contents such as colour, texture and shape, in terms of a multi-dimensional feature vector. Understandably, the features cannot be assumed to have equal weights. More important features deserve more weights and less important features deserve less weights. These weights can be assigned fully automatically from feature variation statistics in the database images. Better still is to use the so called relevance feedback (RF) mechanism in which the weights are modified iteratively based on users' response regarding the relevance of the retrieved images. The aim of this project is to investigate the feature weighting problem for both cases, namely, without RF and with RF.

Small Sample Size Effects on CBIR Accuracy (24 or 12 pts)
Sid Ray
Project-id: Ray-sample

In recent years Content-Based Image Retrieval (CBIR) has become one of the most active areas of research in image data mining. In studies involving the design of a CBIR system often the question arises whether the data set sizes used are big enough to rely on the accuracy achieved. The aim of this second project in CBIR is to study the impact, on accuracy, of the number of samples in the database, the number of samples per semantic category, the number of retrieved images returned to user for RF, the number of features used, and their relative sizes.

Hierarchical Feature Selection for Pattern Recognition (24 or 12 pts)
Sid Ray
Project-id: Ray-hierarchy

In statistical pattern recognition, often patterns are represented by a large number of numerical features. Although there is no conceptual justification in reducing the number of features to a small number, in practical problem solving, this becomes a necessary step due to the wellknown phenomenon of the 'curse of dimensionality' of the feature vector on the complexity of the pattern classifier. The aim of the proposed project is to develop a hierarhical feature selection paradigm that is well-suited to multi-class pattern recognition problems.

The project will comprise (i) study of some existing feature selection criteria followed by an experimental investigation of them, (ii) analysis of the above results leading to, hopefully, a hierarchical feature selection paradigm, and (iii) development of an interactive software tool for the above paradigm.

The methodology developed will be such that depending on the classification accuracy arrived at a certain stage, the user will have the option of increasing or decreasing the dimensionality value. The software tool will include procedures for displaying the distribution of pattern samples in different feature spaces obtained by different feature selection methods.

Texture Analysis of Images (24 pts)
Sid Ray
Project-id: Ray-texture

Texture plays an important role in both human interpretation of visual scenes and computer analysis of images. Textural cues are of particular relevance in two different, but related, image analysis problems, namely, the problems of segmentation and classification of images. The proposed project will deal with both of these problems. It will involve (i) investigating existing texture analysis methods from the point of view of their theoretical soundness as textural measures and (ii) investigating their practical applicability.

Individual dolphin dorsal fin identification (24 or 12 pts)
Sid Ray, David Dowe and Kate Charlton (School of Biological Sciences)
Project-id: Ray-fin

Port Phillip Bay, Victoria, is home to a small and genetically unique population of bottlenose dolphins, Tursiops sp. Individuals within this population have been identified via photo-identification of the dolphins' dorsal fin. Dorsal fins show a unique shape and through time develop permanent marks and notches. Currently, the digital images are being assessed for these permanent marks and assigned an identity by human 'eye'. This can be incredibly time consuming, requires permanent notches on the fin and a trained eye to pick-up small notches. The aim of the project is to create a methodology/software that not only uses the notches but shape of the fin to assign identity with high levels of accuracy.

Some references:

Development of collection-specific texture features for image retrieval (12 or 24-pts).
David Squire
Project-id: Squire

Independent Component Analysis (ICA) is a statistical technique for discovering hidden factors underlying a set of random variables. ICA can be used to discover filters that can be used to characterize visual textures in a set of images. The filters produced by ICA are adapted to the statistics of the set of images used to derive them. It should thus be possible to use ICA to derive a set of filters specifically tuned to the textures in a collection of images from a restricted domain. We will investigate the discovery and selection of a set of ICA filters for image collections from domains such as dermatology.

The system will be developed within the framework of the GNU Image Finding Tool (GIFT) (http://www.gnu.org/software/gift/). The GIFT is an open framework for content-based image retrieval. Use of the GIFT framework will means that researchers working on a project can focus on the problems of immediate interest and importance to the project, rather than having to develop and entire image retrieval system, including user interface, feature extractors, indexing tools, database accessor, user interface, and feedback/learning system from scratch.


Efficient nearest-Neighbour (NN) Searching in Higher Dimensional Spaces by using Minimal Spanning Trees(12 or 24-pts)
Peter Tischer
Project-id: Tischer-1

In one dimension, when we want to find the closest matching item to a given item it is natural to use binary search. Once we go beyond one dimension there is no known optimal solution. In many situations such as in case-based reasoning and in coding audio and video, there is a need to search a large collection of higher-dimensional vectors to find the one which is closest to the query item. This project will investigate the use of Minimal Spanning Trees to find a starting item for the search and to control how the search proceeds.

Mixture Modeling used in Analysis of DNA Microarray Data (24-pts)
Peter Tischer
Project-id: Tischer-2

Mixture modeling, also known as clustering, involves taking a set of observations about things and deciding whether the things all belong to one population or whether the things come from a number of different sub-populations. DNA microarray data is a 2D array where the rows or columns may represent either a particular gene or represent the subject of an experiment. The entries in the microarray represent the activity of that gene for that subject. For instance, the entries in the first row might represent gene activities for the first subject while the entries in the fourth column might represent gene activities for the fourth gene. With bi-clustering we are interested in finding groups of genes and groups of people for what the group of genes behaves similarly. If we could re-order the rows and columns of a DNA micro-array, a bi-cluster would appear as a region of cells that show the same gene activity.

The MML (Minimal Message Length) criterion was proposed by Professor Chris Wallace foundation professor of Computer Science at Monash University. MML has been used to create a clustering program called 'snob'. 'snob' has proven to be very effective in identifying the correct number of clusters but it is not used widely in bioinformatics where people continue to use a variety of techniques that are generally not able to identify correctly the number of clusters.

The aim of this project is to apply the MML approach and snob to the mixture modelling problems that arise in the analysis of DNA microarray data. It would be of particular interest to extend the snob approach to bi-clustering.

Measuring the Complexity of Networks by Encoding them (12 or 24-pts)
Peter Tischer
Project-id: Tischer-3

In applications where we are trying to trade-off the complexity of a network against the ability of the network to model what occurs in a given situation, we need to come up with a measure of the complexity of the network. One way of measuring the complexity is to ask how many bits would be needed if we were to transmit a message that would enable the receiver to reconstruct the network in its entirety. Such a message is a lossless compression of the network. This project aims to explore how to apply standard data compression techniques to the problem of encoding networks.

Highly Effective One-Pass Lossless Compression of Greyscale Images (12 or 24-pts)
Peter Tischer
Project-id: Tischer-4

A lossless compression of a grey-scale image allows the image to be stored in a minimal number of bits and for the image to be reconstructed that is bit-for-bit identical with the original image. A number of years ago a postgraduate student in this school developed the world's second best lossless compression of grey-scale images. (This same person had also developed the world's best lossless image compressor). This code, called glicbawls, was designed for the International Obfuscated C Programming competition. The fundamental approach in glicbawls is one-pass but the source code of the program itself is impossible to understand and very much tuned to the coding of natural images.

The aim of the project is to produce a version of glicbawls which is easy to understand and to modify and to explore how glicbawls can be tuned to perform well on particular classes of images, particularly medical images.

Semi-supervised learning and image segmentation (12 or 24-pts)
Peter Tischer
Project-id: Tischer-5

In the semi-supervised learning approach to image segmentation, the user labels certain pixels as belonging to particular segments and the program then decides how to label the remaining pixels based on their spatial proximity and the similarity of their pixel values to labelled pixels. A difficulty arises in trying to assign pixels to particular segments if there is more than one reasonable candidate segment. This project will investigate a probabilistic approach to propagating segment labels from user-labelled points to unlabelled points.

Applying 'snob' to image segmentation (12 or 24-pts)
Peter Tischer
Project-id: Tischer-6

'snob' is a program that was developed by two people in the Department of Computer Science at Monash University. 'snob' aims to make class distinctions. That is, it looks at a number of observations about things and decides whether those things belong to one, two or a number of classes. A particular strength of 'snob' is that it is able to come up with a reasonable estimate of the number of classes that are actually present in the data. For instance, 'snob' when presented with data randomly generated by a particular distribution might decide that there is really only one class present.

The fundamental problem in image processing is to decide which pixels belong together because they represent parts of the same object. 'snob' is good at putting things into classes but it fails with image segmentation because 'snob' assumes that things to be classified are statistically independent whereas in image segmentation the values of pixels are highly statistically dependent on the values of neighbouring pixels.

The aim of this project is to extract certain key information from an image and to present that information to 'snob' which will decide how many classes are present. This information, along with the key information, will then be used by an image segmentation algorithm to produce a segmentation with the same number of segments as the number of classes that 'snob' judged were present in the key information.

Integrating Information Visualisation and Clustering (12 or 24-pts)
Peter Tischer
Project-id: Tischer-7

Clustering is the process of putting objects into the same cluster or class as other objects which share the same properties. The 'snob' program is probably the world's best program at discovering an appropriate clustering of a given body of observations about objects. There are two ways in which it might be able to make snob easier to use, more efficient and more effective. The default action is for 'snob' to try to assign objects to classes without an initial guess as to the best clustering. In some cases a user might be able to look at a visualisation of the data and decide on an appropriate initial clustering and then let 'snob' try to optimise the user-provided initial clustering. This may result in significantly reduced run-times and might help 'snob' find clusterings it might miss in its search process.

Having produced a clustering, it would be convenient if a user could view a representation of the data with the clustering superimposed to see whether the clustering seems appropriate. In providing an initial clustering or assessing the snob-provided clustering, there is a need to look at data which may exist in many more than the two or three spatial dimensions than we can display on conventional graphics devices. This process of taking higher-dimensional data and converting it to a form which can be displayed on conventional graphics devices is known as information visualisation.

The aim of the project is to integrate 'snob' with an information visualisation package so that a user can view the data via the information visualisation package.


Efficient Local Search
Mark Wallace
Project-id: Wallace-1

Local search is the most scalable approach for solving constrained optimisation problems, but each problem requires specific move operators to be coded. In Comet [VHM05] an approach based on "invariants" has been implemented for compiling a problem statement into a local search algorithm. Building on invariants, this project will design and implement a local search compiler which efficiently handles logical conditions. The approach will be tested on practical optimisation problem currently being tackled by the CTI-Monash Centre for Optimisation in travel, transport and logistics.
[VHM05] Constraint-Based Local Search, by Pascal Van Hentenryck and Laurent Michel. MIT Press, 2005

Travelling Salesperson as a Constraint
Mark Wallace
Project-id: Wallace-2

To solve complex problems Constraint Programming often uses subproblems as "global" constraints. A global constraint is an approximations of the subproblem, which can be solved quickly – a global constraint may be invoked many times while solving the original problem. This project will design a global constraint for the travelling salesperson problem (TSP) who must vist a set of cities and return to base using the shortest possible route. Our approximation to the TSP uses nodes representing clusters of "cities" – where the TSP for each cluster has already been approximated. The global constraint will be implemented in the ECLiPSe constraint programming system [AW07] The practical usefulness of the new TSP global constraint will be tested on a number of practical problems currently being tackled by the CTI-Monash Centre for Optimisation in travel, transport and logistics.
[AW07] Apt, K. R. and Wallace, M. Constraint Logic Programming Using Eclipse. Cambridge University Press, 2007.


Probabilistic classification learning (24-pts)
Geoff Webb
Project-id: Webb-1

Averaged One Dependence Estimators (AODE) is one of the more popular new machine learning algorithms proposed in the past five years. This project will examine extensions to AODE, either to improve its accuracy by extending it to higher-order estimators, or to increase the range of tasks to which it can be applied, by addressing its computational limitations when applied to high-dimensional data. This is a well-defined project with the potential for high-impact outcomes which will provide a motivated student with scope to take initiative and develop advanced research skills.

Bioinformatics (24-pts)
Geoff Webb
Project-id: Webb-2

Computational techniques are becoming increasingly central to advanced biological research. In this project the student will join a team of computer scientists and biologists working together to address significant questions about how one of the fundamental building blocks of life, proteins, function. There is no requirement for a background in biology.

Last update: Wednesday, January 20, 2010 6:03 PM