Honours projects for the Clayton School of IT, 2010
Note that some BSE projects are 12-pts, and that BCS/BBIS projects are 24-pts. Some projects come in 24- and 12-pt versions; the latter involves less work.
Imagine when you next visit a museum you are handed a device that not only tells you about the current exhibit that you are viewing but also makes recommendations about other exhibits that it believes you may be interested in. For such a device to be useful it will need to interact with sensors that are keeping track of your movements, and it will need to make predictions about your trajectory and interests. There are are several types of sensors, and it is too expensive to try each of them out in a real museum environment.
This project is a continuation of an existing project, where we propose to simulate the effect of different types of sensors on modelling visitors behaviour in a museum.
You must be a competent programmer and familiar with MATLAB (or at least familiarize yourself with MATLAB prior to the start of the project).
Bayesian Networks are graphical models that are used for probabilistic reasoning and to represent knowledge involving uncertainties. In this project it is proposed to investigate their use in the cryptanalysis of simple ciphers. The project will initially investigate the cryptanalysis of monoalphabet ciphers of English plain text, using Bayesian Networks to represent knowledge of the cipher and the English language. After this initial investigation other simple ciphers will be consider, such as horizontal and vertical shift ciphers and rotor ciphers.
Strict Minimum Message Length (SMML) inference is an information-theoretic criterion for machine learning. It was introduced by Wallace and Boulton and has several desirable statistical properties. The SMML code-book problem is to create an optimal code-book that gives the greatest compression of future data. There is a quadratic-time algorithm for data drawn from a binomial distribution, and a good heuristic for the trinomial distribution, but the code-book problem is NP-hard (very probably requires exponential time) in general [FW02]. The project is to devise fast approximation-algorithms, that is algorithms that run quickly and create good, but not necessarily optimal, code books. The first cases to consider will be essentially unbounded one-dimensional in character such as the geometric and Poisson distributions. Optimisations may be possible if the prior on the parameter(s) is "nice". The Gaussian (normal) and related distributions are important for continuous data. The project involves and relates to algorithm analysis, complexity, data compression, and machine learning. You should have good results in mathematics, algorithms and data structures, and formal methods, and must have discussed the project with LA or GF before submitting "the form".
[FW02] G. E. Farr and C. S. Wallace.
The complexity of strict minimum message length inference.
Computer Journal, 45(3), pp.285-292, 2002 [doi:10.1093/comjnl/45.3.285]
Given K DNA sequences, or protein sequences, the multiple-alignment problem and the evolutionary-tree problem are "chicken and egg" problems: Given an optimal evolutionary-tree one can search for an optimal multiple-alignment of the K sequences (and we know how to do that [LW94]), and given an optimal multiple-alignment of the K sequences one can search for an optimal evolutionary-tree.
An evolutionary-tree is a special case of a Bayesian net [KN04]: a tree rather than a general DAG, with discrete variables (over {A,C,G,T} if DNA) at each node, known values at all leaves, and missing values at all internal (ancestral) nodes, e.g., see fig.4 [LW94].
The project is to modify the CaMML algorithm for Bayesian nets to infer evolutionary trees given a multiple alignment. If that is successful, the feasibility of simultaneously solving the multiple-alignment and evolutionary-tree problems will be investigated.
The project involves algorithms, probability and machine learning. You should have good results in mathematics, algorithms and data structures, and formal methods, and must have discussed the project with LA or KK before submitting "the form".
[LW94] L. Allison and C. S. Wallace.
The Posterior Probability Distribution of Alignments and
its Application to Parameter Estimation of Evolutionary Trees and
to Optimisation of Multiple Alignments.
Jrnl. Molec. Evol., 39(4), pp.418-430, 1994
http://www.csse.monash.edu.au/~lloyd/tildeStrings/Multiple/94.JME/
or
http://dx.doi.org/10.1007/BF00160274
[KN04] K. B. Korb and A. E. Nicholson.
Bayesian Artificial Intelligence.
Chapman and Hall / CRC, 2004.
Vision is indispensable. Building automated machines with human vision capabilities is a formidable challenge in both industry and academia. Much of human learning can be viewed as an unsupervised incremental learning process. Investigating this gives a significant contribution to machine learning. In this project our main focus is to investigate how humans accumulate knowledge incrementally by a series of objects, induce a hierarchy of concepts that summarize and organize such observations and use them in classifying future experiences.
Pre-labour fetal health is vital for a successful birth. Despite the maternal involvement in assuring fetal health, there are the risky deliveries where intervention is mandatory. The only external measurement that relates directly to fetal health is fetal heart rate patterns. Analyses of this property can be used to determine ominous instances. However, these pattens are largely misinterpreted by human observers leading to unnecessary surgical procedures. This project presents a cognitive approach to fetal heart rate interpretation. Artificial knowledge acquisition means are revamped with emphasis on understanding rather than enforcing knowledge.
Bioinformatics research involves retrieval and analysis of a large amount of biological data such as gene expressions, DNA, RNA, and proteins. Biological data is characterised by its high volume, high dimensionality and also complexity -in terms of structure, evolving nature and biological processes it involves in. Machine learning algorithms have been widely used for solving problems in bioinformatics such as sequence alignment, gene finding and protein domain analysis. This research is an attempt to develop a connectionist system that identifies dimensional changes in biological data.
Business intelligence (BI) systems have been rated by the Gartner Global CIO Survey as the top technology priority for organizations for the last four years. There has been very little research conducted on the sourcing of BI systems and the data warehouses (DW) that feed them with data. Articles in industry magazines suggest that many systems are developed inhouse, many are outsourced, and some have relatively complex arrangements. This project builds on preliminary research on BI and DW sourcing:
Arnott, D. (2008). Should large-scale decision support systems be outsourced? An empirical study. In Proceedings of the 13th Asia-Pacific Decision Sciences Conference (APDSI 2008). Brisbane, Australia: Griffith University.
This project will investigate how BI and DW are sourced, whether they use different sourcing strategies to enterprise systems, and whether they themselves need different strategies. The foundation theories for the project will be transaction cost economics and the resource based view of the firm. The project could be undertaken by multiple students with each student using a different method, that is, one researcher could use a survey method and another a case study approach. The project is suitable for BBIS and BITS students who have an information systems orientation. Studies in business and management would be helpful but are not required.
IT governance is an area of considerable academic and industry Business intelligence (BI) systems have been rated by the Gartner Global CIO Survey as the top technology priority for organizations for the last four years. There has been very little research conducted on the governance of BI systems and the data warehouses (DW). This project builds on preliminary research on BI and DW governance:
Arnott, D. (2006). Data warehouse and business intelligence governance: An empirical study. In F. Adam, P. Brezillon, S. Carlsson, & P. Humphreys (Eds.). Creativity and innovation in decision making and decision support (pp. 711-730). London: Ludic Publishing.
This project will investigate how BI and DW are governed, whether they use different governance strategies to enterprise systems, and whether they themselves need different strategies. The foundation theories for the project will be the matrix theory of IT governance and the contingency theory of governance. The project could be undertaken by multiple students, with each student using a different method, that is, one researcher could use a survey method and another a case study approach.
The project is suitable for BBIS and BITS students who have an information systems orientation. Studies in business and management would be helpful but are not required.
Biometric information can serve not only for access or authetication, but also for data protection. It is possible to generate a key from the biometric information that can be used with some cryptographic algorithm to encipher and decipher data. In this project we shall study methods of generating key from finger prints or iris codes, addressing the fundamental problems of biometric cryptography of key change and distortion tolerance. Given the biometrics, a set of reliable features are extracted. In this project we shall design a system in which we do not need to store a biometric template, but only a string of error-correction data from which the biometric cannot be derived, and from which the key cannot be derived either unless the biometric is present.
The name Slime Mould doesn't give an organism a very appealing ring to it... but these "creatures" are really fascinating! There are two types of slime moulds: plasmodial slime moulds and cellular slime moulds. This project is interested in the latter. The trait that makes these slimes so amazing is their diverse range of behaviours during a complex life cycle. They have several "phases" within a motile (moving) period that give the mould the appearance of an animal, and an immotile (static) period that gives them the appearance of a plant or fungus.
Individual cells of the mould start their lives scattered around an area. At a chemical signal triggered by starvation, they all coagulate into a single blob. This then becomes a slug called a plasmodium that may be a few inches long. This slug then wanders around seeking nutrients. When it finds a good spot, the slug turns into a reproductive structure called a fruiting body. This is a fungus-like structure with a stalk and a ball on top that produces spores. These spores are then scattered... and the life cycle starts over again.
The purpose of this project is to build an agent-based simulation of the slime mould's life cycle (to some level of abstraction) and to visualise it using computer graphics techniques. The look of the model is important! The aim is to produce an attractive but true-to-life representation of the organism's appearance and behaviour. A movie of the slime mould plasmodium is online: http://www.youtube.com/watch?v=96U-6iU8W_A Some decent illustrations of the various phases of the life cycle are online too: http://universe-review.ca/R10-18-slimemoulds.htm
See also Korb and Dorin.
Ratings are used in chess, tennis, golf, other sports, etc. in order
to rank both teams and individual players. A variety of systems are
used to do the ratings and rankings - including Elo, Glicko, Sonas and
systems more concerned with prize money (over the past 12 months).
We can re-visit and improve these systems using the Minimum Message
Length (MML) principle (Wallace & Boulton, 1968)(Wallace & Freeman,
1987)(Wallace & Dowe, 1999a, which has been the Computer Journal's
most
downloaded article)(Wallace, posthumous, 2005)(Comley and Dowe, 2005)
to arrive at a comparatively simple model with an improved and
near-optimal predictive accuracy on future games and contests.
This project will require strong mathematics - calculus (partial
derivatives, second-order partial derivatives, integration by parts,
determinants of matrices, etc.), etc.
ComleyDowe2005 Comley, Joshua W. and D.L. Dowe (2005). Minimum
Message Length, MDL and Generalised Bayesian Networks with Asymmetric
Languages, Chapter 11 (pp265-294) in "Advances in Minimum Description
Length: Theory and Applications", M.I.T. Press, April 2005.
[Final camera-ready copy submitted Oct. 2003.]
DoweGardnerOppy2007 Dowe, D. L., S. Gardner and G. Oppy (2007).
Bayes not bust! Why simplicity is no problem for Bayesians. British
J. Philos. Sci., pp709-754.
Wallace2005
WallaceBoulton1968
WallaceDowe1999a
WallaceFreeman1987
The existence of the Higgs boson is central to the structure of the standard
model of elementary particles in Physics. The current search for the Higgs
boson involves experiments at the Fermilab Tevatron and at the CERN Large
Hadron Collider (LHC).
Simulation software is available for the outputs of the Tevatron and the
LHC
for the standard particle model and for various other competing models.
This
enables us to statistically analyze the signals of the presence of a Higgs
boson in the data, in advance.
The main statistical approach considered would be the Minimum Message
Length (MML) principle, an approach from Bayesian information theory which
quantitatively trades off the simplicity off a hypothesis against its
goodness of fit to the observed data. Our search problem is partly a problem
in mixture modelling (or clustering), where we wish to identify whether
or not
there is sufficient evidence that there is a component of the observed
data
which is best explained by the presence of the Higgs boson.
The student should have a reasonably strong background in mathematics
(partial
derivatives, integration, matrix determinants, etc.) and advanced quantum
physics.
References:
D. L. Dowe, S. Gardner and G.R. Oppy (2007) "Bayes not Bust! Why Simplicity
is no Problem for Bayesians", British Journal for the Philosophy of
Science
(BJPS), Dec. 2007, pp709-754.
Dowe, D. L. (2008a), "Foreword re C. S. Wallace", Computer J.,
Vol. 51 No. 5 [Christopher Stewart WALLACE (1933-2004) memorial
special issue], pp523-560.
The optimal reconstruction of images from traditional computer
tomography (CT) projections in the presence of real noise is a mature
problem with several practical solutions. This project proposes use
of a combination of symmetric (Finite Radon Transform) and asymmetric
(Mojette Transform) discrete projection techniques to reconstruct
images of test data and real images, and uses Minimum Message Length
techniques to optimise the image reconstruction for a variety of assumed
noise models. It also aims to use MML-related techniques to guide the
selection of the best views to capture the minimal amount of
sufficient real data required to reconstruct variable resolution images.
References:
Comley, J. W. and D. L. Dowe (2005). Minimum Message Length and
Generalized Bayesian Networks with Asymmetric Languages, Chapter 11
(pp265-294) in "Advances in Minimum Description Length: Theory and
Applications", M.I.T. Press, April 2005.
[Final camera-ready copy submitted Oct. 2003.]
D. L. Dowe, S. Gardner and G. R. Oppy (2007) "Bayes not Bust! Why
Simplicity is no Problem for Bayesians", Brit. Journal Philos. Sci.
(BJPS), Dec. 2007, pp709-754.
Guedon, JP and Normand, N., The mojette transform: the first ten years,
LNCS 3429 (2005) 79-91
Matus, F. and Flusser, J., Image representation via a finite Radon
transform, IEEE PAMI 15(10) (1993) 996-1006
Wallace2005
Support Vector Machines (SVMs) are a popular approach to classification
in machine learning and "data mining". They are usually only
used to
divide between two classes ("yes"/"no" or "positive"/"negative")
and
nor are they typically able to give probabilities with their
predictions. They also have some arbitrariness in the choice of "kernel"
functions for specifying non-linear boundaries. Using Minimum Message
Length (MML) approaches such as those in Tan & Dowe (2004), other notions
intimated in Dowe (2007) and some previously overlooked coding
inefficiencies, we will be able to overcome all these shortcomings.
This will enable us to come up with comparatively simple SVMs which give
excellent (probabilistic) predictions on multi-class problems, possibly
using non-linear cuts.
The mathematics in this project will not be trivial.
References:
ComleyDowe2005
Dowe, D. L. (2008a), "Foreword re C. S. Wallace", Computer J.,
Vol. 51 No. 5 [Christopher Stewart WALLACE (1933-2004) memorial
special issue], pp523-560.
D. L. Dowe, S. Gardner and G. R. Oppy (2007) "Bayes not Bust! Why
Simplicity
is no Problem for Bayesians", Brit. Journal Philos. Sci. (BJPS), Dec.
2007,
pp709-754.
P. J. Tan and D. L. Dowe (2004). MML Inference of Oblique Decision Trees,
Proc. 17th Australian Joint Conf. on Artificial Intelligence (AI'04),
Dec. 2004, Lecture Notes in Artificial Intelligence (LNAI) 3339, Springer,
pp1082-1088.
Wallace2005
In games such as "rock, paper, scissors", we typically first
seek
to maximise our returns against an opponent's best strategy, and
then we seek to exploit any systematic sub-optimalities in our
opponents' play. There is a balance between trying simultaneously
to play an optimal strategy and learn a model for the opponents'
strategies [while also being wary of the traps of elusive models
(Dowe 2008a, footnote 211)].
The project will start with just one other player and then build
up to games with several other players and games where the number
of other players is not known. We shall eventually seek to infer
the total number of players and their strategies. This has
relevance to financial markets. The ability of MML to infer when
the amount of data per parameter is very limited (e.g., Dowe &
Wallace (1997)) will be central here.
The mathematics in this project will not be trivial. An interest
in game theory is essential, a knowledge of game theory is desirable.
References:
ComleyDowe2005
Dowe, D. L. (2008a), "Foreword re C. S. Wallace", Computer J.,
Vol. 51 No. 5 [Christopher Stewart WALLACE (1933-2004) memorial
special issue], pp523-560
D. L. Dowe, S. Gardner and G. R. Oppy (2007) "Bayes not Bust! Why
Simplicity
is no Problem for Bayesians", Brit. Journal Philos. Sci. (BJPS), Dec.
2007,
pp709-754.
Dowe, D. L. and C. S. Wallace (1997).
Wallace2005
WallaceDowe1999a
Many real-world systems (aerodynamic, biological, economic, financial,
meteorological, etc.) are modelled by systems of one or more
differential equations - such as Bernoulli, Navier-Stokes, etc.
Such systems are often well-known, but even then they are typically
inexact. One such example is turbulence and the difficulty in
properly accounting for it.
This project will start off simply with data from which we will infer
a differential equation with only one unknown. We will then introduce
a noise term which will enable us to model data which we can not fit
exactly - due to any number of effects, ranging from quantum mechanical
to measurement inaccuracies to the fact that the system is simply too
complicated for our model.
Our modelling will include Minimum Message Length (MML) - variously
because of a variety of optimality properties and its ability to use
simple models that fit more accurately than the more complex models
of other approaches.
The mathematics in this project will not be trivial.
References:
Comley, J. W. and D. L. Dowe (2005). Minimum Message Length and
Generalized Bayesian Networks with Asymmetric Languages, Chapter 11
(pp265-294) in "Advances in Minimum Description Length: Theory and
Applications", M.I.T. Press, April 2005.
[Final camera-ready copy submitted Oct. 2003.]
D. L. Dowe, S. Gardner and G. R. Oppy (2007) "Bayes not Bust! Why
Simplicity
is no Problem for Bayesians", Brit. Journal Philos. Sci. (BJPS), Dec.
2007,
pp709-754.
Wallace2005
WallaceDowe1999a
Languages spoken by humans change over time. For example, French, Italian
and
many European languages originate from Latin, but each diverges from Latin
in
rather different ways. English is a more complex case, with influences from
Latin and French overlaying its basic Germanic nature. Representing language
by the sounds of spoken words, we use Minimum Message Length (MML) to model
how languages might have developed. Our aim is to reach a model which best
approximates the known history of a language group by running simulations
which assign different probabilities to different types of change, e.g.
internal variation, development of regularity by analogy with existing forms,
and external influences (borrowing). This project will be quite mathematical.
References:
Cangelosi, Angelo and Domenico Parisi (eds.) (2002). Simulating the Evolution
of Language. Berlin: Springer-Verlag.
Dowe, Gardner & Oppy (2007)
Kosmidis, Kosmas, John M. Halley, and Panos Argyrakis (2005). Language
evolution and population dynamics in a system of two interacting species.
Physica A 353: 595-612.
Nowak, Martin A., Natalia Komarova, and Partha Niyogi (2002). Computational
and evolutionary aspects of language. Nature 417: 611-617
Ooi, J.N. and D. L. Dowe (2005). Inferring Phylogenetic Graphs of
Natural Languages using Minimum Message Length, Proc. CAEPIA
2005 (11th Conf. Spanish Assoc. for Artificial Intelligence), Vol. 1,
pp I:143 - I:152, Nov. 2005; ISBN 84-96474-13-5
Surveys of this work are given in parts of Wallace (2005) and Dowe (2008a).
This project will involve extending the program in one of more directions.
Dowe, D. L. (2008a), ``Foreword re C. S. Wallace'', Computer J., Vol.
51 No. 5
[Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560
Edwards & Dowe (1998)
Visser & Dowe (2007)
Visser, Dowe & Uotila (2009)
Wallace (1990)
Wallace (1998)
Wallace & Boulton (1968)
Wallace & Dowe (1994)
Wallace & Dowe (2000)
MML time series and Bayesian nets with discrete and continuous attributes
David Dowe
Project-id: dld-9
The first application of MML to Bayesian nets including both discrete
and
continuous-valued attributes was in Comley & Dowe (2003), refined in
Comley
& Dowe (2005) [whose final camera-ready version was submitted in Oct
2003],
based on an idea in Dowe & Wallace (1998). We seek to enhance this
original
work to Bayesian nets which can change with time, using the mathematics
of
MML time series in Fitzgibbon, Dowe et al. (2004).
Comley, J. & D.L. Dowe (2003). General Bayesian Networks and Asymmetric
Languages, Proc. 2nd Hawaii International Conf' on Statistics and Related
Fields, 5-8 June, 2003.
Comley, J. & D.L. Dowe (2005). Minimum Message Length, MDL and Generalised
Bayesian Networks with Asymmetric Languages, Chapter 11 (pp265-294) in
P.
Grunwald et al. (eds.), Advances in Minimum Description Length: Theory
and
Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9
Dowe, D. L. (2008a), ``Foreword re C. S. Wallace'', Computer J., Vol.
51 No. 5
[Christopher Stewart WALLACE (1933-2004) memorial special issue], pp523-560
D L Dowe & C S Wallace (1998). Kolmogorov complexity, minimum message
length
and inverse learning, abstract, page 144, 14th Australian Statistical Conf'
(ASC-14), Qld, 6 - 10 July 1998.
Fitzgibbon, L.J., D. L. Dowe & F. Vahid (2004). Minimum Message Length
Autoregressive Model Order Selection. In M. Palanaswami et al. (eds.),
International Conf' on Intelligent Sensing and Information Processing (ICISIP),
Chennai, India, Jan 2004, pp439-444
Evolving & quantifying varieties of intelligence (24 pts)
"Intelligence" is a term used variably to describe a variety
of
human aptitudes (memory, mathematical ability, ability to make inductive
inferences), terrestrial non-human aptitudes and (www.SETI.org)
something sought in extra-terrestrials. Humans share much with other
humans and also with terrestrial non-humans, but how much must humans
share with extra-terrestrials or one extra-terrestrial with another?
Dowe & Hajek (1997, 1998) discuss the relevance of the
information-theoretic Minimum Message Length (MML) principle (Wallace
& Boulton, 1968; Wallace & Freeman, 1987; Wallace, 2005) to the
inductive inference part of intelligence, and Hernandez-Orallo &
Minaya-Collado (1998) discuss the relevance of Kolmogorov complexity to
intelligence - and MML is intimately related to Kolmogorov complexity
(Wallace and Dowe, 1999a).
This project concerns the use of information theory, MML and
Kolmogorov complexity to recognise signs of intelligence and ways of
communicating with it. This will inevitably entail measuring - or
attempting to measure - intelligence. Another direction in which
the project might head is to study how intelligence of a system increases
(or possibly doesn't increase) when there is communication involving more
than one intelligent entity (Dowe, 2008a).
This mathematical project might also involve the study of
communications between terrestrial non-humans.
D. L. Dowe (2008a), ``Foreword re C. S. Wallace'', Computer Journal
http://comjnl.oxfordjournals.org/content/vol51/issue5
Researchers in graph theory often want to determine various important properties of specific graphs of modest size: chromatic number, size of maximum clique or independent set, shortest/longest cycle, diameter, girth, genus, automorphism group, etc (most of which you will not have met yet). Such information can be used to formulate or disprove hypotheses, and is very valuable to workers in the field. However, there are huge numbers of different graphs, even of modest size, and there are also very many different parameters which are of interest. Some of these parameters may be very hard to compute. Technology can help a lot here, in storage of large numbers of graphs and their parameters, giving access to this information from anywhere in the world, and computing the frequently NP-hard (or worse) parameters, if they are not already available.
The aim of this project is to build a system which:
As well as system-building, the project will involve doing a survey of existing systems that do parts of the job required, as well as a literature review.
Many decisions will need to be made in the course of the project. Some of these will relate to the relative emphasis given to (a) data management, and using the data on existing graphs to answer user queries about those graphs, and (b) computation, so that queries about new graphs can be answered.
We anticipate giving more emphasis to (a). We also need to decide what graph parameters are to be stored with a graph in the database, and how sophisticated the front end will be (probably primitive, to begin with).
The platform development principles of the Message Lab will be a feature of this project: http://www.messagelab.monash.edu.au/
Prerequisites: good results in: mathematics subjects; Algorithms and Data Structures; Analysis and Design of Algorithms. Some knowledge of databases.
You must talk to Graham Farr before nominating this project.
One of the most crucial phases when chairing a reviewed conference is to assign reviewers to the submitted papers. This is usually done by asking reviewers to indicate, for each paper, whether they want it, could cope with it, do not want it, and have a conflict of interests with its authors. With this information (and perhaps some extra info such as a maximum/minimum number of papers to be assigned to a given reviewer), most modern conference-support tools (such as Easy-Chair) perform an initial assignment. Initial because it is usually so bad that the Chair has to perform extensive modifications to it, often based on information that might not even have been taken into account by the tool, such as the specific research area of the reviewer, its quality, etc. Such modifications tend to be extremely painful and time consuming due, mainly, to the lack of support from the tool.
The aim of this project is twofold. The first is to apply optimisation techniques that yield a better result than those in current state of the art conference-support tools. In particular, the tool should allow the chair to choose among different kind of objective functions and visualise the resulting assignments. The second aim is to provide a graphical interface that somehow helps Chairs modify the initial assignment. Since we do not know yet what is the best optimisation function not how to support Chairs when modifying the tool, the project should be a lot of fun!
It would be extremely useful (though not absolutely necessary) for the student to have taken FIT3082 Programming Languages and Paradigms (and obtained a good mark).
Volume rendering is a computer graphics technique that is used by scientists to visualise 3D image data. It is becoming an important tool due to the increasing availability of machines that produce 3D data, such as common CT machines, MRI machines, and also due to the introduction of new generation imaging facilities such as the upcoming Imaging and Medical Therapies Beamline at the Australian Synchrotron. Despite the increasing use of volume rendering, viewing a 3D volume on a 2-dimensional screen remains challenging – it is difficult to gauge shape and structure, and it is easy to misrepresent an object. This project proposed that lessons might be learned from other professions that have been representing 3D geometry in 2D for many years. For example, architects are trained to express a building on paper using a range of explicit and implicit techniques such as line weight, shadow, fading, detail and perspective. These techniques are applied carefully to make a drawing read in 3D whilst retaining the simplicity and scientific rigour of a 2D drawing. Few of these techniques are used effectively in scientific visualisation. This project will investigate how architectural drawing can help scientists view their data and develop tools to support this. The outcome will be tested using detailed synchrotron data sets. Student should have an interest in computer graphics, scientific visualisation, medical imaging, or design.
GN is a highly-scalable associative memory algorithm1 capable of handling
multiple streams of input, which are processed and matched with the historical
data (available within the network) in real time. It is capable of utilising
a large number of low performance processors in connected configurations.
Hence GN is highly suited for resource constrained devices such as Berkeley
motes. The project will involve generalising the existing GN implementation,
which handles 1-dimensional sensory inputs, to a multi-dimensional version
capable of analysing 3 or higher dimensional sensory inputs to a WSN. The
24 point project will also investigate whether the algorithm suffers from
curse of dimensionality. This project is available to students with programming
background in C/C++ or Java.
1. Nasution and Khan, A Hierarchical Graph Neuron Scheme
This project will explore the possibilities for implementing creative evolution in silico, responding to the challenge by Bedau et al (2000) to reproduce "open-ended evolution". We will begin by examining previous attempts and Bedau's evolutionary activity statistics (Bedau et al 1998) for assessing them.
We will develop our own measures of the growth of complexity in organism structure and behaviour, and apply them to:
Students are requested to do some preliminary reading (e.g. at least (Levy 1997) and (Bedau et al 2000)) prior to commencing the project. Enrollment in the Honours unit FIT4012 is strongly encouraged as it will provide background material of direct relevance to the topic.
References:
Static Bayesian networks (BNs) represent causal processes which may be repeated, but which do not change upon repetition. While simple in concept, they are nevertheless powerful representations for dealing with systems showing significant uncertainty about their outcomes, and they are widely employed in health care, ecological and environmental sciences, manufacturing industries, and government (see www.norsys.com/clients.htm). A great deal of effort has been put into automating the learning of Bayesian networks from data, including at Monash, yielding our CaMML program (Causal discovery via MML).
Dynamic Bayesian networks (DBNs) represent causal systems which change over time, yielding time series data from their observation. Examples include all kinds of economic markets, such as stock markets and currencies, weather and climate systems, and human disease processes. This project will investigate ways of extending CaMML to learn DBNs from observational data.
Evolutionary Artificial Life is a powerful simulation tool that has been applied to a wide range of practical and theoretical problems in science. Among those are the evolution of cooperation (Axelrod), complex social behavior (Epstein and Axtell), the evolution of altruism (Mascaro et al), the evolution of language (Kirby), and the evolution of cognition (Yeager).
In this project we propose a new target for simulated experimental evolution: utility. Utility is a key ingredient in rational decision making. Understanding how it evolves and, in particular, evolutionary tradeoffs between utility evolution and improvements in cognition are likely to shed light upon a large variety of issues in the evolution of cognition and behavior generally.
Some familiarity with artificial life and artificial intelligence (especially neural networks and Bayesian decision making) would be an advantage.
CaMML (Causal discovery via Minimum Message Length) "data mines" observational data to find the causal Bayesian network most probable in light of the data. It has been successfully applied to a large range of problems. As those problems grow in the number of variables and the complexity of their interrelation, either the time required for discovery or the inaccuracy of the result increases exponentially. In order to increase the range of tractable problems we propose to redesign CaMML to take advantage of parallel supercomputing. As a part of the project we will investigate empirically the relationship between problem size, sample size, time and accuracy.
See also: Alison and Korb and Nicholson and Korb
Jon McCormack's projects are in the area of computer graphics, human-computer interaction, adaptive intelligence and evolutionary computing. Current project offerings:
Organisation charts are one of the most common kind of diagrams. However, comparatively little is known about how people understand or use them. In this project eye-tracking equipment will be used to determine how the focus of attention moves around the elements in an organisation chart when it is first seen, and then used in subsequent tasks. Based on this the project will be to develop a model for focus of attention changes when viewing organisation charts and to test the model with eye tracking data. [The project can also be offered for other sorts of visual notations such as UML notations, advanced mathematics, etc]
Automatic document layout is increasingly important topic because of the need to adapt a document's layout to different viewing environments and to dynamically generated content. An important part of technical document layout is how to layout program code, i.e. where to break lines of code if the line is too long. Based on this the project will be to develop a function for measuring the quality of layout of the code and incorporate this into an automatic document layout tool.
Automatic document layout is increasingly important topic because of the need to adapt a document's layout to different viewing environments and to dynamically generated content. An important part of technical document layout is how to layout equations and in particular where to break equations if the line is too long. The project will be to develop a function for measuring the quality of layout of the equation and incorporate this into an automatic document layout tool.
SBGN (Systems Biology Graphical Notation) is a new graphical standard for visualizing the processes in living organisms (http://sbgn.org/). Good automatic layout is required to support the use of SBGN in computer visualization applications. However currently there is no good method for laying out large SBGN diagrams. The project is to extend the automatic layout engine underlying the diagramming tool Dunnart (http://www.csse.monash.edu.au/~mwybrow/dunnart/) to support layout of large SBGN diagrams.
The project requires someone who is interested in algorithms and is not scared by mathematics.
The Monash Bayesian poker project was started by Kevin Korb in 1993. Since then 5 honours students have worked on various aspects of the project. Our Bayesian poker player (BPP) was originally developed to play 5-card stud poker, using Bayesian network technology. Most recently, BPP has been converted to play Texas Hold'em Poker, the main online form of poker, and re-written in python. BPP has been entered in the inaugural world automated poker playing competition at the main American Artificial Intelligence conference AAAI-2006 and we plan to compete again in 2007. We have developed a simple GUI interface that will allow people to play against BPP via the internet. This project offers lots of options for making BPP a better poker player including improving its bluffing strategies and its opponent modelling. We have access to the AAAI-2006 tournament dgame logs, which provide a rich source of data for automated learning of opponent modelling. Or you might be interested in developing a much more interesting playing interface.
See http://www.csse.monash.edu.au/bai/poker/poker.html for links to Honours projects, online versions of research papers, and to play against the latest version of BPP online.
Artificial Cortical Networks (ACN) also known as Multimodal Self-Organizing Networks are used to model mapping of the multidimensional sensory information e.g. speach onto two-dimensional areas as in human cortex. Recent applications of ACNs are presented in:
The modelling software is written in MATLAB and the first objective of the project is to add GUI in Simulink. Subsequently a student is expected to complete a specific modelling task using ACNs.
3D modelling is the process of constructing a virtual form for use in diverse fields such as film, computer games, computer aided engineering, architecture, physical simulation, or other interactive software. Many different packages and methods exist for modelling, one such package is Blender [1], a free and open-source modelling, animation and compositing tool.
It is often desirable to create organic forms, for example a giant CG kraken for a pirate movie. However, this is usually a very complex process, as 3d modelling tools are very general and have limited support for the kinds of patterns and shapes that can characterise organic quality. The aim of this project is to take a commonly occurring feature of organic shape and design and implement a tool to support the easy modelling of that feature.
One such recurring motif of many organic forms is covering symmetry. This occurs when a surface is literally covered in a repeated object, such as a tentacle being covered by suckers. A tool that supports covering symmetry would ideally take as input: an arbitrary surface S, an arbitrary shape C, and some parameters (such as covering density, arrangement style, etc.) and then would output the shape of C covering S.
The covering operation can be implemented in a multitude of ways and this project would investigate some of these in a very hands on and practical way. The desired outcome would be an easy to use tool integrated into a 3d modelling package (such as Blender), along with a set of case studies (3d models) resulting from the use of tool. As such, experience in (and enthusiasm for) computer graphics and programming is a requirement for this project.
For more information see this web page.
The purpose of a Content-Based Image Retrieval (CBIR) system is to retrieve images from a database such that the visual contents of retrieved images are similar to that of a query image. Often the image similarities are computed from the representation of visual contents such as colour, texture and shape, in terms of a multi-dimensional feature vector. Understandably, the features cannot be assumed to have equal weights. More important features deserve more weights and less important features deserve less weights. These weights can be assigned fully automatically from feature variation statistics in the database images. Better still is to use the so called relevance feedback (RF) mechanism in which the weights are modified iteratively based on users' response regarding the relevance of the retrieved images. The aim of this project is to investigate the feature weighting problem for both cases, namely, without RF and with RF.
In recent years Content-Based Image Retrieval (CBIR) has become one of the most active areas of research in image data mining. In studies involving the design of a CBIR system often the question arises whether the data set sizes used are big enough to rely on the accuracy achieved. The aim of this second project in CBIR is to study the impact, on accuracy, of the number of samples in the database, the number of samples per semantic category, the number of retrieved images returned to user for RF, the number of features used, and their relative sizes.
In statistical pattern recognition, often patterns are represented by a large number of numerical features. Although there is no conceptual justification in reducing the number of features to a small number, in practical problem solving, this becomes a necessary step due to the wellknown phenomenon of the 'curse of dimensionality' of the feature vector on the complexity of the pattern classifier. The aim of the proposed project is to develop a hierarhical feature selection paradigm that is well-suited to multi-class pattern recognition problems.
The project will comprise (i) study of some existing feature selection criteria followed by an experimental investigation of them, (ii) analysis of the above results leading to, hopefully, a hierarchical feature selection paradigm, and (iii) development of an interactive software tool for the above paradigm.
The methodology developed will be such that depending on the classification accuracy arrived at a certain stage, the user will have the option of increasing or decreasing the dimensionality value. The software tool will include procedures for displaying the distribution of pattern samples in different feature spaces obtained by different feature selection methods.
Texture plays an important role in both human interpretation of visual scenes and computer analysis of images. Textural cues are of particular relevance in two different, but related, image analysis problems, namely, the problems of segmentation and classification of images. The proposed project will deal with both of these problems. It will involve (i) investigating existing texture analysis methods from the point of view of their theoretical soundness as textural measures and (ii) investigating their practical applicability.
Port Phillip Bay, Victoria, is home to a small and genetically unique population of bottlenose dolphins, Tursiops sp. Individuals within this population have been identified via photo-identification of the dolphins' dorsal fin. Dorsal fins show a unique shape and through time develop permanent marks and notches. Currently, the digital images are being assessed for these permanent marks and assigned an identity by human 'eye'. This can be incredibly time consuming, requires permanent notches on the fin and a trained eye to pick-up small notches. The aim of the project is to create a methodology/software that not only uses the notches but shape of the fin to assign identity with high levels of accuracy.
Some references:Independent Component Analysis (ICA) is a statistical technique for discovering hidden factors underlying a set of random variables. ICA can be used to discover filters that can be used to characterize visual textures in a set of images. The filters produced by ICA are adapted to the statistics of the set of images used to derive them. It should thus be possible to use ICA to derive a set of filters specifically tuned to the textures in a collection of images from a restricted domain. We will investigate the discovery and selection of a set of ICA filters for image collections from domains such as dermatology.
The system will be developed within the framework of the GNU Image Finding Tool (GIFT) (http://www.gnu.org/software/gift/). The GIFT is an open framework for content-based image retrieval. Use of the GIFT framework will means that researchers working on a project can focus on the problems of immediate interest and importance to the project, rather than having to develop and entire image retrieval system, including user interface, feature extractors, indexing tools, database accessor, user interface, and feedback/learning system from scratch.
In one dimension, when we want to find the closest matching item to a given item it is natural to use binary search. Once we go beyond one dimension there is no known optimal solution. In many situations such as in case-based reasoning and in coding audio and video, there is a need to search a large collection of higher-dimensional vectors to find the one which is closest to the query item. This project will investigate the use of Minimal Spanning Trees to find a starting item for the search and to control how the search proceeds.
Mixture modeling, also known as clustering, involves taking a set of observations about things and deciding whether the things all belong to one population or whether the things come from a number of different sub-populations. DNA microarray data is a 2D array where the rows or columns may represent either a particular gene or represent the subject of an experiment. The entries in the microarray represent the activity of that gene for that subject. For instance, the entries in the first row might represent gene activities for the first subject while the entries in the fourth column might represent gene activities for the fourth gene. With bi-clustering we are interested in finding groups of genes and groups of people for what the group of genes behaves similarly. If we could re-order the rows and columns of a DNA micro-array, a bi-cluster would appear as a region of cells that show the same gene activity.
The MML (Minimal Message Length) criterion was proposed by Professor Chris Wallace foundation professor of Computer Science at Monash University. MML has been used to create a clustering program called 'snob'. 'snob' has proven to be very effective in identifying the correct number of clusters but it is not used widely in bioinformatics where people continue to use a variety of techniques that are generally not able to identify correctly the number of clusters.
The aim of this project is to apply the MML approach and snob to the mixture modelling problems that arise in the analysis of DNA microarray data. It would be of particular interest to extend the snob approach to bi-clustering.
In applications where we are trying to trade-off the complexity of a network against the ability of the network to model what occurs in a given situation, we need to come up with a measure of the complexity of the network. One way of measuring the complexity is to ask how many bits would be needed if we were to transmit a message that would enable the receiver to reconstruct the network in its entirety. Such a message is a lossless compression of the network. This project aims to explore how to apply standard data compression techniques to the problem of encoding networks.
A lossless compression of a grey-scale image allows the image to be stored in a minimal number of bits and for the image to be reconstructed that is bit-for-bit identical with the original image. A number of years ago a postgraduate student in this school developed the world's second best lossless compression of grey-scale images. (This same person had also developed the world's best lossless image compressor). This code, called glicbawls, was designed for the International Obfuscated C Programming competition. The fundamental approach in glicbawls is one-pass but the source code of the program itself is impossible to understand and very much tuned to the coding of natural images.
The aim of the project is to produce a version of glicbawls which is easy to understand and to modify and to explore how glicbawls can be tuned to perform well on particular classes of images, particularly medical images.
In the semi-supervised learning approach to image segmentation, the user labels certain pixels as belonging to particular segments and the program then decides how to label the remaining pixels based on their spatial proximity and the similarity of their pixel values to labelled pixels. A difficulty arises in trying to assign pixels to particular segments if there is more than one reasonable candidate segment. This project will investigate a probabilistic approach to propagating segment labels from user-labelled points to unlabelled points.
'snob' is a program that was developed by two people in the Department of Computer Science at Monash University. 'snob' aims to make class distinctions. That is, it looks at a number of observations about things and decides whether those things belong to one, two or a number of classes. A particular strength of 'snob' is that it is able to come up with a reasonable estimate of the number of classes that are actually present in the data. For instance, 'snob' when presented with data randomly generated by a particular distribution might decide that there is really only one class present.
The fundamental problem in image processing is to decide which pixels belong together because they represent parts of the same object. 'snob' is good at putting things into classes but it fails with image segmentation because 'snob' assumes that things to be classified are statistically independent whereas in image segmentation the values of pixels are highly statistically dependent on the values of neighbouring pixels.
The aim of this project is to extract certain key information from an image and to present that information to 'snob' which will decide how many classes are present. This information, along with the key information, will then be used by an image segmentation algorithm to produce a segmentation with the same number of segments as the number of classes that 'snob' judged were present in the key information.
Clustering is the process of putting objects into the same cluster or class as other objects which share the same properties. The 'snob' program is probably the world's best program at discovering an appropriate clustering of a given body of observations about objects. There are two ways in which it might be able to make snob easier to use, more efficient and more effective. The default action is for 'snob' to try to assign objects to classes without an initial guess as to the best clustering. In some cases a user might be able to look at a visualisation of the data and decide on an appropriate initial clustering and then let 'snob' try to optimise the user-provided initial clustering. This may result in significantly reduced run-times and might help 'snob' find clusterings it might miss in its search process.
Having produced a clustering, it would be convenient if a user could view a representation of the data with the clustering superimposed to see whether the clustering seems appropriate. In providing an initial clustering or assessing the snob-provided clustering, there is a need to look at data which may exist in many more than the two or three spatial dimensions than we can display on conventional graphics devices. This process of taking higher-dimensional data and converting it to a form which can be displayed on conventional graphics devices is known as information visualisation.
The aim of the project is to integrate 'snob' with an information visualisation package so that a user can view the data via the information visualisation package.
Local search is the most scalable approach for solving constrained optimisation
problems, but each problem requires specific move operators to be coded.
In Comet [VHM05] an approach based on "invariants" has been implemented
for compiling a problem statement into a local search algorithm. Building
on invariants, this project will design and implement a local search compiler
which efficiently handles logical conditions. The approach will be tested
on practical optimisation problem currently being tackled by the CTI-Monash
Centre for Optimisation in travel, transport and logistics.
[VHM05] Constraint-Based Local Search, by Pascal Van Hentenryck and Laurent Michel.
MIT Press, 2005
To solve complex problems Constraint Programming often uses subproblems
as "global" constraints. A global constraint is an approximations
of the subproblem, which can be solved quickly – a global constraint may
be invoked many times while solving the original problem.
This project will design a global constraint for the travelling salesperson problem
(TSP) who must vist a set of cities and return to base using the shortest
possible route. Our approximation to the TSP uses nodes representing clusters
of "cities" – where the
TSP for each cluster has already been approximated. The global constraint will
be implemented in the ECLiPSe constraint programming system [AW07]
The practical usefulness of the new TSP global constraint will be tested on a
number of practical problems currently being tackled by the CTI-Monash
Centre for Optimisation in travel, transport and logistics.
[AW07] Apt, K. R. and Wallace, M. Constraint Logic Programming Using Eclipse.
Cambridge University Press, 2007.
Averaged One Dependence Estimators (AODE) is one of the more popular new machine learning algorithms proposed in the past five years. This project will examine extensions to AODE, either to improve its accuracy by extending it to higher-order estimators, or to increase the range of tasks to which it can be applied, by addressing its computational limitations when applied to high-dimensional data. This is a well-defined project with the potential for high-impact outcomes which will provide a motivated student with scope to take initiative and develop advanced research skills.
Computational techniques are becoming increasingly central to advanced biological research. In this project the student will join a team of computer scientists and biologists working together to address significant questions about how one of the fundamental building blocks of life, proteins, function. There is no requirement for a background in biology.
Last update: Wednesday, January 20, 2010 6:03 PM