BCS and BSE Honours Projects 2007

Honours projects for the Computer Science (BCS) and Software Engineering with Honours (BSE, CSE4402) for 2007.

Note that some BSE projects are 12-pts, and that BCS projects are 24-pts. Some projects come in 24- and 12-pt versions; the latter involves less work.

Supervisors as of Friday 2nd March 2007.


A Smart Client for Solving Computational Inversion Problems (12 or 24 pts)
David Abramson
Project-id: Abramson-inverse
Inverse problems are very common in science and engineering. They can occur when we only know the outputs of a process and want to know the input values that caused them. In geology, an example might be to decide what parameters values, or initial conditions, are needed to drive a model of the earths crust to produce realistic output. Inverse problems are difficult to solve, however, most solutions combine non-linear optimization with computational modeling. We have developed a family of tools called Nimrod the can be used to solve inverse problems. Nimrod allows a user to specify some design goal, and the system searches the parameter space automatically.
Normally, it is necessary to convert the output of the computation to a single objective cost function value. However, many of our users have indicated that whilst they cannot objectively give a performance metric for a given solution, they recognise a good solution when they see one. This suggests that subjective analysis by users might be useful as part of the design approach. We have been experimenting with the design of such systems in which the user sees a number of solutions, and then ranks them for evaluation by a rank-driven optimization algorithm. Thus, the expert user is an integral part of the evaluation process rather than blindly using a deterministic mathematical equation.
In this project we plan to build a Smart Client that uses Nimrod to generate multiple solutions using computational models, visualizes them, and then display them for a user to assess. [nimrod(click)]
 
High-performance protein crystallography using GRID computing (12 or 24 pts)
David Abramson and Ashley Buckle (Medicine)
Project-id: Abramson-Buckle
X-ray crystallography is the most powerful technique for determining the 3-dimensional structures of proteins, but is increasingly dependant on high performance computing. The aim of this project is to develop solutions that will enable a wide range of crystallographic calculations to be performed on available GRID resources.

Applying MML to SVMs (12 or 24-pts)
David Albrecht and Peter Tischer
Project-id: Alb-MMLSVM

The Minimum Message (MML) inductive inference approach to machine learning
seeks the shortest explanation of the data by encoding the data in a two-part
message - the first part states a model and the second part gives the details
of the data using the stated model. MML has been successfully used in a wide
range of problems including, clustering and unsupervised classification;
time series; image processing; and data compression.

Support Vector Machines (SVM) are a very popular and important method
for doing classification in machine learning. They been successfully
applied to a number of applications including classification of DNA
microarray expression data; detection of microcalcifications in
mammograms; face detection; novelty detection; and text classification.
However, there are difficulties with the traditional approaches to SVMs.
The main one is that SVMs do not provide probabilities with their
classifications.

The aim of this project is to investigate how to apply MML
inductive inference to SVMs to develop a method of probabilistic
classification that combines the benefits of these two general methods.


Approximation-Algorithms for Strict Minimum Message Length Code-Books (12 or 24-pts)
Lloyd Allison and Graham Farr
Project-id: Allison-Farr-MML
Strict Minimum Message Length (SMML) inference is an information-theoretic criterion for machine learning. It was introduced by Wallace and Boulton and has several desirable statistical properties. The SMML code-book problem is to create an optimal code-book that gives the greatest compression of future data. There is a quadratic-time algorithm for data drawn from a binomial distribution, and a good heuristic for the trinomial distribution, but the code-book problem is NP-hard (very probably requires exponential time) in general [FW02]. The project is to devise fast approximation-algorithms, that is algorithms that run quickly and create good, but not necessarily optimal, code books. The first cases to consider will be essentially unbounded one-dimensional in character such as the geometric and Poisson distributions. Optimisations may be possible if the prior on the parameter(s) is "nice". The Gaussian (normal) and related distributions are important for continuous data. The project involves and relates to algorithm analysis, complexity, data compression, and machine learning. You should have good results in mathematics, algorithms and data structures, and formal methods, and must have discussed the project with LA or GF before submitting "the form".

[FW02] G. E. Farr and C. S. Wallace.
The complexity of strict minimum message length inference.
Computer Journal, 45(3), pp.285-292, 2002 [doi:10.1093/comjnl/45.3.285]

Bayesian Evolutionary Trees (12 or 24-pts)
Lloyd Allison and Kevin Korb
Project-id: Allison-Korb-BET

Given K DNA sequences, or protein sequences, the multiple-alignment problem and the evolutionary-tree problem are "chicken and egg" problems: Given an optimal evolutionary-tree one can search for an optimal multiple-alignment of the K sequences (and we know how to do that [LW94]), and given an optimal multiple-alignment of the K sequences one can search for an optimal evolutionary-tree.

An evolutionary-tree is a special case of a Bayesian net [KN04]: a tree rather than a general DAG, with discrete variables (over {A,C,G,T} if DNA) at each node, known values at all leaves, and missing values at all internal (ancestral) nodes, e.g., see fig.4 [LW94].

The project is to modify the CaMML algorithm for Bayesian nets to infer evolutionary trees given a multiple alignment. If that is successful, the feasibility of simultaneously solving the multiple-alignment and evolutionary-tree problems will be investigated.

The project involves algorithms, probability and machine learning. You should have good results in mathematics, algorithms and data structures, and formal methods, and must have discussed the project with LA or KK before submitting "the form".

References:

[LW94] L. Allison and C. S. Wallace.
The Posterior Probability Distribution of Alignments and
its Application to Parameter Estimation of Evolutionary Trees and
to Optimisation of Multiple Alignments.
Jrnl. Molec. Evol., 39(4), pp.418-430, 1994
http://www.csse.monash.edu.au/~lloyd/tildeStrings/Multiple/94.JME/
or
http://dx.doi.org/10.1007/BF00160274

[KN04] K. B. Korb and A. E. Nicholson.
Bayesian Artificial Intelligence.
Chapman and Hall / CRC, 2004.



Biometric Cryptosystem (12 or 24-pts)
Nandita Bhattacharjee and Bala Srinivasan
Project-id: Nandita-bio
Biometric information can serve not only for access or authetication, but also for
data protection. It is possible to generate a key from the biometric information that can be
used with some cryptographic algorithm to encipher and decipher data. In this project we shall study
methods of generating key from finger prints, addressing the fundamental problems of biometric
cryptography of key change and distortion tolerance. Given the biometrics, a set of reliable features are extracted. In this project we shall design a system in which we do not need to store a biometric
template, but only a string of error-correction data from which the biometric cannot be
derived, and from which the key cannot be derived either unless the biometric is present.

References:

Uludag, U.; Pankanti, S.; Prabhakar, S.; Jain, A.K.;"Biometric cryptosystems: issues and challenges",
Proceedings of the IEEE, Volume 92, Issue 6, June 2004 Page(s):948 - 960.


Group Selection: An investigation into the potential for the evolution of virtual ecosystems (12 or 24-pts)
Alan Dorin and Jon McCormack
Project-id: Dorin-McCormack-Eco
The role of group selection is somewhat controversial amongst evolutionary biologists. Group selection refers to the "process of genetic change brought about or maintained by the differential extinction and/or proliferation of populations" (Wade 1976, 1977). The aim of this project is to construct a computer simulation and visualisation of a set of populations of software agents, each with its own inter-agent and agent-environment interactions. This set of ecosystem simulations will be used to explore the potential for ecosystem replication and evolution. That is, the project investigates the evolutionary process as it applies to *groups* of agents, rather than the usual scenario in which evolution acts upon individual organisms in a population.

Important questions that will be addressed by the student include:
What are the necessary and sufficient conditions for the evolution of virtual ecosystems?
How does group selection compare against individual selection under different circumstances?

New computer graphics visualisation tools will need to be developed in order to interpret the results of the simulations conducted during the course of the project. Students will require a sound programming knowledge (C/C++ preferred, CSE2305, CSE3308) and experience with computer graphics programming (OpenGL, CSE3313) and multimedia (CSE2325/3325).

 

Wade, M.J. 1976. Group selection among laboratory populations of Tribolium. Proceedings of the National Academy of Sciences U.S.A. 73: 4604—4607

Wade, M.J. 1977. An experimental study of group selection. Evolution 31: 134—153

Some introductory reading:

Dawkins, R., The Blind Watchmaker. 1987, New York: W.W. Norton & Co.

Epstein, J.M. and R. Axtell, Growing Artificial Societies, Social Science from the Bottom Up. 1996, Washington D.C.: Brookings Institution & MIT Press.

Yaeger, L. Computational Genetics, Physiology, Metabolism, Neural Systems, Learning, Vision and Behavior or Polyworld: Life in a New Context. in Artificial Life III. 1992: Addison-Wesley.


Rating and ranking players and teams using MML (12 or 24 pts)
David Dowe
Project-id: Dowe-ranking
Ratings are used in chess, tennis, golf, other sports, etc. in order to rank both teams and individual players. A variety of systems are used to do the ratings and rankings - including Elo, Glicko and systems more concerned with prize money (over the past 12 months).
We can re-visit and improve these systems using the Minimum Message Length (MML) principle (Wallace and Boulton, 1968)(Wallace and Freeman, 1987)(Wallace and Dowe, 1999a, which is the Computer Journal's most downloaded article)(Wallace, posthumous, 2005)(Comley and Dowe, 2005) to arrive at a comparatively simple model with an improved and near-optimal predictive accuracy on future games and contests.
This project will require strong mathematics - calculus (partial derivatives, second-order partial derivatives, integration by parts, determinants of matrices, etc.), etc.
References: CoDo2005 Comley, Joshua W. and D.L. Dowe (2005). Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages, Chapter 11 (pp265-294) in P. Gru:nwald, I. J. Myung and M. A. Pitt (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9 [Final camera-ready copy submitted Oct. 2003.]
Wall2005; WaBo1968; WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, pp270-283. [This is the Computer Journal's most downloaded article.] WaFr1987.
Econometric, statistical and financial time series modelling using MML (24 pts)
David Dowe
Project-id: Dowe-econ
Time series are sequences of values of one or more variables. They are much studied in finance, econometrics, statistics and various branches of science (e.g., meteorology, etc.).
Minimum Message Length (MML) inference (Wallace and Boulton, 1968) (Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace, posthumous, 2005)(Comley and Dowe, 2005) has previously been applied to autoregressive (AR) time series (Fitzgibbon et al., 2004), other time series (Schmidt et al., 2005) and (at least in preliminary manner) both AR and Moving Average (MA) time series (Sak et al., 2005).
In this project, we apply MML to the Autoregressive Conditional Heteroskedasticity (ARCH) model, in which the (standard deviations and) variances also vary with time. Depending upon progress, we can move on to the GARCH (Generalised ARCH) model or Peiris's Generalised Autoregressive (GAR) models, or to inference of systems of differential equations.
This project will require strong mathematics - calculus (partial derivatives, second-order partial derivatives, integration by parts, determinants of matrices, etc.), etc.
References: CoDo2005 Comley, Joshua W. and D.L. Dowe (2005).
FiDV2004 Fitzgibbon, L.J., D. L. Dowe and F. Vahid (2004).
SaDR2005; ScPL2005; Wall2005; WaBo1968; WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, pp270-283. [This is the Computer Journal's most downloaded article.] WaFr1987
Using MML to infer language history and evolution (and relation to DNA) (24 pts)
David Dowe
Project-id: Dowe-history
The evolution of human languages raises several interesting issues, such as how languages evolve, how the evolution of spoken language relates to the evolution of written language, how it is that geographical regions of related languages can surround one or more regions of languages not related, how populations of language speakers migrated eons ago and possibly also how spoken language relates to DNA. Study of this area also helps with the inference of now-extinct ancestral languages and at least indirectly with the preservation of dying languages. The project will use the Minimum Message Length (MML) principle (Wallace and Boulton, 1968) (Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace, posthumous, 2005) (Comley and Dowe, 2005), building upon earlier work in (Ooi and Dowe, 2005).
The project will require strong mathematics - calculus (partial derivatives, second-order partial derivatives, integration by parts, determinants of matrices, etc.), etc.
References: CoDo2005 Comley, Joshua W. and D.L. Dowe (2005).
OoDo2005 Ooi, J.N. and D. L. Dowe, Inferring Phylogenetic Graphs of Natural Languages using Minimum Message Length, Proc. CAEPIA 2005, Vol. 1, pp I:143 - I:152, Nov. 2005.
Wall2005; WaBo1968 Wallace C.S. & Boulton, D.M. (1968)
WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, pp270-283. [This is the Computer Journal's most downloaded article.] WaFr1987.
Recognising, communicating with & quantifying varieties of intelligence (24 pts)
David Dowe
Projecy-id: Dowe-intell
"Intelligence" is a term used variably to describe a variety of human aptitudes (memory, mathematical ability, ability to make inductive inferences), terrestrial non-human aptitudes and (www.SETI.org) something sought in extra-terrestrials. Humans share much with other humans and share at least a planet with terrestrial non-humans, but how much must humans share with extra-terrestrials or one extra-terrestrial with another?
Dowe and Hajek (1997, 1998) discuss the relevance of the information-theoretic Minimum Message Length (MML) principle (Wallace and Boulton, 1968)(Wallace and Freeman, 1987)(Wallace, 2005) to the inductive inference part of intelligence, and Hernandez-Orallo and Minaya-Collado (1998) discuss the relevance of Kolmogorov complexity to intelligence - and MML is intimately related to Kolmogorov complexity (Wallace and Dowe, 1999a).
This project concerns the use of information theory, MML and Kolmogorov complexity to recognise signs of intelligence and ways of communicating with it. This will inevitably entail measuring - or attempting to measure - the intelligence. Another direction in which the project might head is to study how intelligence of a system increases (or possibly doesn't increase) when there is communication involving more than one intelligent entity. The project might also involve the study of communications between terrestrial non-humans such as, e.g., marmosets and dolphins. The project will require strong mathematics.
An MML-guided approach to reconstruct variable resolution images from asymmetric sets of noisy discrete projections (24 pts)
David Dowe and Imants Svalbe (Physics)
Projecy-id: dld-phys
The optimal reconstruction of images from traditional computer tomography (CT) projections in the presence of real noise is a mature problem with several practical solutions. Images where the spatial resolution becomes higher over pre-defined regions of interest may offer a means to reduce the patient x-ray radiation dose while maximising the relevant anatomic detail relevant for clinical applications. Variable resolution imaging requires adaptive projected views instead of using the typical uniformly-sampled projections seen in conventional CT. This project proposes use of a combination of symmetric (Finite Radon Transform) and asymmetric (Mojette Transform) discrete projection techniques to reconstruct variable resolution images of test data and real images, and uses Minimum Message Length techniques to optimise the image reconstruction for a variety of assumed noise models. It also aims to use MML-related techniques to guide the selection of the best views to capture the minimal amount of sufficient real data required to reconstruct variable resolution images.

Chromatic roots of Boolean functions (12 or 24-pts)
Graham Farr
Project-id: Farr-Chromatic
A colouring of a graph G is an assignment of a colour to each vertex so that adjacent vertices receive different colours. Sometimes one wants to count the number of colourings of a graph with a given number q of available colours. The function P(G;q) of G and q that does this is called the chromatic polynomial of G. This function plays an important role in the theory of counting problems on graphs.
The chromatic polynomial of any graph has a number of zeros, i.e., real or complex numbers z such that P(G;z)=0. These zeros are often known as chromatic roots. The patterns formed by chromatic roots in the complex plane can be very intriguing, and many important unsolved problems in graph colouring amount to questions about these roots.
Some of the theory of chromatic polynomials has been generalised to arbitrary Boolean functions by Kung in 1980. However, no-one has yet investigated the chromatic roots of Boolean functions.
Aim: To investigate the chromatic roots of Boolean functions. This will entail: learning about chromatic polynomials and their extension to Boolean functions; writing programs to compute these generalised chromatic polynomials of Boolean functions; using existing software to find and plot the complex zeros of these functions; writing programs to generate large numbers of Boolean functions of various kinds, in order to investigate the behaviour of their chromatic roots; trying to form hypotheses about the behaviour of the roots; and perhaps even trying to prove some theorems.
You must discuss this project with GF before you nominate it.
[See also: Allison & Farr project]

How many rules are needed to express propagation? (24-pts)
Mark Wallace and Maria Garcia de la Banda
Project-id: WGB-prop
Constraint inference is widely applied for efficiently solving combinatorial problems. Typically, during problem solving, inference is performed at each step in the search for a solution. Instead of running the generic inference process, it is theoretically possible simply to execute a single inference rule tailored to the constraint of interest in order to achieve the same inference. However, the number of alternatives needed for each inference rule is believed to be exponential in the number of variables in the constraint. In this project we explore whether this assumption is true. In particular, we want to explore how many rules are actually needed under different assumptions about the power of the inference. There are to date no published results on this topic, so we can hope that this project will produce a new and publishable result. This also means candidate students should have a solid mathematical background.

References

Le Provost, T., Wallace, M., Domain independent propagation.
Proc. Int.Conf. on 5th Gen. Comp. Systems, pages 1004--1011, 1992
http://citeseer.ist.psu.edu/leprovost92domain.html M. Maher. Propagation completeness of reactive constraints.
Proc. ICLP'02, pages 148-162, 2002

 
The Best Way to Schedule (12 or 24-pts)
Maria Garcia de la Banda and Mark Wallace
Project-id: GBW-schedule

Scheduling problems appear in all walks of life: from airlines to hospitals, and from offices to sports leagues. They involve performing a set of jobs, each requiring a sequence of tasks to be completed using a set of resources under some time constraints. The specific resource and duration for each task is fixed, and two tasks can't use the same resource at the same time. As a simple example, consider a computer game in which the components of a gravity gun, a radiation suit and a bullet-time disruptor need to be built according to the following schedule:

           |     Task_1     |     Task_2     |    Task_3
-------------------------------------------------------------
Grav. Gun  |  6 hours at M1 | 50 hours at M4 | 17 hours at M3
Rad. Suit  | 25 hours at M2 | 15 hours at M1 | 10 hours at M4
Disruptor  | 16 hours at M1 |  5 hours at M2 | 20 hours at M3

where M1,M2,M3, and M4 are four machines. The aim is to schedule the tasks to complete all jobs in the minimum amount of time. These problems can be modelled by associating a variable with the start time of each task (9 variables in our example above), and searching for the best compatible start times. Alternatively, they can be modelled by associating a variable with each pair of tasks on each resource (6 variables), and searching for the best task-orders. This project addresses the assumption that underlies many scheduling algorithms: that deciding task-orders is better. When, if ever, is this assumption false?

Ref: An Algorithm for Solving the Job Shop Problem, Carlier and Pinson, Management Science, Feb. 1989.

Automatic analysis of "no holes" in constraint programs (12 or 24-pts)
Maria Garcia de la Banda, Kim Marriott
Project-id: GBM-constraint
Constraint satisfaction problems involve a set of variables, their domains (i.e., their set of possible values), and a set of constraints determining the allowed combination of values. For example, the N-queens problem tries to place N queens on an NxN chessboard so that they cannot take each other. It can be modelled with Q1,...,QN variables (the N queens), each Qi with domain {1,...,N} (the N columns in which Qi can appear), and constraints ensuring no two queens are in the same column or diagonal.
Solvers collect the constraints and determine their effect on the variables. Finite domain (FD) solvers handle variables with finite domains (e.g., each Qi above has a finite domain of N elements). FD solvers often handle constraints by using either domain or bounds propagation. While the latter is usually less powerful, it is also faster. [1] has shown that domain propagation can be replaced by the more efficient bounds propagation without loss of power if no constraint creates a "hole" in the domain of any variable. While [1] proposes a program analysis to automatically infer the "no hole" property from the problem, the analysis was not implemented. Thus, it is not known how effective it is in practice.
The aim of this project is to implement the analysis, systematically test it on a wide range of problems, determine whether it is accurate enough to effectively guide a domain-to-bounds transformation, and if not, study possible extensions.
[1] Schulte and Stuckey. When do bounds and domain propagation lead to the same search space. ACM. PPDP 2001.

Expert Course Advice - Mechanically (12 or 24-pts)
John Hurst
Project-id: Hurst-Expert

As part of an on-going process to better utilize the advantages of information technology in a university learning environment, a recent student project looked at ways of extracting course and unit information from publically available handbooks, and turning this into an "expert system" that could provide course advice to students.

One outcome from this project was the automatic rendition of a Prolog program that was tailored to a given student at some arbitrary point in his or her course. Knowing what units had been completed, what units the course structure required, and what units might be undertaken in the next semester, the program would not only supply the student with a list of possible units that could be taken over the next semester, but also would show what units had to be taken in order to complete the course with (say) a specialization in a particular field.

The project showed the feasibility of this approach, but did not deliver a workable prototype. This project would be to take the existing work, and (re)engineer it to the point of completing a workable prototype that could be employed via say a web interface, or faculty kiosk.

Knowledge of Prolog would be useful, although not essential. A copy of the thesis may be found at http://www.csse.monash.edu.au/~ajh/research/PfreyThesis.pdf


Implementation of a Password Capability Unix Filesystem (12 or 24-pts)
Carlo Kopp
Project-id: Kopp-walnut
The Walnut password capability operating system has demonstrated the utility and unique access control and security properties inherent in this class of operating system. The aim of this project is to exploit earlier research effort in this area to implement and test an alternative filesystem for Unix (Linux) platforms. This project is suitable for a student with prior Linux kernel experience.
Needs a good Linux-internals background. BCS/BDS student, or a strong BSE student.
Suburban Ad Hoc Networks (24-pts)  
Carlo Kopp & Ron Pose
Project-id: Kopp-sahn
The Suburban Ad Hoc Networking group focusses its research activities on techniques for implementing Suburban Ad Hoc Networks. These are self organising, quasi-static ad hoc (typically wireless) networks which provide an alternative technology for providing high speed digital connectivity to households, small businesses and distributed campuses. Specific areas of research interest include security, low level routing protocols, access controls and propagation behaviour. Given the broad scope of the research performed in this area, there is considerable choice in project topics. Students should consult Dr Ronald Pose or Dr Carlo Kopp for suitable project topics.
See [/research/san/].
Enhancement of Point to Point Protocol with Forward Error Control capability. (24-pts)
Carlo Kopp
Project-id: Kopp-ptp
he topic has the potential to also produce a good journal paper if done properly. The project requires some theoretical work to survey the plethora of FEC codes and codecs, and modification of existing open source PPP code to include LCP changes and the FEC codec. C language skills and some comms understanding required. The main requirement is a smart student. Potential for code release into the GPL domain - PPP is widely used in various open source OS'.

Experimental Ethics (12 or 24-pts)
Kevin Korb & Ann Nicholson
Project-id: Korb-ethics
This project will use evolutionary Artificial Life techniques to examine the evolution of ethical and social behavior. Classically, related techniques have been applied to game-theoretic problems such as the iterated prisoner's dilemma (IPD). Here we will evolve some varieties of altruistic behavior (self-sacrifice, resource sharing, predator warnings, reciprocal sharing, communication, etc.) and of selfish behavior (theft, thuggery, deception, etc.). The ethical merits of different actions (assessed in terms of expected utility) and their fitness and evolutionary stability in various environments will be the main foci of experimental simulations. We may also look at the evolution of utility itself.
Bayesian Poker (12 or 24-pts)
Ann Nicholson and Kevin Korb
Project-id: Nic-Korb-Poker
The Monash Bayesian poker project was started by Kevin Korb in 1993. Since then 5 honours students have worked on various aspects of the project. Our Bayesian poker player (BPP) was originally developed to play 5-card stud poker, using Bayesian network technology. Most recently, BPP has been converted to play Texas Hold'em Poker, the main online form of poker, and re-written in python. BPP has been entered in the inaugural world automated poker playing competition at the main American Artificial Intelligence conference AAAI-2006 and we plan to compete again in 2007. We have developed a simple GUI interface that will allow people to play against BPP via the internet. This project offers lots of options for making BPP a better poker player including improving its bluffing strategies and its opponent modelling. We have access to the AAAI-2006 tournament dgame logs, which provide a rich source of data for automated learning of opponent modelling. Or you might be interested in developing a much more interesting playing interface.

See http://www.csse.monash.edu.au/~annn/poker/poker.html for links to Honours projects, online versions of research papers, and to play against the latest version of BPP online.

Causal Discovery (12 or 24-pts)
Kevin Korb
Project-id: Korb-cause
Causal discovery algorithms learn causal Bayesian networks from data. The oldest of them dates from 1991. At Monash we have developed CaMML (Causal discovery via Minimum Message Length), which "data mines" observational data to find the causal model most probable in light of the data. In this honours project you may choose any one of a number of possible specific problems to investigate, including:
- The proper evaluation of causal discovery algorithms. The common uses of predictive accuracy, unweighted edit distance and Kullback-Leibler divergence are all demonstrably inadequate. We have an alternative "Causal Kullback-Leibler" which is superior, but needs further work. This is also related to metrics of causal power that we are developing, which assess how much causal influence one variable has over another.
- Integrating latent (unobserved) variables into our causal discovery algorithms.
- The philosophy of token causality explicated in terms of causal models (Twardy & Korb "A Criterion of Probabilistic Causation" Phil of Sci, 2004; Korb, Twardy, Handfield and Oppy "Causal Reasoning with Causal Models" Synthese, submitted)
- The mathematics of causal intervention (Korb & Nyberg "The Power of Intervention" Minds and Machines, 2006; Korb, Hope, Nicholson and Axnick "Varieties of Causal Intervention" PRICAI, 2004)
Classification Tree Search (12 or 24-pts)
Kevin Korb
Project-id: Korb-tree
Classification trees are a well-established data mining technique, for supervised machine learning. At Monash we have programs which do greedy search for MML trees and graphs with lookahead; we have also used genetic algorithm search. This project will apply Markov Chain Monte Carlo search to estimating the posterior probability distribution over trees and graphs, which should be more effective. If there is opportunity we can also look at an ensemble search, which finds the best collection of trees and graphs for weighted predictions.
 
See also: Nicholson and Korb and Allison and Korb projects

Evolutionary Models of Global Self-Regulation (12 or 24-pts)
Jon McCormack
Project-id: McCormack-Evo

The 'Daisyworld' model of Ecologist James Lovelock, first published in 1983, is a simple model of planetary self-regulation, bi-stability and homeostasis. It is considered a 'parable' of planetary self-regulation, achieved via feedback between life and its environment. The original model had a simple gray plant, populated by two species of daisy: one black and one white. The number and proportion of the daisy species alter their local temperature in opposite directions (a white daisy reflects solar radiation, a black one absorbs it). The Daisyworld model demonstrated a global regulation of planet surface temperature in response to changes in the amount of solar radiation falling on the planet. This is achieved by a feedback loop between planetary albedo and daisy growth rates.

In 1998, a paper by Robertson and Robinson argued that if there were variation of optimum growth temperature within the daisy population, then the population should adapt towards the prevailing conditions. Robertson and Robinson showed that high rates of adaptation destroyed temperature regulation in Daisyworld. Lenton and Lovelock responded in a 2001 paper, claiming that with bounds on the range of adaptation, regulation was maintained. However, their model showed wide variation for different runs, with the mean of 10 separate runs showing only 'minor temperature regulation'.

In this project you will develop an evolutionary Daisyworld model in order to test the effects of adaptation on self-regulation. The goal is to better understand the effects of mutation rate and adaptive bounds on this model.

Reference: Lenton, T. M. and J. E. Lovelock (2001). "Daisyworld revisited: quantifying biological effects on planetary self-regulation." Tellus 53B: 288-305
[see also Evolutionary ecosystem project co-supervised with Alan Dorin]
An Agent-Based Model of Mexican Waves (12-pts)
Jon McCormack
Project-id: McCormack-Waves

The 2006/2007 summer cricket season was imbued with controversy as Cricket Australia banned the 'Mexican wave' across stadiums in Australia. Many fans reacted angrily, defying the new laws, potentially facing eviction from the ground and heavy fines. A campaign to 'save the wave' was quickly begin, with commentators saying the ban had gone too far, ruining people's enjoyment of the game, and participation as spectators.

The rational given to banning the Mexican wave is that, when throwing their arms in the air, some people choose to throw objects that fall on the people below, potentially risking serious injury.

For this project you will develop an agent-based model of crowd behaviour, designed to test theories about the initiation and propagation of the Mexican wave in stadium crowds. You will need to model agents (representing the people in the stadium) as well as the physical layout of the stadium itself.

Some questions to be asked of the model:

Superformula (12 or 24-pts)
Jon McCormack
Project-id: McCormack-Super

The so-called 'superformula' is generic geometric transformation developed by the engineer Johan Gielis. It is based on the idea that relatively simple parameterised formula can represent a wide variety of geometric shapes. For example, the circle, square, and ellipse are all members of the set |x/a|^n + |y/b|^n = 1. The superformula can represent a large number of natural profiles, particularly those found in plants. The original formulation was confined to two-dimensional space. The aim of this project is to develop a three-dimensional modelling system based on the use of the superformula. Some possibilities include the use of generalised cylinders (using superformula representations for profile and carrier curves), or implicit surface representations. The system developed should be applied to practical modelling of a variety of organic shapes.

For this project you will need to have successfully completed CSE3313 Computer Graphics. Knowledge of OpenGL and a hunger for mathematical visualisation would also be helpful.

Reference: Gielis, J. "A generic geometric transformation that unifies a wide range of natural and abstract shapes", American Journal of Botany 90(3): 333-338. 2003.

 


Human Comprehension of Organisation Charts (24-pts)
Kim Marriott
Project_id: Marriott-Charts
Organisation charts are one of the most common kind of diagrams. However, comparatively little is known about how people understand or use them. In this project eye-tracking equipment will be used to determine how the focus of attention moves around the elements in an organisation chart when it is first seen, and then used in subsequent tasks. Based on this the project will be to develop a model for focus of attention changes when viewing organisation charts and to test the model with eye tracking data. [The project can also be offered for other sorts of visual notations such as UML notations, advanced mathematics, etc]
 
Better Figure Placement in Documents (24-pts)
Kim Marriott
Project_id: Marriott-Figure
Automatic document layout is increasingly important topic because of the need to adapt a document's layout to different viewing environments and to dynamically generated content. An important part of document layout is where to place floating figures in the document. In this project eye-tracking equipment will be used to determine where people expect the figures to occur. Based on this the project will be to develop a function for measuring the quality of layout of the figures in a particular document and incorporate this into an automatic document layout tool.
 
Also see [G de la B] & M.

Bayesian Networks for Epidemiology (12 or 24-pts)
Ann Nicholson and Robin Bell (Women's Health Program, Dept. of Medicine)
Project-id: Nich-Bell-epidemiology
The project is to apply Bayesian Networks (BNs) to women's health epidemiological data to learn factors that are predictors of "well-being". Models will be implemented in CaMML [CaMML], a successful BN learner being developed at Monash, see [CaMML(click)]. The project will involve extending CaMML in various ways, applying it to epidemiology data, and incorporating expert knowledge into the networks.
Knowledge Engineering Dynamic Bayesian Networks for Ecological Risk Assessment (12 or 24 pts)
Ann Nicholson and Carmel Pollino (Monash Water Studies)
Project-id: Nich-Poll-ecology
Bayesian Networks (BNs) are graphical models for probabilistic reasoning, which are now widely accepted in the AI community as intuitively appealing and practical representations for reasoning under uncertainty. One use of BNs is for prediction, and within that general task, for the problem of risk assessment. We have two current projects in ecological risk assessment: (a) assessing the risk for native fish populations of water management interventions and (b) risk management of tropical seagrass. However, to date, the BN modeling in these projects has been done with so-called "static" BNs, where there is no explicit representation of time. Clearly, temporal modelling in such prediction tasks is very important, and it could (and should) be done using an existing sextension to BNs, called Dynamic Bayesian networks. The aim of this project is to develop knowledge engineering techniques and methodologies for Dynamic Bayesian networks, using the ecological risk assessment domains as case studies.
 
 
 
 
 
Also see [Korb & Nicholson].

Integrating auditory and visual stimuli using Self-Organizing Cortical Neural Networks (SoCNN) (12 or 24 pts)
Andrew Paplinski
Project-id:Paplinski-SoCNN
The project continues the development of a computer model of an integration of audio-visual information by human brain using Self-Organizing Cortical Neural Networks (SoCNN). A good starting point for the problem specification is given in our recent report:
http://www.csse.monash.edu.au/publications/2007/tr-2007-210-full.pdf
and the paper:
A P Paplinski and L Gustafsson: Feeback in multimodal self-organizing networks enhances perception of corrupted stimuli, Lecture Notes in Computer Science, Springer-Verlag, vol. 4304, pp 19-28, 2006.
The human cortex can be thought of as a linked two-dimensional hierarchical associative memory system. Such a structure is modeled by our SoCNN. By now we are able to model integration of phonemes and related phonetic symbols/letters. This project will concentrate on the integration of spoken and written words using SoCNN.

Dependable Software Services (12 or 24-pts)
Sita Ramakrishnan
Project-id: SR-services
As Service-oriented architecture (SOA) is being considered increasingly in critical applications by distributed enterprises, the provision of dependable services is becoming an important area of research. In this project, we consider dependability attributes of a software service in a sustainable manufacturing domain. Sustainable manufacturing is the employment of eco-efficient technologies and industry standards for engineering manufacturing systems to minimise environmental burdens of green house emissions and energy use. Dependability attributes can be expressed in terms of timeliness, availability and reliability of the correct service, and trustworthiness attributes such as confidentiality, integrity and maintainability of deployed services.
Providing a predictable level of dependability for services which have been composed from services from various providers is an important requirement. Standards such as SOAP, UDDI and WSDL [1] are being adopted by major web service providers. These services need to establish and adhere to standards and hence, the dependability of service will become a differentiating point for services. Existing standards do not provide sufficient information to make decisions about how dependable and available the various services & components are, and how they fail. Some of the challenges that we will be dealing with respect to component interactions and composition of services in the sustainable manufacturing domain are: how to ensure trust in correctness when dealing with global partners and services composed using their services how to ensure reliability when composing services from these partners who are outside the control of the system ...[see SR for more information]

Feature Weighting in Content Based Image Retrieval (24 or 12 pts)
Sid Ray
Project-id: Ray-feature
The purpose of a Content-Based Image Retrieval (CBIR) system is to retrieve images from a database such that the visual contents of retrieved images are similar to that of a query image. Often the image similarities are computed from the representation of visual contents such as colour, texture and shape, in terms of a multi-dimensional feature vector. Understandably, the features cannot be assumed to have equal weights. More important features deserve more weights and less important features deserve less weights. These weights can be assigned fully automatically from feature variation statistics in the database images. Better still is to use the so called relevance feedback (RF) mechanism in which the weights are modified iteratively based on users' response regarding the relevance of the retrieved images. The aim of this project is to investigate the feature weighting problem for both cases, namely, without RF and with RF.
Small Sample Size Effects on CBIR Accuracy (24 or 12 pts)
Sid Ray
Project-id: Ray-sample
In recent years Content-Based Image Retrieval (CBIR) has become one of the most active areas of research in image data mining. In studies involving the design of a CBIR system often the question arises whether the data set sizes used are big enough to rely on the accuracy achieved. The aim of this second project in CBIR is to study the impact, on accuracy, of the number of samples in the database, the number of samples per semantic category, the number of retrieved images returned to user for RF, the number of features used, and their relative sizes.
Hierarchical Feature Selection for Pattern Recognition (24 or 12 pts)
Sid Ray
Project-id: Ray-hierarchy
In statistical pattern recognition, often patterns are represented by a large number of numerical features. Although there is no conceptual justification in reducing the number of features to a small number, in practical problem solving, this becomes a necessary step due to the wellknown phenomenon of the 'curse of dimensionality' of the feature vector on the complexity of the pattern classifier. The aim of the proposed project is to develop a hierarhical feature selection paradigm that is well-suited to multi-class pattern recognition problems.
The project will comprise (i) study of some existing feature selection criteria followed by an experimental investigation of them, (ii) analysis of the above results leading to, hopefully, a hierarchical feature selection paradigm, and (iii) development of an interactive software tool for the above paradigm.
The methodology developed will be such that depending on the classification accuracy arrived at a certain stage, the user will have the option of increasing or decreasing the dimensionality value. The software tool will include procedures for displaying the distribution of pattern samples in different feature spaces obtained by different feature selection methods.
Texture Analysis of Images (24 pts)
Sid Ray
Project-id: Ray-texture
Texture plays an important role in both human interpretation of visual scenes and computer analysis of images. Textural cues are of particular relevance in two different, but related, image analysis problems, namely, the problems of segmentation and classification of images. The proposed project will deal with both of these problems. It will involve (i) investigating existing texture analysis methods from the point of view of their theoretical soundness as textural measures and (ii) investigating their practical applicability.
Individual dolphin dorsal fin identification (24 or 12 pts)
Sid Ray, David Dowe and Kate Charlton (School of Biological Sciences)
Project-id: Ray-fin
Port Phillip Bay, Victoria, is home to a small and genetically unique population of bottlenose dolphins, Tursiops sp. Individuals within this population have been identified via photo-identification of the dolphins' dorsal fin. Dorsal fins show a unique shape and through time develop permanent marks and notches. Currently, the digital images are being assessed for these permanent marks and assigned an identity by human 'eye'. This can be incredibly time consuming, requires permanent notches on the fin and a trained eye to pick-up small notches. The aim of the project is to create a methodology/software that not only uses the notches but shape of the fin to assign identity with high levels of accuracy.
Some references:
1. http://www.dolphinresearch.org.au

2. J. D. Adams, T. Speakman, E. Zolman, and L. H. Schwacke, "Automatic
Image Matching, Cataloging, and Analysis for Photo-Identification
Research," Aquatic Mammals, Vol. 32, No. 3, pp. 374-384, 2006.
(pdf file available from Sid on request)

3. C. Gope, N. Kehtarnavaz, G. Hillman, and B. Wursig, "An Affine
Invariant curve matching method for photo-identification of marine
mammals, Pattern Recognition, Vol. 38, pp. 125-132, 2005.
(pdf file available from Sid on request)


Development of collection-specific texture features for image retrieval (12 or 24-pts).
David Squire
Project-id: Squire

Independent Component Analysis (ICA) is a statistical technique for discovering hidden factors underlying a set of random variables. ICA can be used to discover filters that can be used to characterize visual textures in a set of images. The filters produced by ICA are adapted to the statistics of the set of images used to derive them. It should thus be possible to use ICA to derive a set of filters specifically tuned to the textures in a collection of images from a restricted domain. We will investigate the discovery and selection of a set of ICA filters for image collections from domains such as dermatology.

The system will be developed within the framework of the GNU Image Finding Tool (GIFT) (http://www.gnu.org/software/gift/). The GIFT is an open framework for content-based image retrieval. Use of the GIFT framework will means that researchers working on a project can focus on the problems of immediate interest and importance to the project, rather than having to develop and entire image retrieval system, including user interface, feature extractors, indexing tools, database accessor, user interface, and feedback/learning system from scratch.

[See also Squire/Tischer project]

 


Bi-clustering of Microarray Data (12 or 24-pts)
Peter Tischer
Project-id: Tischer-cluster
DNA Microarray data can be regarded as a two-dimensional array where
each cell of the array gives the level of activation of a gene. Each column
may represent a particular gene while each row may represent a particular
subject. The aim of bi-clustering DNA microarray data is to identify
groups of genes which behave in a similar way in a group of subjects.
For example, some of the subjects may all have a particular disease
and we may be intereseted in discovering which genes are active when people
have this disease.

This project involves evaluating and comparing a number of techniques
which have been published in the literature. It also involves
implementing some techniques that make use of Minimal Cost Spanning
Trees. This project does not require any special prerequisite knowledge
and could be taken as a 12 point BSE honours project.

Triangle-Projection and its Use in Scientific Visualisation (12 or 24-pts)
Peter Tischer
Project-id: Tischer-vis
In Scientific Visualisation we are interested in taking high dimensional
data and projecting it to two or three dimensions. The 2 or 3 dimensional
data can be displayed as an image and the human visual system is very good
at detecting patterns in pictures. The aim of the project is to implement
and evaluate an approach for projecting higher dimensional data to two
dimensions. It aims to take points which are close together in the
higher dimensional space and keep the projections close together in the
two-dimensional space.

This project does not require students to have any background in computer
graphics and can be taken as a 12 point BSE honours project.

Minimal Cost Spanning Graphs (12 or 24-pts)
Peter Tischer
Project-id: Tischer-mcst
Minimal Cost Spanning Trees (MCSTs) are useful in scientific visualisation,
designing networks, and in hierarchical clustering of data. Minimal Cost
Spanning Graphs (MCSGs) are a generalization of MCSTs. The aim of the
project is to consider how MCSGs can be computed efficiently and how they
can be used in place of MCTs in areas like fault-tolerant networks,
scientific visualisation and hierarchical clustering of data.

This project can be taken as a 12 point BSE honours project.

Evaluating Ways of Evaluating Classifiers (24-pts)
Peter Tischer
Project-id: Tischer-class
In supervised classification we are given information about a number
of items. Each item has a same number of attribute values and it also has
a special attribute, its class label. A classifier is a way of constructing
a function which can be given the attributes for an item and returns a value
for the class label. For example, a spam filter might be given an e-mail
and return a value indicating whether the e-mail is spam or not.

Many different ways of constructing classifiers have been proposed.
However, it is difficult to compare different schemes for constructing
classifiers as the classifiers might be applied to problems with different
numbers of classes. The aim of this project is to implement two methods
for evaluating classifiers and to compare these evaluation techniques against
techniques published in the literature. One of the new evaluation techniques
involves the straightforward application of Minimal Message Length Inductive
Inference (MML).

This project does not require any special background knowledge at third
year level.

Segmenting Skin Images as part of Skin Cancer Diagnosis (12 or 24-pts)
David Squire and Peter Tischer
Project-id: Squire-skin
The aim of this project is to use a database of dermatological images as an aid in detecting and diagnosing skin cancers. This is an example of a content-based image retrieval problem. Given a specific image, the aim is to find those images in the database which are the most relevant. This process can be aided greatly if the user can indicate regions-of-interest so that the image in areas other than the regions-of-interest can be ignored. The aim of the program is to make it easy to specify the region-of-interest without having to select each pixel manually.
 

Inferring biological function from the evolutionary tree (24-pts).
Geoff Webb
Project-id: Webb

This project will involve working with a team of computer scientists and
biologists to analyse biological data about proteins and how they have
evolved in order to infer how they function.
Data mining of protein refolding databases
Geoff Webb and Ashley Buckle (Medicine)
Project-id: Webb-Buckle

The aim of this project is to analyse a large dataset of proteins that are used in refolding experiments, in order to improve our understanding of the refolding process. We hope to discover relationships between a proteins characteristics and its behaviour in the laboratory. We are also very interested using this information to provide scientists with real experimental protocols in a completely automated fashion. See http://refold.med.monash.edu.au.