Honours projects for the Computer Science (BCS) and Software Engineering with Honours (BSE, CSE4402) for 2007.
Note that some BSE projects are 12-pts, and that BCS projects are 24-pts. Some projects come in 24- and 12-pt versions; the latter involves less work.
The Minimum Message (MML) inductive inference approach to machine learning
seeks the shortest explanation of the data by encoding the data in a two-part
message - the first part states a model and the second part gives the details
of the data using the stated model. MML has been successfully used in a wide
range of problems including, clustering and unsupervised classification;
time series; image processing; and data compression.
Support Vector Machines (SVM) are a very popular and important method
for doing classification in machine learning. They been successfully
applied to a number of applications including classification of DNA
microarray expression data; detection of microcalcifications in
mammograms; face detection; novelty detection; and text classification.
However, there are difficulties with the traditional approaches to SVMs.
The main one is that SVMs do not provide probabilities with their
classifications.
The aim of this project is to investigate how to apply MML
inductive inference to SVMs to develop a method of probabilistic
classification that combines the benefits of these two general methods.
[FW02] G. E. Farr and C. S. Wallace.
The complexity of strict minimum message length inference.
Computer Journal, 45(3), pp.285-292, 2002 [doi:10.1093/comjnl/45.3.285]
Given K DNA sequences, or protein sequences, the multiple-alignment problem and the evolutionary-tree problem are "chicken and egg" problems: Given an optimal evolutionary-tree one can search for an optimal multiple-alignment of the K sequences (and we know how to do that [LW94]), and given an optimal multiple-alignment of the K sequences one can search for an optimal evolutionary-tree.
An evolutionary-tree is a special case of a Bayesian net [KN04]: a tree rather than a general DAG, with discrete variables (over {A,C,G,T} if DNA) at each node, known values at all leaves, and missing values at all internal (ancestral) nodes, e.g., see fig.4 [LW94].
The project is to modify the CaMML algorithm for Bayesian nets to infer evolutionary trees given a multiple alignment. If that is successful, the feasibility of simultaneously solving the multiple-alignment and evolutionary-tree problems will be investigated.
The project involves algorithms, probability and machine learning. You should have good results in mathematics, algorithms and data structures, and formal methods, and must have discussed the project with LA or KK before submitting "the form".
[LW94] L. Allison and C. S. Wallace.
The Posterior Probability Distribution of Alignments and
its Application to Parameter Estimation of Evolutionary Trees and
to Optimisation of Multiple Alignments.
Jrnl. Molec. Evol., 39(4), pp.418-430, 1994
http://www.csse.monash.edu.au/~lloyd/tildeStrings/Multiple/94.JME/
or
http://dx.doi.org/10.1007/BF00160274
[KN04] K. B. Korb and A. E. Nicholson.
Bayesian Artificial Intelligence.
Chapman and Hall / CRC, 2004.
References:
Uludag, U.; Pankanti, S.; Prabhakar, S.; Jain, A.K.;"Biometric cryptosystems:
issues and challenges",
Proceedings of the IEEE, Volume 92, Issue 6, June 2004 Page(s):948 - 960.
Important questions that will be addressed by the student include:
What are the necessary and sufficient conditions for the evolution of virtual
ecosystems?
How does group selection compare against individual selection under different
circumstances?
New computer graphics visualisation tools will need to be developed in order
to interpret the results of the simulations conducted during the course of
the project. Students will require a sound programming knowledge (C/C++ preferred,
CSE2305, CSE3308) and experience with computer graphics programming (OpenGL,
CSE3313) and multimedia (CSE2325/3325).
Wade, M.J. 1976. Group selection among laboratory populations of Tribolium. Proceedings of the National Academy of Sciences U.S.A. 73: 4604—4607
Wade, M.J. 1977. An experimental study of group selection. Evolution 31: 134—153
Some introductory reading:
Dawkins, R., The Blind Watchmaker. 1987, New York: W.W. Norton & Co.
Epstein, J.M. and R. Axtell, Growing Artificial Societies, Social Science from the Bottom Up. 1996, Washington D.C.: Brookings Institution & MIT Press.
Yaeger, L. Computational Genetics, Physiology, Metabolism, Neural Systems, Learning, Vision and Behavior or Polyworld: Life in a New Context. in Artificial Life III. 1992: Addison-Wesley.
References
Le Provost, T., Wallace, M., Domain independent propagation.
Proc. Int.Conf. on 5th Gen. Comp. Systems, pages 1004--1011, 1992
http://citeseer.ist.psu.edu/leprovost92domain.html M. Maher. Propagation completeness
of reactive constraints.
Proc. ICLP'02, pages 148-162, 2002
Scheduling problems appear in all walks of life: from airlines to hospitals, and from offices to sports leagues. They involve performing a set of jobs, each requiring a sequence of tasks to be completed using a set of resources under some time constraints. The specific resource and duration for each task is fixed, and two tasks can't use the same resource at the same time. As a simple example, consider a computer game in which the components of a gravity gun, a radiation suit and a bullet-time disruptor need to be built according to the following schedule:
| Task_1 | Task_2 | Task_3
-------------------------------------------------------------
Grav. Gun | 6 hours at M1 | 50 hours at M4 | 17 hours at M3
Rad. Suit | 25 hours at M2 | 15 hours at M1 | 10 hours at M4
Disruptor | 16 hours at M1 | 5 hours at M2 | 20 hours at M3
where M1,M2,M3, and M4 are four machines. The aim is to schedule the tasks to complete all jobs in the minimum amount of time. These problems can be modelled by associating a variable with the start time of each task (9 variables in our example above), and searching for the best compatible start times. Alternatively, they can be modelled by associating a variable with each pair of tasks on each resource (6 variables), and searching for the best task-orders. This project addresses the assumption that underlies many scheduling algorithms: that deciding task-orders is better. When, if ever, is this assumption false?
As part of an on-going process to better utilize the advantages of information technology in a university learning environment, a recent student project looked at ways of extracting course and unit information from publically available handbooks, and turning this into an "expert system" that could provide course advice to students.
One outcome from this project was the automatic rendition of a Prolog program that was tailored to a given student at some arbitrary point in his or her course. Knowing what units had been completed, what units the course structure required, and what units might be undertaken in the next semester, the program would not only supply the student with a list of possible units that could be taken over the next semester, but also would show what units had to be taken in order to complete the course with (say) a specialization in a particular field.
The project showed the feasibility of this approach, but did not deliver a workable prototype. This project would be to take the existing work, and (re)engineer it to the point of completing a workable prototype that could be employed via say a web interface, or faculty kiosk.
Knowledge of Prolog would be useful, although not essential. A copy of the thesis may be found at http://www.csse.monash.edu.au/~ajh/research/PfreyThesis.pdf
See http://www.csse.monash.edu.au/~annn/poker/poker.html for links to Honours projects, online versions of research papers, and to play against the latest version of BPP online.
The 'Daisyworld' model of Ecologist James Lovelock, first published in 1983, is a simple model of planetary self-regulation, bi-stability and homeostasis. It is considered a 'parable' of planetary self-regulation, achieved via feedback between life and its environment. The original model had a simple gray plant, populated by two species of daisy: one black and one white. The number and proportion of the daisy species alter their local temperature in opposite directions (a white daisy reflects solar radiation, a black one absorbs it). The Daisyworld model demonstrated a global regulation of planet surface temperature in response to changes in the amount of solar radiation falling on the planet. This is achieved by a feedback loop between planetary albedo and daisy growth rates.
In 1998, a paper by Robertson and Robinson argued that if there were variation of optimum growth temperature within the daisy population, then the population should adapt towards the prevailing conditions. Robertson and Robinson showed that high rates of adaptation destroyed temperature regulation in Daisyworld. Lenton and Lovelock responded in a 2001 paper, claiming that with bounds on the range of adaptation, regulation was maintained. However, their model showed wide variation for different runs, with the mean of 10 separate runs showing only 'minor temperature regulation'.
In this project you will develop an evolutionary Daisyworld model in order to test the effects of adaptation on self-regulation. The goal is to better understand the effects of mutation rate and adaptive bounds on this model.
The 2006/2007 summer cricket season was imbued with controversy as Cricket Australia banned the 'Mexican wave' across stadiums in Australia. Many fans reacted angrily, defying the new laws, potentially facing eviction from the ground and heavy fines. A campaign to 'save the wave' was quickly begin, with commentators saying the ban had gone too far, ruining people's enjoyment of the game, and participation as spectators.
The rational given to banning the Mexican wave is that, when throwing their arms in the air, some people choose to throw objects that fall on the people below, potentially risking serious injury.
For this project you will develop an agent-based model of crowd behaviour, designed to test theories about the initiation and propagation of the Mexican wave in stadium crowds. You will need to model agents (representing the people in the stadium) as well as the physical layout of the stadium itself.
Some questions to be asked of the model:
The so-called 'superformula' is generic geometric transformation developed by the engineer Johan Gielis. It is based on the idea that relatively simple parameterised formula can represent a wide variety of geometric shapes. For example, the circle, square, and ellipse are all members of the set |x/a|^n + |y/b|^n = 1. The superformula can represent a large number of natural profiles, particularly those found in plants. The original formulation was confined to two-dimensional space. The aim of this project is to develop a three-dimensional modelling system based on the use of the superformula. Some possibilities include the use of generalised cylinders (using superformula representations for profile and carrier curves), or implicit surface representations. The system developed should be applied to practical modelling of a variety of organic shapes.
For this project you will need to have successfully completed CSE3313 Computer Graphics. Knowledge of OpenGL and a hunger for mathematical visualisation would also be helpful.
Reference: Gielis, J. "A generic geometric transformation that unifies a wide range of natural and abstract shapes", American Journal of Botany 90(3): 333-338. 2003.
2. J. D. Adams, T. Speakman, E. Zolman, and L. H. Schwacke, "Automatic
Image Matching, Cataloging, and Analysis for Photo-Identification
Research," Aquatic Mammals, Vol. 32, No. 3, pp. 374-384, 2006.
(pdf file available from Sid on request)
3. C. Gope, N. Kehtarnavaz, G. Hillman, and B. Wursig, "An Affine
Invariant curve matching method for photo-identification of marine
mammals, Pattern Recognition, Vol. 38, pp. 125-132, 2005.
(pdf file available from Sid on request)
The system will be developed within the framework of the GNU Image Finding Tool (GIFT) (http://www.gnu.org/software/gift/). The GIFT is an open framework for content-based image retrieval. Use of the GIFT framework will means that researchers working on a project can focus on the problems of immediate interest and importance to the project, rather than having to develop and entire image retrieval system, including user interface, feature extractors, indexing tools, database accessor, user interface, and feedback/learning system from scratch.
This project involves evaluating and comparing a number of techniques
which have been published in the literature. It also involves
implementing some techniques that make use of Minimal Cost Spanning
Trees. This project does not require any special prerequisite knowledge
and could be taken as a 12 point BSE honours project.
This project does not require students to have any background in computer
graphics and can be taken as a 12 point BSE honours project.
This project can be taken as a 12 point BSE honours project.
Many different ways of constructing classifiers have been proposed.
However, it is difficult to compare different schemes for constructing
classifiers as the classifiers might be applied to problems with different
numbers of classes. The aim of this project is to implement two methods
for evaluating classifiers and to compare these evaluation techniques against
techniques published in the literature. One of the new evaluation techniques
involves the straightforward application of Minimal Message Length Inductive
Inference (MML).
This project does not require any special background knowledge at third
year level.