CSSE / Monash Honours Research Projects 2002
About CSSE Courses Our People Research Student Information Community Links Internal Information

Computer Science, Digital Systems

(NB: This list may change up until mid-February 2002. Supervisors may also offer projects that are not explicitely mentioned in this list. It pays to talk to potential supervisors who research in areras that you are interested in.)

These are the projects being offered for honours research projects for Honours students in 2002.

Notes

Supervisors


The CSSE Interactive Video Wall

David Abramson

The School of CSSE operates a MPEG-2 low latency video wall connection between the staff common rooms at the Caulfield and Clayton campuses. The system consists of a set of FVC CODECS, high output data projectors and cameras.

This project concerns development of a number of supporting systems for the video wall. These include technologies for optimizing the field of view, the addition of software to detect when users are present and also for varying the network bandwidth requirements automatically. Students will be exposed to some leading edge digital video equipment and will be required to interface with the University networking hardware. The project will may involve both hardware and software, depending on the skills that the student has.

Multi-user games for an interactive video wall

David Abramson

The School of CSSE operates a MPEG-2 low latency video wall connection between the staff common rooms at the Caulfield and Clayton campuses. The system consists of a set of FVC CODECS, high output data projectors and cameras.

This project will explore ways of supporting conventional group games like board games, cards and chess, etc, across the video wall. This would allow users to play games where no physical contact is possible.

Master-Slave Parallel Processing using Nimrod

David Abramson

This project concerns extensions to a parallel computing paradigm embodied in the tools like Nimrod and Clustor. These tools support the parallel execution of computational models which are totally independent. Consequently, there is no interprocess communication required and data is usually transmitted via the file system. However, there are a class or problems that can be supported by minimal communication between a master process and a set of slaves. For example, a parallel genetic algorithm might perform most of its computations by separate slave processes, but still require occasional communication with a master which recombined the results from the slaves and redistributes them.

This project will address efficient mechanisms for allowing a master-slave type parallel program to execute under Nimrod, thereby extending its capabilities.


Iterative algorithms for Support Vector Machines generation

David Albrecht

Joint supervisor: Adam Kowalczyk and Herman Ferra (Telstra Research Labs)

Recently, Telstra Research Laboratories has been investigating a machine learning method known as Support Vector Machines (SVMs). They have shown that SVMs possess interesting properties and have demonstrated that they give excellent performance in a number of practical applications, in particular text categorisation. In typical applications, labelled examples are scarce, while unlabelled examples are plentiful. Learning algorithms which can effectively utilise the unlabelled examples will have a definite advantage. In this collaborative project with Telstra Research Laboratories, we will investigate developments of algorithms suitable for such learning paradigm. We shall focus on transductive inference using Support Vector Machines in particular.

Clustering (alias Supervised Classification, Mixture Modelling, etc.)

Lloyd Allison

The project is to add an MML clustering "plugin" to the CDMS data-mining system. This can be done in one of two ways
  1. in Java using the CDMS library of probability distributions etc.
  2. by modifying an existing `C' program and joining it to CDMS through the C:Java interface.
It could well be worthwhile to do both.

Once the basic clustering plugin is added and thoroughly tested, extensions to e.g. [1-D time-series] data and higher-dimensions, e.g. images, could be considered.

Graph Based Models for Data Mining

Lloyd Allison and Trevor Dix

Workers in CSSE have investigated MML [decision-trees], decision graphs (more properley classification-trees etc.) and related structures for data mining for the `supervised classification problem'. (For references, search for Dtree or for Dgraph in the [bibliography].)

The project is to develop a "plugin" for this kind of Model for the CDMS data-mining system.

Artificial Neural Net for CDMS

Lloyd Allison and David Dowe

An artificial neural network (ANN) "plugin" is being developed for the CDMS data-mining system. The project is to develop
  1. the MML theory (and its implementation) of ANNs,
    1. which will allow us to choose between "simple" and "complex" ANNs thus
    2. preventing over-fitting,
    and
  2. the control and visualization of the ANN through the CDMS GUI.

Bioinformatics: Analysing Microarray Data

Lloyd Allison and Trevor Dix

A `micro-array' experiment can show the `level of expression' of tens of thousands of genes on a single microscope slide. Unfortunately the "noise level" in such experiments is high, so they are usually replicated several times for each set of experimental conditions but even so correctly dealing with the noise is difficult. Microarrays are often used to compare gene expression levels of an organism under two or more conditions. e.g. 1: Under normal conditions and when some control gene is mutated. e.g. 2: At different times in a life-cycle, or in some other cycle.

The project is to create tools in the CDMS data-mining system to analyse micro-array data. This may involve developing statistical models of the problem, working out the (MML) complexity of such models, and implementing the above as a plugin to CDMS with controls and visualization through the CDMS GUI.

Parallel Programming Language [-LA]

Lloyd Allsion

A simple notion of `parallel-process' consists of:
  1. Channel, (typed) along which Values are sent and received.
  2. I/O action, to send or receive a Value on a channel.
  3. Process which carries out a sequences of actions and may stop.
  4. Composition of processes, in parallel or by non-deterministic choice.
Note that channels, actions and processes are Values and can be passed through channels.

A proof-of-concept, untyped interpreter exists. The project is to

  1. develop a practical, typed system, including
    1. execution in more than one (Unix-)process and
    2. on more than one computer,
    and
  2. investigate the use of the system in
    1. typical resource-management ~ operating-system tasks, and/or
    2. generalized search and optimization algorithms.

Overloading and/or Classes in Functional Programming

Lloyd Allsion

A light-weight functional programming language is being developed. It has a parametric polymorphic type system. The project is to explore one or more ways of incorporating overloaded operators and functions
  1. by a class system, as in Haskell where
    1. each Value has a Type,
    2. each Type can belong to one or more Classes, and
    3. each Class is specified by the operators and functions that must be defined for Types in that Class,
  2. by declaring certain operators and/or functions to be overloaded.

The system is intended to be used

  1. for general purpose programming, and
  2. (coincidentally) as the basis of a scripting language in the CDMS data mining system.

TRANSMISSION CONTROL PROTOCOL (TCP) FLOW CONTROL SIMULATION AND VISUALIZATION

Jim Breen

TCP is probably the most important protocol in the Internet, and its ability to respond appropriately to errors and packet loss is of vital importance. TCP's flow control mechanisms are fully understood by very few people, and are in fact quite difficult to teach. This proposed project is to develop a simulation of TCP that can be used as a teaching and demonstration tool. The simulation would be embedded in a visualization environment that would enable the user to: (a) select particular connection conditions between the two TCP clients (RTT, packet loss, available bandwidth) (b) select different TCP implementation options (Reno, Reno 2, Long Fat Pipes, etc.) (c) make adjustments to these while the simulation is running and observe the packet-by-packet operations of TCP and its reaction to the changing environment. The project would involve developing or modifying an implementation of TCP, including all the latest extensions (something Microsoft has not yet done).

Generic interactive theorem proving

John Crossley with Iman Poernomo

This project involves the development of an environment for (visual) interactive theorem proving. As a basic requirement, the environment should permit the user to write proofs in traditional predicate logic. However, logicians study many other logics. The design of our environment should be generic enough to represent proofs of some of these other logical systems. This will require approaching the project with a meta-level view. Thus it will require NOT the implementing of well-known logical rules but a general framework for taking (almost) anything that could be regarded as a logical rule and automatically encoding all instances of it. The student will be expected to

The student has a choice of implementation language: .NET, C# or Java. It is assumed that, on completion of the project, the student will have a good working knowledge of XML and reflection. However, no prior knowledge of these concepts is assumed. A familiarity with at least the formal logic discussed in Formal Methods I is essential.


Assembling Genomic Tag Sites

Trevor Dix

This is a bioinformatics project applying computer science to analyse data from microbiology, in this case genomic DNA information. Fortunately, we can view this problem purely as computer science. The problem involves assembling fragments of DNA in a consistent way. The fragments have been marked at so called tag sites. In effect, we need to align these sites. Trouble is the data is noisy. Some sites should have tags and don't; other sites are wrongly marked. Fortunately we can apply the minimum-message-length (MML) principle that was developed at Computer Science at Monash (CSSE) to this problem. We can use MML as we search for the best explanation of the observed data, including the noise. We also need to be able to view the results.

Clustering Microarray Data

Trevor Dix and Lloyd Allison

New technology allows tens of thousands of test tubes to be replaced by a single microscope slide - a `microarray'. There are two types - one using VLSI-like masks to built up arrays of molecules on a slide, the other using inkjet printer-like techniques to print a library of molecules onto a slide. Each dot on the microarray binds to complementary messenger RNA, mRNA, labelled with a flourescent dye. Spots where labelled mRNA is bound are detected using a scanner. Microarrays are of great interest to the Victorian Bioinformatics Consortium of which several CSSE staff are members. Microarrays can be used to reveal gene expression under varied experimental conditions. The problem is that the data are very "noisy" with considerable experimental error, variation and noise. The project is to use machine learning techniques developed in CSSE to extract significant results from the raw data.

 

Alan Dorin

The projects Alan Dorin and Jon McCormack are offering in graphics, animation and artificial life can be found at: [ http://www.cs.monash.edu.au/~cema/projects/hons2001/hons.html].

Jon McCormack

Jon will be on OSP in semester 1 and will not offer any projects.

Linda McIver

Linda McIver undertakes research in the areas of She may also offer other projects in these areas that are not listed here.

Automated Web Navigation

Linda McIver

Navigation information is a critical component of a usable website. Many sites provide incomplete site maps, with poor categorisation, so it can be very difficult to track down the information you need. Once you are deep in the site hierarchy, it can be particularly difficult to work out where you are in relation to the rest of the site, and what else the site offers. Good navigation information provides answers 3 questions: Where am I? Where can I go from here? Where do I want to be? Much of this information can be extracted quite simply from collections of pages, in order to construct a consistent navigation mechanism for each page. This project aims to provide tools for automatic extraction of navigation information, as well as construction of customisable navigation mechanisms for a website. A natural extension of this might be an automated indexing system which uses the text on each page to construct a site map to guide the user to the desired information quickly and easily.

Visualization of Web Queries Results and Website Structure

Bernd Meyer

With the rapidly increasing size of the world-wide web, the visualization of web sites, their contents and structure is quickly becoming a neccessary tool for managing the complexity of the web. Visual representations are extremely helpful for analyzing and interpreting the results of world-wide web searches, and they can also be used to understand and optimize the structure of individual sites or intra-nets.

The project builts upon previous projects for the visualization of query results and web site structures. It will aim to create systems that provide effective visualizations based on the combination of different types of information: (a) structure-based representations and (b) contents-based visualizations. Structure-based visualizations analyze the link structure and/or traffic information and use them to produce typically network-like site maps. Contents-based visualizations analyze the text contents (and/or keywords, metatags etc.) in web pages and produce displays that make it easier to judge the similarity of two pages.

We will put particular emphasis on the utilization of innovative visualization metaphors, such as spatialization, and the questions how these can be used in a browser to support efficient explorative web searches.

Experience in Java programming is required and knowledge about interfacing with web browsers and plug-in programming would certainly be a plus.

Artificial Immune Systems for Plagiarism Detection

Bernd Meyer

Artificial Immune Systems (AIS) are a new class of nature-inspired methods that are becoming increasingly popular in security applications such as network intrusion detection and virus protection as well as for adaptive optimization.

In a broad sense, AIS are close relatives of genetic algorithms (GA). Their function is to learn to distinguish known or expected patterns (such as types of network traffic) from patterns previously not encountered. AIS algorithms are inspired by and roughly modelled after the function of the human immune system.

Many of the basic building block of AIS resemble components of GAs and AIS can indeed be interpreted as performing a kind of fast micro-evolution. However, the objective of AIS is very different from that of GA: AIS generalize, whereas genetic algorithms specialize. More precisely, AIS generate or "evolve" a large number of partial recognizers that, taken as a whole population, cover all known cases. This means that the objective of their learning process is generalization. Genetic algorithms, on the other hand, attempt to breed a population in which each individual is specialized towards the same narrowly defined objective.

AIS have previously been proposed and used for various kinds of intrusion and fraud detection, such as network intrusion, credit card fraud etc. This project will explore how the ideas of AIS can be used to support the automatic detection of plagiarized program code. The basic idea is that the system will be "immune" to previously encountered programs so that it will have an "immune reaction" for every genuinely new solution. However, in the case of plagiarized code, no immune reaction will take place, because the AIS will already have been immunized with related forms of stimuli.

Declarative Languages for Reasoning with Diagrams

Bernd Meyer and Maria Garcia de la Banda

Diagrams are extensively used not only in scientific and technical disciplines, but also in many areas of everyday communication. The long term purpose of this project is to provide computational methods and infrastructure for smart user-interfaces that communicate with the user via diagrams. These systems need to be able to interpret or "understand" diagrams that the user draws, translate them into other specifications and to generate such diagrams as output from specifications. The project aims to create a declarative logic language for computational reasoning with diagrams. Constraint logic programming (CLP) languages are a good basis for such an undertaking due to their high-level nature and their support for geometric constraints.

Three honours sub-projects in this area are immediately available. These are intended to supply the technical groundwork for diagrammatic reasoning languages.

Students wanting to tackle these projects should preferably have a working knowledge of Prolog.

Layout of Complex Diagrams with Evolutionary/Genetic Algorithms

Bernd Meyer

Automatic layout of diagrams is of growing practical importance in many application areas, but it proves to be conceptually difficult and computationally expensive. The layout of simple node-edge graphs is meanwhile, after more than a decade of dedicated research, relatively well understood, but little attention has been paid to more complex types of diagrams such as state charts or message sequence charts in UML.

The project will study the application of stochastic optimization methods, such as genetic algorithms, to the layout of more complex types of diagrams. We are particularly interested in the layout of Hi-Graphs and state charts. Important questions are the definition of appropriate layout esthetics, the design of evolutionary methods for these esthetics, and possibly the hybridization of such methods with other layout techniques.


David Dowe

David Dowe is interested in Minimum Message Length (MML) inductive inference. The MML principle is particularly useful in machine learning, statistics, "knowledge discovery" and "data mining". Both theoretical and applied projects are available, some of which are listed below, and all of which you should feel free to discuss with David Dowe. Areas of interest include clustering and mixture modelling, the von Mises circular distribution, single and multiple factor analysis, supervised learning, decision trees and decision graphs with or without leaf regressions, sequentially and spatially correlated data, protein folding, DNA string alignment, the human genome project and market forecasting. All of these would be done by MML. I am also interested in MML inference of neural nets. There is no need to have done any 3rd year subject on AI for any of these projects.

Hypothesis testing and model selection by MML and other methods

David Dowe

We compare various forms of hypothesis testing in machine learning, statistics and ``data mining''. Some and possibly all of the tests we consider will be: classical, (maximum) likelihood-based tests, (Bayesian) Minimum Message Length (MML) tests and other, non-MML, Bayesian tests. We will compare MML and other Bayesian hypothesis testing with classical hypothesis testing. Preliminary results indicate that classical hypothesis testing does not work as well as Bayesian hypothesis testing. Nonetheless, we endeavour to explain why classical hypothesis testing works as well as it does. A good mathematical background will be necessary, and a background in statistics would be an unnecessary bonus.

References :

Wallace, C.S. and D.M. Boulton (1968), An information measure for classification, Computer Journal, 11(2), pp185-194.

Wallace, C.S. and D.L. Dowe (1999). Minimum Message Length and Kolmogorov Complexity, Computer Journal (special issue on Kolmogorov complexity), Vol. 42, No. 4, pp270-283. http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/ http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/pdf/420270.pdf

A part of a chapter in Em. Prof. Chris Wallace's book.

MML, Occam's razor, inference and prediction

David Dowe

The Minimum Message Length (MML) principle of machine learning, statistics, inductive inference, "knowledge discovery" and "data mining" is an operational form of Ockham's razor. MML (Wallace and Boulton, 1968; Wallace and Dowe, 1999) gives a quantitative trade-off between the simplicity of a theory and how well if fits the data, and seeks a simple theory that fits the data well. An attempt to discredit Ockham's razor was given by Murphy and Pazzani (1994), and this has been responded to by Needham and Dowe (2001). A more recent attempt to discredit Ockham's razor has been given by Webb (1996). This work seems to leave itself open to many responses, both in terms of theory and empirical experiments. The Hons. project is to continue the work of Needham and Dowe (2001) and, at least initially, to respond to (and refute?) Webb (1996).

A good mathematical background will be necessary.

References :

Needham, S.L. and D.L. Dowe (2001). Message Length as an Effective Ockham's Razor in Decision Tree Induction. Proc. 8th International Workshop on Artificial Intelligence and Statistics (AI+STATS 2001), pp253-260, Key West, Florida, U.S.A., Jan. 2001.

Wallace, C.S. and D.M. Boulton (1968), An information measure for classification, Computer Journal, 11(2), pp185-194.

Wallace, C.S. and D.L. Dowe (1999). Minimum Message Length and Kolmogorov Complexity, Computer Journal (special issue on Kolmogorov complexity), Vol. 42, No. 4, pp270-283. http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/ http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/pdf/420270.pdf

C.S. Wallace and J.D. Patrick, `Coding Decision Trees', Machine Learning, 11, pp7-22, 1993.

G. I. Webb, Further Experimental Evidence against the Utility of Occam's Razor, J. Artif. Intell. Research 4 (1996), pp397-417.

MML Inference of Support Vector Machines

David Dowe (and David Albrecht).

Minimum Message Length (MML) is a universal principle in machine learning, "data mining" and statistics. It dates back to Wallace and Boulton (1968) and is closely connected to the notion of algorithmic information theory (Wallace and Dowe, 1999). An increasingly popular approach in machine learning is Support Vector Machines (SVMs) (Vapnik, 1995). SVMs have been used in various problems, including classification, regression analysis and the building of decision trees. They possess interesting properties and have been demonstrated to give excellent performance in a number of both theoretical and practical applications. Recently some work has been done in presenting SVMs in an MML framework. The aim of this project is to continue this work and to break new ground by applying the SVMs to various problems such as classification of data into multiple classes. Possible approaches will include a combinatorial coding scheme and a scheme for encoding the separating hyperplane. A good mathematical background will be necessary. Programming might be in Java, but this is negotiable.

References:

Vapnik, V.N. (1995), The Nature of Statistical Learning Theory, Springer.

Wallace, C.S. and D.M. Boulton (1968), An information measure for classification, Computer Journal, 11(2), pp185-194.

Wallace, C.S. and D.L. Dowe (1999), Minimum Message Length and Kolmogorov Complexity, Computer Journal, 42(4), pp270-283.

Well-behaved long-period pseudo-random number generators

David Dowe

Pseudo-random number generators (RNGs) are useful for computer games including dice-rolling, card-dealing and bingo, as well as for computer simulation and for search strategies such as simulated annealing and genetic algorithms. For a pseudo-RNG to be of any practical use in these and other applications, it needs to have a long period (length of cycle) and to exhibit very low levels of correlation between the bits it generates and other signs of apparently random behaviour. This project will initially involve tests of randomness (as given by D. E. Knuth and others) on an RNG of period (2^32) * (2^32 - 1) due to C.S. Wallace (1989), as well as other RNGs due to G. Marsaglia and others (see WWW site below). A main aim of this project will be to extend the Wallace generator to periods (2^32) * (2^32 - 1) * (2^32 - 3) and higher while still maintaining its randomness.

Reading : 1) C. S. Wallace, A long-period pseudo-random generator, Tech Rept #89/123, Dept of Computer Science, Monash University, February 1989.

2) Find Random Number Generator by doing the following:

      1.  Go to here
      2.  Click on " C Programming " on left of the screen
      3.  Click on " Code Snippets "
      4.  Under the heading " Portable functions and headers " click on
          " Random number functions "
      5.  Click on " Rand1.C "

MML Inference of Musical Composition Style

This project aims to develop programs for inferring something about a musical composer's composition style, just from records of pieces written. Initially, a program capable of inferring very simple aspects of composition - such as rhythm, melody and/or pitch - would be developed, and tested. The testing would include inference of style of appropriately simple computer composers and inference of style of human composers. The testing would also include listening to some of the compositions made using the inferred composition style(s). A long-term and at least initially unrealistic goal is for the program to infer a composition style which can then write pieces both more complicated than nursery rhymes and pleasant to the ear. Some basic methods have been developed for doing this inference (Macmillan, 1997) and related inference (Jansen, Dowe and Farr, 2000); and we hope to extend these. Several inference techniques are possible - among these, we intend to apply the principles of Minimum Message Length (MML) inference. Programming will almost certainly be in Java, C++ or C. A good mathematical background and an interest in Minimum Message Length (MML) will be necessary, and a background in music would be a bonus. The student would be expected to become familiar with most of the references below.

References : ------------

Jansen, A.R, D.L. Dowe and G.E. Farr (2000), Inductive Inference of Chess Player Strategy. Proceedings of the Sixth Pacific Rim International Conference on Artificial Intelligence (PRICAI'2000), Lecture Notes in Artificial Intelligence (LNAI) 1886 (Copyright Springer-Verlag), pp61-71. http://www.csse.monash.edu.au/~tonyj/pricai2000cr.ps

Macmillan, G. Brendan (1997), "MML Analysis of Melody", Master of Computing thesis, Faculty of Information Technology, Monash University, 1997.

Wallace, C.S. and D.M. Boulton (1968), An information measure for classification, Computer Journal, 11(2), pp185-194.

Wallace, C.S. and D.L. Dowe (1999). Minimum Message Length and Kolmogorov Complexity, Computer Journal (special issue on Kolmogorov complexity), Vol. 42, No. 4, pp270-283. http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/ http://www3.oup.co.uk/computer_journal/hdb/Volume_42/Issue_04/pdf/420270.pdf

Inductive Inference of Game Player Strategy.

David Dowe

This project aims to develop programs for inferring something about a game player's strategy, just from records of games played.

The sort of games considered here are board games like Chess. A long term aim is to be able to take as input a record of the chess games of (e.g.) Garry Kasparov (World Champion), and infer something (but not everything!) about his chess playing strategy. This may be an ambitious goal, but we propose to move toward it in achievable steps. Initially, a program capable of inferring very simple aspects of strategy would be developed, and tested using records of games played by appropriately simple computer players. We have developed some basic methods for doing the inference, and expect to improve on them. Several types of inference are possible; among these, we intend to apply the principles of Minimum Message Length (MML) inference. This project builds on a 1997 project by Tony Jansen. (See http://www.csse.monash.edu.au/~dld/chess.html.)

Programming will almost certainly be in C. Please consult with the supervisor on recommended subjects to take.

Evolving intelligence

David Dowe

Intelligence would appear to consist of three main components : rote learning (memory), deductive learning (logical or mathematical reasoning) and inductive inference (generalisation, abduction, induction or pattern recognition). Human-devised intelligence test questions seem to be very much about inductive inference. Inductive inference can, in turn, be thought of as being about two-part compression, where the first part of a compressed data string encodes an inferred hypothesis about the data string and the second part of the compressed string encodes the data given the hypothesis. In this project, using fitness functions related to compressive ability, we use genetic algorithm (GA) techniques to evolve an artificial intelligence which is capable of doing inductive learning and two-part compression in an artificial life environment. Reading :
D L Dowe and A R Hajek (1998). A non-behavioural, computational extension to the Turing Test, pp101-106, Proceedings of the International Conference on Computational Intelligence & Multimedia Applications (ICCIMA'98), Gippsland, Australia, February 1998.
http://www.dartmouth.edu/~phil/events/LoebnerPrize2000.html

Topological Containment

(for BCompSc/BDigSys/BSE Hons, but scaled down for BSE)

Graham Farr

The aim of this project is to write programs to assist in discovering new mathematical results and algorithms on graphs.

We say that a graph H is topologically contained in a graph G if G has a subgraph that can be obtained from H by replacing edges with paths. For example, let G be a graph whose vertices represent countries in Europe with two vertices being adjacent if the countries they represent share a land border. Let H be a graph with four vertices u, v, w, x, with each pair of vertices being adjacent, so its six edges are uv, uw, ux, vw, vx, wx. Then G topologically contains H, by the following correspondence: u = Switzerland, v = France, w = Germany, x = Italy; edges uv, uw, ux, vw and vx (in H) correspond to paths (in G) consisting of single edges joining the relevant countries (e.g., uv corresponds to the simple path Switzerland-France), but edge wx corresponds to a slightly longer path: Germany-Austria-Italy. Note that these paths in G cannot intersect, except at their endpoints. In general, the paths will often not be single edges, and may be quite long.

Topological containment has been an important concept in graph theory for decades. It has been used to characterise many important classes of graphs (trees, planar, series-parallel ...). It is therefore of interest to find efficient algorithms for determining when H is topologically contained in G. However, if G and H are both allowed to vary, as part of the input, then the problem is NP-hard.

Suppose we fix H, so the input consists just of G. When given such an input graph G, we ask whether it topologically contains our fixed graph H. This gives a new problem for each fixed H. The good news is that, for each H, the problem can be solved in polynomial time (due to Robertson & Seymour). The bad news is that the polynomials that measure the running time are so huge that the algorithms are unlikely to ever be practical in our universe. So the quest for practical algorithms continues.

Practical algorithms are known only for a small number of fixed graphs H. Work in this area seemed to stall somewhat about ten years ago, due partly to the prohibitive number of cases that had to be considered in the mathematical analysis. It should now be worth trying to write a program to do this case analysis. This may lead to new mathematical results and new practical algorithms.

This project is part of joint research planned with Prof Mike Fellows (U of Newcastle) whose book, Parameterized Complexity (co-authored with Prof Rod Downey), considers this and many other problems.

You do not need a maths degree to do this project! You should, however, have aptitute in theoretical subjects like Algorithms and Data Structures, Formal Methods I and II.

It may be sensible to use LEDA (= Library of Efficient Data types and Algorithms), which has many programs and data structures for graphs and is written in C++.

Graph Clustering via MML

(for BCompSc/BDigSys Hons)

Graham Farr

Suppose you have a large graph, which may represent a software engineering diagram, an electronic circuit, a human social network, or whatever. In many applications it is useful to be able to identify clusters within the graph: nonoverlapping sets of vertices that are more highly connected, among each other, than the graph as a whole. In the full form of the problem, we allow clusters within clusters and so on, too.

One area where this may be useful is in Graph Drawing, where we seek to represent graphs on a (usually) 2-D surface in a pleasing way, minimising things like area, numbers of bends, crossings, etc. The clusters can be drawn separately, then these drawings can be arranged with respect to each other so that the whole graph can be well drawn. Thus, clustering allows a large graph drawing problem to be broken into several smaller ones.

In this project we use the Minimum Message Length (MML) principle (based on information theory) to measure how good a clustering is. The aim will be to produce programs that find MML clusterings of graphs, compare these with clusterings obtained by other methods, and to use the programs to produce good drawings of graphs.


Faculty Timetabling

Maria Garcia de la Banda

The constraint programming (CP) paradigm has had remarkable industrial success even though it is quite recent. This is because CP languages currently provide one of the best approaches for developing efficient solutions to many combinatorial problems such as planning, scheduling, routing and investment.

This summer scholarship deals with existing room timetabling application developed for a faculty in a Germany University using the constraint logic programming language CHIP. The aim of the project is to (a) port the timetabling application to the logic programming language SICStus and (b) to modify it to deal with the timetabling requirements of the Monash IT Faculty. The student should, preferably, have a knowledge of Prolog and graphical user interfaces.

Compiling OPL to HAL

Maria Garcia de la Banda, Kim Marriott

The constraint programming (CP) paradigm has had remarkable industrial success even though it is quite recent. This is because CP languages currently provide one of the best approaches for developing efficient solutions to many combinatorial problems such as planning, scheduling, routing and investment.

OPL is a recent high-level mathematical modelling language for describing combinatorial problems. HAL is a constraint programming language which is currently being developed by the Optimization & Constraint Solving research group in collaboration with groups outside Monash University. The aim of the project is to develop a OPL-to-HAL compiler.

The student should be either 3rd or 4th year and, preferably, have a knowledge of Prolog. We are ready to fund an additional two-four weeks in order to facilitate the completion of the project.


Charles Greif

Charles Greif will not be offering any projects.

Use Of Attribute Grammars To Construct XML Translators

John Hurst

XML translators provide a mechanism for manipulating documents written in the extensible markup language XML, and performing arbitrary translations on them. XML documents represent an evolving standard for information management that allows the direct creation web documents, paper documents, and database records. Derived from the SGML (Standard General Markup Language) standard, XML provides for the separation of document structure and document presentation, and is a very flexible document protocol.

This project is to develop an attribute grammar (AG) based description of possible XML translations, in much the same style as is used in attribute grammar based compiler generators, such as Eli. Just as compilers translate from source to target representations, so this project will develop a translator that can accept an AG of the translation to be performed on arbitrary XML documents.

The Prefix-Content-Suffix Model Of XML Translation

John Hurst

AXE (http://www.csse.monash.edu.au/~ajh/research/doctech/index.html) is an example of an XML translator. It is written in Perl, and accepts a translation specification that converts XML documents into other quite arbitrary document formats. For example, it has been used to build all the web pages rooted at http://www.csse.monash.edu.au/~ajh. It uses the paradigm of specifying translations for each element as a prefix component, a nested content component, and a suffix component. It can be used as a declarative model, but becomes far more powerful when each of these components may themselves be procedural.

However, it is still a somewhat prototypical implementation, and relies upon escaping into Perl code within the translation file to perform some of the more arcane translations needed, and it uses the restrictive SAX left-to-right translation model. This project has two main goals: to rewrite AXE using Python and the DOM tree-based evaluator, and to design translation specifications that do not rely upon escaping into embedded code.

This project and the previous one may be undertaken conjointly or separately, and are not dependent upon each other, although there are obvious areas of overlap.


Argumentation Game

Kevin Korb

A game in the style of the original computer adventure games is envisioned in which the protaganist is required both to analyze a sequence arguments, identifying conclusions, spotting fallacies, etc. and to construct arguments in order to advance. The arguments will be organized into one or more themes, such as arguments in a court setting, or detective work (Sherlock Holmes style), or medical detection.

Causal Learning

Kevin Korb

This project will continue work with CaMML in using MML (Bayesian) inference for learning causal structure from statistical and experimental data. This will involve work with a team of PhD students and others. The exact role within the team is negotiable and may involve one or more of: philosophy of causality; mathematics of MML; statistics of causal inference; experimental work; creativity. Programming in JAVA will be required.

ALife Simulation

Kevin Korb

The use of Artificial Life techniques to simulate social behavior. I use ALife to investigate economic issues -- such as the nature of money, transaction costs, market efficiency, etc. -- and ethical issues -- such as the evolution of altruistic behavior. Projects will combine theory with experimentation in one or the other of these domains.

Bayesian Poker

Kevin Korb and Ann Nicholson

Poker is an ideal vehicle for testing automated reasoning under uncertainty. It introduces uncertainty both by physical randomization and by incomplete information about opponents' hands. Another source of uncertainty is the limited information available to construct psychological models of opponents, their tendencies to bluff, play conservatively, reveal weakness, etc. and the relation between their hand strengths and betting behavior. All of these uncertainties must be assessed accurately and combined effectively for any reasonable level of skill in the game to be achieved, since good decision making is highly sensitive to those tasks. We have developed a Bayesian Poker Program (BPP), which uses a Bayesian network to model the program's poker hand, the opponent's hand and the opponent's playing behavior conditioned upon the hand, and betting curves which govern play given a probability of winning. The history of play with opponents is used to improve BPP's understanding of their behavior. BPP has been compared experimentally with: a simple rule-based system; a program which depends exclusively on hand probabilities (i.e., without opponent modeling); and with human players. BPP has shown itself to be an effective player against all these opponents, barring the better humans. This project involves extending and improving BPP. Possible tasks include: Modifying the system to play "Texas Hold'em" and running it in competition with alternative AI programs which play Texas Hold'em Modifying the system to use Dynamic Bayesian Networks, to represent the play of the hand over time. Addition of bluffing to opponent model. Improved learning methods.

Cleaning Dirty Data

Kevin Korb

The booming field of "Knowledge Discovery in Databases" (KDD) or "Data Mining" combines techniques developed in database theory with machine learning. A number of techniques unique to this area have been developed for coping with the inevitable noise in data, which results in mismeasurement and missing values. KDD people typically clean their data before providing it to their inferential or statistical programs. There are excellent theoretical reasons for believing that "cleaning" data throws out good information with the bad. This project will look at the effect of cleaning data by throwing it out and the prospects for the alternative of recycling dirty bath water.

Carlo Kopp

Please refer to SAN project co-supervised with Ron Pose.


Gordon Lowe

Gordon will be on OSP in 2002 and will not offer any projects.

Kim Marriott

Kim will be on OSP in semester 1 and will not offer any projects.


Ann Nicholson

Ann will be on leave in semester 1 and will not offer any projects.


Development of High Resolution Ultrasonic Imaging System.

Andrew Paplinski, Nandita Bhattacharjee, Charles Greif

An ultrasonic imaging system comprise three subsystems:

1. The analog front-end transmits signals between the transducer array and the signal processing subsystem.

2. The digital signal processing subsystem, which applies appropriately delayed signals to each transducer element to transmit a focused ultrasonic pulse wave in the selected steering direction. The reflected signals are first processed by the analog front-end, and then appropriately delayed, weighted and summed to form the resultant beam. The beam is dynamically focused at different ranges along the direction of the transmitted ultrasonic pulse.

3. The imaging subsystem performs logarithmic compression of the pixels from the signal processing part, converts the scan lines into rectangular coordinates and displays image frames of the anatomy being scanned.

Specific projects are available to participate in the development of all three subsystems.

Modelling Autism using Self-Organizing Maps

Andrew Paplinski and Lennart Gustafsson (Lulea Uni. Tech.)

Autism is a developmental disorder first described by Kanner (1943) and Asperger (1944). Attention shift impairment and the negative response to novelty are prevalent in individuals with autism and are considered to be primary to other deficits in autism, namely, impairments in social interaction, impairments in verbal and non-verbal communication, and restricted repertoire of activities and interests. Theories on causes of autism, based on properties of artificial neural networks, have been presented by Cohen (1994) and Gustafsson (1997). Based on their work the purpose of this project is to examine how the attention shift impairment and novelty avoidance influence the self-organization of an artificial neural network and to discuss the characteristics of the resulting maps. Existing prototype model has been written in MATLAB and will be a starting point for the project.

Computer Controlled Car

Ron Pose and Lloyd Allison

See supervisors for details.

Suburban Area Networks

Ron Pose and Carlo Kopp

SANs are self organising ad hoc wireless networks in which household computers in a suburb can be networked, bypassing the fixed wiring infrastructure. Using current commodity hardware, such links can operate at speeds of up to 11 Mbit/s. Future wireless hardware may permit higher bit rates. In such a network, multiple households can participate, each using a mast mounted 2.5 or 5.8 GHz microwave link antenna. A number of possible projects are available, involving work with protocols, traffic encryption or wireless propagation.

Information Theoretic Image Thresholding

Peter Tischer and Sid Ray

Image thresholding is a quick way to split the pixels of an image into 2 classes. Often this is desirable if the image consists of a foreground region and a background region. Most thresholding techiques use the histogram of the intensity distribution and a large number of thresholding techniques make use of principles from information theory to approximate the histogram with two probability distributions - one representing foreground pixels and one representing background. The aim of the project is to investigate the mixture modeling approach to image thresholding. Of particular interest is the use of the Kullback-Leibler measure as an indicator of how well the histogram is approximated. Since mixture modeling is a particular strength of the MML approach to inductive inference, the project will also use the MML mixture modeling program, 'snob', and apply it to the problem of finding optimal mixtures with two or more probability distributions. This project does not require any specific background knowledge of image processing, information theory or MML.

Generating Support Vector Machines which minimize entropy of prediction error

Peter Tischer and David Albrecht

Support Vector Machines have been used in various machine learning problems, including classification, regression analysis and the building of decision trees. They process interesting properties and have been demonstrated to give excellent performance in a number of practical applications. Also recently, in the field of image compression, iterative techniques have been developed which generate linear predictors that minimize the entropy of the prediction error distribution. In this project we will investigate whether these recent techniques for generating linear predictors can be adapted to generate Support Vector Machines which minimize the entropy of the prediction distribution, and compare the performance of these Support Vector Machines with ones created by standard methods.

Bayesian Image Binarization

Peter Tischer

Image binarization involves taking the pixels of a 2D image and placing them into two classes. Image thresholding is a particular example of image binarization. The Iterated Conditional Mode (ICM) algorithm is a Bayesian technique for putting pixels from images into particular classes. In doing so it tries to exploit the fundamental property of images that pixels tend to be similar in value to their neighbours. As a Bayesian technique, ICM has an inbuilt model of how the value of a pixel is related to the values of its neighbours. The aim of the project is to implement the ICM algorithm and investigate how it can be modified to build up its model of interpixel dependency from the image itself. The work can begin by trying to put pixels into two classes but it should be possible to generalise the approach to include larger numbers of classes. This project does not require any specific background knowledge of image processing or Bayesian statistics.

Processing of fMRI data

Peter Tischer

fMRI stands for functional Magentic Resonance Imaging. MRIs are three dimensional images that are becoming widely used in medicine and medical research. In fMRI a sequence of 3D MRIs of the brain are taken while the subject is alternately carrying out some task or staying inactive. A primary aim of analysing fMRIs is to determine which regions of the brain are active when the task is carried out. The project involves a number of image processing tasks that are carried out in the analysis of fMRIs. Since the images are taken over a number of seconds, the subject may move. Therefore there is a need to estimate the amount of movement from one image to another and to try to compensate for the effects of motion so different 3D images can be compared. This involves a process called image registration. Both the image capture process and the motion compensation may introduce anomalous values into the data. Therefore one aim of the project would be different ways of filtering the data to remove anomalous values. As a preliminary step, the processing of fMRIs tries to identify which volume elements (voxels) of the 3D picture represent regions of the brain that were active in the task. Another aim of the project would be to investigate ways of producing binary 3D images whose voxels indicate whether a particular voxel in the original MRIs was active or inactive. Apart from identifying active voxels, the processing would also try to identify connected regions of active brain regions. This project does not require any specific background knowledge of image processing.

Segmenting MRI brain images

Peter Tischer

MRI stands for functional Magentic Resonance Imaging. MRIs are three dimensional images that are becoming widely used in medicine and medical research. An important step in the processing of MRIs of the human brain is segmenting the volume elements (voxels) into small number of classes. Typically, this means 4 classes: White Matter (WM), Grey Matter (GM), Cerebral-Spinal Fluid (CSF) and Other. One of the most effective programs for segmenting brain images is a program called FAST which uses the Iterated Conditional Mode (ICM) algorithm to produce segmentations where a particular voxel is highly likely to have similar properties to those of its neighbours. As part of its normal execution the FAST algorithm also tries to estimate the bias field associated with the particular MRI scanner that was used to capture the image. The bias field is an indication of how the magnetic field strength varied over the 3D volume in which the subject was placed. The project will investigate ways of improving the FAST program by considering different models for the bias field, for the statistical dependence of one voxel with its neighbours and for the way that the intensity within a particular tissue class changes spatially. This project does not require any specific background knowledge of image processing or Bayesian statistics.


Optimisation of fuzzy control systems using genetic and other algorithms

Bin Qiu

This project investigates genetic and other algorithms for the optimisation of fuzzy controllers in terms of membership functions and linguistic rules.

A fuzzy logic based Connection Admission Control (CAC) function

Bin Qiu

This project is concerned with Connection Admission Control (CAC) in ATM networks. The decision of source admission into a multimedia communication network is a very active research area. Analytical solutions presented so far are not adequate under a variety of network situations and source characteristics. Fuzzy logic based CAC function may provide a breakthrough, and therefore this research topic has the potential to be developed into a major research project.

Using a fuzzy logic based scheduling algorithm to provide smart and efficient load-balancing for clustered 'virtual' Internet servers.

Bin Qiu

This first part is taken from http://www.linuxvirtualserver.org/ :- "With the explosive growth of the Internet and its increasingly important role in our lives, the traffic on the Internet is increasing dramatically, which has been growing at over 100% annual rate. The workload on the servers is increasing rapidly so that servers will be easily overloaded for a short time, especially for a popular web server. To overcome the overloading problem of the servers, there are two solutions. One is the single server solution, i.e. to upgrade the server to a higher performance server, but it will soon be overloaded when requests increase so that we have to upgrade it again, the upgrading process is complex and the cost is high. The other is the multi-server solution, i.e. to build a scalable server on a cluster of servers. When load increases, we can simply add a new server or more into cluster to meet the increasing requests. However, there are several methods to construct the cluster of servers. Now the widely used one is Round-Robin DNS, which maps a single name to the different IP address in a round-robin manner; thus different clients will be mapped to different servers in the cluster for the ideal situation. In this way, the load is distributed among the servers. However, due to the caching nature of clients and hierarchical DNS system, it easily leads to dynamic load imbalance among the servers, thus it is not easy for a server to handle its peak load. ... An even better way is to use a load balancer to distribute load among servers in a cluster. The parallel services of servers can be made to appear as a virtual service on a single IP address, so that the end users see a virtual server, not a cluster of servers. The scheduling granularity is per connection, which can make a sound load balance among the servers. Fails can be masked when one server or more fail. Server management is becoming easy, and administrator can take a server or more in and out of service at any time, which won't interrupt services to users.

Load balancing can be done in two levels, application-level and IP-level. ..."

This project will pertain to IP-level load balancing.

Currently, the load balancing algorithms implemented in the LinuxVirtualServer (LVS) consist of: Round-Robin, Weighted Round-Robin, Least-Connection, and Weighted Least-Connection. These are fairly basic algorithms that provide an adequate working load-balancing solution.

Following on from the idea that network communications aims for 1) Quality Of Service (QoS) and 2) efficiency, it is proposed that by using a fuzzy logic based algorithm for load-balancing virtual servers, it will (possibly) improve both of these.

The aim would be to get the algorithm to efficiently load balance traffic between nodes in the virtual server cluster in a 'smart' fashion... whilst also making sure that the algorithm is lightweight enough to be practical. The project tasks would be as follows:

1) Research the algorithms currently in use in LVS.
2) Research alternative fuzzy logic algorithms that could be used in load balancing.
3) Assess the performance of currently implemented scheduling algorithms in LVS (by simulation). Compare these and discover under what traffic conditions each works best.
4) Investigate the performance of some new fuzzy logic based scheduling algorithm. Compare the performance of the new algorithm with the existing algorithms (by simulation). Determine the traffic conditions under which the new algorithm operates best.
5) Implement the fuzzy logic based load balancing algorithm into LVS.
6) Assess LVS in operation?

The project could be extended to do smart prediction of overloading of the load balancer and thus bring up secondary load balancers. Further, bringing in more servers into the cluster intelligently as load / traffic requires.


Image Segmentation by Clustering Methods

Sid Ray

The aim of the project will be to make a critical study of a number of existing non-hierarchical cluster analysis methods, such as, c-means, fuzzy c-means, and isodata, with a view to their application in image segmentation. Study of cluster validity will form an essential part of the project. Applications will include both monochrome and colour images.

Content-Based Image Retrieval

Sid Ray

Content-based image retrieval is a well-known method of image retrieval. The objective of this project is to devise an algorithm for retrieval of images from a large image data base based on colour quantisation and regional quantisation.

Texture Analysis of Images

Sid Ray

Texture plays an important role in both human interpretation of visual scenes and computer analysis of images. Textural cues are of particular relevance in two different, but related, image analysis problems, namely, the problems of (i) segmentation and (ii) classification of images. The proposed project will deal with both of these problems. It will involve two phases. In the first phase some existing texture analysis methods will be investigated from the point of view of their theoretical soundness as textural measures as well as their practical applicability. In the second phase attempts will be made to derive a new methodology with particular attention to its computational efficiency.

Document Image Analysis System

Sid Ray

The objective of document image analysis is to recognize text, graphics, and pictures in printed documents and to extract the intended information from them. There are two broad categories of document image analysis, namely, textual processing and graphical processing. Textual processing includes skew determination (any tilt at which the document may have been scanned), finding columns, paragraphs, text lines and words, and performing optical character recognition. Graphical processing deals with lines and symbols. The scope of the proposed project will be the aspect of text processing and it will concentrate on the development of a system for page layout analysis.

Selection of Features for Pattern Recognition

Sid Ray

Experience shows that in recognition of patterns by humans, the recognition is made based on only a few important features (or attributes) characterizing the pattern classes. Conversely, in statistical pattern recognition, the patterns are often represented by a large number of numerical features. Although there is no conceptual justification in reducing the number of features to a small number, in practical problem solving, this becomes a necessary step due to the wellknown phenomenon of the 'curse of dimensionality' of the feature vector on the complexity of the pattern classifier. The aim of the proposed project is to develop an interactive feature selection paradigm well-suited to multiclass (rather than 2-class) pattern recognition problems.

The project will comprise four components, namely, (i) theoretical comparison of existing feature selection criteria, (ii) development of software tools for the existing criteria, followed by an experimental investigation of them, (iii) analysis of the above results leading to, hopefully, a feature selection paradigm, and (iv) development of an interactive software tool for the above paradigm. The software developed will be 'interactive' in the sense that depending on the classification accuracy arrived at a certain stage, the user will have the option of changing the dimensionality value. Options will also be available to supply interactively a range of values of the dimensionality. The software tools will include procedures for displaying the distribution of pattern samples in different feature spaces obtained by different feature selection methods. 



Adaptor generation for software components

Heinz Schmidt and Ralf Reussner

In modern component-based software engineering (CBSE), and particular in peer-to-peer (P2P) computing, software components often have to be adapted. The reason is, that in CBSE components are often re-used in contexts, where they have not been developed for. In P2P components have to be adapted, because ften a systems have to interoperate instantaneously with other systems, often unforeseen by the system's developer. In both areas, the adaptation needed to make a component interoperable in a specific enivironment is unforeseen by the component developer. Hence, it has to be performed automatically by the component user. The TrustME project of the CRC for Enterprise Distributed Systems Technology Pty Ltd (DSTC) deals with extensions of industrial middleware platforms (like Microsoft's .NET or Sun Microsystems EJB). In this project, we developed first approaches for automatic adaptor generation. In this particular student project, the student is expected to work in the level of components adaptation. Therefore, an understanding of finite state machines is required (or the willingness to aquire such knowledge). In particular, the student is expected, to enhance an existing API of a component configuration library protocol adaptation mechanisms. This includes the implementation and enhancement of existing algorithms for protocol adaptation.


Image compression via context coding of error bitplanes

Rod Worley

Image data may be compressed by predicting pixel values, forming an image of the differences of the true and predicited values. The error image may be coded efficiently by many techniques.

This project will look at the efficiency of compressing the error image by converting it to bitplanes and using context coding to compress the bitplane data.


Chintha Tellambura

Chinta will not offer any projects.

Henry Wu

Henry will be on OSP in 2002 and will not offer any projects.

Ingrid Zukerman

A short-answer evaluation system

Ingrid Zukerman, Norm Eizenberg (Dept. of Anatomy and Biology, University of Melbourne)

This project consists of developing a system which will give feedback to students' short answers to questions. The project consists of the following components: The main domain of implementation is Medicine, where a repository of questions and answers is already available. A Computer Science application is also possible. The student undertaking this project should have some grounding in statistics.

 
Disclaimer