Computer Science, Digital Systems and BCSE Honours Research Projects 1999

These are the projects being offered for honours research projects for Honours students in 1999.

Notes

Supervisors


Debugger for Highly Parallel Systems

David Abramson

This project involves the design and implementation of a debugger for parallel computer systems. The work will build on an existing project in the School which is funded by the ARC, in the area of relative debugging. The student will augment an existing debugger with features for supporting parallel execution, a graphical user interface and also the visualisation of data structures.

The project requires skills in parallel computing, Unix and C, as well as exisiting debuggers like gdb.

Some more details can be obtained from [http://www.dgs.monash.edu.au/research/guard/].


Typogenetics

David Albrecht

Typogenetics, which is short for Typographical Genetics, was developed by D.R. Hofstadter to capture some of the concepts of molecular genetics in a typographical system. The system involves manipulating strings, called strands, consisting of four characters A, C, G, and T. These strands are operated on by enzymes which are sequences of basic operations. In turn, the enzymes are created from strands.

So given a strand we can create an enzyme which can then be applied to the original strand to produce a new generation of strands. These new strands can then be used to create new enzymes which can then be applied to the current generation of strands to produce the next generation, and so on. If at any stage we have two copies of the original strand in one generation, we say that the original strand is self-reproducing. While, if at any stage we are left with no strands, we say the original strand is self-destroying.

Interesting questions are:

The aim of this project is to investigate the field of Typogenetics, and to answer some of these questions. In particular it is envisage that the project will involve the development of a system that allows a user to:

(No knowledge of biology/biochemistry is required!)

Wumpus Hunters

David Albrecht

Hunt the Wumpus was an early computer game created by Gregory Yob. Since then it has been used in AI as a testbed environment for intelligent agents. The aim of this project is to use this game as a testbed for developing techniques that learn users' profiles. The reason this game has been chosen is that it is simplified version of a domain we are planning to investigate in our work on user profiles. This project will involve creating agents that can operate in an existing Wumpus World simulation written in C++, and implementing a system that performs user profiling. These agents will need to be able to

Students undertaking this project should be able to program in C++, and it is desirable that they have done CSC3091 (Artificial Intelligence).

Data Mining Software

Lloyd Allison

There is a research project to develop a data mining software library and graphical user interface. A mixture of Java and C is being used. Graphical components will be written in Java (Sun Microsystems). Numerical routines will be written in Java or C as appropriate and this may include adapting existing code. There may be opportunities for honours projects for good programmers in this area. The ability to work with others is important.

java2c

Lloyd Allison

The project is to write a translator from Java to C. It would then be possible to get the Software Engineering advantages of writing in Java with the speed of C. There would be losses in some areas, e.g. array bounds checking and security. The aim is to retain as much of the structure of the Java source program in the translated C, not just to treat C as a `machine code'. There may well be java2c translators in existence but this would not necessarily abort the project; a quick search failed to find any public translators.

Distributed, Interactive 3D World.

Lloyd Allison

Previous experience and demonstrated ability in interactive computer graphics is a necessary prerequisite for this project. The aim is to provide a 3-D world, distributed over many servers and clients, and to investigate and solve the problems of maintaining consistent views and time-lines for the users. Jamie Cameron's 1995 "indoors" world may provide some inspiration [http://www.cs.monash.edu.au/hons/projects_1995/Jamie.Cameron/index.html] In 1997 Eugene Ware produced an "outdoors" world programmed in Java and VRML (not available in '95). The aim is to develop this world further.

Combinatorial Workbench in Java

Lloyd Allison and Graham Farr

The aim of the project is to build a graphical environment in which objects such as graphs and networks can be interactively edited and manipulated. Information about the object, such as vertex degrees, cycle lengths, diameters etc. is to be displayed, and this displayed information must be updated as changes are made. The system must enable the user to define their own type of combinatorial object, and to specify what sort of information is to be displayed and maintained about the objects. Object-oriented design will be important. Programming will be in Java.


Jim Breen

Jim Breen will be on sabbatical in Semester 1 1999, and hence will not be offering projects students starting in March.


Information Based Assessment

Angela Carbone and David Dowe

This project will involve developing a multi threaded servlet in Java and a client applet with a GUI. The servlet is required to generate and assess multiple choice questions using a questions database. The questionaire is then forwarded on to the client applet in a form as determined by servlet settings and user priviliges. The applet sends user responses back to the servelet for evaluation. This system allows for multiple simultaneous use of the questions server whilst maintaining data security and verifying user identity.

Although the project will involve considerable Java programming (around an existing framework) the research will be mainly concerned with electronic assessment methods that seek to address inherent problems associated with multiple choice tests.

The assessment methods should foster the students' cognitive abilities and measure their true state of knowledge. One the assessment methods of the multiple choice responses will incorporate a scheme which allows students to express their level of confidence in their answers and discriminates between students' knowledge states. This method is aimed at encouraging honesty.

To enhance the feedback to students and academics the scoring system not only calculates a mark but also calculates a self estimation factor, and a trust factor. The self estimation is the overall mark that the student expects to achieve based on the given responses. If the student's confidence levels are appropriate and justified, the self estimation will closely match the actual overall mark. The trust factor is a measure of the trustworthiness of the given information. A trust factor of 100% indicates that optimal results can be obtained by completely adopting the student's answers. Conversely, a trust factor of 0% indicates that no information about the actual correct answers can be gained from the student.

The application will be Web based with protection against unauthorised access through server and Java implemented security.


A multi-language anti-compiler

Damian Conway

Many of the programming mistakes made by novice programmers result from their inability to correctly translate an algorithm to code. This project aims to produce software which assists them in overcoming these such problems by automatically translating their erroneous code back to algorithmic language (with annotations indicating possible errors).

In taking on this project all of the following would be of definite advantage:

LlamaCard

Damian Conway

Hypercard, Supercard, and Metacard are all hypermedia, stack-based, WYSIWYG, persistent, event-driven, rapid software development systems. Each provides extensive customizability through a proprietory scripting language.

The aim of this project is to create a free implementation of a similar application (LlamaCard), using Perl as the scripting language and various standard Perl libraries to help implement the GUI, so that the entire package is platform-independent.

Experience with one of the above applications and with Perl programming would be highly advantageous.

Implementing the DEMUN mark-up language

Damian Conway

DEMUN (Distance Education Mark-up Notation) is a tag-based text annotation system for building complex, highly interlinked, interactive WWW-based course materials from simple ASCII text files.

The aim of this project is to build a DEMUN-to-HTML converter that is capable of automatically and incrementally generating:

  • cross-referenced pages (often from incomplete information)
  • computer-generated summaries and indices,
  • interactive (CGI) forms for testing and sample exam materials.

    Implementation would probably be in the Perl language, so experience in Perl would be a significant advantage.


    John Crossley

    John Crossley will be on leave in second semester 1999, and hence will not be offering projects.


    New Technique for Assembling Genomic Restriction Maps

    Trevor Dix

    This project builds on some existing work done in C++ where a different and (hopefully) better approach was tried. This project performs an AI search to piece together data which happens to come from the Human Genome project. There is a need to view results of the search, so a GUI using either Java or tcl/tk will probably also be developed. (No knowledge of biology/biochemistry is required!)

    Joyce-Linda Implementation Using Pthreads and PVM

    Trevor Dix

    Joyce-Linda is a language that forces the use of parallel processes. The Linda communication model is provides a convenient mechanism for many sorts of parallel programs. This project involves changing the Joyce-Linda compiler and runtime support code to run with different packages, including running across a network.


    Reactive Particle System

    Alan Dorin

    This project involves the design & construction of an anmiation tool for the production of visual effects utilizing large numbers of tiny objects known as 'particles'. Particles have been used in the past to model clouds, steam, fog, fire, sparks and similar real world phenomena. In this project, not only are particles subject to global external forces such as gravity and the effects of wind, the particles may interact with one another in a manner akin to chemical interactions.

    For example, a number of blue particles may hang freely and unchanging in space. When a red particle is introduced to the system, the blue particles are pulled towards it. When they come into contact with the red particle they too become red and are thrown violently away from the other red particles in the vicinity, producing a dazzling blue implosion followed by an explosion of red streaks.

    The student should have a thorough understanding of C or C++ and should have completed 3rd year graphics. It is suggested that the student should also participate in the Graphics and Artificial Life course at HONS level (csc415)

    SUGGESTED INTRODUCTORY READING:

    Project

    Alan Dorin

    Alan Dorin may also be offering another project in graphics and artificial life.

    David Dowe

    David Dowe is interested in Minimum Message Length (MML) inductive inference. The MML principle is particularly useful in machine learning, statistics, "knowledge discovery" and "data mining". Both theoretical and applied projects are available, some of which are listed below, and all of which you should feel free to discuss with David Dowe. Areas of interest include clustering and mixture modelling, the von Mises circular distribution, single and multiple factor analysis, supervised learning, decision trees and decision graphs with or without leaf regressions, sequentially and spatially correlated data, protein folding, DNA string alignment, the human genome project and market forecasting. All of these would be done by MML. I am also interested in MML inference of neural nets. There is no need to have done any 3rd year subject on AI for any of these projects.

    Supervised and unsupervised learning from correlated data

    David Dowe

    Researchers in machine learning, statistics, "knowledge discovery", "data mining" and prediction are interested in both unsupervised learning (clustering) of data, and supervised learning. Much data of interest has sequential or spatial correlations in it, such as financial price time series or images of a mining or other site.

    Much work has been done at Monash in the areas of unsupervised and supervised learning by Wallace, Dowe and others for uncorrelated data. Some work has also been done at Monash using these techniques for correlated data. This project will head in that direction.

    A reasonably strong mathematical background will be required. Programming will almost certainly be in C. This project is closely related to a project done by Russell Edwards in 1997. It has the potential to go in either parallel or tangential directions. There is no need to have done any 3rd year subject on AI. However, if doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML.

    Prediction Problems

    David Dowe and Lloyd Allison

    A string of text or any other body of data will compress if and only if it is not random. Prediction problems often involve inferring the value of one body of data from another, such as using interest rates to predict share prices or amino acid sequence to predict protein secondary structure. In this project, we initially consider an abstract mathematical problem of aligning two sequences with alphabets alphabet1 and alphabet2 respectively, where one of the sequences can be assumed to be random and the other can be used to depend on it. Some preliminary mathematics has already been developed and partially implemented for this problem using Minimum Message Length (MML), where we consider a random sequence of amino acids and a non-random dependent sequence of local protein structures. Financial markets are known not to be totally unpredictable, and the above general modelling lends itself well to financial, protein and other prediction problems. There is no need to have done any 3rd year subject on AI. However, if doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML.

    Inference of Probabilistic Finite State Automata

    David Dowe

    Finite state machines can be used to model grammars or syntax. Some bodies of data can reasonably be assumed to have come from some underlying, but unknown, grammar (or finite state machine). When the data is of great interest to us, we will be interested in inferring the finite state automaton from which the data came. This project will use the Minimum Message Length (MML) principle and will be quite mathematical in nature. It will build upon work done at Monash by Wallace and Georgeff (1983) and more recently by some of their collaborators.

    Artificial sample data will initially come from generating data from some model and then seeing how well the program can discover it. Real sample data to be analysed will come from DNA, proteins and speech patterns. There is no need to have done any 3rd year subject on AI.

    A strong mathematical background will be required. Programming will almost certainly be in C. If doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML.

    Probabilistic Football Tipping System.

    David Dowe (and Graham Farr and John Hurst).

    This project is provisional. If you wish to nominate this project, it is necessary that you first consult Dr. David Dowe to obtain his consent."

    Attaching to predictions an indication of how certain the predictor is, and rewarding such predictions properly, are important issues in many fields. This project focuses on football tipping because it is topical, accessible and may be useful in teaching. The project is partly software engineering, and partly implementation of ideas concerning prediction and inference.

    For the last three years, the Department of Computer Science has run a football tipping competition in which participants must nominate, for each game, not only which team they think will win, but a probability that that team will win. Tips are scored according to a simple formula, and the theory is linked to information theory and gambling theory. This year the competition was extended to allow participants to nominate a mean and standard deviation for the margin of each game. Again, there is a soundly based way to score such tips. The competition is currently run using software written in C++ (with a curses interface) by John Hurst. The software is written as a literate program (nutweb), and managed by a version control system (RCS), currently at www.csse.monash.edu.au/~footy/ .

    The aim of the project is to implement new probabilistic football tipping ideas in software, and to extend the software so that the competition can be run over the World Wide Web.

    In more detail, the main tasks of the project are to:

    Programming in Java and C++ will be required. Prior knowledge of Australian Rules football is not essential. This project builds upon a 1997 project by Matt Doran.

    MML Inference of Neural Networks

    David Dowe

    Neural networks (NNs) are a technique in machine learning and "data mining" which are often useful but which have some notoriety for over-fitting the training data and then predicting comparatively poorly, for needing a lot of human tuning and for being "black box" predictors which obscure any semblance of an underlying model. This project is concerned with using Minimum Message Length (MML) to correct these problems for relatively simple neural networks. MML is robust to over-fitting noise, and fits explicitly parameterised models which predict well. The project will entail using MML to model the logistic or sigmoid function in NNs under the guidance of the supervisor, and to use MML to balance the cost of the number of hidden layer nodes and the inter-connection weights with the goodness of fit to the data. Some acquaintance with NNs and a reasonably strong mathematical or statistical background are essential.

    Inverse learning using MML Inference and Kolmogorov complexity

    David Dowe

    Minimum Message Length (MML) is an accurate technique of data fitting with, in principle, all the expressibility, generality and universality of a Universal Turing Machine (UTM).

    According to both the principles of MML and of Kolmogorov complexity, the most likely theory to infer from a body of data is that which gives rise to the shortest two-part compression of the data. However, on occasions, this will not give rise to a simple explicit model of the variable we wish to predict explained as a function of the other, explanatory, variables. One of many cases in point is when the variable of most interest to us is unobserved but known to come from a distribution between 0 and 1; but we do observe a second variable which functionally depends upon it. We have to balance our prior beliefs regarding the likely values of the variable of interest against the values which would have been more likely to cause the observed value of the second variable.

    D L Dowe and C S Wallace (1998). Kolmogorov complexity, minimum message length and inverse learning, abstract, page 144, 14th Australian Statistical Conference (ASC-14), Gold Coast, Qld, 6 - 10 July 1998.

    Wallace, C.S. and D.L. Dowe (1999). Minimum Message Length and Kolmogorov Complexity, accepted, in press, to appear, Computer Journal.

    Evolving intelligence

    David Dowe (possibly with Alan Dorin)

    Intelligence would appear to consist of three main components : rote learning (memory), deductive learning (logical or mathematical reasoning) and inductive inference (generalisation, abduction, induction or pattern recognition). Human-devised intelligence test questions seem to be very much about inductive inference. Inductive inference can, in turn, be thought of as being about two-part compression, where the first part of a compressed data string encodes an inferred hypothesis about the data string and the second part of the compressed string encodes the data given the hypothesis. In this project, using fitness functions related to compressive ability, we use genetic algorithm (GA) techniques to evolve an artificial intelligence which is capable of doing inductive learning and two-part compression in an artificial life environment.

    Reading : D L Dowe and A R Hajek (1998). A non-behavioural, computational extension to the Turing Test, pp101-106, Proceedings of the International Conference on Computational Intelligence & Multimedia Applications (ICCIMA'98), Gippsland, Australia, February 1998.

    relevant web size .

    Well-behaved long-period pseudo-random number generators

    David Dowe

    Pseudo-random number generators (RNGs) are useful for computer games including dice-rolling, card-dealing and bingo, as well as for computer simulation and for search strategies such as simulated annealing and genetic algorithms. For a pseudo-RNG to be of any practical use in these and other applications, it needs to have a long period (length of cycle) and to exhibit very low levels of correlation between the bits it generates and other signs of apparently random behaviour. This project will initially involve tests of randomness (as given by D. E. Knuth and others) on an RNG of period (2^32) * (2^32 - 1) due to C.S. Wallace (1989), as well as other RNGs due to G. Marsaglia and others (see WWW site below). A main aim of this project will be to extend the Wallace generator to periods (2^32) * (2^32 - 1) * (2^32 - 3) and higher while still maintaining its randomness.

    Reading : 1) C. S. Wallace, A long-period pseudo-random generator, Tech Rept #89/123, Dept of Computer Science, Monash University, February 1989.

    2) Find Random Number Generator by doing the following:

          1.  Go to here
          2.  Click on " C Programming " on left of the screen
          3.  Click on " Code Snippets "
          4.  Under the heading " Portable functions and headers " click on
              " Random number functions "
          5.  Click on " Rand1.C "
    

    Machine learning of profiteering from inefficiencies in artificial markets

    David Dowe

    The Efficient Markets Hypothesis (EMH) asserts, even in its weakest most innocuous forms, that in a market situation with no insider trading and all having access to the same public knowledge, supposedly no trading strategy can be expected in the long-term to outperform the market average. The cause of this misconception is due at least in part to the intractability of finding patterns in data that might appear to be random both on the surface and even after some analysis. Dowe and Korb (1996) have argued why markets must almost always be inefficient, and Dowe, Korb and Stillwell (in preparation) are demonstrating with empirical real-world data that practice indeed seems to follow theory. In this project, we instead create artificial, simulated markets with market agents trading with some strategies and price being driven by a combination of some underlying function and the trading strategies of the participant agents. The student will use Minimum Message Length (MML) or related machine learning techniques to discover inefficiencies in such artificial markets and also to discover trading strategies which beat the market average in the long run.


    Constraint-based interactive tool for 3D orthogonal graph drawing>

    Graham Farr and Kim Marriott

    Graphs and networks are useful models of many different kinds of information: electronic circuits, software engineering diagrams, communications networks, the WWW etc. Many applications require such graphs to be laid out in space in some way, perhaps for display as a diagram or for implementing as a circuit in several layers in VLSI. In some such cases, it is important to represent the graph in 3 dimensions, and also that any angles where edges meet or bend be right angles. Such a drawing is called a 3D orthogonal graph drawing.

    For an example of such a drawing, see http://www.cs.newcastle.edu.au/~richard/phd/k7-wood.gif. (The graph drawing underlying this picture was found by local PhD student David Wood, and the picture itself was made by Richard Webber (PhD student, University of Newcastle).)

    The aim of the project is to develop a tool which allows 3D graph drawings to be manipulated interactively. The tool will use the C++/Java constraint solving toolkit QOCA to ensure that the constraints of orthogonality etc are properly handled. A graphical editor for 3D orthogonal graph drawings will need to be written, probably in VRML. (Prior experience in VRML is not necessary.)

    The tool should provide an interesting case study in the application of QOCA, and should be useful in current research on 3-dimensional orthogonal graph drawing. (Indeed, its desirability became apparent in research by David Wood, supervised by Graham Farr.)

    Algorithms for Strict Minimum Message Length Inference

    Graham Farr and Chris Wallace

    Strict Minimum Message Length (SMML) is a criterion (due to Wallace) by which models of data can be assessed. It can be shown to have many desirable properties, and is important in the theory of machine learning. It is, however, very difficult to compute, except in the simplest cases. In the binomial case, we have an exact and efficient algorithm, but the trinomial case is significantly harder, and may well be NP-hard. The aim of this project will be to implement and study some algorithms for cases such as the binomial, trinomial and normal, and thus hopefully shed light on the theory. A strong mathematical background is required.

    Algorithmic problems on DNA sequences

    Graham Farr

    For many purposes, it is convenient to regard DNA as a long sequence of symbols, where each symbol is one of C, G, A, T. It is currently not possible for scientists to determine the entire sequence for complex life forms. Small fragments of the sequence can be found, but then there is the problem of piecing the fragments together in the correct order to determine the entire sequence.

    This project looks at a class of algorithms for this problem based on a divide-and-conquer approach. Algorithms will be implemented and studied experimentally.

    Efficient Planarisation of Graphs.

    Graham Farr

    Many applications require graphs to be drawn on a two-dimensional surface (e.g. CASE diagrams, circuit layouts). The graphs of interest are often not planar, so any planar drawing will have edges that cross each other. In order to make the drawing clear and economical, it is desirable to draw as much of the graph as possible in the plane. We would like to be able to take any graph as input, and find its Maximum Planar Subgraph (i.e. the planar subgraph which has the most edges). This problem is NP-hard, so we look for practical heuristics. The aim of this project is to implement several heuristics for the problem and study their performance. The emphasis will be on graphs which are sparse (i.e. low average vertex degree), since many graphs in practical drawing applications are of this type.

    Some background in discrete mathematics would help.

    Programming will be in C++ and will build on the Leda package for combinatorial computing.

    The project is part of work I am doing with Prof. Peter Eades (University of Newcastle).

    Inductive Inference of Game Player Strategy.

    Graham Farr (and David Dowe)

    This project aims to develop programs for inferring something about a game player's strategy, just from records of games played.

    The sort of games considered here are board games like Chess. A long term aim is to be able to take as input a record of the chess games of (e.g.) Garry Kasparov (World Champion), and infer something (but not everything!) about his chess playing strategy. This may be an ambitious goal, but we propose to move toward it in achievable steps. Initially, a program capable of inferring very simple aspects of strategy would be developed, and tested using records of games played by appropriately simple computer players. We have developed some basic methods for doing the inference, and expect to improve on them. Several types of inference are possible; among these, we intend to apply the principles of Minimum Message Length (MML) inference. This project builds on a 1997 project by Tony Jansen.

    Programming will almost certainly be in C. If doing this project, you are strongly encouraged to take the 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML. (See http://www.csse.monash.edu.au/~dld/chess.html.)

    Possible Joint Projects with AMRL & DSTO

    (It is not clear whether these projects will run. See Graham Farr if interested.)

    Analysis of temporal sequences from human-machine interfaces.

    Human factors scientists at AMRL are investigating how equipment operators interact with control interfaces. They collect multivariate time series data from experimental studies, and then examine this data in order to evaluate and improve interfaces and operating procedures. The aim of the project is to analyse such data using Minimum Message Length (MML) inference with the objective of producing metrics of similarity of sequences and grammars of task activity.

    Search and optimisation in complex models.

    In order to study how assests can be best used, without (or prior to) going to the expense of full trials, AMRL uses complex computer-based models. The study of such devices and systems using these models gives rise to search and optimisation problems, for which suitable algorithms must be designed, implemented and studied.


    Image tracking implemented in VHDL

    Charles Greif

    Image tracking is a computationally expensive process. In recent times there have been a number of attempts to implement image tracking in hardware. Some recent results are very encouraging; leading us to believe that an implementation in FPGA using VHDL and the Mentor Graphics ECAD software, will be quite successful. The hardware component of this project would involve the acquisition of an off-the-shelf FPGA board to which the memory (to hold the images) has to be interfaced.


    Document definition in SGML and XML

    John Hurst

    SGML (or "Standardised General Markup Language") is a standard for defined document class and structure, and there have been a number of tools to automate the processing of such documents (see for example: http://www.sgmltools.org).

    Unfortunately, SGML is not suitable for delivering documents to the web, and a new standard, XML (or "eXtensible Markup Language") has arisen. XML is a subset of SGML, and is designed specifically to facilitate the delivery of documents on the web. For example, automatic conversion of XML documents to HTMl is possible, as is conversion to LaTeX/TeX, groff, rtf or even just plain ASCII text!

    These all depend upon two basic tools, modelled fairly closely on compiler technology: a parser that determines that a document is "well-formed" and "valid"; and a back-end that translates to the document form required. Just as with compiler technology, much research has been invested into making these two processes table driven, and this project is about investigating these two aspects.

    DTD, or "Document Type Definition" files describe the structure of documents, and how they are to be parsed. DSSSL, or "Document Style Semantics and Specification Language" files are about how the documents are rendered into presentation form (TeX, HTML, etc.). In addition, there are other standards such as XLL ("XLink Language") that provide models for document analysis, search and retrieval operations to the web.

    The project is to investigate the use of XML as a document paradigm, to define appropriate DTD and DSSSL prototypes, and to write code to produce working models of the tools described above. Real life examples will be drawn from the university environment. A knowledge of HTML (and to a lesser extent, LaTeX, TeX, and rtf) formats would be helpful, although not essential.

    Program Maintenance and Literate Programming

    John Hurst

    The most expensive phase in the software life cycle is program maintenance, during which programs typically get modified so frequently that this phase may account for up to 70\% of their total development cost. A major factor in this cost is the need for software engineers to re-engineer some (rarely all) of the program code to either fix bugs, or to develop new functionality. In both of these cases, having access to the design decisions made during initial and subsequent program development can be invaluable in terms of understanding the code.

    Literate Programming offers a mechanism to maintain such information. Originally proposed by Donald Knuth in 1984, the idea has seen a resurgence of activity with the advent of "second generation" (language independent) and "third generation" (WEB based) literate programming tools. The use of advanced macro processing features can also ease the program maintenance task, through appropriate revision control and platform independence mechanisms.

    This project is about exploring these issues, and developing a state-of-the-art literate programming tool.


    Experimental Ethics

    Kevin Korb and Ann Nicholson

    This project will use Artificial Life techniques to apply a utilitarian account of ethical virtue to examine various ethical issues and principles experimentally. Classically, related techniques have been applied to game-theoretic problems such as the iterated prisoner's dilemma. In this project we will examine some selection of issues in A-life settings, such as: the social value of altruism, cooperation and selfish behavior; the effects of euthanasia under various circumstances; property rights and their violation; racial discrimination. In general, the aim is not to establish that some principle is right or wrong, but instead to find characterizations of both those A-life environments in which a kind of behavior is socially beneficial and those in which it is socially harmful.

    Bayesian Poker

    Kevin Korb and Ann Nicholson

    Poker is an ideal vehicle for testing automated reasoning under uncertainty. It introduces uncertainty both by physical randomization and by incomplete information about opponents' hands. Another source of uncertainty is the limited information available to construct psychological models of opponents, their tendencies to bluff, play conservatively, reveal weakness, etc. and the relation between their hand strengths and betting behavior. All of these uncertainties must be assessed accurately and combined effectively for any reasonable level of skill in the game to be achieved, since good decision making is highly sensitive to those tasks. We have developed a Bayesian Poker Program (BPP), which uses a Bayesian network to model the program's poker hand, the opponent's hand and the opponent's playing behavior conditioned upon the hand, and betting curves which govern play given a probability of winning. The history of play with opponents is used to improve BPP's understanding of their behavior. BPP has been compared experimentally with: a simple rule-based system; a program which depends exclusively on hand probabilities (i.e., without opponent modeling); and with human players. BPP has shown itself to be an effective player against all these opponents, barring the better humans.

    This project involves extending and improving BPP. Possible tasks include:

    Missing Data

    Kevin Korb

    Machine learning techniques of many different kinds have been applied very successfully to a great range of problems involving generalization from data. Their success has led to the interest of industry, resulting in the applied field of "Knowledge Discovery in Databases" (KDD) or "Data Mining." A recurring difficulty in applying machine learning to real-world data, however, is the pervasiveness of error in the data. In this project we will examine a particular form of data error, namely instances where attribute values for a sample are missing. We shall perform a comparative experimental analysis of different proposed methods for coping with missing data values and look at ways of improving upon them using Bayesian techniques.

    Telerobotics

    Gordon Lowe

    Robots generally are programmed by leading through the required motion with a teach pendant. Off-line programming which is less common, is available on two of our industrial robots permits the bulk of programming to be written without accessing robot motion. Viewing operation of robot has to now required presence at the robotic cell, by using image data via the internet it is possible to observe and control robot tasks from any connected terminal.

    Users will require images of the robot work cell for controlling moves and monitoring robot execution, and interface to the robot server for loading, compiling and control of robot. The Telerobotic System, Figure 1. shows how a request from an operator to the HTTPD server launches a CGI script that communicates with the robot image servers to perform the requested move and obtain new pictures. The robot, robot controller and robot server are existing equipment.

    User interaction is proposed by using Common Gateway Interface (CGI) to control physical devices by reading input from an HTML form, performing the required operation, and feeding back the result on a returned HTML page. Java offers possibilities for increasing the sophistication of the interface.

    The image server development including three camera system for one robot cell three dimensional viewing connected to image server will be the first phase. This work will present images to HTML page for Teleautonomous control in discrete command control.

    This work involves interface controls for trajectory control and motion kinematics, and off-line programming facility to http server. Multiple observers and off-line programmers with one active user are the conditions to be satisfied through the server.

    Real Time Torque Control

    Gordon Lowe

    This project involves simulation and monitoring of real time torque control for a 6 axis robot. At present torque control in robot manipulators is compensated for by overdamping for worst case situations. Robots which can plan a trajector and control motion by applying real-time torque control will be able to move much faster. This project will involve programming torque expressions on a ~cluster of PC's" and measuring performance.

    Soccer Robots

    Gordon Lowe

    In 1999 the 3rd year project is to build a prototype soccer robot. Opportunity exists for up to 2 students to work on the intelligence of the the soccer robots. This includes: Since only one soccer robot will be completed by the end of the year most of this work will have to be verified through simulation.

    Fruit Picking

    Gordon Lowe

    Robotic fruit picking has been researched for a number of years now with little commercial success. As I see it the big problems are correct identification of fruit, which includes location and identification of obstructions. Other problems relate to gripping and removal of fruit from the tree.

    Projection Program for QOCA

    Kim Marriott

    QOCA is a C++/Java constraint programming toolkit developed at Monash for graphical applications. It needs a facility to simplify arithemetic equality and inequality constraints on to one or two variables of interest. The algorithm to be used will be based on that used in CLP(R). The project requires some knowledge of numerical computing.

    Jon McCormack

    Jon McCormack is interested in computer graphics and animation. He is now part-time with the CSC Department. The projects he is offering can be found at: [ http://www.cs.monash.edu.au/~jonmc/hons98.html].


    Application of Artificial Intelligence Techniques to the Solution of Flutter Equations

    Ann Nicholson (and Peter Tischer)

    Currently, when performing analytical flutter clearances for the RAAF, a technique is used which, whilst giving an accurate estimate for the flutter speed, can result in non-physical solutions remote from a flutter point. It would be desirable to implement a technique which gives realistic solutions throughout the range of desired airspeeds for two reasons: i) greater faith in the results in the mind of the customer; and ii) such physically realistic solutions can be used in flight-flutter trials to monitor the performance of the mathematical model well before a potential flutter region is approached.

    A method of solution which achieves the above aims does exist, but, for a number of reasons, it is computationally intensive and, therefore, time consuming. One of the main difficulties with this method is that sufficiently small increments in airspeed must be chosen such that the results at previous airspeeds can be used to predict a possible solution from which to start the iterations for the next airspeed. The question is then, what is a sufficiently small increment? In early implementations of technique, a uniform, very small increment was used for all of the roots to be solved. What is really sufficiently small, for any given mode, however, depends very much upon the behaviour of that mode in the region of interest. In a previous Honours project, a student developed some algorithms to vary the airspeed increments for any given mode as the airspeed is increased. This honours project will build on that work by:

    The current implementation is written in C++ using an existing mathematical library. It is desirable that that student undertaking this project should have done CSC3091 (Artificial Intelligence).

    Parallelised Genetic Algorithm

    Ann Nicholson and David Abramson
    Joint Supervisor: Dr. Shane Dunn (DSTO)

    Background

    Work is being carried out at the Department of Defence's Aeronautical and Maritime Research Laboratory (AMRL) on processes which can be used to develop mathematical models of physical systems which will give better predictive capabilities than is typical for current models. The approach to this problem has been to incorporate experimental data into the model making process from the very beginning; in effect, creating a model such that its results must conform to known experimental data. Typical mathematical modelling processes may have a mechanism by which they are 'tuned' to better correlate with experimental data, but this only occurs as a final step. Such 'tuning' of models can lead to considerable problems arising from issues of model complexity and errors due to early approximations in the modelling process not being observed.

    A technique has been developed by which creating the mathematical model is treated as a highly complex, non-linear optimisation process. All of the physical parameters required to define the model can be considered as unknown. Depending upon the model, the number of properties to be determined can vary from a few to hundreds. The best approach which has been found to-date to tackle such a problem is by use of Genetic Algorithms (GAs).

    Genetic Algorithms have been used with considerable success at AMRL on the problem of deriving optimal structural dynamic models for aircraft structures. Such a problem involves optimising of order 100 parameters (ie. carrying out an optimisation in 100 dimensional space). Expermental data is collected during the shaking of an aircraft in a ground vibration test (GVT) and then a GA is used to create a model which gives good correlation with these data. Inherent to such a process is the issue of model complexity (ie. how many parameters can be determined given a set of experimental data; this is also addressed in the model optimisation process, but outside of the GA.

    Parallelised Genetic Algorithm

    Genetic Algorithms can be relatively efficient at solving highly complex problems, but this relativity is taken against more traditional optimisation approaches for which it can be demonstrated that many problems would be effectively impossible. So, whilst GAs can solve difficult problems, they can still take a very long time. A big advantage with GAs, however, is that they are inherently parallel processes; the same problem (just with different properties) is solved over and over again. A means by which such problems could be solved more quickly is by distributing these processes over a number of computer processors. Computer networks exist in many organisations these days where PC type machines spend a great deal of their time idle, both during the working day and, if left turned on, overnight. In the DSTO network, there are hundreds of such machines, all interconnected over an internal network. Solving such difficult problems would be greatly speeded up if only a fraction of this spare processing capacity could be harnessed towards solving such problems.

    A means of distributing such tasks around a network seems to have been developed at the Oak Ridge National Laboratory in the US. This system is called a "Parallel Virtual Machine" (PVM) ( http://www.epm.ornl.gov/pvm/pvm_home.html). The proposed project would involve investigating how such a PVM could be set up for a collection of PCs using Windows NT and Windows 95/98 and hopefully demonstrating a simple parallelised process on two or more such machines. Then, investigating how the GAs described above could optimally be distributed amongst a range of machines over a network which will have differing processor speeds and differing amounts of spare processing capacity. Also, a means by which the system would be sufficiently robust to handle machines coming on and off line during processing would be required.

    How much of the above proposal would be achieved over the course of a final year project will clearly be dependent on the problems which are encountered during the course of the project.

    Medical Engineering - Ambulation Monitoring and Fall Detection

    Ann Nicholson
    Joint Supervisor: Dr. Ian Brown (ECSE)

    This project involves monitoring the stepping pattern of elderly people or patients undergoing rehabilitation, building upon previous Honours projects (James Davies 1995, Ryan McGowan 1997). The existing system obtains step data using foot-switches and downloads it to a computer, where a Dynamic Belief Network processes the data, and updates its beliefs as to the persons walking status. The existing system uses the Netica API software for developing belief networks.

    This project would involve several extensions to the basic existing hardware and software implementation.

    This project would suit a BCSE student who has completed CSC2091/3091 Artificial Intelligence and who can program in C. In addition, non-engineering student could also do this project, focusing on the software aspects.


    Andrew Paplinski

    Andrew Paplinski will be on sabbatical in 2nd semester so will not be offering a project in 1999.


    Computer Controlled (model) Car.

    Ron Pose, Lloyd Allison

    This continues a project started in 1997 to have a computer drive a radio controlled model car. An SGI indy with indycam is controlling the car and can drive it at modest speed around predermined courses. The aims are to (i) increase speed, (ii) improve path recognition and overall "planning", (iii) allow for moving objects ("pedestrians", other cars). The other aim is to place a TV camera and transmitter on the car and to work from a driver's-eye view. Such a miniature transmitter may be available in 1999.


    Local Segmentation of Images

    Peter Tischer

    If we select a pixel at random from a picture, it is highly likely that the pixel will be in the same segment as its nearest neighbouring pixels. The next most likely outcome is that a pixel and its nearest neighbours are likely to be in no more than two segments. That is, the block straddles the boundary between two regions. If a pixel and all its neighbours are in the same segment, then we may use all the pixel values in subsequent processing of the current pixel. If we determine that some neighbouring pixel values are not in the same segment as the current pixel, then we should exclude those pixels from processing of the current pixel.

    The aim of this project is to investigate ways of looking at small blocks of pixels and deciding whether the pixels in the block all belong to the same segment or whether they belong to at least two different segments. Minimal Message Length (MML) Inductive Inference gives us a way of choosing between two possible models or explanations for a body of data. In this case, MML techniques will be used to decide whether a block of pixels can be best described by saying that all pixels in the block belong to the same segment or they belong to at least two different segments.

    The process of classifying blocks of pixels as consisting of only one segment or at least two segments can be termed local segmentation. Local segmentation is of fundamental importance in an image coding technique called Block Truncation Coding. In addition, local segmentation can be used in algorithms for edge detection, segmentation and for noise removal.

    Compression of Binary Images of Text.

    Peter Tischer

    Binary images are picture whose pixels are either black or white. Binary images are often used to represent faxes or pages of documents. The Fax group 3 and group 4 compression algorithms were designed to compress binary images which consist of pages of text. These algorithms are quick and achieve reasonable results on pages of text but they perform poorly on other kinds of binary images.

    The algorithms in the JBIG standard were designed to adapt to the characteristics of the binary image and perform very well across a wide range of binary images. The JBIG algorithms perform better on images of text than the Group 3 and Group 4 algorithms and can still achieve good compression on other kinds of binary images, e.g. scanned pictures and dithered pictures. The JBIG algorithms are more expensive to implement and they make use of a form of arithmetic coding which has been patented by IBM.

    The aim of this project is to develop a way of compressing binary images of text that will deliver as much compression as the Group 3 and Group 4 algorithms, will adapt to the characteristics of the binary image and will deliver better throughput rates than the JBIG algorithm. Ideas to be investigated include: transforming the binary image, use of run-length coding, use of adaptive modeling, Rice coding and chain coding.

    Robust Linear Regression and MML

    Peter Tischer

    Linear Regression is concerned with finding the best straight line model for a body of data. The classical techniques for linear regression are very sensitive to the presence of anomalous data. For example, a single anomalous observation can cause the least squares method to find a straight line fit that is arbitrarily bad with respect to the other data. Robust regression is concerned with linear regression techniques that are relatively unaffected by the presence of anomalous observations. There are robust regression techniques known where the effect of anomalous observations is bounded even when as much as 50% of the observations are contaminated. Unfortunately, algorithms for implementing these techniques can be extremely computationally intensive and do not guarantee that the best robust fit has been found. The project involves investigating a number of possible techniques for robust regression. As well, the project could investigate using the Minimal Message Length (MML) criterion as a way of comparing two different straight line fits of the same data.

    Low Bit Rate Lossy Image Compression via Block Truncation Coding

    Peter Tischer

    Block Truncation Coding (BTC) is a lossy method for compressing image data. It can also be applied to the lossy compression of sound data. BTC methods are relatively inexpensive to implement and BTC-coded images can be decoded very efficiently but they require relatively high bit rates to achieve a particular level of image quality.

    The aim of this project is to investigate the use of prediction and interpolation in BTC coding. In addition, the coding may either use a combination of Rice coding and run length coding or it may use arithmetic coding. The aim is to see how well images can be encoded at the kinds of bit rates that are used in digital television. In video sequence coding, images might be encoded as the difference from a previous image or an image might be coded without referring to any other images. It would be interesting to see how well a BTC-based coding scheme can perform on both kinds of images.

    Objective Criterion for Comparing Image Segmentations

    Peter Tischer

    Image Segmentation is one of the fundamental problems of image processing. There is little consensus on what are the best algorithms and how different algorithms compare. There are no objective criteria for measuring the differences between two segmentations of the same image. Well, there are criteria but no consensus on good criteria or even that good criteria exist.

    The aim of this project is to use Minimal Message Length (MML) techniques to come up with a good objective criteria for comparing segmentations of the same image. The approach involves constructing two-part messages for each segmentation. The first part of the message allows us to recover the segmentation of the image. Once the segmentation has been recovered, it is used in conjunction with the information in the second part of the message to recover all the information in the original image. The use of the MML criterion suggests that the two-part message with the shortest overall message length is the one with the best segmentation.

    The project involves looking at ways of encoding a segmentation and then using the segment information to encode the image. One approach is to associate a segment number with each pixel and to form an image of the segment numbers. This image we can call the segment map. Segment maps can be regarded as being generalised forms of binary images and effective techniques for the lossless coding of binary images may be generalised and applied to segment maps.


    Optimisation of fuzzy control systems using genetic and other algorithms

    Bin Qiu

    This project investigates genetic and other algorithms for the optimisation of fuzzy controllers in terms of membership functions and linguistic rules.

    A fuzzy logic based Connection Admission Control (CAC) function

    Bin Qiu

    This project is concerned with Connection Admission Control (CAC) in ATM networks. The decision of source admission into a multimedia communication network is a very active research area. Analytical solutions presented so far are not adequate under a variety of network situations and source characteristics. Fuzzy logic based CAC function may provide a breakthrough, and therefore this research topic has the potential to be developed into a major research project.

    An efficient architecture for fuzzy rule processing

    Bin Qiu

    Existing processor architectures are mostly designed and optimised for crisp logic processing. This project seeks to establish and simulate some alternative processor architectures dedicated for fuzzy rule processing. It involves analytical study, VHDL modeling, simulation and FPGA implementation if possible.


    Image Thresholding by Probabilistic Distance Criteria

    Sid Ray

    Thresholding is one of the most popular approaches to image segmentation. In this approach, all the pixels having a certain range of some image property, say, intensity, are considered to belong to one group. Connected regions in these groups lead to the desired segmentation. In the past, the probabilistic entropy measure of Shannon, defined in the context of Information Theory, has been successfully applied in determining the gray level threshold between the "object" and the "background" regions of an image, assuming separate probability distributions for the object pixels and the background pixels. Image segmentation methods also exist in which non-probabilistic entropic criteria, devised in the fuzzy set-theoretic framework, have been used.

    An Honours project carried out in the year 1997 involved the study of two probabilistic distance criteria, namely, the Bhattacharyya distance and Kullback-Leibler Divergence function, as image thresholding criteria. The purpose of the proposed project will be to make further investigations in the area of probabilistic distance-based image segmentation methods and their applications for both monochrome and colour images.

    Image Segmentation by Clustering Methods

    Sid Ray

    The aim of the project will be to make a critical study of a number of existing non-hierarchical cluster analysis methods, such as, c-means, fuzzy c-means, and isodata, with a view to their application in image segmentation. Study of cluster validity will form an essential part of the project. Applications will include both monochrome and colour images.

    Content-Based Image Retrieval

    Sid Ray

    Content-based image retrieval is a well-known method of image retrieval. The objective of this project is to devise an algorithm for retrieval of images from a large image data base based on colour quantisation and regional quantisation.

    Texture Analysis of Images

    Sid Ray

    Texture plays an important role in both human interpretation of visual scenes and computer analysis of images. Textural cues are of particular relevance in two different, but related, image analysis problems, namely, the problems of (i) segmentation and (ii) classification of images. The proposed project will deal with both of these problems. It will involve two phases. In the first phase some existing texture analysis methods will be investigated from the point of view of their theoretical soundness as textural measures as well as their practical applicability. In the second phase attempts will be made to derive a new methodology with particular attention to its computational efficiency.

    Document Image Analysis System

    Sid Ray

    The objective of document image analysis is to recognize text, graphics, and pictures in printed documents and to extract the intended information from them. There are two broad categories of document image analysis, namely, textual processing and graphical processing. Textual processing includes skew determination (any tilt at which the document may have been scanned), finding columns, paragraphs, text lines and words, and performing optical character recognition. Graphical processing deals with lines and symbols. The scope of the proposed project will be the aspect of text processing and it will concentrate on the development of a system for page layout analysis.

    Selection of Features for Pattern Recognition

    Sid Ray

    Experience shows that in recognition of patterns by humans, the recognition is made based on only a few important features (or attributes) characterizing the pattern classes. Conversely, in statistical pattern recognition, the patterns are often represented by a large number of numerical features. Although there is no conceptual justification in reducing the number of features to a small number, in practical problem solving, this becomes a necessary step due to the wellknown phenomenon of the 'curse of dimensionality' of the feature vector on the complexity of the pattern classifier. The aim of the proposed project is to develop an interactive feature selection paradigm well-suited to multiclass (rather than 2-class) pattern recognition problems.

    The project will comprise four components, namely, (i) theoretical comparison of existing feature selection criteria, (ii) development of software tools for the existing criteria, followed by an experimental investigation of them, (iii) analysis of the above results leading to, hopefully, a feature selection paradigm, and (iv) development of an interactive software tool for the above paradigm. The software developed will be 'interactive' in the sense that depending on the classification accuracy arrived at a certain stage, the user will have the option of changing the dimensionality value. Options will also be available to supply interactively a range of values of the dimensionality. The software tools will include procedures for displaying the distribution of pattern samples in different feature spaces obtained by different feature selection methods.


    Rod Worley

    Rod Worley will be on long service leave until June 26, 1999 and hence will not be offering projects to students starting in March.


    Low bit rate coding of speech signals

    Henry Wu

    Objectives include

    a) an investigation of speech coding techniques at low bit rates;

    b) an implementation and evaluation of a 16 or an 8 kbps speech codec in the form of either software or hardware.


    Text summarization

    Ingrid Zukerman

    This project, which is a continuation of a project undertaken in 1998, considers the problem of automatically generating summaries from documents on the WWW. The student will use simple statistical methods to determine ``important information'' from the articles. This information will be collated into summaries using simple linguistic techniques. This years' project uses a corpus of documents obtained from Telstra, which are more homogeneous than those used in last year's work. This feature is expected to have some impact on the summarization techniques being applied.

    A student undertaking this project should have some grounding in statistics.

    Dialogue Generation

    Ingrid Zukerman and Ann Nicholson

    This project continues similar projects undertaken in previous years. It deals with the development of complex agents which interact with a user. These agents should combine planning and problem-solving capabilities as well as emotion-related features. The focus of the project is to study the interaction between an agent's emotional state and its resources (e.g., planning ability, memory) with the goal of carrying out a dialogue (in limited form) with the user.

    A student undertaking this project should have previously taken CSC3309 (Artificial Intelligence).

    SPIM Tutor

    Ingrid Zukerman and Ron Pose

    This project is a continuation of a 1998 project. It consists of designing and implementing a computerized tutor based on an animation process which illustrates the workings of SPIM, a MIPS simulator which is used for teaching computer architecture at first and second year level. The animation, which was implemented in 1998, highlights the different components of the architecture and the data flow for a small set of MIPS instructions. The animator may be accessed at this web site. The computerized tutor will determine parts of the subject in which the student's knowledge is inadequate. This will be done based on traces of the data path clicked by the student. The tutor will then decide on appropriate help actions, e.g., presenting explanations of architecture components or giving hints about how to proceed. Some of the data collection software required to support the tutor was implemented over the summer of 1999.

    Other Projects

    Ingrid Zukerman

    Projects proposed by students will also be considered, in particular in the area of Natural Language Processing.



    Joint Earth Sciences Honours projects.

    Epsilon Lab and/or AGCRC Graduate Projects 1999:

    There is a possibility of some funding with these projects. Students interested should speak to Mark Jessell, and also Ann Nicholson, the honours coordinator before nominating these projects in their preference list.

    Mark Jessell
    Australian Geodynamics Cooperative Research Centre
    Dept of Earth Sciences 
    Monash University, Clayton, VIC, 3168 
    Australia
    
    mark@earth.monash.edu.au
    
    Tel (61)(3) 9905 4902 
    Fax (61)(3) 9905 4903
    
    Home-Page. List Not Complete (5/2/99)