Computer Science, Digital Systems and BCSE Honours Research Projects Midyear 1999

(13/7/99: currently being updated - not final list)

These are the projects being offered for honours research projects for Honours students starting midyear 1999.

Notes

Supervisors


Debugger for Highly Parallel Systems

David Abramson

This project involves the design and implementation of a debugger for parallel computer systems. The work will build on an existing project in the School which is funded by the ARC, in the area of relative debugging. The student will augment an existing debugger with features for supporting parallel execution, a graphical user interface and also the visualisation of data structures.

The project requires skills in parallel computing, Unix and C, as well as exisiting debuggers like gdb.

Some more details can be obtained from [http://www.dgs.monash.edu.au/research/guard/].


 Hunt the Wumpus

David Albrecht

Hunt the Wumpus is a computer game that was created in the early 70's. Ken Thompson wrote a version for his son to help him learn to read and use a computer, it later appeared in 5th edition UNIX (1974). Gregory Yob also wrote a version which appeared in Creative Computing Volume 1 (1974/5). Since then it has been used in AI as a testbed environment for intelligent agents. The aim of this project is to use this game as a testbed for developing techniques that learn users' profiles. The reason this game has been chosen is that it is simplified version of a domain we are planning to investigate in our work on user profiles.

This project will involve creating agents that can operate in an existing Wumpus World simulation written in C++, and implementing a system that performs user profiling. These agents will need to be able to

Students undertaking this project should be able to program in C++, and it is desirable that they have done CSC3091 (Artificial Intelligence). 

Data Mining Software

Lloyd Allison

There is a research project to develop a data mining software library and graphical user interface. A mixture of Java and C is being used. Graphical components will be written in Java (Sun Microsystems). Numerical routines will be written in Java or C as appropriate and this may include adapting existing code. There may be opportunities for honours projects for good programmers in this area. The ability to work with others is important.

java2c

Lloyd Allison

The project is to write a translator from Java to C. It would then be possible to get the Software Engineering advantages of writing in Java with the speed of C. There would be losses in some areas, e.g. array bounds checking and security. The aim is to retain as much of the structure of the Java source program in the translated C, not just to treat C as a `machine code'. There may well be java2c translators in existence but this would not necessarily abort the project; a quick search failed to find any public translators.

Distributed, Interactive 3D World.

Lloyd Allison

Previous experience and demonstrated ability in interactive computer graphics is a necessary prerequisite for this project. The aim is to provide a 3-D world, distributed over many servers and clients, and to investigate and solve the problems of maintaining consistent views and time-lines for the users. Jamie Cameron's 1995 "indoors" world may provide some inspiration [http://www.cs.monash.edu.au/hons/projects_1995/Jamie.Cameron/index.html] In 1997 Eugene Ware produced an "outdoors" world programmed in Java and VRML (not available in '95). The aim is to develop this world further.

Combinatorial Workbench in Java

Lloyd Allison and Graham Farr

The aim of the project is to build a graphical environment in which objects such as graphs and networks can be interactively edited and manipulated. Information about the object, such as vertex degrees, cycle lengths, diameters etc. is to be displayed, and this displayed information must be updated as changes are made. The system must enable the user to define their own type of combinatorial object, and to specify what sort of information is to be displayed and maintained about the objects. Object-oriented design will be important. Programming will be in Java. 

Multi-Lingual Dictionary System

Jim Breen

This project involves the development of search/display software (either for X-Windows or NT/Windows) for the new JMdict (Japanese-Multilingual Dictionary) project. JMdict is an XML-based project which combines (currently) Japanese, English and German dictionary material. Volunteers around the world are preparing material in Finnish, French and Spanish. The file is coded using Unicode in the UTF8 encapsulation (as per the XML spec).

For further information about the overall project, see:

http://www.dgs.monash.edu.au/~jwb/japanese.html#jmdict_proj

Some of the tasks in the dictionary system include:

While some familiarity with Chinese or Japanese might help overcoming stage fright, it is not essential.

PS: the last student to do a project using this dictionary won a laptop as a prize for it in an ACM programming competition. See http://www.acm.org/jquest/webquest1.html.


A multi-language anti-compiler

Damian Conway

Many of the programming mistakes made by novice programmers result from their inability to correctly translate an algorithm to code. This project aims to produce software which assists them in overcoming these such problems by automatically translating their erroneous code back to algorithmic language (with annotations indicating possible errors).

In taking on this project all of the following would be of definite advantage:

Implementing the DEMUN mark-up language

Damian Conway

DEMUN (Distance Education Mark-up Notation) is a tag-based text annotation system for building complex, highly interlinked, interactive WWW-based course materials from simple ASCII text files.

The aim of this project is to build a DEMUN-to-HTML converter that is capable of automatically and incrementally generating:

  • cross-referenced pages (often from incomplete information)
  • computer-generated summaries and indices,
  • interactive (CGI) forms for testing and sample exam materials.
  • Implementation would probably be in the Perl language, so experience in Perl would be a significant advantage. 


    John Crossley

    John Crossley will be on leave in second semester 1999, and hence will not be offering projects.


    New Technique for Assembling Genomic Restriction Maps

    Trevor Dix

    This project builds on some existing work done in C++ where a different and (hopefully) better approach was tried. This project performs an AI search to piece together data which happens to come from the Human Genome project. There is a need to view results of the search, so a GUI using either Java or tcl/tk will probably also be developed. (No knowledge of biology/biochemistry is required!)

    Joyce-Linda Implementation Using Pthreads and PVM

    Trevor Dix

    Joyce-Linda is a language that forces the use of parallel processes. The Linda communication model is provides a convenient mechanism for many sorts of parallel programs. This project involves changing the Joyce-Linda compiler and runtime support code to run with different packages, including running across a network.
     

    Project

    Alan Dorin

    Alan Dorin may also be offering another project in graphics and artificial life. 

    David Dowe

    David Dowe is interested in Minimum Message Length (MML) inductive inference. The MML principle is particularly useful in machine learning, statistics, "knowledge discovery" and "data mining". Both theoretical and applied projects are available, some of which are listed below, and all of which you should feel free to discuss with David Dowe. Areas of interest include clustering and mixture modelling, the von Mises circular distribution, single and multiple factor analysis, supervised learning, decision trees and decision graphs with or without leaf regressions, sequentially and spatially correlated data, protein folding, DNA string alignment, the human genome project and market forecasting. All of these would be done by MML. I am also interested in MML inference of neural nets. There is no need to have done any 3rd year subject on AI for any of these projects.

    Prediction Problems

    David Dowe and Lloyd Allison

    A string of text or any other body of data will compress if and only if it is not random. Prediction problems often involve inferring the value of one body of data from another, such as using interest rates to predict share prices or amino acid sequence to predict protein secondary structure. In this project, we initially consider an abstract mathematical problem of aligning two sequences with alphabets alphabet1 and alphabet2 respectively, where one of the sequences can be assumed to be random and the other can be used to depend on it. Some preliminary mathematics has already been developed and partially implemented for this problem using Minimum Message Length (MML), where we consider a random sequence of amino acids and a non-random dependent sequence of local protein structures. Financial markets are known not to be totally unpredictable, and the above general modelling lends itself well to financial, protein and other prediction problems. There is no need to have done any 3rd year subject on AI. However, if doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML.

    Inference of Probabilistic Finite State Automata

    David Dowe

    Finite state machines can be used to model grammars or syntax. Some bodies of data can reasonably be assumed to have come from some underlying, but unknown, grammar (or finite state machine). When the data is of great interest to us, we will be interested in inferring the finite state automaton from which the data came. This project will use the Minimum Message Length (MML) principle and will be quite mathematical in nature. It will build upon work done at Monash by Wallace and Georgeff (1983) and more recently by some of their collaborators.

    Artificial sample data will initially come from generating data from some model and then seeing how well the program can discover it. Real sample data to be analysed will come from DNA, proteins and speech patterns. There is no need to have done any 3rd year subject on AI.

    A strong mathematical background will be required. Programming will almost certainly be in C. If doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML.

    Probabilistic Football Tipping System.

    David Dowe (and Graham Farr and John Hurst).

    This project is provisional. If you wish to nominate this project, it is necessary that you first consult Dr. David Dowe to obtain his consent."

    Attaching to predictions an indication of how certain the predictor is, and rewarding such predictions properly, are important issues in many fields. This project focuses on football tipping because it is topical, accessible and may be useful in teaching. The project is partly software engineering, and partly implementation of ideas concerning prediction and inference.

    For the last three years, the Department of Computer Science has run a football tipping competition in which participants must nominate, for each game, not only which team they think will win, but a probability that that team will win. Tips are scored according to a simple formula, and the theory is linked to information theory and gambling theory. This year the competition was extended to allow participants to nominate a mean and standard deviation for the margin of each game. Again, there is a soundly based way to score such tips. The competition is currently run using software written in C++ (with a curses interface) by John Hurst. The software is written as a literate program (nutweb), and managed by a version control system (RCS), currently at www.csse.monash.edu.au/~footy/ .

    The aim of the project is to implement new probabilistic football tipping ideas in software, and to extend the software so that the competition can be run over the World Wide Web.

    In more detail, the main tasks of the project are to:

    Programming in Java and C++ will be required. Prior knowledge of Australian Rules football is not essential. This project builds upon a 1997 project by Matt Doran.

    MML Inference of Neural Networks

    David Dowe

    Neural networks (NNs) are a technique in machine learning and "data mining" which are often useful but which have some notoriety for over-fitting the training data and then predicting comparatively poorly, for needing a lot of human tuning and for being "black box" predictors which obscure any semblance of an underlying model. This project is concerned with using Minimum Message Length (MML) to correct these problems for relatively simple neural networks. MML is robust to over-fitting noise, and fits explicitly parameterised models which predict well. The project will entail using MML to model the logistic or sigmoid function in NNs under the guidance of the supervisor, and to use MML to balance the cost of the number of hidden layer nodes and the inter-connection weights with the goodness of fit to the data. Some acquaintance with NNs and a reasonably strong mathematical or statistical background are essential.

    Well-behaved long-period pseudo-random number generators

    David Dowe

    Pseudo-random number generators (RNGs) are useful for computer games including dice-rolling, card-dealing and bingo, as well as for computer simulation and for search strategies such as simulated annealing and genetic algorithms. For a pseudo-RNG to be of any practical use in these and other applications, it needs to have a long period (length of cycle) and to exhibit very low levels of correlation between the bits it generates and other signs of apparently random behaviour. This project will initially involve tests of randomness (as given by D. E. Knuth and others) on an RNG of period (2^32) * (2^32 - 1) due to C.S. Wallace (1989), as well as other RNGs due to G. Marsaglia and others (see WWW site below). A main aim of this project will be to extend the Wallace generator to periods (2^32) * (2^32 - 1) * (2^32 - 3) and higher while still maintaining its randomness.

    Reading : 1) C. S. Wallace, A long-period pseudo-random generator, Tech Rept #89/123, Dept of Computer Science, Monash University, February 1989.

    2) Find Random Number Generator by doing the following:

          1.  Go to here
          2.  Click on " C Programming " on left of the screen
          3.  Click on " Code Snippets "
          4.  Under the heading " Portable functions and headers " click on
              " Random number functions "
          5.  Click on " Rand1.C "

    Machine learning of profiteering from inefficiencies in artificial markets

    David Dowe

    The Efficient Markets Hypothesis (EMH) asserts, even in its weakest most innocuous forms, that in a market situation with no insider trading and all having access to the same public knowledge, supposedly no trading strategy can be expected in the long-term to outperform the market average. The cause of this misconception is due at least in part to the intractability of finding patterns in data that might appear to be random both on the surface and even after some analysis. Dowe and Korb (1996) have argued why markets must almost always be inefficient, and Dowe, Korb and Stillwell (in preparation) are demonstrating with empirical real-world data that practice indeed seems to follow theory. In this project, we instead create artificial, simulated markets with market agents trading with some strategies and price being driven by a combination of some underlying function and the trading strategies of the participant agents. The student will use Minimum Message Length (MML) or related machine learning techniques to discover inefficiencies in such artificial markets and also to discover trading strategies which beat the market average in the long run. 

    Algorithms for Strict Minimum Message Length Inference

    Graham Farr and Chris Wallace

    Strict Minimum Message Length (SMML) is a criterion (due to Wallace) by which models of data can be assessed. It can be shown to have many desirable properties, and is important in the theory of machine learning. It is, however, very difficult to compute, except in the simplest cases. In the binomial case, we have an exact and efficient algorithm, but the trinomial case is significantly harder, and may well be NP-hard. The aim of this project will be to implement and study some algorithms for cases such as the binomial, trinomial and normal, and thus hopefully shed light on the theory. A strong mathematical background is required.

    Algorithmic problems on DNA sequences

    Graham Farr

    For many purposes, it is convenient to regard DNA as a long sequence of symbols, where each symbol is one of C, G, A, T. It is currently not possible for scientists to determine the entire sequence for complex life forms. Small fragments of the sequence can be found, but then there is the problem of piecing the fragments together in the correct order to determine the entire sequence.

    This project looks at a class of algorithms for this problem based on a divide-and-conquer approach. Algorithms will be implemented and studied experimentally.

    Inductive Inference of Game Player Strategy.

    Graham Farr (and David Dowe)

    This project aims to develop programs for inferring something about a game player's strategy, just from records of games played.

    The sort of games considered here are board games like Chess. A long term aim is to be able to take as input a record of the chess games of (e.g.) Garry Kasparov (World Champion), and infer something (but not everything!) about his chess playing strategy. This may be an ambitious goal, but we propose to move toward it in achievable steps. Initially, a program capable of inferring very simple aspects of strategy would be developed, and tested using records of games played by appropriately simple computer players. We have developed some basic methods for doing the inference, and expect to improve on them. Several types of inference are possible; among these, we intend to apply the principles of Minimum Message Length (MML) inference. This project builds on a 1997 project by Tony Jansen.

    Programming will almost certainly be in C. If doing this project, you are strongly encouraged to take the 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML. (See http://www.csse.monash.edu.au/~dld/chess.html.)

    Possible Joint Projects with AMRL & DSTO

    (It is not clear whether these projects will run. See Graham Farr if interested.)

    Analysis of temporal sequences from human-machine interfaces.

    Human factors scientists at AMRL are investigating how equipment operators interact with control interfaces. They collect multivariate time series data from experimental studies, and then examine this data in order to evaluate and improve interfaces and operating procedures. The aim of the project is to analyse such data using Minimum Message Length (MML) inference with the objective of producing metrics of similarity of sequences and grammars of task activity.

    Search and optimisation in complex models.

    In order to study how assests can be best used, without (or prior to) going to the expense of full trials, AMRL uses complex computer-based models. The study of such devices and systems using these models gives rise to search and optimisation problems, for which suitable algorithms must be designed, implemented and studied.


    Charles Greif


    Document definition in SGML and XML

    John Hurst

    SGML (or "Standardised General Markup Language") is a standard for defined document class and structure, and there have been a number of tools to automate the processing of such documents (see for example: http://www.sgmltools.org). 

    Unfortunately, SGML is not suitable for delivering documents to the web, and a new standard, XML (or "eXtensible Markup Language") has arisen. XML is a subset of SGML, and is designed specifically to facilitate the delivery of documents on the web. For example, automatic conversion of XML documents to HTMl is possible, as is conversion to LaTeX/TeX, groff, rtf or even just plain ASCII text!

    These all depend upon two basic tools, modelled fairly closely on compiler technology: a parser that determines that a document is "well-formed" and "valid"; and a back-end that translates to the document form required. Just as with compiler technology, much research has been invested into making these two processes table driven, and this project is about investigating these two aspects.

    DTD, or "Document Type Definition" files describe the structure of documents, and how they are to be parsed. DSSSL, or "Document Style Semantics and Specification Language" files are about how the documents are rendered into presentation form (TeX, HTML, etc.). In addition, there are other standards such as XLL ("XLink Language") that provide models for document analysis, search and retrieval operations to the web.

    The project is to investigate the use of XML as a document paradigm, to define appropriate DTD and DSSSL prototypes, and to write code to produce working models of the tools described above. Real life examples will be drawn from the university environment. A knowledge of HTML (and to a lesser extent, LaTeX, TeX, and rtf) formats would be helpful, although not essential.

    Program Maintenance and Literate Programming

    John Hurst

    The most expensive phase in the software life cycle is program maintenance, during which programs typically get modified so frequently that this phase may account for up to 70\% of their total development cost. A major factor in this cost is the need for software engineers to re-engineer some (rarely all) of the program code to either fix bugs, or to develop new functionality. In both of these cases, having access to the design decisions made during initial and subsequent program development can be invaluable in terms of understanding the code.

    Literate Programming offers a mechanism to maintain such information. Originally proposed by Donald Knuth in 1984, the idea has seen a resurgence of activity with the advent of "second generation" (language independent) and "third generation" (WEB based) literate programming tools. The use of advanced macro processing features can also ease the program maintenance task, through appropriate revision control and platform independence mechanisms.

    This project is about exploring these issues, and developing a state-of-the-art literate programming tool.


    Experimental Ethics

    Kevin Korb and Ann Nicholson

    This project will use Artificial Life techniques to apply a utilitarian account of ethical virtue to examine various ethical issues and principles experimentally. Classically, related techniques have been applied to game-theoretic problems such as the iterated prisoner's dilemma. In this project we will examine some selection of issues in A-life settings, such as: the social value of altruism, cooperation and selfish behavior; the effects of euthanasia under various circumstances; property rights and their violation; racial discrimination. In general, the aim is not to establish that some principle is right or wrong, but instead to find characterizations of both those A-life environments in which a kind of behavior is socially beneficial and those in which it is socially harmful.

    Bayesian Poker

    Kevin Korb and Ann Nicholson

    Poker is an ideal vehicle for testing automated reasoning under uncertainty. It introduces uncertainty both by physical randomization and by incomplete information about opponents' hands. Another source of uncertainty is the limited information available to construct psychological models of opponents, their tendencies to bluff, play conservatively, reveal weakness, etc. and the relation between their hand strengths and betting behavior. All of these uncertainties must be assessed accurately and combined effectively for any reasonable level of skill in the game to be achieved, since good decision making is highly sensitive to those tasks. We have developed a Bayesian Poker Program (BPP), which uses a Bayesian network to model the program's poker hand, the opponent's hand and the opponent's playing behavior conditioned upon the hand, and betting curves which govern play given a probability of winning. The history of play with opponents is used to improve BPP's understanding of their behavior. BPP has been compared experimentally with: a simple rule-based system; a program which depends exclusively on hand probabilities (i.e., without opponent modeling); and with human players. BPP has shown itself to be an effective player against all these opponents, barring the better humans.

    This project involves extending and improving BPP. Possible tasks include:

    Missing Data

    Kevin Korb

    Machine learning techniques of many different kinds have been applied very successfully to a great range of problems involving generalization from data. Their success has led to the interest of industry, resulting in the applied field of "Knowledge Discovery in Databases" (KDD) or "Data Mining." A recurring difficulty in applying machine learning to real-world data, however, is the pervasiveness of error in the data. In this project we will examine a particular form of data error, namely instances where attribute values for a sample are missing. We shall perform a comparative experimental analysis of different proposed methods for coping with missing data values and look at ways of improving upon them using Bayesian techniques. 

    Telerobotics

    Gordon Lowe

    Robots generally are programmed by leading through the required motion with a teach pendant. Off-line programming which is less common, is available on two of our industrial robots permits the bulk of programming to be written without accessing robot motion. Viewing operation of robot has to now required presence at the robotic cell, by using image data via the internet it is possible to observe and control robot tasks from any connected terminal.

    Users will require images of the robot work cell for controlling moves and monitoring robot execution, and interface to the robot server for loading, compiling and control of robot. The Telerobotic System, Figure 1. shows how a request from an operator to the HTTPD server launches a CGI script that communicates with the robot image servers to perform the requested move and obtain new pictures. The robot, robot controller and robot server are existing equipment.

    User interaction is proposed by using Common Gateway Interface (CGI) to control physical devices by reading input from an HTML form, performing the required operation, and feeding back the result on a returned HTML page. Java offers possibilities for increasing the sophistication of the interface.

    The image server development including three camera system for one robot cell three dimensional viewing connected to image server will be the first phase. This work will present images to HTML page for Teleautonomous control in discrete command control.

    This work involves interface controls for trajectory control and motion kinematics, and off-line programming facility to http server. Multiple observers and off-line programmers with one active user are the conditions to be satisfied through the server.

    Real Time Torque Control

    Gordon Lowe

    This project involves simulation and monitoring of real time torque control for a 6 axis robot. At present torque control in robot manipulators is compensated for by overdamping for worst case situations. Robots which can plan a trajector and control motion by applying real-time torque control will be able to move much faster. This project will involve programming torque expressions on a ~cluster of PC's" and measuring performance.

    Soccer Robots

    Gordon Lowe

    In 1999 the 3rd year project is to build a prototype soccer robot. Opportunity exists for up to 2 students to work on the intelligence of the the soccer robots. This includes: Since only one soccer robot will be completed by the end of the year most of this work will have to be verified through simulation.

    Fruit Picking

    Gordon Lowe

    Robotic fruit picking has been researched for a number of years now with little commercial success. As I see it the big problems are correct identification of fruit, which includes location and identification of obstructions. Other problems relate to gripping and removal of fruit from the tree. 

    Projection Program for QOCA

    Kim Marriott QOCA is a C++/Java constraint programming toolkit developed at Monash for graphical applications. It needs a facility to simplify arithemetic equality and inequality constraints on to one or two variables of interest. The algorithm to be used will be based on that used in CLP(R). The project requires some knowledge of numerical computing. 

    Jon McCormack

    Jon McCormack is interested in computer graphics and animation. He is now part-time with the CSC Department. The projects he is offering can be found at: [ http://www.cs.monash.edu.au/~jonmc/hons98.html].


    Application of Artificial Intelligence Techniques to the Solution of Flutter Equations

    Ann Nicholson (and Peter Tischer)

    Currently, when performing analytical flutter clearances for the RAAF, a technique is used which, whilst giving an accurate estimate for the flutter speed, can result in non-physical solutions remote from a flutter point. It would be desirable to implement a technique which gives realistic solutions throughout the range of desired airspeeds for two reasons: i) greater faith in the results in the mind of the customer; and ii) such physically realistic solutions can be used in flight-flutter trials to monitor the performance of the mathematical model well before a potential flutter region is approached.

    A method of solution which achieves the above aims does exist, but, for a number of reasons, it is computationally intensive and, therefore, time consuming. One of the main difficulties with this method is that sufficiently small increments in airspeed must be chosen such that the results at previous airspeeds can be used to predict a possible solution from which to start the iterations for the next airspeed. The question is then, what is a sufficiently small increment? In early implementations of technique, a uniform, very small increment was used for all of the roots to be solved. What is really sufficiently small, for any given mode, however, depends very much upon the behaviour of that mode in the region of interest. In a previous Honours project, a student developed some algorithms to vary the airspeed increments for any given mode as the airspeed is increased. This honours project will build on that work by:

    The current implementation is written in C++ using an existing mathematical library. It is desirable that that student undertaking this project should have done CSC3091 (Artificial Intelligence).

    Medical Engineering - Ambulation Monitoring and Fall Detection

    Ann Nicholson 
    Joint Supervisor: Dr. Ian Brown (ECSE)

    This project involves monitoring the stepping pattern of elderly people or patients undergoing rehabilitation, building upon previous Honours projects (James Davies 1995, Ryan McGowan 1997). The existing system obtains step data using foot-switches and downloads it to a computer, where a Dynamic Belief Network processes the data, and updates its beliefs as to the persons walking status. The existing system uses the Netica API software for developing belief networks.

    This project would involve several extensions to the basic existing hardware and software implementation.

    This project would suit a BCSE student who has completed CSC2091/3091 Artificial Intelligence and who can program in C. In addition, non-engineering student could also do this project, focusing on the software aspects.


    Andrew Paplinski

    Andrew Paplinski will be on sabbatical in 2nd semester so will not be offering a project in 1999.


    Ron Pose


    Local Segmentation of Images

    Peter Tischer

    If we select a pixel at random from a picture, it is highly likely that the pixel will be in the same segment as its nearest neighbouring pixels. The next most likely outcome is that a pixel and its nearest neighbours are likely to be in no more than two segments. That is, the block straddles the boundary between two regions. If a pixel and all its neighbours are in the same segment, then we may use all the pixel values in subsequent processing of the current pixel. If we determine that some neighbouring pixel values are not in the same segment as the current pixel, then we should exclude those pixels from processing of the current pixel.

    The aim of this project is to investigate ways of looking at small blocks of pixels and deciding whether the pixels in the block all belong to the same segment or whether they belong to at least two different segments. Minimal Message Length (MML) Inductive Inference gives us a way of choosing between two possible models or explanations for a body of data. In this case, MML techniques will be used to decide whether a block of pixels can be best described by saying that all pixels in the block belong to the same segment or they belong to at least two different segments.

    The process of classifying blocks of pixels as consisting of only one segment or at least two segments can be termed local segmentation. Local segmentation is of fundamental importance in an image coding technique called Block Truncation Coding. In addition, local segmentation can be used in algorithms for edge detection, segmentation and for noise removal.

    Robust Linear Regression and MML

    Peter Tischer

    Linear Regression is concerned with finding the best straight line model for a body of data. The classical techniques for linear regression are very sensitive to the presence of anomalous data. For example, a single anomalous observation can cause the least squares method to find a straight line fit that is arbitrarily bad with respect to the other data. Robust regression is concerned with linear regression techniques that are relatively unaffected by the presence of anomalous observations. There are robust regression techniques known where the effect of anomalous observations is bounded even when as much as 50% of the observations are contaminated. Unfortunately, algorithms for implementing these techniques can be extremely computationally intensive and do not guarantee that the best robust fit has been found. The project involves investigating a number of possible techniques for robust regression. As well, the project could investigate using the Minimal Message Length (MML) criterion as a way of comparing two different straight line fits of the same data.

    Objective Criterion for Comparing Image Segmentations

    Peter Tischer

    Image Segmentation is one of the fundamental problems of image processing. There is little consensus on what are the best algorithms and how different algorithms compare. There are no objective criteria for measuring the differences between two segmentations of the same image. Well, there are criteria but no consensus on good criteria or even that good criteria exist.

    The aim of this project is to use Minimal Message Length (MML) techniques to come up with a good objective criteria for comparing segmentations of the same image. The approach involves constructing two-part messages for each segmentation. The first part of the message allows us to recover the segmentation of the image. Once the segmentation has been recovered, it is used in conjunction with the information in the second part of the message to recover all the information in the original image. The use of the MML criterion suggests that the two-part message with the shortest overall message length is the one with the best segmentation.

    The project involves looking at ways of encoding a segmentation and then using the segment information to encode the image. One approach is to associate a segment number with each pixel and to form an image of the segment numbers. This image we can call the segment map. Segment maps can be regarded as being generalised forms of binary images and effective techniques for the lossless coding of binary images may be generalised and applied to segment maps.


    Optimisation of fuzzy control systems using genetic and other algorithms

    Bin Qiu

    This project investigates genetic and other algorithms for the optimisation of fuzzy controllers in terms of membership functions and linguistic rules.

    A fuzzy logic based Connection Admission Control (CAC) function

    Bin Qiu

    This project is concerned with Connection Admission Control (CAC) in ATM networks. The decision of source admission into a multimedia communication network is a very active research area. Analytical solutions presented so far are not adequate under a variety of network situations and source characteristics. Fuzzy logic based CAC function may provide a breakthrough, and therefore this research topic has the potential to be developed into a major research project.

    An efficient architecture for fuzzy rule processing

    Bin Qiu

    Existing processor architectures are mostly designed and optimised for crisp logic processing. This project seeks to establish and simulate some alternative processor architectures dedicated for fuzzy rule processing. It involves analytical study, VHDL modeling, simulation and FPGA implementation if possible. 

    Image Thresholding by Probabilistic Distance Criteria

    Sid Ray

    Thresholding is one of the most popular approaches to image segmentation. In this approach, all the pixels having a certain range of some image property, say, intensity, are considered to belong to one group. Connected regions in these groups lead to the desired segmentation. In the past, the probabilistic entropy measure of Shannon, defined in the context of Information Theory, has been successfully applied in determining the gray level threshold between the "object" and the "background" regions of an image, assuming separate probability distributions for the object pixels and the background pixels. Image segmentation methods also exist in which non-probabilistic entropic criteria, devised in the fuzzy set-theoretic framework, have been used.

    An Honours project carried out in the year 1997 involved the study of two probabilistic distance criteria, namely, the Bhattacharyya distance and Kullback-Leibler Divergence function, as image thresholding criteria. The purpose of the proposed project will be to make further investigations in the area of probabilistic distance-based image segmentation methods and their applications for both monochrome and colour images.

    Image Segmentation by Clustering Methods

    Sid Ray

    The aim of the project will be to make a critical study of a number of existing non-hierarchical cluster analysis methods, such as, c-means, fuzzy c-means, and isodata, with a view to their application in image segmentation. Study of cluster validity will form an essential part of the project. Applications will include both monochrome and colour images.

    Content-Based Image Retrieval

    Sid Ray

    Content-based image retrieval is a well-known method of image retrieval. The objective of this project is to devise an algorithm for retrieval of images from a large image data base based on colour quantisation and regional quantisation.

    Texture Analysis of Images

    Sid Ray

    Texture plays an important role in both human interpretation of visual scenes and computer analysis of images. Textural cues are of particular relevance in two different, but related, image analysis problems, namely, the problems of (i) segmentation and (ii) classification of images. The proposed project will deal with both of these problems. It will involve two phases. In the first phase some existing texture analysis methods will be investigated from the point of view of their theoretical soundness as textural measures as well as their practical applicability. In the second phase attempts will be made to derive a new methodology with particular attention to its computational efficiency.

    Document Image Analysis System

    Sid Ray

    The objective of document image analysis is to recognize text, graphics, and pictures in printed documents and to extract the intended information from them. There are two broad categories of document image analysis, namely, textual processing and graphical processing. Textual processing includes skew determination (any tilt at which the document may have been scanned), finding columns, paragraphs, text lines and words, and performing optical character recognition. Graphical processing deals with lines and symbols. The scope of the proposed project will be the aspect of text processing and it will concentrate on the development of a system for page layout analysis.

    Selection of Features for Pattern Recognition

    Sid Ray

    Experience shows that in recognition of patterns by humans, the recognition is made based on only a few important features (or attributes) characterizing the pattern classes. Conversely, in statistical pattern recognition, the patterns are often represented by a large number of numerical features. Although there is no conceptual justification in reducing the number of features to a small number, in practical problem solving, this becomes a necessary step due to the wellknown phenomenon of the 'curse of dimensionality' of the feature vector on the complexity of the pattern classifier. The aim of the proposed project is to develop an interactive feature selection paradigm well-suited to multiclass (rather than 2-class) pattern recognition problems.

    The project will comprise four components, namely, (i) theoretical comparison of existing feature selection criteria, (ii) development of software tools for the existing criteria, followed by an experimental investigation of them, (iii) analysis of the above results leading to, hopefully, a feature selection paradigm, and (iv) development of an interactive software tool for the above paradigm. The software developed will be 'interactive' in the sense that depending on the classification accuracy arrived at a certain stage, the user will have the option of changing the dimensionality value. Options will also be available to supply interactively a range of values of the dimensionality. The software tools will include procedures for displaying the distribution of pattern samples in different feature spaces obtained by different feature selection methods. 


    Image compression via context coding of error bitplanes

    Rod Worley

    Image data may be compressed by predicting pixel values, forming an image of the differences of the true and predicited values. The error image may be coded efficiently by many techniques.

    This project will look at the efficiency of compressing the error image by converting it to bitplanes and using context coding to compress the bitplane data.


    Henry Wu


    Text summarization

    Ingrid Zukerman

    This project, which is a continuation of a project undertaken in 1998, considers the problem of automatically generating summaries from documents on the WWW. The student will use simple statistical methods to determine ``important information'' from the articles. This information will be collated into summaries using simple linguistic techniques. This years' project uses a corpus of documents obtained from Telstra, which are more homogeneous than those used in last year's work. This feature is expected to have some impact on the summarization techniques being applied.

    A student undertaking this project should have some grounding in statistics.

    SPIM Tutor

    Ingrid Zukerman and Ron Pose

    This project is a continuation of a 1998 project. It consists of designing and implementing a computerized tutor based on an animation process which illustrates the workings of SPIM, a MIPS simulator which is used for teaching computer architecture at first and second year level. The animation, which was implemented in 1998, highlights the different components of the architecture and the data flow for a small set of MIPS instructions. The animator may be accessed at this web site. The computerized tutor will determine parts of the subject in which the student's knowledge is inadequate. This will be done based on traces of the data path clicked by the student. The tutor will then decide on appropriate help actions, e.g., presenting explanations of architecture components or giving hints about how to proceed. Some of the data collection software required to support the tutor was implemented over the summer of 1999.

    Other Projects

    Ingrid Zukerman

    Projects proposed by students will also be considered, in particular in the area of Natural Language Processing.
     


    Joint Earth Sciences Honours projects.

    Epsilon Lab and/or AGCRC Graduate Projects 1999:

    There is a possibility of some funding with these projects. Students interested should speak to Mark Jessell, and also Ann Nicholson, the honours coordinator before nominating these projects in their preference list.
    Mark Jessell
    Australian Geodynamics Cooperative Research Centre
    Dept of Earth Sciences 
    Monash University, Clayton, VIC, 3168 
    Australia
    
    mark@earth.monash.edu.au
    
    Tel (61)(3) 9905 4902 
    Fax (61)(3) 9905 4903
    Home-Page. List Not Complete (5/2/99)

     
    Disclaimer