(1) David Dowe's research interests and scope of offered projects. David Dowe is interested in Minimum Message Length (MML) inductive inference. The MML principle is particularly useful in machine learning, statistics, "knowledge discovery" and "data mining". Both theoretical and applied projects are available, some of which are listed below, and all of which you should feel free to discuss with David Dowe. Areas of interest include clustering and mixture modelling, the von Mises circular distribution, single and multiple factor analysis, supervised learning, decision trees and decision graphs with or without leaf regressions, sequentially and spatially correlated data, protein folding, DNA string alignment, the human genome project and market forecasting. All of these would be done by MML. I am also interested in MML inference of neural nets. There is no need to have done any 3rd year subject on AI for any of these projects. However, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML. If you've got any queries, feel free to e-mail dld@cs.monash.edu.au or drop by for a chat (CS Bldg. 26, Room 103). (2) Inference of Probabilistic Finite State Automata. David Dowe. Finite state machines can be used to model grammars or syntax. Some bodies of data can reasonably be assumed to have come from some underlying, but unknown, grammar (or finite state machine). When the data is of great interest to us, we will be interested in inferring the finite state automaton from which the data came. This project will use the Minimum Message Length (MML) principle and will be quite mathematical in nature. It will build upon work done at Monash by Wallace and Georgeff (1983) and more recently by some of their collaborators. Artificial sample data will initially come from generating data from some model and then seeing how well the program can discover it. Real sample data to be analysed will come from DNA, proteins and speech patterns. There is no need to have done any 3rd year subject on AI. A strong mathematical background will be required. Programming will almost certainly be in C. If doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML. (3) Supervised and unsupervised learning from correlated data. David Dowe. Researchers in machine learning, statistics, "knowledge discovery", "data mining" and prediction are interested in both unsupervised learning (clustering) of data, and supervised learning. Much data of interest has sequential or spatial correlations in it, such as financial price time series or images of a mining or other site. Much work has been done at Monash in the areas of unsupervised and supervised learning by Wallace, Dowe and others for uncorrelated data. Some work has also been done at Monash using these techniques for correlated data. This project will head in that direction. A reasonably strong mathematical background will be required. Programming will almost certainly be in C. This project is closely related to a project done by Russell Edwards in 1997. It has the potential to go in either parallel or tangential directions. There is no need to have done any 3rd year subject on AI. However, if doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML. (4) Inductive Logic Programming. David Dowe. This project is provisional. If you wish to nominate this project, it is necessary that you first consult Dr. David Dowe to obtain his consent. Logic programs enable us to express relations, relating objects to one another, such as parent(X, Y) & male(Y) -> son(Y, X). Sometimes the relations in data are not known and we wish to infer them. Even worse, sometimes the relationships that we want to infer from the data are slightly corrupted or "noisy". The project will initially involve understanding Minimum Message Length (MML) principles sufficiently well to be able to information-theoretically cost a logic program, and to also cost some data given such a logic program. The good news is that MML is good at dealing with noisy data. Given a body of data, programming will entail searching amongst a variety of logic programming, costing their message lengths (as above) in turn and trying to arrive at the one with the minimum message length. Published reading material exists in this area. If progress is going very well, constraint logic programming could explored as an optional non-compulsory extension. Artificial sample data will initially come from generating data from some model and then seeing how well the program can discover it. Real sample data to be analysed will come from DNA, proteins and other sources. A strong mathematical background will be required. Programming will almost certainly be in some version of Prolog and possibly also in C. If doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML. (5) Prediction Problems. David Dowe and Lloyd Allison. A string of text or any other body of data will compress if and only if it is not random. Prediction problems often involve inferring the value of one body of data from another, such as using interest rates to predict share prices or amino acid sequence to predict protein secondary structure. In this project, we initially consider an abstract mathematical problem of aligning two sequences with alphabets alphabet1 and alphabet2 respectively, where one of the sequences can be assumed to be random and the other can be used to depend on it. Some preliminary mathematics has already been developed and partially implemented for this problem using Minimum Message Length (MML), where we consider a random sequence of amino acids and a non-random dependent sequence of local protein structures. Financial markets are known not to be totally unpredictable, and the above general modelling lends itself well to financial, protein and other prediction problems. There is no need to have done any 3rd year subject on AI. However, if doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML. (6) Probabilistic Football Tipping System. David Dowe (and Graham Farr and John Hurst). This project is provisional. If you wish to nominate this project, it is necessary that you first consult Dr. David Dowe to obtain his consent. Attaching to predictions an indication of how certain the predictor is, and rewarding such predictions properly, are important issues in many fields. This project focuses on football tipping because it is topical, accessible and may be useful in teaching. The project is partly software engineering, and partly implementation of ideas concerning prediction and inference. For the last three years, the Department of Computer Science has run a football tipping competition in which participants must nominate, for each game, not only which team they think will win, but a probability that that team will win. Tips are scored according to a simple formula, and the theory is linked to information theory and gambling theory. This year the competition was extended to allow participants to nominate a mean and standard deviation for the margin of each game. Again, there is a soundly based way to score such tips. The competition is currently run using software written in C++ (with a curses interface) by John Hurst. The software is written as a literate program (nutweb), and managed by a version control system (RCS). The aim of the project is to implement new probabilistic football tipping ideas in software, and to extend the software so that the competition can be run over the World Wide Web. In more detail, the main tasks of the project are to: write a Web interface for the software, using Java; make the software more general, so that it can deal with prediction in other domains (such as, e.g., other sports, or the stock-market); allow the User to supply a formula for calculating predictions, if they wish to; have an `automatic' tipster based on a method of rating teams, like the Elo scheme for rating chess players; implement methods of quantifying how bold, or how cautious, a tipster is; implement, and study, methods of combining the predictions of individual tipsters in order to form better tipsters; implement measures of correlation between tipsters; improve the current method of calculation of probabilities (based on a Normal distribution) when tipsters nominate means and standard deviations for the margins; possibly, to try to infer the performance of teams from data on past performance. Programming in Java and C++ will be required. There is no need to have done any 3rd year subject on AI. However, if doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML. Prior knowledge of Australian Rules football is not essential. (7) Inductive Inference of Game Player Strategy. Graham Farr (and David Dowe). This project aims to develop programs for inferring something about a game player's strategy, just from records of games played. The sort of games considered are board games like Chess. A long term aim is to be able to take as input a record of the chess games of (e.g.) Garry Kasparov (World Champion), and infer something (but not everything!) about his chess playing strategy. This may be an ambitious goal, but we propose to move toward it in achievable steps. Initially, a program capable of inferring very simple aspects of strategy would be developed, and tested using records of games played by appropriately simple computer players. We have developed some basic methods for doing the inference, and expect to improve on them. Several types of inference are possible; among these, we intend to apply the principles of Minimum Message Length (MML) inference. Programming will almost certainly be in C. If doing this project, you are strongly encouraged to take my 4th Year Hons. 2nd semester subject CSC423 Learning and Prediction, which is the only Hons. subject teaching any amount of MML.