MONASH University, Information Technology, Clayton School of I.T.
[server] [courses] [staff] [students] [research] [Staff directory] [A-Z] [Site map]

BCS, BDS, BSE Honours Projects 2006

CSSE
BSE
 CSE4402
  2006
   allocation
  2005 list
gallery:
 BSE+H
 BCS/BDS

general:
BCS/BDS

Honours projects for the B. Computer Science (BCS), B. Digital Systems (BDS), and B. Software Engineering with Honours (BSE, CSE4402), for 2006. Note that BSE projects are 12-pts, and that BCS and BDS projects are 24-pts. Some projects come in 24- and 12-pt versions; the latter involves less work.

Supervisors as of Friday 3 March 2006.

Fill in a paper copy of the approriate preference form (fully and accurately!) for your degree, [/hons/2006/], by Friday of week-1 of semester. Even if you think that you have secured a specific project, still fill in the form completely.

Check www.csse.monash.edu.au/courseware/cse4402/2006/offerings.shtml from time to time for changes.


Support Vector Machines (12 or 24-pts)
David Albrecht and Lloyd Allison.
Project-id: Alb-SVM
Support Vector Machines (SVMs) are a recent machine-learning technique. SVMs can be used as 'classifiers', e.g. given medical data on a patient predict whether or not (s)he has condition X. (Other kinds of SVM can also be used to fit functions.) Unfortunately it is rarely possible to be 100% accurate in the real world. All machine-learning methods, SVMs included, must also combat over-fitting -- is a complex SVM enough of an improvement over a simple SVM to be justified? The project is to (i) investigate the 'types' (as in data-type) of SVMs, (ii) implement new cost/benefit criteria for learning classifier-SVMs within a Monash library of machine-learning routines, and (iii) compare the prediction performance of the SVMs learned against those learned by other systems.

Compression of DNA and Protein Sequences for Knowledge Discovery (24 or 12-pts)
Lloyd Allison
Project id: LA-bioinformatics
Most of the well-known file-compression algorithms fail to beat the trivial 2-bits/symbol {A->00, C->01, G->10, T->11} compression level on DNA sequences. Nor are they able to significantly compress protein sequences (where |alphabet|=20). Compression=modelling+coding, and arithmetic coding has solved the coding part of the compression problem, so better models must be necessary. A model by Allison, Edgoose and Dix [Intell. Sys. in Mol. Biol. '98] gives good compression but we want an algorithm which is much faster -- one that is suitable for interactive use on long sequences; there has been recent progress. The project is to further develop improved models and algorithms for compression of biological data to discover new features.
 
Machine Learning + Haskell + templates or generics (24 or 12-pts)
Lloyd Allison
Project id: LA-haskell
Haskell (www.haskell.org) is a lazy functional programming language. I have been using it for Bayesian machine learning, see J. Functional Programming, 15(1), pp.15-32, 2005, doi:10.1017/S0956796804005301, e.g. for classification (clustering), decision trees, time-series, mixture models, segmentation, and for [Bayesian networks] which include discrete and/or continuous and/or structured attributes (variables).
'Template-Haskell' (TH) is a Haskell library where a Haskell program can generate part of the code of other Haskell programs, and do this in a type-safe manner. That is a program can write all or part of another program.
The project is to investigate the use of TH, 'generic Haskell' and similar "meta-programming" extensions to Haskell to make it easier to process multi-variate data-sets, e.g. data held in Excel spread-sheets, 'csv' and similar file formats. Applications include data-mining Biomedical data, and Search and Rescue data on Missing People -- who, where, how found, when, in what condition, etc..
 
Large, Low-Cost Digital Display (24 or 12-pts)
Lloyd Allison
Project-id: LA-display
If you are inventive, like robotics, and can build things: The project is to design and build a very large but cheap digital display. The target size is about 2m×2m with a resolution of at least 1cm/pixel, say 40,000+ pixels. There is at least one readily available gadget from which such a large display could be constructed. Display speed is "not the highest priority", e.g., a rewrite time of several seconds could be acceptable in some applications. Small prototype(s) would be built before scaling-up. (You must discuss this project with LA before nominating it.)
 
Algorithms to Find Approximate Palindromes (24 or 12-pts)
Lloyd Allison
Project id: LA-palindromes
Palindromes (actually reverse-complementary ones) are significant for DNA and for RNA because they can fold into biologically active structures. However the typical bio-palindrome is approximate, not exact! Given a string, s, of symbols, a palindrome is a substring p=s[i..j] that reads the same forwards and backwards, i.e. either p=t;u or p=t;c;u where c is a symbol and u=rev(t) is the reverse of t. But in an approximate palindrome, p=t;u or p=t;c;u where u can be approximately, not necessarily identically, equal to rev(t). The project is to develop, test, and apply fast algorithms to find approximate palindromes in very long strings. See [http://arxiv.org/pdf/cs.DS/0412004]
 
Gnumeric + Plugins + Haskell (24 or 12-pts)
Lloyd Allison
Project id: LA-gnu
Gnumeric (www.gnumeric.org) is an open-source spread-sheet. It allows a programmer to add new 'plug-ins'; these are often written in C or C++. Recently Pang et al showed that "Haskell can be comfortably used as a statically typed extension language, and that it can support type-safe dynamic loading of plugins using dynamic types...."; see [http://www.cse.unsw.edu.au/~dons/hs-plugins/paper/]. The project is to investigate writing Haskell plugins (particularly plugins for Bayesian machine learning) for gnumeric, possibly using the work of Pang et al.
 
Machine Learning + Haskell + cluster computing (24-pts)
Lloyd Allison
Project id: LA-cluster
Haskell (www.haskell.org) is a lazy functional programming language with parametric polymorphic types, and type-classes. It has extensive libraries for many purposes. I have been using it for Bayesian machine learning, see http://dx.doi.org/10.1017/S0956796804005301. The project is to investigate cluster computing and/or parallel Haskell to speed up Haskell-based machine learning on very large data sets.
Causal-Model of Movie Data
Lloyd Allison and Kevin Korb
Project-id: LA-movies
The project is to learn a causal model to predict the success of a movie before its release. Some attributes such as genre, budget and star(s), are available before opening night. A lot of people would dearly like to be able to predict the initial and long-term 'success' of a movie - which is only really known after release.
The project is to use CaMML (Causal discovery via Minimum Message Length) to build a model of post-release attributes and pre-release attributes of movies.
Modelling Sinews, Fillets & Lattices (12 or 24-pts)
Alan Dorin
Project-id: Dorin-sinews
The components of many organisms and their secretions consist of what may loosely be labelled "sinewy networks". The junctions in these networks are often gently filleted, creating a web-like mesh of holes and curves that gives the structure a biological feel. The purpose of this project is to devise a technique for automating the process of generating the geometry of these meshes. The software will be parameterized to vary the extent to which the meshes are filletted, allowing them to resemble architectural forms of different kinds, whether they be made by animals (webs, shells, skeletons), erosion (rocks hollows and caves), plants (patterns in bark, veins on leaves, tree roots) or people (architectural lattices). The software will also allow automatic generation of the network itself and should allow the user to parameterize the form of the lattice in general terms from completely irregular, through approximately regular, to highly geometric. Meshes may be surfaces or solid structures. Images and details are online: ~aland/honoursProjectIdeas.html.

Rating and ranking players and teams using MML (12 or 24 pts)
David Dowe
Project-id: Dowe-ranking
Ratings are used in chess, tennis, golf, other sports, etc. in order to rank both teams and individual players. A variety of systems are used to do the ratings and rankings - including Elo, Glicko and systems more concerned with prize money (over the past 12 months).
We can re-visit and improve these systems using the Minimum Message Length (MML) principle (Wallace and Boulton, 1968)(Wallace and Freeman, 1987)(Wallace and Dowe, 1999a, which is the Computer Journal's most downloaded article)(Wallace, posthumous, 2005)(Comley and Dowe, 2005) to arrive at a comparatively simple model with an improved and near-optimal predictive accuracy on future games and contests.
This project will require strong mathematics - calculus (partial derivatives, second-order partial derivatives, integration by parts, determinants of matrices, etc.), etc.
References: CoDo2005 Comley, Joshua W. and D.L. Dowe (2005). Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages, Chapter 11 (pp265-294) in P. Gru:nwald, I. J. Myung and M. A. Pitt (eds.), Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9 [Final camera-ready copy submitted Oct. 2003.]
Wall2005; WaBo1968; WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, pp270-283. [This is the Computer Journal's most downloaded article.] WaFr1987.
 
Econometric, statistical and financial time series modelling using MML (24 pts)
David Dowe
Project-id: Dowe-econ
Time series are sequences of values of one or more variables. They are much studied in finance, econometrics, statistics and various branches of science (e.g., meteorology, etc.).
Minimum Message Length (MML) inference (Wallace and Boulton, 1968) (Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace, posthumous, 2005)(Comley and Dowe, 2005) has previously been applied to autoregressive (AR) time series (Fitzgibbon et al., 2004), other time series (Schmidt et al., 2005) and (at least in preliminary manner) both AR and Moving Average (MA) time series (Sak et al., 2005).
In this project, we apply MML to the Autoregressive Conditional Heteroskedasticity (ARCH) model, in which the (standard deviations and) variances also vary with time. Depending upon progress, we can move on to the GARCH (Generalised ARCH) model or Peiris's Generalised Autoregressive (GAR) models, or to inference of systems of differential equations.
This project will require strong mathematics - calculus (partial derivatives, second-order partial derivatives, integration by parts, determinants of matrices, etc.), etc.
References: CoDo2005 Comley, Joshua W. and D.L. Dowe (2005).
FiDV2004 Fitzgibbon, L.J., D. L. Dowe and F. Vahid (2004).
SaDR2005; ScPL2005; Wall2005; WaBo1968; WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, pp270-283. [This is the Computer Journal's most downloaded article.] WaFr1987
 
Using MML to infer language history and evolution (and relation to DNA) (24 pts)
David Dowe
Project-id: Dowe-history
The evolution of human languages raises several interesting issues, such as how languages evolve, how the evolution of spoken language relates to the evolution of written language, how it is that geographical regions of related languages can surround one or more regions of languages not related, how populations of language speakers migrated eons ago and possibly also how spoken language relates to DNA. Study of this area also helps with the inference of now-extinct ancestral languages and at least indirectly with the preservation of dying languages. The project will use the Minimum Message Length (MML) principle (Wallace and Boulton, 1968) (Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace, posthumous, 2005) (Comley and Dowe, 2005), building upon earlier work in (Ooi and Dowe, 2005).
The project will require strong mathematics - calculus (partial derivatives, second-order partial derivatives, integration by parts, determinants of matrices, etc.), etc.
References: CoDo2005 Comley, Joshua W. and D.L. Dowe (2005).
OoDo2005 Ooi, J.N. and D. L. Dowe, Inferring Phylogenetic Graphs of Natural Languages using Minimum Message Length, Proc. CAEPIA 2005, Vol. 1, pp I:143 - I:152, Nov. 2005.
Wall2005; WaBo1968 Wallace C.S. & Boulton, D.M. (1968)
WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, pp270-283. [This is the Computer Journal's most downloaded article.] WaFr1987.
 
Recognising, communicating with & quantifying varieties of intelligence (24 pts)
David Dowe
Projecy-id: Dowe-intell
"Intelligence" is a term used variably to describe a variety of human aptitudes (memory, mathematical ability, ability to make inductive inferences), terrestrial non-human aptitudes and (www.SETI.org) something sought in extra-terrestrials. Humans share much with other humans and share at least a planet with terrestrial non-humans, but how much must humans share with extra-terrestrials or one extra-terrestrial with another?
Dowe and Hajek (1997, 1998) discuss the relevance of the information-theoretic Minimum Message Length (MML) principle (Wallace and Boulton, 1968)(Wallace and Freeman, 1987)(Wallace, 2005) to the inductive inference part of intelligence, and Hernandez-Orallo and Minaya-Collado (1998) discuss the relevance of Kolmogorov complexity to intelligence - and MML is intimately related to Kolmogorov complexity (Wallace and Dowe, 1999a).
This project concerns the use of information theory, MML and Kolmogorov complexity to recognise signs of intelligence and ways of communicating with it. This will inevitably entail measuring - or attempting to measure - the intelligence. Another direction in which the project might head is to study how intelligence of a system increases (or possibly doesn't increase) when there is communication involving more than one intelligent entity. The project might also involve the study of communications between terrestrial non-humans such as, e.g., marmosets and dolphins. The project will require strong mathematics.

Incorporating graph-layout techniques in interactive diagram editors (12 or 24-pts)
Tim Dwyer & Kim Marriott
Project_id: Dwyer-graph
Graph layout is an active area of computer-science research exploring techniques for automatically generating drawings of relational networks. The typical problem considered by most graph drawing algorithms is to produce a drawing from the data in a single pass. However, in practice the underlying network may be evolving or the user may want more control over the layout, so incremental methods are required. We would like a student to explore the type of interactivity required when incorporating graph drawing techniques into a popular vector-graphics drawing tool. The project will involve implementing some graph drawing algorithms, looking at ways to make them incremental and an exploratory evaluation to study the effectiveness of different types of interaction.
 
Also see [G de la B] & M.

Chromatic roots of Boolean functions (20 pts or 12 pts)
Graham Farr
Project-id: Farr
A colouring of a graph G is an assignment of a colour to each vertex so that adjacent vertices receive different colours. Sometimes one wants to count the number of colourings of a graph with a given number q of available colours. The function P(G;q) of G and q that does this is called the chromatic polynomial of G. This function plays an important role in the theory of counting problems on graphs.
The chromatic polynomial of any graph has a number of zeros, i.e., real or complex numbers z such that P(G;z)=0. These zeros are often known as chromatic roots. The patterns formed by chromatic roots in the complex plane can be very intriguing, and many important unsolved problems in graph colouring amount to questions about these roots.
Some of the theory of chromatic polynomials has been generalised to arbitrary Boolean functions by Kung in 1980. However, no-one has yet investigated the chromatic roots of Boolean functions.
Aim: To investigate the chromatic roots of Boolean functions. This will entail: learning about chromatic polynomials and their extension to Boolean functions; writing programs to compute these generalised chromatic polynomials of Boolean functions; using existing software to find and plot the complex zeros of these functions; writing programs to generate large numbers of Boolean functions of various kinds, in order to investigate the behaviour of their chromatic roots; trying to form hypotheses about the behaviour of the roots; and perhaps even trying to prove some theorems.
You must discuss this project with GF before you nominate it.

The Best Way to Schedule (12 or 24-pts)
Maria Garcia de la Banda and Mark Wallace
Project-id: GBW-schedule
Scheduling problems appear in all walks of life: from airlines to hospitals, and from offices to sports leagues. They involve performing a set of jobs, each requiring a sequence of tasks to be completed using a set of resources under some time constraints. The specific resource and duration for each task is fixed, and two tasks can't use the same resource at the same time. As a simple example, consider a computer game in which the components of a gravity gun, a radiation suit and a bullet-time disruptor need to be built according to the following schedule:
           |     Task_1     |     Task_2     |    Task_3
-------------------------------------------------------------
Grav. Gun  |  6 hours at M1 | 50 hours at M4 | 17 hours at M3
Rad. Suit  | 25 hours at M2 | 15 hours at M1 | 10 hours at M4
Disruptor  | 16 hours at M1 |  5 hours at M2 | 20 hours at M3
where M1,M2,M3, and M4 are four machines. The aim is to schedule the tasks to complete all jobs in the minimum amount of time. These problems can be modelled by associating a variable with the start time of each task (9 variables in our example above), and searching for the best compatible start times. Alternatively, they can be modelled by associating a variable with each pair of tasks on each resource (6 variables), and searching for the best task-orders. This project addresses the assumption that underlies many scheduling algorithms: that deciding task-orders is better. When, if ever, is this assumption false?
Ref: An Algorithm for Solving the Job Shop Problem, Carlier and Pinson, Management Science Feb. 1989
 
Automatic analysis of "no holes" in constraint programs (12 or 24-pts)
Maria Garcia de la Banda, Kim Marriott
Project-id: GBM-constraint
Constraint satisfaction problems involve a set of variables, their domains (i.e., their set of possible values), and a set of constraints determining the allowed combination of values. For example, the N-queens problem tries to place N queens on an NxN chessboard so that they cannot take each other. It can be modelled with Q1,...,QN variables (the N queens), each Qi with domain {1,...,N} (the N columns in which Qi can appear), and constraints ensuring no two queens are in the same column or diagonal.
Solvers collect the constraints and determine their effect on the variables. Finite domain (FD) solvers handle variables with finite domains (e.g., each Qi above has a finite domain of N elements). FD solvers often handle constraints by using either domain or bounds propagation. While the latter is usually less powerful, it is also faster. [1] has shown that domain propagation can be replaced by the more efficient bounds propagation without loss of power if no constraint creates a "hole" in the domain of any variable. While [1] proposes a program analysis to automatically infer the "no hole" property from the problem, the analysis was not implemented. Thus, it is not known how effective it is in practice.
The aim of this project is to implement the analysis, systematically test it on a wide range of problems, determine whether it is accurate enough to effectively guide a domain-to-bounds transformation, and if not, study possible extensions.
[1] Schulte and Stuckey. When do bounds and domain propagation lead to the same search space. ACM. PPDP 2001.

Implementation of a Password Capability Unix Filesystem (12 or 24-pts)
Carlo Kopp
Project-id: Kopp-walnut
The Walnut password capability operating system has demonstrated the utility and unique access control and security properties inherent in this class of operating system. The aim of this project is to exploit earlier research effort in this area to implement and test an alternative filesystem for Unix (Linux) platforms. This project is suitable for a student with prior Linux kernel experience.
Needs a good Linux-internals background. BCS/BDS student, or a strong BSE student.
 
Suburban Ad Hoc Networks (24-pts)  
Carlo Kopp & Ron Pose
Project-id: Kopp-sahn
The Suburban Ad Hoc Networking group focusses its research activities on techniques for implementing Suburban Ad Hoc Networks. These are self organising, quasi-static ad hoc (typically wireless) networks which provide an alternative technology for providing high speed digital connectivity to households, small businesses and distributed campuses. Specific areas of research interest include security, low level routing protocols, access controls and propagation behaviour. Given the broad scope of the research performed in this area, there is considerable choice in project topics. Students should consult Dr Ronald Pose or Dr Carlo Kopp for suitable project topics.
See [/research/san/].

Experimental Ethics (12 or 24-pts)
Kevin Korb & Ann Nicholson
Project-id: Korb-ethics
This project will use evolutionary Artificial Life techniques to examine the evolution of ethical and social behavior. Classically, related techniques have been applied to game-theoretic problems such as the iterated prisoner's dilemma (IPD). Here we will evolve some varieties of altruistic behavior (self-sacrifice, resource sharing, predator warnings, reciprocal sharing, communication, etc.) and of selfish behavior (theft, thuggery, deception, etc.). The ethical merits of different actions (assessed in terms of expected utility) and their fitness and evolutionary stability in various environments will be the main foci of experimental simulations. We may also look at the evolution of utility itself.
 
Causal Discovery (12 or 24-pts)
Kevin Korb
Project-id: Korb-cause
Causal discovery algorithms learn causal Bayesian networks from data. The oldest of them dates from 1991. At Monash we have developed CaMML (Causal discovery via Minimum Message Length), which "data mines" observational data to find the causal model most probable in light of the data. In this honours project you may choose any one of a number of possible specific problems to investigate, including:
- The proper evaluation of causal discovery algorithms. The common uses of predictive accuracy, unweighted edit distance and Kullback-Leibler divergence are all demonstrably inadequate. We have an alternative "Causal Kullback-Leibler" which is superior, but needs further work. This is also related to metrics of causal power that we are developing, which assess how much causal influence one variable has over another.
- Integrating latent (unobserved) variables into our causal discovery algorithms.
- The philosophy of token causality explicated in terms of causal models (Twardy & Korb "A Criterion of Probabilistic Causation" Phil of Sci, 2004; Korb, Twardy, Handfield and Oppy "Causal Reasoning with Causal Models" Synthese, submitted)
- The mathematics of causal intervention (Korb & Nyberg "The Power of Intervention" Minds and Machines, 2006; Korb, Hope, Nicholson and Axnick "Varieties of Causal Intervention" PRICAI, 2004)
 
Classification Tree Search (12 or 24-pts)
Kevin Korb
Project-id: Korb-tree
Classification trees are a well-established data mining technique, for supervised machine learning. At Monash we have programs which do greedy search for MML trees and graphs with lookahead; we have also used genetic algorithm search. This project will apply Markov Chain Monte Carlo search to estimating the posterior probability distribution over trees and graphs, which should be more effective. If there is opportunity we can also look at an ensemble search, which finds the best collection of trees and graphs for weighted predictions.
 
Also see [Allison and Korb] and [Nicholson and Korb]

Autonomous aerial vehicle (fixed wing): sensing and flight control (12 or 24-pts)
Bijan Shirinzadeh and Gordon Lowe
Project-id: Lowe-A1
This project focuses on research and development of autonomous flying drones - i.e. fixed wing unmanned aerial vehicles. The project aims to develop integrated sensing capabilities such as inertial mesaurement unit (IMU), global positioning system (GPS), vision, etc. for the autonomous aerial vehicles. One or more of the following aircrafts will also be used as experimental platforms: Apachee Trainer, F/A-18 Hornet, Mig29-Fulcrum, and Kangaroo.
Autonomous aerial vehicle (helicopter): experiments in sensing and control (12 or 24-pts)
Bijan Shirinzadeh and Gordon Lowe
Project-id: Lowe-A2
This project focuses on research and development of autonomous flying robots - rotary wing unmanned aerial vehicles. The project entails development of control techniques and mission planning methodologies. One or both of the following helicopters will be used: Shuttle-Z Trainer, JR Ergo (Payload: 5 kg).
3D part modelling for robotic assembly (12 or 24-pts)
Bijan Shirinzadeh and Gordon Lowe
Project-id: Lowe-parts
This project focuses on research and development of an image processing system to monitor the assembly process of parts. A key aspect of the project will be the integration of three fire wire cameras to produce 3D models of parts, along with feature extraction.
Biped robot simulation (12 or 24-pts)
Gordon Lowe
Project-id: Lowe-walk
This project focuses on developing a simulation of a biped robot which resides in the Robotics Laboratory. The simulation will assist in the study of biped walking actions and may result in direct control of the biped robot. No prior knowledge of robotics is necessary for this research project.

The Software Engineering Side of Business Process Reengineering (24-pts)
Christine Mingins
Project-id: Mingins
Standard Software Development methods employ programming languages that combine logic and data to form applications software. With this approach business rules and business processes are 'hard wired' into the code. New technologies such as WebSphere-MQ Workflow and Windows Workflow Foundation allow programs to be expressed as declarative, long-running processes called workflows.
How easy is it to retrofit workflow into existing applications? What is involved in separating out the business process layer and migrating existing software? What process reengineering and refactoring needs to be done?
This project has two goals: 1. evaluate the capabilities of workflow engines from the perspectives of platform and language independence, and accessibility for domain experts. 2. Propose, prototype and evaluate a method for retrofitting workflow capabilities into existing software applications.
The project is sponsored by an industry partner, [Echo-Services] who will provide the legacy software, and any other resources necessary.

Bayesian Networks for Epidemiology (12 or 24-pts)
Ann Nicholson and Kevin Korb
Project-id: NK-epidemiology
The project is to apply Bayesian Networks (BNs) to epidemiological data to learn factors that are predictors of good, or bad, health. Models will be implemented in CaMML [CaMML], a successful BN learner being developed at Monash, see [CaMML(click)]. The project will involve extending CaMML in various ways, applying it to epidemiology data, and incorporating expert knowledge into the networks.
 
Knowledge Engineering Dynamic Bayesian Networks for Ecological Risk Assessment (12 or 24 pts)
Ann Nicholson and Carmel Pollino (Monash Water Studies)
Project-id: Nich-Poll-ecology
Bayesian Networks (BNs) are graphical models for probabilistic reasoning, which are now widely accepted in the AI community as intuitively appealing and practical representations for reasoning under uncertainty. One use of BNs is for prediction, and within that general task, for the problem of risk assessment. We have two current projects in ecological risk assessment: (a) assessing the risk for native fish populations of water management interventions and (b) risk management of tropical seagrass. However, to date, the BN modeling in these projects has been done with so-called "static" BNs, where there is no explicit representation of time. Clearly, temporal modelling in such prediction tasks is very important, and it could (and should) be done using an existing sextension to BNs, called Dynamic Bayesian networks. The aim of this project is to develop knowledge engineering techniques and methodologies for Dynamic Bayesian networks, using the ecological risk assessment domains as case studies.
 
Also see [Korb & Nicholson].

Dependable Software Services (12 or 24-pts)
Sita Ramakrishnan
Project-id: SR-services
As Service-oriented architecture (SOA) is being considered increasingly in critical applications by distributed enterprises, the provision of dependable services is becoming an important area of research. In this project, we consider dependability attributes of a software service in a sustainable manufacturing domain. Sustainable manufacturing is the employment of eco-efficient technologies and industry standards for engineering manufacturing systems to minimise environmental burdens of green house emissions and energy use. Dependability attributes can be expressed in terms of timeliness, availability and reliability of the correct service, and trustworthiness attributes such as confidentiality, integrity and maintainability of deployed services.
Providing a predictable level of dependability for services which have been composed from services from various providers is an important requirement. Standards such as SOAP, UDDI and WSDL [1] are being adopted by major web service providers. These services need to establish and adhere to standards and hence, the dependability of service will become a differentiating point for services. Existing standards do not provide sufficient information to make decisions about how dependable and available the various services & components are, and how they fail. Some of the challenges that we will be dealing with respect to component interactions and composition of services in the sustainable manufacturing domain are: how to ensure trust in correctness when dealing with global partners and services composed using their services how to ensure reliability when composing services from these partners who are outside the control of the system ...[see SR for more information]

Feature Weighting in Content Based Image Retrieval (24 or 12 pts)
Sid Ray
Project-id: Ray-feature
The purpose of a Content-Based Image Retrieval (CBIR) system is to retrieve images from a database such that the visual contents of retrieved images are similar to that of a query image. Often the image similarities are computed from the representation of visual contents such as colour, texture and shape, in terms of a multi-dimensional feature vector. Understandably, the features cannot be assumed to have equal weights. More important features deserve more weights and less important features deserve less weights. These weights can be assigned fully automatically from feature variation statistics in the database images. Better still is to use the so called relevance feedback (RF) mechanism in which the weights are modified iteratively based on users' response regarding the relevance of the retrieved images. The aim of this project is to investigate the feature weighting problem for both cases, namely, without RF and with RF.
 
Small Sample Size Effects on CBIR Accuracy (24 or 12 pts)
Sid Ray
Project-id: Ray-sample
In recent years Content-Based Image Retrieval (CBIR) has become one of the most active areas of research in image data mining. In studies involving the design of a CBIR system often the question arises whether the data set sizes used are big enough to rely on the accuracy achieved. The aim of this second project in CBIR is to study the impact, on accuracy, of the number of samples in the database, the number of samples per semantic category, the number of retrieved images returned to user for RF, the number of features used, and their relative sizes.
 
Hierarchical Feature Selection for Pattern Recognition (24 or 12 pts)
Sid Ray
Project-id: Ray-hierarchy
In statistical pattern recognition, often patterns are represented by a large number of numerical features. Although there is no conceptual justification in reducing the number of features to a small number, in practical problem solving, this becomes a necessary step due to the wellknown phenomenon of the 'curse of dimensionality' of the feature vector on the complexity of the pattern classifier. The aim of the proposed project is to develop a hierarhical feature selection paradigm that is well-suited to multi-class pattern recognition problems.
The project will comprise (i) study of some existing feature selection criteria followed by an experimental investigation of them, (ii) analysis of the above results leading to, hopefully, a hierarchical feature selection paradigm, and (iii) development of an interactive software tool for the above paradigm.
The methodology developed will be such that depending on the classification accuracy arrived at a certain stage, the user will have the option of increasing or decreasing the dimensionality value. The software tool will include procedures for displaying the distribution of pattern samples in different feature spaces obtained by different feature selection methods.
 
Texture Analysis of Images (24 pts)
Sid Ray
Project-id: Ray-texture
Texture plays an important role in both human interpretation of visual scenes and computer analysis of images. Textural cues are of particular relevance in two different, but related, image analysis problems, namely, the problems of segmentation and classification of images. The proposed project will deal with both of these problems. It will involve (i) investigating existing texture analysis methods from the point of view of their theoretical soundness as textural measures and (ii) investigating their practical applicability.

HWS
Heinz Schmidt
Project-id: HWS-<something>
H. tells me that he has some projects, but the list is not available here. Please identify any such project very carefully if you have discussed it with H. and it is one of your preferences.

Approximating frequency distributions by mixture models (24-pts)
Peter Tischer
Project-id: Tischer-freqs
Often when we are trying to estimate the probability associated with a set of events we compute the relative frequency distribution. A question we may ask is whether the relative frequency distribution can be adequately described as a mixture of probability distributions where the components in the mixture are in some sense simple probability distributions. If we can can find a good approximating mixture model we may be able to show that the events can be thought of as originating from different sub-populations of the overall population. Separating things into sub-populations is important in data mining and machine learning.
This project aims to come up with good mixture models for a given relative frequency distribution by using MML techniques. In the first place, the problem will be approached by treating it as a lossless data compression problem. This can be seen as a simpler form of MML. Should time permit, the problem might be approached using MML in its general form.
It would be an advantage to have a maths background and to be taking the honours unit on 'MML'.
Document Similarity Measurement via Data Compression (12 or 24-pts)
Peter Tischer
Project-id: Tischer-docs
In document classification it is often desirable to measure how similar two documents are. This might be because we are interested in saying whether documents are spam or whether the source code of two programs is so similar that at least program is an example of plagiarism.
If two documents are similar, we would expect that we would require little information to know the contents of the second document given that we already know the contents of the first. One way to measure information is by using data compression techniques as the compressed size is an upper bound on the amount of information needed to store the original data.
The aim of the project is to adapt techniques from the area of text compression to compress the data which shows how one document can be created from another. Of particular interest is coming up with similarity measures that are also distance functions.
Scientific Visualisation using MCST Projections (24-pts)
Peter Tischer
Project-id: Tischer-vis
In Scientific Visualization we are often interested in representing higher dimensional data in some lower number of dimensions such as 2,3 or 4. Minimal Cost Spanning Trees (MCSTs) are simple to compute and are a way of preserving closeness information in representing higher dimensional data.
Current methods for carrying out projections for visualizing higher dimensional data tend to be computationally expensive with results that are sensitive to parameters within the algorithm that are difficult to tune. These methods are also sensitive to the dimension to which the projection is being made. MCST-based projection does have these drawbacks and can be engineered to preserve properties of the original higher dimensional data.
This project will involve implementing a package that uses MCSTs for visualizing higher dimensional data in lower dimensions such as 2, 3 or 4 and comparing the performance of this package against packages incorporating techniques like SOMs (Self Organising Maps).
Editor for Image Segmentations (12-pts)
Peter Tischer
Project-id: Tischer-seg
A fundamental activity in image processing is to segment the pixels of an image so that pixels which belong together are in the same segment. Human beings are extremely good and quick at segmenting images but it is tedious and time-coonsuming for humans to segment digital images manually because of the huge numbers of pixels involved. Computer programs can segment images quickly but the resulting segmentation may be one which has obvious flaws which a human may want to correct.
An obvious idea is to combine both human input and computer processing of digital images to produce a semi-automatic approach for image segmentation. Given an initial segmentation which may have been produced either by a program or by a human, the image editor allows a human to alter that segmentation interactively. The editor should allow for a wide range of image types such as greyscale and colour, 2D and 3D. Note, no prior knowledge of image processing is necessary
Mixture Modeling used in Analysis of DNA Microarray Data (24-pts)
Peter Tischer
Project-id: Tischer-micro
Mixture modeling, also known as clustering, involves taking a set of observations about things and deciding whether the things all belong to one population or whether the things come from a number of different sub-populations. DNA microarray data is a 2D array where the rows or columns may represent either a subject of an experiment or a partcular gene. The entries in the microarray represent the activity of that gene for that subject.
The MML (Minimal Message Length) criterion was proposed by Professor Chris Wallace foundation professor of Computer Science at Monash University. MML has been used to create a clustering program called 'snob'. 'snob' has proven to be very effective in identifying the correct number of clusters but it is not used widely in bioinformatics where people continue to use a variety of techniques that are generally not able to identify correctly the number of clusters.
The aim of this project is to apply snob to the mixture modelling problems that arise in the analysis of DNA microarray data. Most clustering methods, snob included, are iterative and rely on taking an intial clustering and improving it at each iteration. One aim is to allow people to generate a clustering of the data and then to submit this an initial clustering to snob.
snob itself randomly creates initial clusters. By making certain assumptions about the nature of clusters it is possible to use simple clustering techniques like k-means to create initial clusterings that are likely to reveal the true cluster structure, if the data obeys the assumptions used in deriving the clustering method. With this approach snob would first try to look for certain kinds of clusterings before trying intial clusterings generated at random.
The Effect of Manipulating Data on Support Vector Machine (SVM) Performance (24-pts)
Peter Tischer and David Albrecht
Project-id: Tischer-SVM
Support Vector Machines (SVM)s are used mainly in supervised classification problems. Suppose we are given a body of data about things. One of the pieces of data about a thing may be a label. This may be an integer saying to which class or cluster the thing belongs. With an SVM we try to develop a function that can be given all the attributes for a given thing, other than its label, and tries to predict its label. For example, we might have the results of a range of tests that were carried out on a patient and the label may be whether the patient had cancer or not.
SVMs clasify data into two classes by finding a separating hyperplane. In two dimensions, the hyperplane is a straight line and we hope that all members of one class will be on one side of the line and all members of the other class will be on the other side of the line. An attractive feature of SVMs is that they can allow you to transform the attributes of the data in a nonlinear way and to find a separting hyperplane on the transformed data. This is equivalent to finding a nonlinear separating surface on the original data. In the 2D case, using SVMs we may be able to find a nonlinear separating surface such as an ellipse so that all members of one class are inside the ellipse and all members of the other class are outside the ellipse.
There are a number of packages available which enable people to compute SVMs. By preprocessing the data it should be possible to reduce the amount of data and to compute an SVM which performs as well on the reduced data as on the original data. In addition, the SVM packages often make assumptions about the way distances between things are computed. It is often assumed that the attributes of things are not correlated. If the attributes actually are correlated then preprocessing may produced transformed data for which the transformed attributes are not correlated.

Learning Under Resource Constraints (24-pts).
Geoff Webb
Project-id: Webb
Two projects available. These projects, as part of a large research team investigating the wider problem, will each tackle one aspect of the question of how to optimise machine learning in a context where the computational resources available for learning are unpredictable and varying. This is typical of many online applications where the user load varies greatly from time to time. Such applications include information retrieval, recommender systems, user modeling, junk email filters and online fraud detection.
 
Reading Assistance Program (24-pts)
Linda McIver and Stephen Welsh
Project-id: McWelsh
Special Requirements: Degree in psychology. Develop (or simply quote) a taxonomy of reading difficulties with a view to producing a catalogue of current assistive technologys and methodologys to aid the reading impaired overcome their difficulty.
For each difficulty, design several possible ways software&eye tracking could assist. These could be adaptations of the techs. and meths. from above and/or proposals for novel techniques.
Design and develop an application to implement _all_ of those possibilities. The software is required to permit selection between options, and refining options, to support different types of experiments. For example, it should be possible to run the application with restricted sets of options for different subjects in the experiments. The detection of the subject actually encountering difficulty in reading the presented text will form a major part of the project. The application should interface with an eye tracking system if/when it becomes available.
.
Linux
OpenOffice
Gimp

FIT units & help-desk

11/4/2006: Google has bought the rights to an [algorithm] from the UNSW School of Computer Science -- ABC TV, The Age, etc..

2006 Handbooks

Bibliographies:  [CS-bibs]
 [LA]
 [Mon.lib.]

Statistical & Inductive Inference by MML,
the MML book.





























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..




























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..




























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..




























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..




























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..




























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..




























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..




























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..




























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..




























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..




























































Try the [Bibs] for more on: Bayesian, capability, causal, classification, compression, colouring, computer, constraint, decision, discovery, functional, game, graph, graphics, image, IPD, knowledge, language, layout, MML, microarray, mixture, model, net, network, Nqueens, palindrome, processing, programming, scheduling, segmentation, SVM, tree, etc..

B. Software Engineering. Course code: 2770. URL: http://www.csse.monash.edu.au/courses/bse/
Created with "vi (Linux + Solaris)",   charset=iso-8859-1