|
|
Honours projects for the
B. Computer Science (BCS),
B. Digital Systems (BDS), and
B. Software Engineering with Honours (BSE, CSE4402), for 2006.
Note that BSE projects are 12-pts, and that BCS and BDS projects are 24-pts.
Some projects come in 24- and 12-pt versions;
the latter involves less work.
Supervisors as of Friday 3 March 2006.
Fill in a paper copy of
the approriate preference form (fully and accurately!) for your degree,
[/hons/2006/],
by Friday of week-1 of semester.
Even if you think that you have secured a specific project,
still fill in the form completely.
Check
www.csse.monash.edu.au/courseware/cse4402/2006/offerings.shtml
from time to time for changes.
- Support Vector Machines (12 or 24-pts)
- David Albrecht and Lloyd Allison.
- Project-id: Alb-SVM
- Support Vector Machines (SVMs) are a recent machine-learning technique.
SVMs can be used as 'classifiers',
e.g. given medical data on a patient predict whether or not
(s)he has condition X. (Other kinds of SVM can also be used to fit functions.)
Unfortunately it is rarely possible to be 100% accurate in the real world.
All machine-learning methods, SVMs included, must also combat over-fitting --
is a complex SVM enough of an improvement over a simple SVM to be justified?
The project is to
(i) investigate the 'types' (as in data-type) of SVMs,
(ii) implement new cost/benefit criteria for learning
classifier-SVMs within a Monash library of machine-learning routines, and
(iii) compare the prediction performance of the SVMs learned
against those learned by other systems.
- Compression of DNA and Protein Sequences for Knowledge Discovery (24 or 12-pts)
- Lloyd Allison
- Project id: LA-bioinformatics
- Most of the well-known file-compression algorithms fail to beat
the trivial 2-bits/symbol {A->00, C->01, G->10, T->11}
compression level on DNA sequences.
Nor are they able to significantly compress protein sequences
(where |alphabet|=20).
Compression=modelling+coding, and arithmetic coding has solved the coding
part of the compression problem, so better models must be necessary.
A model by Allison, Edgoose and Dix
[Intell. Sys. in Mol. Biol. '98]
gives good compression but we want an algorithm which is much faster --
one that is suitable for interactive use on long sequences;
there has been recent progress.
The project is to further develop
improved models and algorithms for compression
of biological data to discover new features.
-
- Machine Learning + Haskell + templates or generics (24 or 12-pts)
- Lloyd Allison
- Project id: LA-haskell
- Haskell (www.haskell.org) is a lazy functional programming language.
I have been using it for Bayesian machine learning,
see J. Functional Programming, 15(1), pp.15-32, 2005,
doi:10.1017/S0956796804005301,
e.g. for classification (clustering), decision trees,
time-series, mixture models, segmentation, and for
[Bayesian networks]
which include discrete and/or continuous and/or
structured attributes (variables).
'Template-Haskell' (TH) is a Haskell library where a Haskell program
can generate part of the code of other Haskell programs, and do
this in a type-safe manner.
That is a program can write all or part of another program.
The project is to investigate the use of
TH, 'generic Haskell' and similar "meta-programming"
extensions to Haskell to make it easier to process multi-variate data-sets,
e.g. data held in Excel spread-sheets, 'csv' and
similar file formats.
Applications include data-mining Biomedical data,
and Search and Rescue data on Missing People
-- who, where, how found, when, in what condition, etc..
-
- Large, Low-Cost Digital Display (24 or 12-pts)
- Lloyd Allison
- Project-id: LA-display
- If you are inventive, like robotics, and can build things: The project
is to design and build a very large but cheap digital display.
The target size is about 2m×2m with a resolution of
at least 1cm/pixel, say 40,000+ pixels.
There is at least one readily available gadget from which such a large display
could be constructed.
Display speed is "not the highest priority",
e.g., a rewrite time of several seconds could be acceptable in
some applications.
Small prototype(s) would be built before scaling-up.
(You must discuss this project with LA before nominating it.)
-
- Algorithms to Find Approximate Palindromes (24 or 12-pts)
- Lloyd Allison
- Project id: LA-palindromes
- Palindromes (actually reverse-complementary ones)
are significant for DNA and for RNA because
they can fold into biologically active structures.
However the typical bio-palindrome is approximate, not exact!
Given a string, s, of symbols, a palindrome is a
substring p=s[i..j] that reads the same forwards and backwards,
i.e. either p=t;u or p=t;c;u where c is a symbol and u=rev(t)
is the reverse of t.
But in an approximate palindrome, p=t;u or p=t;c;u where
u can be approximately, not necessarily identically, equal to rev(t).
The project is to develop, test, and apply
fast algorithms to find approximate palindromes in very long strings.
See
[http://arxiv.org/pdf/cs.DS/0412004]
-
- Gnumeric + Plugins + Haskell (24 or 12-pts)
- Lloyd Allison
- Project id: LA-gnu
- Gnumeric (www.gnumeric.org)
is an open-source spread-sheet.
It allows a programmer to add new 'plug-ins';
these are often written in C or C++.
Recently Pang et al showed that "Haskell can be
comfortably used as a statically typed extension language,
and that it can support type-safe dynamic loading of plugins using dynamic
types...."; see
[http://www.cse.unsw.edu.au/~dons/hs-plugins/paper/].
The project is to investigate writing Haskell plugins
(particularly plugins for Bayesian machine learning)
for gnumeric, possibly using the work of Pang et al.
-
- Machine Learning + Haskell + cluster computing (24-pts)
- Lloyd Allison
- Project id: LA-cluster
- Haskell (www.haskell.org) is a lazy functional programming language
with parametric polymorphic types, and type-classes.
It has extensive libraries for many purposes.
I have been using it for Bayesian machine learning, see
http://dx.doi.org/10.1017/S0956796804005301.
The project is to investigate cluster computing and/or parallel Haskell
to speed up Haskell-based machine learning on very large data sets.
Causal-Model of Movie Data
Lloyd Allison and Kevin Korb
Project-id: LA-movies
The project is to learn a causal model to predict the
success of a movie before its release. Some attributes
such as genre, budget and star(s), are available before
opening night. A lot of people would dearly like to be able to
predict the initial and long-term 'success' of a movie -
which is only really known after release.
The project is to use CaMML
(Causal discovery via Minimum Message Length)
to build a model of post-release attributes and pre-release attributes
of movies.
- Modelling Sinews, Fillets & Lattices (12 or 24-pts)
- Alan Dorin
- Project-id: Dorin-sinews
- The components of many organisms and their secretions consist of what
may loosely be labelled "sinewy networks". The junctions in these
networks are often gently filleted, creating a web-like mesh of holes
and curves that gives the structure a biological feel. The purpose of
this project is to devise a technique for automating the process of
generating the geometry of these meshes. The software will be
parameterized to vary the extent to which the meshes are filletted,
allowing them to resemble architectural forms of different kinds,
whether they be made by animals (webs, shells, skeletons), erosion
(rocks hollows and caves), plants (patterns in bark, veins on leaves,
tree roots) or people (architectural lattices). The software will also
allow automatic generation of the network itself and should allow the
user to parameterize the form of the lattice in general terms from
completely irregular, through approximately regular, to highly
geometric. Meshes may be surfaces or solid structures.
Images and details are online:
~aland/honoursProjectIdeas.html.
- Rating and ranking players and teams using MML (12 or 24 pts)
- David Dowe
- Project-id: Dowe-ranking
- Ratings are used in chess, tennis, golf, other sports, etc. in order
to rank both teams and individual players. A variety of systems are
used to do the ratings and rankings - including Elo, Glicko and
systems more concerned with prize money (over the past 12 months).
We can re-visit and improve these systems using the Minimum Message
Length (MML) principle (Wallace and Boulton, 1968)(Wallace and Freeman,
1987)(Wallace and Dowe, 1999a, which is the Computer Journal's most
downloaded article)(Wallace, posthumous, 2005)(Comley and Dowe, 2005)
to arrive at a comparatively simple model with an improved and
near-optimal predictive accuracy on future games and contests.
This project will require strong mathematics - calculus (partial
derivatives, second-order partial derivatives, integration by parts,
determinants of matrices, etc.), etc.
References:
CoDo2005 Comley, Joshua W. and D.L. Dowe (2005). Minimum Message
Length, MDL and Generalised Bayesian Networks with Asymmetric
Languages, Chapter 11 (pp265-294) in P. Gru:nwald, I. J. Myung and
M. A. Pitt (eds.), Advances in Minimum Description Length: Theory
and Applications, M.I.T. Press, April 2005, ISBN 0-262-07262-9
[Final camera-ready copy submitted Oct. 2003.]
Wall2005;
WaBo1968;
WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length
and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4,
pp270-283. [This is the Computer Journal's most downloaded article.]
WaFr1987.
-
- Econometric, statistical and financial time series modelling using MML (24 pts)
- David Dowe
- Project-id: Dowe-econ
- Time series are sequences of values of one or more variables. They
are much studied in finance, econometrics, statistics and various
branches of science (e.g., meteorology, etc.).
Minimum Message Length (MML) inference (Wallace and Boulton, 1968)
(Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace,
posthumous, 2005)(Comley and Dowe, 2005) has previously been applied
to autoregressive (AR) time series (Fitzgibbon et al., 2004), other
time series (Schmidt et al., 2005) and (at least in preliminary manner)
both AR and Moving Average (MA) time series (Sak et al., 2005).
In this project, we apply MML to the Autoregressive Conditional
Heteroskedasticity (ARCH) model, in which the (standard deviations
and) variances also vary with time. Depending upon progress, we can
move on to the GARCH (Generalised ARCH) model or Peiris's Generalised
Autoregressive (GAR) models, or to inference of systems of
differential equations.
This project will require strong mathematics - calculus (partial
derivatives, second-order partial derivatives, integration by parts,
determinants of matrices, etc.), etc.
References:
CoDo2005 Comley, Joshua W. and D.L. Dowe (2005).
FiDV2004 Fitzgibbon, L.J., D. L. Dowe and F. Vahid (2004).
SaDR2005;
ScPL2005;
Wall2005;
WaBo1968;
WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length
and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4,
pp270-283. [This is the Computer Journal's most downloaded article.]
WaFr1987
-
- Using MML to infer language history and evolution (and relation to DNA) (24 pts)
- David Dowe
- Project-id: Dowe-history
- The evolution of human languages raises several interesting issues, such
as how languages evolve, how the evolution of spoken language relates to
the evolution of written language, how it is that geographical regions
of related languages can surround one or more regions of languages not
related, how populations of language speakers migrated eons ago and
possibly also how spoken language relates to DNA. Study of this area
also helps with the inference of now-extinct ancestral languages and at
least indirectly with the preservation of dying languages. The project
will use the Minimum Message Length (MML) principle (Wallace and Boulton,
1968) (Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace,
posthumous, 2005) (Comley and Dowe, 2005), building upon earlier work in
(Ooi and Dowe, 2005).
The project will require strong mathematics - calculus (partial
derivatives, second-order partial derivatives, integration by parts,
determinants of matrices, etc.), etc.
References:
CoDo2005 Comley, Joshua W. and D.L. Dowe (2005).
OoDo2005 Ooi, J.N. and D. L. Dowe, Inferring Phylogenetic Graphs of
Natural Languages using Minimum Message Length, Proc. CAEPIA 2005,
Vol. 1, pp I:143 - I:152, Nov. 2005.
Wall2005;
WaBo1968 Wallace C.S. & Boulton, D.M. (1968)
WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length
and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4,
pp270-283. [This is the Computer Journal's most downloaded article.]
WaFr1987.
-
- Recognising, communicating with & quantifying varieties of intelligence (24 pts)
- David Dowe
- Projecy-id: Dowe-intell
- "Intelligence" is a term used variably to describe a variety of
human aptitudes (memory, mathematical ability, ability to make inductive
inferences), terrestrial non-human aptitudes and (www.SETI.org)
something sought in extra-terrestrials. Humans share much with other
humans and share at least a planet with terrestrial non-humans, but how
much must humans share with extra-terrestrials or one extra-terrestrial
with another?
Dowe and Hajek (1997, 1998) discuss the relevance of the
information-theoretic Minimum Message Length (MML) principle (Wallace
and Boulton, 1968)(Wallace and Freeman, 1987)(Wallace, 2005) to the
inductive inference part of intelligence, and Hernandez-Orallo and
Minaya-Collado (1998) discuss the relevance of Kolmogorov complexity to
intelligence - and MML is intimately related to Kolmogorov complexity
(Wallace and Dowe, 1999a).
This project concerns the use of information theory, MML and
Kolmogorov complexity to recognise signs of intelligence and ways of
communicating with it. This will inevitably entail measuring - or
attempting to measure - the intelligence. Another direction in which
the project might head is to study how intelligence of a system increases
(or possibly doesn't increase) when there is communication involving more
than one intelligent entity.
The project might also involve the study of communications between
terrestrial non-humans such as, e.g., marmosets and dolphins.
The project will require strong mathematics.
- Incorporating graph-layout techniques in interactive diagram editors (12 or 24-pts)
- Tim Dwyer & Kim Marriott
- Project_id: Dwyer-graph
- Graph layout is an active area of computer-science research exploring
techniques for automatically generating drawings of relational
networks. The typical problem considered by most graph drawing
algorithms is to produce a drawing from the data in a single pass.
However, in practice the underlying network may be evolving or the user
may want more control over the layout, so incremental methods are
required. We would like a student to explore the type of interactivity
required when incorporating graph drawing techniques into a popular
vector-graphics drawing tool. The project will involve implementing
some graph drawing algorithms, looking at ways to make them incremental
and an exploratory evaluation to study the effectiveness of different
types of interaction.
-
- Also see [G de la B] & M.
- Chromatic roots of Boolean functions (20 pts or 12 pts)
- Graham Farr
- Project-id: Farr
- A colouring of a graph G is an assignment of a colour to each
vertex so that adjacent vertices receive different colours. Sometimes
one wants to count the number of colourings of a graph with
a given number q of available colours. The function P(G;q)
of G and q that does this is called the chromatic
polynomial of G. This function plays an important role
in the theory of counting problems on graphs.
The chromatic polynomial of any graph has a number of zeros,
i.e., real or complex numbers z such that P(G;z)=0.
These zeros are often known as chromatic roots.
The patterns formed by chromatic roots in the complex plane can be very
intriguing, and many important unsolved problems in graph colouring
amount to questions about these roots.
Some of the theory of chromatic polynomials has been generalised
to arbitrary Boolean functions by Kung in 1980. However,
no-one has yet investigated the chromatic roots of Boolean functions.
Aim:
To investigate the chromatic roots of Boolean functions. This will
entail: learning about chromatic polynomials and their extension to
Boolean functions; writing programs to compute these generalised
chromatic polynomials of Boolean functions; using existing software
to find and plot the complex zeros of these functions; writing programs
to generate large numbers of Boolean functions of various kinds, in order
to investigate the behaviour of their chromatic roots; trying to form
hypotheses about the behaviour of the roots; and perhaps even trying to
prove some theorems.
- You must discuss this project with GF before you nominate it.
- The Best Way to Schedule (12 or 24-pts)
- Maria Garcia de la Banda and Mark Wallace
- Project-id: GBW-schedule
- Scheduling problems appear in all walks of life: from airlines to
hospitals, and from offices to sports leagues. They involve performing
a set of jobs, each requiring a sequence of tasks to be completed
using a set of resources under some time constraints. The specific
resource and duration for each task is fixed, and two tasks can't use
the same resource at the same time. As a simple example, consider a
computer game in which the components of a gravity gun, a radiation
suit and a bullet-time disruptor need to be built according to the
following schedule:
| Task_1 | Task_2 | Task_3
-------------------------------------------------------------
Grav. Gun | 6 hours at M1 | 50 hours at M4 | 17 hours at M3
Rad. Suit | 25 hours at M2 | 15 hours at M1 | 10 hours at M4
Disruptor | 16 hours at M1 | 5 hours at M2 | 20 hours at M3
where M1,M2,M3, and M4 are four machines. The aim is to schedule the
tasks to complete all jobs in the minimum amount of time.
These problems can be modelled by associating a variable with the
start time of each task (9 variables in our example above), and
searching for the best compatible start times. Alternatively, they can
be modelled by associating a variable with each pair of tasks on each
resource (6 variables), and searching for the best task-orders.
This project addresses the assumption that underlies many scheduling
algorithms: that deciding task-orders is better. When, if ever, is
this assumption false?
Ref: An Algorithm for Solving the Job Shop Problem, Carlier and
Pinson, Management Science Feb. 1989
-
- Automatic analysis of "no holes" in constraint programs (12 or 24-pts)
- Maria Garcia de la Banda, Kim Marriott
- Project-id: GBM-constraint
- Constraint satisfaction problems involve a set of variables, their domains
(i.e., their set of possible values), and a set of constraints determining
the allowed combination of values. For example, the N-queens problem tries
to place N queens on an NxN chessboard so that they cannot take each
other. It can be modelled with Q1,...,QN variables (the N queens), each Qi
with domain {1,...,N} (the N columns in which Qi can appear), and
constraints ensuring no two queens are in the same column or diagonal.
Solvers collect the constraints and determine their effect on the
variables. Finite domain (FD) solvers handle variables with finite domains
(e.g., each Qi above has a finite domain of N elements). FD solvers often
handle constraints by using either domain or bounds propagation. While the
latter is usually less powerful, it is also faster. [1] has shown that
domain propagation can be replaced by the more efficient bounds propagation
without loss of power if no constraint creates a "hole" in the domain of
any variable. While [1] proposes a program analysis to automatically infer
the "no hole" property from the problem, the analysis was not implemented.
Thus, it is not known how effective it is in practice.
The aim of this project is to implement the analysis, systematically test
it on a wide range of problems, determine whether it is accurate enough to
effectively guide a domain-to-bounds transformation, and if not, study
possible extensions.
[1] Schulte and Stuckey.
When do bounds and domain propagation lead to the same search space.
ACM. PPDP 2001.
- Implementation of a Password Capability Unix Filesystem (12 or 24-pts)
- Carlo Kopp
- Project-id: Kopp-walnut
- The Walnut password capability operating system has demonstrated the
utility and unique access control and security properties inherent in this
class of operating system. The aim of this project is to exploit earlier
research effort in this area to implement and test an alternative filesystem
for Unix (Linux) platforms. This project is suitable for a student with
prior Linux kernel experience.
Needs a good Linux-internals background.
BCS/BDS student, or a strong BSE student.
-
- Suburban Ad Hoc Networks (24-pts)
- Carlo Kopp & Ron Pose
- Project-id: Kopp-sahn
- The Suburban Ad Hoc Networking group focusses its research activities
on techniques for implementing Suburban Ad Hoc Networks. These are self
organising, quasi-static ad hoc (typically wireless) networks which provide
an alternative technology for providing high speed digital connectivity to
households, small businesses and distributed campuses. Specific areas of
research interest include security, low level routing protocols, access
controls and propagation behaviour. Given the broad scope of the
research performed in this area, there is considerable choice in
project topics. Students should consult Dr Ronald Pose or Dr Carlo Kopp for
suitable project topics.
See [/research/san/].
- Experimental Ethics (12 or 24-pts)
- Kevin Korb & Ann Nicholson
- Project-id: Korb-ethics
- This project will use evolutionary Artificial Life techniques to
examine the evolution of ethical and social behavior. Classically,
related techniques have been applied to game-theoretic problems such
as the iterated prisoner's dilemma (IPD). Here we will evolve some
varieties of altruistic behavior (self-sacrifice, resource sharing,
predator warnings, reciprocal sharing, communication, etc.) and of
selfish behavior (theft, thuggery, deception, etc.). The ethical
merits of different actions (assessed in terms of expected utility)
and their fitness and evolutionary stability in various environments
will be the main foci of experimental simulations. We may also look
at the evolution of utility itself.
-
- Causal Discovery (12 or 24-pts)
- Kevin Korb
- Project-id: Korb-cause
- Causal discovery algorithms learn causal Bayesian networks from data. The oldest of them dates from 1991. At Monash we have developed CaMML (Causal discovery via Minimum Message Length), which "data mines" observational data to find the causal model most probable in light of the data. In this honours project you may choose any one of a number of possible specific problems to investigate, including:
- The proper evaluation of causal discovery algorithms. The common uses of predictive accuracy, unweighted edit distance and Kullback-Leibler divergence are all demonstrably inadequate. We have an alternative "Causal Kullback-Leibler" which is superior, but needs further work. This is also related to metrics of causal power that we are developing, which assess how much causal influence one variable has over another. - Integrating latent (unobserved) variables into our causal discovery algorithms. - The philosophy of token causality explicated in terms of causal models (Twardy & Korb "A Criterion of Probabilistic Causation" Phil of Sci, 2004; Korb, Twardy, Handfield and Oppy "Causal Reasoning with Causal Models" Synthese, submitted) - The mathematics of causal intervention (Korb & Nyberg "The Power of Intervention" Minds and Machines, 2006; Korb, Hope, Nicholson and Axnick "Varieties of Causal Intervention" PRICAI, 2004)
-
- Classification Tree Search (12 or 24-pts)
- Kevin Korb
- Project-id: Korb-tree
- Classification trees are a well-established data mining technique, for
supervised machine learning. At Monash we have programs which do
greedy search for MML trees and graphs with lookahead; we have also
used genetic algorithm search. This project will apply Markov Chain
Monte Carlo search to estimating the posterior probability
distribution over trees and graphs, which should be more effective. If
there is opportunity we can also look at an ensemble search, which
finds the best collection of trees and graphs for weighted
predictions.
-
- Also see
[Allison and Korb] and
[Nicholson and Korb]
- Autonomous aerial vehicle (fixed wing): sensing and flight control (12 or 24-pts)
- Bijan Shirinzadeh and Gordon Lowe
- Project-id: Lowe-A1
- This project focuses on research and development of autonomous flying
drones - i.e. fixed wing unmanned aerial vehicles. The project aims to
develop integrated sensing capabilities such as inertial mesaurement
unit (IMU), global positioning system (GPS), vision, etc. for the
autonomous aerial vehicles. One or more of the following aircrafts will
also be used as experimental platforms: Apachee Trainer, F/A-18 Hornet,
Mig29-Fulcrum, and Kangaroo.
- Autonomous aerial vehicle (helicopter): experiments in sensing
and control (12 or 24-pts)
- Bijan Shirinzadeh and Gordon Lowe
- Project-id: Lowe-A2
- This project focuses on research and development of autonomous flying
robots - rotary wing unmanned aerial vehicles. The project entails
development of control techniques and mission planning methodologies.
One or both of the following helicopters will be used: Shuttle-Z
Trainer, JR Ergo (Payload: 5 kg).
- 3D part modelling for robotic assembly (12 or 24-pts)
- Bijan Shirinzadeh and Gordon Lowe
- Project-id: Lowe-parts
- This project focuses on research and development of an image processing
system to monitor the assembly process of parts. A key aspect of the
project will be the integration of three fire wire cameras to produce 3D
models of parts, along with feature extraction.
- Biped robot simulation (12 or 24-pts)
- Gordon Lowe
- Project-id: Lowe-walk
- This project focuses on developing a simulation of a biped robot which
resides in the Robotics Laboratory. The simulation will assist in the
study of biped walking actions and may result in direct control of the
biped robot. No prior knowledge of robotics is necessary for this
research project.
- The Software Engineering Side of Business Process Reengineering (24-pts)
- Christine Mingins
- Project-id: Mingins
- Standard Software Development methods employ programming languages that
combine logic and data to form applications software. With this approach
business rules and business processes are 'hard wired' into the code. New
technologies such as WebSphere-MQ Workflow and Windows Workflow Foundation
allow programs to be expressed as declarative, long-running processes called
workflows.
How easy is it to retrofit workflow into existing applications? What is
involved in separating out the business process layer and migrating existing
software? What process reengineering and refactoring needs to be done?
This project has two goals:
1. evaluate the capabilities of workflow engines from the
perspectives of platform and language independence, and accessibility for
domain experts.
2. Propose, prototype and evaluate a method for retrofitting
workflow capabilities into existing software applications.
The project is sponsored by an industry partner,
[Echo-Services]
who will provide the legacy software, and any
other resources necessary.
- Bayesian Networks for Epidemiology (12 or 24-pts)
- Ann Nicholson and Kevin Korb
- Project-id: NK-epidemiology
- The project is to apply Bayesian Networks (BNs)
to epidemiological data to learn factors that are predictors
of good, or bad, health.
Models will be implemented in CaMML [CaMML], a successful BN learner
being developed at Monash, see
[CaMML(click)].
The project will involve
extending CaMML in various ways,
applying it to epidemiology data, and
incorporating expert knowledge into the networks.
-
- Knowledge Engineering Dynamic Bayesian Networks for
Ecological Risk Assessment (12 or 24 pts)
- Ann Nicholson and Carmel Pollino (Monash Water Studies)
- Project-id: Nich-Poll-ecology
- Bayesian Networks (BNs) are graphical models for probabilistic
reasoning, which are now widely accepted in the AI community as
intuitively appealing and practical representations for reasoning
under uncertainty. One use of BNs is for prediction, and within that
general task, for the problem of risk assessment. We have two current
projects in ecological risk assessment:
(a) assessing the risk for
native fish populations of water management interventions and
(b) risk management
of tropical seagrass. However, to date, the BN modeling in
these projects has been done with so-called "static" BNs, where there
is no explicit representation of time. Clearly, temporal modelling in
such prediction tasks is very important, and it could (and should) be
done using an existing sextension to BNs, called Dynamic Bayesian
networks. The aim of this project is to develop knowledge engineering
techniques and methodologies for Dynamic Bayesian networks, using the
ecological risk assessment domains as case studies.
-
- Also see [Korb & Nicholson].
- Dependable Software Services (12 or 24-pts)
- Sita Ramakrishnan
- Project-id: SR-services
- As Service-oriented architecture (SOA) is being considered increasingly in
critical applications by distributed enterprises, the provision of dependable
services is becoming an important area of research. In this project, we
consider dependability attributes of a software service in a sustainable
manufacturing domain. Sustainable manufacturing is the employment of
eco-efficient technologies and industry standards for engineering manufacturing
systems to minimise environmental burdens of green house emissions and energy
use. Dependability attributes can be expressed in terms of timeliness,
availability and reliability of the correct service, and trustworthiness
attributes such as confidentiality, integrity and maintainability of deployed
services.
Providing a predictable level of dependability for services which have been
composed from services from various providers is an important requirement.
Standards such as SOAP, UDDI and WSDL [1] are being adopted by major web
service providers. These services need to establish and adhere to standards and
hence, the dependability of service will become a differentiating point for
services. Existing standards do not provide sufficient information to make
decisions about how dependable and available the various services & components
are, and how they fail. Some of the challenges that we will be dealing with
respect to component interactions and composition of services in the
sustainable manufacturing domain are:
how to ensure trust in correctness when dealing with global partners and
services composed using their services
how to ensure reliability when composing services from these partners who are outside the control of the system
...[see SR for more information]
- Feature Weighting in Content Based Image Retrieval (24 or 12 pts)
- Sid Ray
- Project-id: Ray-feature
- The purpose of a Content-Based Image Retrieval (CBIR) system
is to retrieve images from a database such that the visual contents
of retrieved images are similar to that of a query image. Often
the image similarities are computed from the representation of visual
contents such as colour, texture and shape, in terms of a multi-dimensional
feature vector. Understandably, the features cannot be assumed to have
equal weights. More important features deserve more weights and less
important features deserve less weights. These weights can be assigned
fully automatically from feature variation statistics in the database
images. Better still is to use the so called relevance feedback (RF)
mechanism in which the weights are modified iteratively based on users'
response regarding the relevance of the retrieved images. The aim of this
project is to investigate the feature weighting problem for both cases,
namely, without RF and with RF.
-
- Small Sample Size Effects on CBIR Accuracy (24 or 12 pts)
- Sid Ray
- Project-id: Ray-sample
- In recent years Content-Based Image Retrieval (CBIR) has become one of
the most active areas of research in image data mining. In studies involving
the design of a CBIR system often the question arises whether the
data set sizes used are big enough to rely on the accuracy achieved.
The aim of this second project in CBIR is to study the impact, on accuracy,
of the number of samples in the database, the number of samples per
semantic category, the number of retrieved images returned to user for RF,
the number of features used, and their relative sizes.
-
- Hierarchical Feature Selection for Pattern Recognition (24 or 12 pts)
- Sid Ray
- Project-id: Ray-hierarchy
- In statistical pattern recognition, often patterns are represented by
a large number of numerical features. Although there is no conceptual
justification in reducing the number of features to a small number,
in practical problem solving, this becomes a necessary step due to
the wellknown phenomenon of the 'curse of dimensionality' of the
feature vector on the complexity of the pattern classifier. The aim
of the proposed project is to develop a hierarhical feature selection
paradigm that is well-suited to multi-class pattern recognition problems.
The project will comprise (i) study of some existing feature selection
criteria followed by an experimental investigation of them, (ii) analysis
of the above results leading to, hopefully, a hierarchical feature
selection paradigm, and (iii) development of an interactive software
tool for the above paradigm.
The methodology developed will be such that depending on the classification
accuracy arrived at a certain stage, the user will have the option of
increasing or decreasing the dimensionality value. The software tool will
include procedures for displaying the distribution of pattern samples in
different feature spaces obtained by different feature selection methods.
-
- Texture Analysis of Images (24 pts)
- Sid Ray
- Project-id: Ray-texture
- Texture plays an important role in both human interpretation of visual
scenes and computer analysis of images. Textural cues are of particular
relevance in two different, but related, image analysis problems, namely,
the problems of segmentation and classification of images. The
proposed project will deal with both of these problems. It will involve
(i) investigating existing texture analysis methods from the point of view
of their theoretical soundness as textural measures and (ii) investigating
their practical applicability.
- HWS
- Heinz Schmidt
- Project-id: HWS-<something>
-
H. tells me that he has some projects,
but the list is not available here.
Please identify any such project very carefully
if you have discussed it with H. and
it is one of your preferences.
- Approximating frequency distributions by mixture models (24-pts)
- Peter Tischer
- Project-id: Tischer-freqs
- Often when we are trying to estimate the probability associated with a set
of events we compute the relative frequency distribution. A question we
may ask is whether the relative frequency distribution can be adequately
described as a mixture of probability distributions where the components
in the mixture are in some sense simple probability distributions.
If we can can find a good approximating mixture model we may be able to
show that the events can be thought of as originating from different
sub-populations of the overall population.
Separating things into sub-populations is important in data mining and
machine learning.
This project aims to come up with good mixture models for a given
relative frequency distribution by using MML techniques. In the first
place, the problem will be approached by treating it as a lossless data
compression problem. This can be seen as a simpler form of MML.
Should time permit, the problem might be approached using MML in its
general form.
It would be an advantage to have a maths background
and to be taking the honours unit on 'MML'.
- Document Similarity Measurement via Data Compression (12 or 24-pts)
- Peter Tischer
- Project-id: Tischer-docs
- In document classification it is often desirable to measure how similar
two documents are. This might be because we are interested in saying
whether documents are spam or whether the source code of two programs is
so similar that at least program is an example of plagiarism.
If two documents are similar, we would expect that we would require
little information to know the contents of the second document given
that we already know the contents of the first. One way to measure
information is by using data compression techniques as the compressed
size is an upper bound on the amount of information needed to store the
original data.
The aim of the project is to adapt techniques from the area of text
compression to compress the data which shows how one document can be
created from another. Of particular interest is coming up with
similarity measures that are also distance functions.
- Scientific Visualisation using MCST Projections (24-pts)
- Peter Tischer
- Project-id: Tischer-vis
- In Scientific Visualization we are often interested in representing higher
dimensional data in some lower number of dimensions such as 2,3 or 4.
Minimal Cost Spanning Trees (MCSTs) are simple to compute and are a way of
preserving closeness information in representing higher dimensional data.
Current methods for carrying out projections for visualizing higher
dimensional data tend to be computationally expensive with results that
are sensitive to parameters within the algorithm that are difficult to
tune. These methods are also sensitive to the dimension to which the
projection is being made. MCST-based projection does have these drawbacks
and can be engineered to preserve properties of the original higher
dimensional data.
This project will involve implementing a package that uses MCSTs for
visualizing higher dimensional data in lower dimensions such as 2, 3 or 4
and comparing the performance of this package against packages
incorporating techniques like SOMs (Self Organising Maps).
- Editor for Image Segmentations (12-pts)
- Peter Tischer
- Project-id: Tischer-seg
-
A fundamental activity in image processing is to segment the pixels of an
image so that pixels which belong together are in the same segment.
Human beings are extremely good and quick at segmenting images but it is
tedious and time-coonsuming for humans to segment digital images manually
because of the huge numbers of pixels involved. Computer programs can
segment images quickly but the resulting segmentation may be one which
has obvious flaws which a human may want to correct.
An obvious idea is to combine both human input and computer processing
of digital images to produce a semi-automatic approach for image
segmentation. Given an initial segmentation which may have been produced
either by a program or by a human, the image editor allows a human to
alter that segmentation interactively. The editor should allow for a wide
range of image types such as greyscale and colour, 2D and 3D.
Note, no prior knowledge of image processing is necessary
Mixture Modeling used in Analysis of DNA Microarray Data (24-pts)
Peter Tischer
Project-id: Tischer-micro
Mixture modeling, also known as clustering, involves taking a set of
observations about things and deciding whether the things all belong to one
population or whether the things come from a number of different
sub-populations. DNA microarray data is a 2D array where the rows or columns
may represent either a subject of an experiment or a partcular gene.
The entries in the microarray represent the activity of that gene for that
subject.
The MML (Minimal Message Length) criterion was proposed by Professor Chris
Wallace foundation professor of Computer Science at Monash University.
MML has been used to create a clustering program called 'snob'. 'snob'
has proven to be very effective in identifying the correct number of
clusters but it is not used widely in bioinformatics where people continue
to use a variety of techniques that are generally not able to identify
correctly the number of clusters.
The aim of this project is to apply snob to the mixture modelling problems
that arise in the analysis of DNA microarray data. Most clustering methods,
snob included, are iterative and rely on taking an intial clustering and
improving it at each iteration. One aim is to allow people to generate
a clustering of the data and then to submit this an initial clustering to
snob.
snob itself randomly creates initial clusters. By making certain
assumptions about the nature of clusters it is possible to use simple
clustering techniques like k-means to create initial clusterings that
are likely to reveal the true cluster structure, if the data obeys the
assumptions used in deriving the clustering method. With this approach snob
would first try to look for certain kinds of clusterings before trying
intial clusterings generated at random.
- The Effect of Manipulating Data on Support Vector Machine (SVM) Performance (24-pts)
- Peter Tischer and David Albrecht
- Project-id: Tischer-SVM
- Support Vector Machines (SVM)s are used mainly in supervised classification
problems. Suppose we are given a body of data about things. One of the
pieces of data about a thing may be a label. This may be an integer saying
to which class or cluster the thing belongs. With an SVM we try to develop
a function that can be given all the attributes for a given thing, other
than its label, and tries to predict its label. For example, we might have
the results of a range of tests that were carried out on a patient and the
label may be whether the patient had cancer or not.
SVMs clasify data into two classes by finding a separating hyperplane.
In two dimensions, the hyperplane is a straight line and we hope that all
members of one class will be on one side of the line and all members of the
other class will be on the other side of the line. An attractive feature of
SVMs is that they can allow you to transform the attributes of the data in a
nonlinear way and to find a separting hyperplane on the transformed data.
This is equivalent to finding a nonlinear separating surface on the
original data. In the 2D case, using SVMs we may be able to find a nonlinear
separating surface such as an ellipse so that all members of one class are
inside the ellipse and all members of the other class are outside the
ellipse.
There are a number of packages available which enable people to compute
SVMs. By preprocessing the data it should be possible to reduce the amount
of data and to compute an SVM which performs as well on the reduced data as
on the original data. In addition, the SVM packages often make assumptions
about the way distances between things are computed. It is often assumed
that the attributes of things are not correlated. If the attributes
actually are correlated then preprocessing may produced transformed data for
which the transformed attributes are not correlated.
- Learning Under Resource Constraints (24-pts).
- Geoff Webb
- Project-id: Webb
- Two projects available.
These projects, as part of a large research team investigating the wider
problem, will each tackle one aspect of the question of how to optimise
machine learning in a context where the computational resources available
for learning are unpredictable and varying. This is typical of many online
applications where the user load varies greatly from time to time. Such
applications include information retrieval, recommender systems, user
modeling, junk email filters and online fraud detection.
- Reading Assistance Program (24-pts)
- Linda McIver and Stephen Welsh
- Project-id: McWelsh
- Special Requirements: Degree in psychology.
Develop (or simply quote) a taxonomy of reading difficulties with
a view to producing a catalogue of current assistive technologys and
methodologys to aid the reading impaired overcome their difficulty.
For each difficulty, design several possible ways software&eye
tracking could assist. These could be adaptations of the techs. and meths.
from above and/or proposals for novel techniques.
Design and develop
an application to implement _all_ of those possibilities. The software
is required to permit selection between options, and refining options,
to support different types of experiments. For example, it should
be possible to run the application with restricted sets of options for
different subjects in the experiments. The detection of the subject actually
encountering difficulty in reading the presented text will form a
major part of the project. The application should interface with
an eye tracking system if/when it becomes available.
.
|
|
11/4/2006: Google
has bought the rights to an
[algorithm]
from the UNSW School of
Computer Science
-- ABC TV, The Age, etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
| Try the
[Bibs] for more on:
Bayesian,
capability,
causal,
classification,
compression,
colouring,
computer,
constraint,
decision,
discovery,
functional,
game,
graph,
graphics,
image,
IPD,
knowledge,
language,
layout,
MML,
microarray,
mixture,
model,
net,
network,
Nqueens,
palindrome,
processing,
programming,
scheduling,
segmentation,
SVM,
tree,
etc..
|
|
|