|
|
- This list contains Honours projects for
BCS (20-point),
BDS (20-point) and
BSE (12-point).
A project may be 12-point only, or 20-point only,
or available in both 12- or 20-point versions.
- BSE
cannot take a 20-point project unless there is a 12-point version of it.
- BCS and BDE
cannot take a 12-point project unless there is a 20-point version of it.
- You must discuss any project that you are interested in
with the supervisor(s).
If you do not you might not be considered for the project.
School of Computer Science and Software Engineering, Monash University.
Active Supervisors this year:
- Investigation of Support Vector Machines (12- or 20-pts)
- Project-id: AT-svm
- David Albrecht &
Peter Tischer (also see PT)
- Support Vector Machines (SVM) have been used in various machine learning
problems, including supervised classification, regression analysis and
novelty detection. They process interesting properties and have been
demonstrated to give excellent performance in a number of practical
applications.
There are many different aspects of Support Vector Machines which could be
investigated in this project. These include:
* sensitivity to anomalous data,
* effect of overlapping of
classes on the performance of SVM algorithms,
* methods of improving SVM algorithms,
* application to image processing, and
* application to novelty detection.
Students
are not expected to have any special knowledge, but a background
in mathematics will be an advantage. Also, students will not necessarily be
expected to implement SVM algorithms as there are several public domain
software packages available for computing SVMs.
- MML + Haskell + templates or generics (20 or 12-pts)
- Project id: LA-mml1
- Lloyd Allison
- Haskell (www.haskell.org) is a lazy functional programming language.
I have been using it for MML-based machine learning, see
http://dx.doi.org/10.1017/S0956796804005301
e.g. classification (clustering), decision trees,
time-series, mixtures, Bayesian networks
(including discrete and/or continuous variables),
etc..
The project is to investigate the use of
'template Haskell' or 'generic Haskell'
and similar extensions to Haskell
to make it easier to process new multivariate data-sets,
e.g. data held in spread-sheets or 'csv' files.
An application is to Search and Rescue data on Missing People
-- who, where, how found, when, in what condition, etc..
-
- MML + Haskell + cluster computing (20-pts)
- Project id: LA-mml2
- Lloyd Allison
- Haskell (www.haskell.org) is a lazy functional programming language.
I have been using it for MML-based machine learning, see
http://dx.doi.org/10.1017/S0956796804005301
The project is to investigate cluster computing and/or parallel Haskell
to speed up MML+Haskell machine learning on very large data sets.
-
- Compression of DNA (20 or 12-pts)
- Project id: LA-dna
- Lloyd Allison
- Most familiar file-compression algorithms fail to beat
the trivial 2-bits/symbol {A->00, C->01, G->10, T->11}
compression level on such data.
Compression=modelling+coding, and arithmetic coding has solved the
coding part of the compression problem, so we need better models; see
www.csse.monash.edu.au/~lloyd/tildeStrings/#Compression
The project is to develop improved models for use in
compression of biological data.
-
- Parallel Functional Language (20-pts)
- Project id: LA-pfp
- Lloyd Allison
- The project is to implement a parser and interpreter for a
small, typed, CCS-based language. The system will be written in Java.
The final system should not be much larger than the
(untyped, non-parallel) λ:
www.csse.monash.edu.au/~lloyd/tildeFP/Lambda/
-
- Large, Low-Cost Digital Display (20 or 12-pts)
- Project-id: LA-display
- Lloyd Allison
- If you are inventive, like robotics, and can make things:
The project is to design and build a very large but cheap digital display.
The target size is about 2m×2m with a resolution of
about 1cm/pixel (or higher), 40,000+ pixels, say.
The cost must be low, a small number of cents (1c is good) per pixel.
Display speed is "not the highest priority",
e.g., a rewrite time of 1-hour could be acceptable.
Small prototype(s) could be built before scaling-up.
(You must discuss this project with LA before nominating it.)
-
- Algorithms to find approximate palindromes (20 or 12-pts)
- Project id: LA-pal
- Lloyd Allison
- Given a string, s, of symbols, a palindrome is a
substring p=s[i..j] that reads the same forwards and backwards,
i.e. either p=t;rev(t) or p=t;c;rev(t) where c is a symbol and rev(t)
is the reverse of t.
But in an approximate palindrome, p=t;u or p=t;c;u where
u can be approximately, not necessarily identically, equal to rev(t).
The project is to develop, test, and apply
fast algorithms to find approximate palindromes in long strings.
See
http://arxiv.org/pdf/cs.DS/0412004
-
- Haskell + gnumeric + plugins (20 or 12-pts)
- Project id: LA-gnu
- Lloyd Allison
- Gnumeric (www.gnumeric.org)
is an open-source spread-sheet.
It allows a programmer to add new 'plug-ins';
these are usually written in C or C++.
Recently Pang et al showed that "Haskell can be
comfortably used as a statically typed extension language,
and that it can support type-safe dynamic loading of plugins using dynamic
types...."; see
www.cse.unsw.edu.au/~dons/hs-plugins/paper/
The project is to investigate writing Haskell plugins
(particularly plugins for MML-based machine learning)
for gnumeric, possibly using the work of Pang et al.
- A platform for DNA exploration and visualisation (12- or 20-pts)
- Project-id: DP-dna
- Trevor Dix and David Powell
- The
DNA of an organism contains the information about how that organism
is to develop and function. Tools exist to find patterns and
"interesting" regions of DNA based on information-theoretic and other
techniques. Biologists also look at how much information one
DNA sequence tells us about another DNA sequence, this may be within
one organism, or between different organisms.
This project is to develop a GUI tool to explore the patterns and
information content of one or more DNA sequences. This will allow a
user to explore and visualise the output of existing pattern discovery
tools, until a cumbersome and error-prone task.
The student completing this project will need to be proficient in Java.
Some background or interest in biology would be helpful, but is not essential.
[more]
-
- Image Processing for Protein Expression (20-pts)
- Project-id: DSB-image
- Trevor Dix, Torsten Seemann, and John Boyce
- Background:
Monash is part of the Aust. Research Council centre for structural and
functional microbial genomics that is interested in identifying vaccine
targets from the bacterial pathogens Pasteurella multocida and
Dichelobacter nodosus. The Centre has used bioinformatics
analyses to predict all secreted and surface proteins from these
organisms and are currently undertaking high-throughput cloning and
protein expression of these predicted antigens. The scale of the protein
cloning expression experiments (hundreds of proteins) means that data
management and expression analysis become very time consuming.
This
project is to develop an image analysis program which would allow rapid
identification of 1) quantity of protein expression and 2) molecular
mass of expressed protein from our protein gel images
(see image 1).
The package would integrate seamlessly with an in-house genomic information
database to carry out automated checks of the integrity of the expressed
products. The project could be extended to allow analysis of similar
images of DNA amplification (sample image 2).
Images are available at
www.csse.monash.edu.au/~trevor/honours/2005/John.html.
[more]
-
- Analysing whole genome shotgun assembly (20-pts)
- Project-id: DP-shotgun
- Trevor Dix and Darren Platt
- Background:
The genomic sequence of an organism is sequenced (read) using many small
individual reads of unknown location. These are typically around 600
letters (bases) long and must be correctly aligned, ordered and
oriented to recover the original underlying DNA sequence. This process
known as whole genome shot gun sequencing was traditionally used for
small genomes of the few million bases but is now widely used to
assemble giga base sized genomes such as that of human. While the
assembly programs are gradually improving in quality, they routinely
encounter problems with repetitive DNA (where one section of a genome
looks very similar to another) resulting in mis-assembly.
The goal of this project would be to build on some primitive existing
tools for visualising assemblies to devise automated methods for
detecting potential mis-assemblies and better tools for visual
inspection.
Real and simulated assemblies are available for testing. Techniques
that could be employed include visualisation of the assembly, basic
statistical analysis of the pairing distances, depth of coverage. A
slightly more advanced analysis could include comparison with known
genes that align to the genome. A technique that enabled a
"diff" of two different assemblies would be extremely valuable
for analysing how changes to assembly algorithms impact assemblies.
See www.csse.monash.edu.au/~trevor/honours/2005/Darren.html
for images and more explanation.
This is a difficult project and part of an international collaboration
in the USA with a Monash expatriate. Agreement must be reached with all
parties. The existing system will be made available and some supervision
will be done remotely.
[more]
- Background
- David Dowe
is interested in Minimum Message Length (MML) inductive inference.
MML is particularly useful in machine learning, statistics, econometrics,
"knowledge discovery", "data mining" and philosophy of science. Both
theoretical and applied projects are available. I list a sample below.
You should feel free to discuss any and all of these with David Dowe.
Some of my areas of interest include clustering, angular data, correlated data,
multi-dimensional data, supervised learning, decision trees/graphs, hybrid
models, market forecasting, biological data, ..., etc.
[more].
All of these would
be done by MML. There is no need to have done any previous subject on AI.
-
- Phylogenetic modelling of indigenous, endangered and other languages (20 pts)
- Project-id: DD-lang
- David Dowe
- This project concerns phylogenetic modelling of languages. In other words,
we wish to model how languages have descended from and evolved from one
another. There will be an emphasis on endangered indigenous languages,
largely as a way to help preserve them. The modelling should be carried
out using Minimum Message Length (MML), largely because of its theoretical
optimality properties and its wide-ranging achievements in all range of
inference problems.
The project could be taken in the direction of using (probablistic) finite
state automata/machines (PFSAs) to model grammars or syntax.
More info' on the project, including a list of references, is at
www.csse.monash.edu.au/~dld/Hons/2005/dld2005_projects
-
- Time-series autoregression using MML with econometric applications (20 pts)
- Project-id: DD-econ
- David Dowe
- We adapt the minimum message length (MML) principle to the problem of
efficient partitioning of economic units, such as firms or countries, into
groups (or regions) whose behavioural patterns are similar within each group
but distinct across groups. The approach to the partitioning will be MML
multi-way join decision graphs (Tan and Dowe, 2003). We will develop an
autoregression model in the leaves of the decision graph, possibly augmenting
it with a variety of lags (or time delays). We hope to consider at least two
specific applications. There could be many variations on this project.
More info' on the project, including a list of references, is at
www.csse.monash.edu.au/~dld/Hons/2005/dld2005_projects
- On Go.
- Project-id: GF-go
- Graham Farr
- The aim of this project is to estimate, as accurately as possible,
the number of legal positions in the game of Go when played on
large boards. In this game (known as Go or Igo in Japan, Weiqi
in China and Baduk in Korea), two players (Black and White) take turns
to place stones of their respective colours on the vertices of a large
square grid (usually 19×19) so as to enclose (in a particular
sense) as much territory as possible, and to capture the opponent's
stones by surrounding them. In the project, both the size of the
board and the number of players will be allowed to be arbitrary, and some
alternative (non-square-grid) board topologies will be considered.
The methods to be used include transfer matrices and Markov Chain
Monte Carlo, which you will learn about.
Programming will probably be in C, and user interaction will just
be simple and text-based. You should have a good mathematical
background and good results in cse2304, cse3305, cse2303.
- I will be at Clayton for one day a week, and can meet
students then. I am now mostly based at Caulfield, and can meet
there as well, if the need arises.
- Ref: G. E. Farr,
The Go polynomials of a graph,
Theoretical Computer Science, 306 (2003), pp.1-18.
- Scalable Video Compression (20pts)
- Project-id: TF-video
- Tim Ferguson
- The
traditional method for compressing video is to use the common
MC/DPCM/DCT hybrid block based coding technique. This hybrid technique
is popular due to its simplicity, efficiency and general performance.
However a relatively new and more scalable approach, showing significant
promise in image compression (JPEG2000), is the wavelet transform.
Research is currently being conducted as to how best apply the wavelet
transform to video compression. The primary problem relates to motion
compensation and how this can be combined with the wavelet transform.
The MPEG standards organisation is currently investigating this area
of research (www.chiariglione.org/mpeg/working_documents.htm).
-
- Camera Calibration for Making Real-world Measurements (12 or 20pts)
- Project-id: TF-camera
- Tim Ferguson
- The
basic problem here is we wish to take a two-dimensional image and
obtain three-dimensional information so that real-world measurements and
positions can be found. Obviously some limitations must be applied to
this process. Rather than locating points at any position in the
three-dimensional space, we can only consider points on a reference
planar surface. In this case, a road surface is considered. To solve
this problem, we need to be able to determine the internal parameters
of the camera (focal length, principal point of focus and radial lens
distortion) and the external parameters of the camera (rotation and
translation). Initial research uses the well known Tsai's coplanar
camera calibration method. The aim of the project is to improve
the current calibration methodology and possibly track changes to
the calibration using visual queues.
-
- Detection and Measurement of Pavement Cracking (20pts)
- Project-id: TF-pavement
- Tim Ferguson
- Pavement
cracking leads to the rapid degradation of road surfaces. Early
restoration of cracking can lead to the prevention of more serious
problems (like potholes). The present method for measuring pavement
cracking is visual inspection. This is time consuming, costly and can
be dangerous. This project investigates methods for automated, or
semi-automated detection of pavement cracking from video taken of the
pavement surface. Due to practical implementation reasons, the video
has resolution (currently a relatively low 0.5 - 1 megapixel) and lighting
limitations placed upon it. This presents a significant challenge
for the detection system. Early work on this problem was commenced
by a previous honours student. It is expected that further development
and testing of his method, or the proposal of an improved method, can
lead to improvement in these results.
-
- Content Based Identification (CBID) (20pts)
- Project-id: TF-cbid
- Tim Ferguson
- Content based
identification (CBID) is also known as digital media
fingerprinting. The idea is that multimedia content is assigned a
unique fingerprint that can be used for future identification.
Given a database of known fingerprints and an unknown piece of multimedia,
it should be possible to identify the multimedia by generating a fingerprint
and using that fingerprint as an index into our database. Ideally the
CBID implementation should be robust enough to still identify
content even after it has gone through noise additive processes
(compression, transmission, etc). This project will investigate CBID for
video. The primary goal will be to create unique fingerprints for video
sequences which will be robust under noisy conditions. The fingerprint
will need to be a continuous one that may be generated not only by analysing
the video track, but may consider the audio track (if present) as well.
- Smart Circuit Reconfiguration of Wireless Sensor Networks (20-pts).
- Project-id: CG-recon
- Charles Greif
- The use of Ad Hoc wireless networks to allow smart sensor circuits
to communicate has recently become an active area of research both
from the Data Communications Theoretic point of view as well as from
the practical circuit design point of view. This project extends the
work of Steven Koelmeyer BDS(Hons 2004). The main aim of this project
is to apply the Mentor Graphics "Seamless Hardware Software Co-Verification
tools to the task of designing the Hardware and Software for the next
generation of smart sensor notes. The nodes already carry a small
processor and will be extended with the addition of a FPGA or CPLD
to enhance their functionality.
- Reference:
I. Stojmenovic, J. Wu, "Ad Hoc Networks", IEEE Computer, 37, 23, 2004
- C. Zuver, D. Lim, J. Lockwood, C. Neely, "Automated tools to implement
and test internet systems in reconfigurable hardware", ACM SIGCOMM
Computer Communications Review, 2004
-
- Software development to support the programming of smart sensor
networks running the Berkeley TinyOS Operating System (12 or 20-pts)
- Project-id: CG-sensor
- Charles Greif
- Networks of small low-power sensor nodes are currently of significant
research interest to both scientist and engineers for the purposes
of remotely sensing environments and controlling machinery using that
sensory data. This aim of this project is to use specialised
compiler technology and a simulation environment to develop software
for small sensor networks, similar to those used in building-climate
control systems. The operating system used on the sensor nodes is
Berkeley's TinyOS and the compiler that is to be used is the NESC compiler.
- Reference:
R Katz, J. Kahn, K. Pister, "Emerging Challenges: Mobile Networking
for Smart Dust" Berkeley EECS Smart Dust website, 2000
- P. Buonadonna, R. Szewcxyk, D. Culler, J. Hill and A. Woo, "A network-
centric Approach to embedded software for tiny devices" Emsoft website
page 16, 2001
- Suburban Ad Hoc Networks (20- or 12-pts)
- Project-id: KP-net
- Carlo Kopp and Ron Pose
- The Suburban Ad Hoc Networking group focusses its research activities on techniques for implementing Suburban Ad Hoc Networks. These are self organising, quasi-static ad hoc (typically wireless) networks which provide an alternative technology for providing high speed digital connectivity to households, small businesses and distributed campuses. Specific areas of research interest include security, low level routing protocols, access controls and propagation behaviour. Given the broad scope of the research performed in this area, there is considerable choice in project topics. Students should consult Dr Ronald Pose or Dr Carlo Kopp for suitable project topics.
See:
www.csse.monash.edu.au/research/san/
-
- Supervised Learning of Email Features (20 points)
- Project-id: MA-mail
- Yuval Marom (res fellow) and David Albrecht (also see DA)
- Automatic
processing of emails is required in numerous practical
applications, such as summarizing a thread of emails (eg to extract
the gist of a discussion), and grouping similar emails/threads (eg to
provide a collection of topic-related threads in a newsgroup). The
challenge that such applications face is finding a suitable data
representation for coding and processing the emails. Success hinges on
the choice of features used to transform the raw text in an email to a
form suitable for processing by computer algorithms. Research has
shown that high-level features, such as sentence types, are often very
useful. However, they tend to be very domain-specific, and thus not
easily ported between domains. One way to overcome this limitation is
to employ a corpus-based approach, where high-level features are
learned from low-level ones.
This project
will investigate the supervised learning of sentence
types from low-level features, in the domain of help-desk
emails. Examples of sentence types are GREETING (eg "Hi again Mr
Blob") and REQUEST ("Please send me the most recent driver for my
printer"). Examples of low-level features are the words in a sentence,
and the number of verbs used in the sentence. The main outcome of
this project will be a classifier (such as an SVM) that automatically
classifies sentences using low-level features. Optional extensions of
the project, upon successful implementation of a classifier, are email
clustering and summarization, or other applications of the student's
choice.
- Prerequisites:
A strong mathematical and statistical background is
required for this project. Knowledge of markup languages, such as
xml, is useful but not essential.
- Organic Modelling with Generalised Cylinders (12 or 20pts)
- Project-id: JM-1
- Jon McCormack
- Generalised
cylinders are a geometric modelling method, originally
developed for use in computer vision. For this project we wish to apply
them to the modelling of organic structures for computer graphics. The
basic principle for creating a generalised cylinder is to define a
series of cross-sectional profiles, possibly of varying shape and
size, and distribute them over some continuous curve, known as the
carrier curve. The cross-sections are connected to form a continuous
surface. Sounds easy, but there are a number of important issues that
need to be addressed to ensure that the geometry defined by the
cylinder is legal (i.e. can be rendered). Constructing compound
surfaces is very useful for modelling organic structures such as
branches, leaves, tentacles, veins, shells, etc., as this image
(below) illustrates. The image is procedurally generated using
generalised cylinders.
- The challenge for this project will be to create a software system to
assist in the automated construction of such models using generalised
cylinders. The system will also have to deal with managing geometric
complexity and geometric output in a variety of formats (e.g.
real-time, offline rendering). The software for the project should be
written in C++ and OpenGL. You should have successfully completed
CSE3313 Computer Graphics (or equivalent) in order to work on this
project.
[more]
- Preliminary Reading:
McCormack, J. 2004, Generative Modelling with Timed L-Systems, in
Gero, J.S. (ed) Design Computing and Cognition '04, Kluwer Academic
Publishers, Dordrecht. pp. 157-175. (Hargrave Library reference: H
620.00420285 I61.4D 2004)
- Prusinkiewicz, P., L. Mündermann, R. Karwowski & B. Lane 2001, The Use
of Positional Information in the Modeling of Plants. Proceedings of
SIGGRAPH 2001 (Los Angeles, California, August 12-17). In Computer
Graphics (Proceedings) Annual Conference Series, ACM SIGGRAPH, pp.
289-300.
- Mech, R., P. Prusinkiewicz & J. Hanan 1997, 'Extensions to the
Graphical Interpretation of L-Systems Based on Turtle Geometry',
Technical Report, No. 1997-599-01, April 1, 1997. University of
Calgary, Calgary, Alberta Canada.
-
- Interactive Adaptive Learning Systems (12 or 20pts)
- Project-id: JM-2
- Jon McCormack
- This
project will investigate the use of adaptive learning systems for
a real-time interactive application. Collections of virtual creatures
are required to develop a symbiotic relationship with a human audience
– responding to movement and gesture. Students should be prepared to
investigate a number of adaptive learning techniques including
classifier systems and evolutionary ANN (Artificial Neural Network)
approaches, with the goal of creating a novel system that evolves and
adapts to its real and virtual environments. The key emphasis for this
system must be flexibility and real-time behaviour as the dynamic
environment is actually changing in real-time.
[more]
- References:
General Introductions to Adaptive and Evolutionary Systems (all
available from the Hargrave library):
- Flake, G.W. (1998), The Computational Beauty of Nature : Computer
Explorations of Fractals, Chaos, Complex Systems, and Adaptation, MIT
Press, Cambridge, Mass.
- Pfeifer, R. and C. Scheier (1999), Understanding Intelligence, MIT
Press, Cambridge, Mass.
- Holland, J.H. (1995), Hidden Order : How Adaptation Builds Complexity,
Helix Books, Addison-Wesley, Reading, Mass.
- More specific papers:
McCormack, J. (2002), Evolving for the Audience, International Journal
of Design Computing 4 (Special Issue on Designing Virtual Worlds). pdf
version.
- McCormack, J. (2003), Evolving Sonic Ecosystems, Kybernetes 32(1/2).
pdf version.
-
- The Illusion of Beauty (20pts)
- Project-id: JM-3
- Jon McCormack
- What
is beauty and what makes a thing beautiful? Is beauty a property
of things and can it be measured? This was famously answered by the
mathematician Birkhoff who in 1933 proposed an 'aesthetic measure',
equal to the ratio of order to complexity. Birkhoff was able to
measure the aesthetics of simple shapes and vases. However, the
measure was not successful in general. Many principles of what makes a
good picture are well known, i.e. in composition we talk of balance,
entrance and exit, symmetry, etc. as properties of what is considered
a good composition. For this project the challenge is to try and
encode or infer rules about visual composition to see if we can devise
a better aesthetic measure of an image. This could have all sorts of
applications, for example, automated selection of camera positions in
3D virtual environments, use as "fitness functions" to be used in
evolutionary programs and so on — read this paper as a good starting
point. Would we think a computer that can make beautiful images as
being creative?
[more]
- Preliminary Reading:
Saunders, R and Gero, JS: 2001, Artificial Creativity: A Synthetic
Approach to the Study of Creative Behaviour, in Gero, JS (ed),
Proceedings of the Fifth Conference on Computational and Cognitive
Models of Creative Design, Key Centre of Design Computing and
Cognition, Sydney.
- Humphrey, NK: 1973, The Illusion of Beauty, Perception 2: 429-439.
- Solomonoff, RJ: 1995, The discovery of algorithmic probability: A guide
for the programming of true creativity., in Vatanyi, P (ed),
Computational Learning Theory: EuroCOLT '95, Springer-Verlag, Berlin,
pp. 1-22.
- Boden, MA: 1994, What is Creativity?, in Boden, MA (ed), Dimensions of
Creativity, MIT Press, Cambridge, MA, pp. 75-117.
- Dartnall, T (ed.) 2002, Creativity, Cognition, and Knowledge: An
Interaction, Praeger, Westport, Connecticut; London.
- Partridge, D and Rowe, J: 1994, Computers and creativity, Intellect,
Oxford, England.
- Hybridizing Ant Colony Optimization and Constraint Propagation (24-pts)
- Project-id: BM-ants-1
- Bernd Meyer and Andreas Ernst
- Solving
combinatorial optimization problems, (COP) such as fleet routing,
rostering and timetabling, is a core task of practically all industrial
operations. As these problems are computationally hard, specialized
techniques and algorithms are required to solve them. This makes COP a very
interesting class of problems, both from the theoretical and from the
applied perspective.
The common characteristics of COP is that a vast space of potential
solutions has to be searched for a solution that satifies all problem
constraints and is in some sense optimal. Problem constraints restrict the
range of feasible solutions: in rostering, for example, a constraint may be
that no employee can work two consecutive shifts. Optimality must in some
sense be measurable, so that solutions can be compared. In rostering a
component of optimality often is the total salary cost.
There is large variety of algorithms and techniques to tackle COPs. Two very
different approaches are meta-heuristics and constraint programming. With a
bit of over-simplification, it could be said that meta-heuristics are a good
approach when finding feasible solutions is comparatively easy but
achieving optimality is difficult, whereas constraint programming is at an
advantage where it is difficult to find a feasible solution, but optimising
among the feasible solutions is comparatively easy.
The project will investigate hybrid techniques that combine the
complementary strengths of these methods. Particularly promising
meta-heuristics for such a combination are Ant Colony Optimization (ACO) and
the Bayesian Optimization Algorithm (BOA). ACO is a recent stochastic
meta-heuristic inspired by the foraging behaviour of real ants. For an
introduction to ACO see "A Review on the Ant Colony Optimization
Metaheuristic: Basis, Models and New Trends" by Oscar Cordo, Francisco
Herrera and Thomas Stützle in Mathware and Soft Computing, Vol. 9, No. 2-3,
2002, pp. 141-175. For an overview of BOA see "Linkage Problems,
Distribution Estimation and Bayesian Networks" by M. Pelikan, D. Goldberg
and E. Cantu-Paz (available at:
www.cs.umsl.edu/~pelikan/publications.html).
The project will be conducted in association with the Commonwealth
Scientific and Industrial Research Organisation CSIRO.
-
- Optimality of Ant Foraging (24-pts)
- Project-id: BM-ants-2
- Bernd Meyer and TBA
- The
emergence of co-operative behaviour, such as collective foraging, is one
of the most fascinating aspects of social biology. How can very simple
individuals, such as ants, collectively conduct highly organised activities,
such as nest building or foraging, without any need for a central
co-ordinator?
Probably the best studied forms of co-operative behaviour are foraging and
combat in social insects, such as ant, wasps, termites and bees. Foraging,
in particular, raises interesting questions. A wide-spread assumption in
modern biology is the so-called Optimal Foraging Theory (OFT), which
basically asserts that the foraging behaviour of a highly evolved species
must in some sense be optimal or at least close to optimum. OFT really is a
consequence of standard evolutionary theory, where foraging efficiency is
taken as a currency (as a proxy) for general fitness. It is, however, worth
questioning whether such a simple form of OFT is really justified.
While it is difficult or even impossible to rigorously test OFT in the field
or in the lab, mathematical and computational modelling gives us a handle to
do this. Mathematical modelling provides the opportunity to vary parameters
that we cannot (easily) isolate or control in the lab and to derive
predictions how the foraging behaviour will change with such parameter
changes. Thus we can design representative simple, schematic scenarios and
abstractly test OFT for these. A closer inspection of existing mathematical
models of ant foraging immediately invites some types of computational
experiments that allow us to analyze the optimality of ant foraging in
certain types of environments.
The purpose of the project is to use mathematical and computational
modelling to analyse whether ant foraging is optimally adapted to the
environment. Results of a successful project will in turn inform a
biological investigation and give guidance for the design of field or
laboratory experiments with real ants to validate the results of the
modelling.
These questions can be tackled with two complementary modelling techniques:
individual-based simulation models (IBMs, sometimes also called agent-based
micro simulation), and differential / difference equation models.
Differential equation models, the classical modelling technique for
collaborative foraging, unfortunately often fall short where complex
environments have to be taken into account, for example dynamically changing
food resources. The project is therefore more likely to rely upon IBM
simulation. Students wishing to undertake this project should feel confident
with programming these. Certain important aspects of the problem can,
however, well be studied in abstract mathematical models without simulation
and a student with the appropriate mathematical background may choose to
specialise the project in this direction.
Interest in biology and some enthusiasm for interdisciplinary work is a
must, but no particular biological knowledge is required as a prerequisite.
Depending on the specialisation, this project may be conducted with an
external supervisor and may give interested students the opportunity to
collaborate on experiments with real ants in a biological laboratory if
desired.
For an introduction to collaborative foraging and its modelling see
"Self-organization in biological systems" by S. Camazine, J.-L. Deneubourg,
N.R. Franks, J. Sneyd, G. Theraulaz and E. Bonnabeau. Princeton University
Press, 2001.
- Bayesian Networks for Modeling Cardiovascular Health (Background)
-
The ARC has funded a 3-year project to
apply Bayesian networks (BNs) to epidemiological data, specifically to
predict and prevent coronary heart disease (CHD). Bayesian networks
excel when researchers need to combine causal and diagnostic reasoning
in areas characterised by uncertainty. But they have one flaw which
hinders their use: they do not yet easily mix continuous and discrete
variables. We intend to extend them to handle such mixes, then
demonstrate how much they can improve on current methods for
predicting, among other things, coronary heart disease. One aim,
obviously, is to prevent deaths. Another is to reduce health-care
costs by showing where we can shift resources from a small and
expensive high-risk group to a large and inexpensive moderate-risk
group.
The ARC project commenced in 2004 with the appointment of Dr. Charles
Twardy as Research Fellow. We have constructed two BNs from
epidemiological models of coronary heart disease (CHD), specifically
(1) a regression model from the Australian Busselton study [Knuiman et
al., 1998] and a simplified ``points-based'' model from the German
PROCAM study [Assmann et al., 2002]. We then adapted both these BNs
for the Busselton data set and compared them in cross-validation,
along with machine-learned models. These BNs will serve as baseline
predictors for the ARC project.
-
- Project 1: Knowledge Engineering BNs for Epidemiology (12pt or 20pt)
- Project-id: NK-epi-1
- Ann Nicholson and Kevin Korb
- This project involves constructing BNs that combine knowledge from
experts (from the Department of Epidemiology, Monash Medical Ctr)
and the results of the baseline predictor BNs constructed thus
far. A number of stages in the knowledge engineering process (see
KorbNicholson2004] will be undertaken including:
- investigation of network structures using the software tool Matilda
- interleaving expert and automated parameters estimation, using a
previously developed methodology [WoodberryEtAl2004]
- applying prior causal knowledge (from experts or the literature)
to automated learners, including CaMML
(See
www.datamining.monash.edu.au/software/camml/
for an overview of CaMML).
- sensitivity analysis (including sensitivity to findings and sensitivity
to parameters), using the Netica software and software
developed at Monash [WoodberryEtAl2004]
- comparitive evaluation of the BNs using the Weka environment.
-
- Project 2: Learning hybrid BNs for Epidemiology (12pt or 20pt)
- Project-id: NK-epi-2
- Ann Nicholson and Kevin Korb
- This project involves extending existing BN methods to learn hybrid
networks (those with mixed continuous and discrete variables). These
methods will be implemented in CaMML [CaMML], a successful BN
learner that uses Minimum Message Length (MML) metrics tuned for
causal structure. (See
www.datamining.monash.edu.au/software/camml/
for an overview of CaMML). The methods developed will be used to build
hybrid BNs from the CHD epidemiological data. A comparive evaluation
of these hybrid BNs against the baseline predictor BNs will be
undertaken using the Weka environment.
-
- Requirements:
Student should have completed an undergraduate AI subject,
and have a reasonable maths background.
- References:
- [Assman et al. 2002]. G. Assmann, P. Cullen and H. Schulte., Simple
scoring scheme for calculating the risk of acute coronary events based
on the 10-year follow-up of the Prospective Cardiovascular Munster
(PROCAM) Study. Circulation. 105(3):310-315, 2002.
- References:
[Knuiman et al., 1998].M.W. Knuiman, H.T. Vu and H. C. Bartholomew.
Multivariate risk estimation for coronary heart disease: the Busselton
Health Study. Australian & New Zealand Journal of Public
Health. 22:747-753,
1998.
[Korb & Nicholson, 2004] K. B. Korb and A. E. Nicholson. Bayesian
Artificial Intelligence. Chapman & Hall / CRC,
2004.
[Woodberry et al., 2004] O. Woodberry, A. Nicholson, K. Korb, C. Pollino.
Parameterising Bayesian Networks: A Case Study in Ecological Risk Assessment,
Technical Report 2004/159, School of Computer Science and Software Engineering,
Monash University, 2004 (based on 2003 Honours project, see
www.csse.monash.edu.au/hons/projects/2003/Owen.Woodberry/.
- Computer models of autistic learning (allocated)
- Andrew P. Paplinski and Lennart Gustafsson
- I will be offering only one Honours project which has been already
pre-assigned to Chris Mills.
- Content-Based Image Retrieval (20 pt or 12 pt, 2 projects)
- Background: In
recent years Content-Based Image Retrieval (CBIR) has become one of
the most active areas of research in image processing. The purpose of
a CBIR system is to retrieve images from a database whose visual contents
are similar to that of a query image. Two key research aspects (see below) in
CBIR are 1. Representation of visual features such as colour, texture
and shape, followed by indexing of image in terms of multi-dimensional
features and 2. Use of relevance feedback in the retrieval
process.
- Project-id: SR-cbir1
- Sid Ray
- This project will focus on the second aspect mentioned above, namely, the
use of relevance feedback in the retrieval process. In this method,
the retrieved images are chosen in such a way that they reflect the user's
choice that is based on visual inspection of images. The project will involve
development of methodology for combining image features, such as colour,
texture and shape, by incorporating the relevance feedback information.
-
- Project-id: SR-cbir2
- Sid Ray
- This project will focus on the first aspect mentioned above, namely,
representation of image features and image indexing. The project will involve
development of an image indexing algorithm based on (a) the study of
usefulness of various computational features in describing the visual
contents of an image and (b) the study of combination of features leading
to successful retrieval results.
- For details on the projects please see the supervisor. In addition,
reference may be made to the three honours projects in the area of CBIR
that were carried out by Stewart Hore, Scott Clements and Hai Le during
the years 2000, 2003 and 2004, respectively.
-
- Texture Analysis of Images (20 pt)
- Project-id: SR-texture
- Sid Ray
- Texture plays an important role in both human interpretation of visual
scenes and computer analysis of images. Textural cues are of particular
relevance in two different, but related, image analysis problems, namely,
the problems of (i) segmentation and (ii) classification of images. The
proposed project will deal with both of these problems. It will involve
two phases. In the first phase some existing texture analysis methods
will be investigated from the point of view of their theoretical soundness
as textural measures as well as their practical applicability. In the
second phase attempts will be made to derive a new methodology with
particular attention to its computational efficiency.
-
- Selection of Features for Pattern Recognition (20 pt or 12 pt)
- Project-id: SR-features
- Sid Ray
- Experience
shows that in recognition of patterns by humans, the recognition
is made based on only a few important features (or attributes) characterizing
the pattern classes. Conversely, in statistical pattern recognition,
the patterns are often represented by a large number of numerical features.
Although there is no conceptual justification in reducing the number
of features to a small number, in practical problem solving, this
becomes a necessary step due to the wellknown phenomenon of the
'curse of dimensionality' of the feature vector on the complexity
of the pattern classifier. The aim of the proposed project is to
develop an interactive feature selection paradigm well-suited to
multiclass (rather than 2-class) pattern recognition problems.
- The project will comprise four components, namely, (i) study of some
existing feature selection criteria, (ii) development of software
tools for the existing criteria, followed by an experimental
investigation of them, (iii) analysis of the above results leading to,
hopefully, a feature selection paradigm, and (iv) development of
an interactive software tool for the above paradigm. The software developed
will be 'interactive' in the sense that depending on the classification
accuracy arrived at a certain stage, the user will have the option of
changing the dimensionality value. Options will also be available to
supply interactively a range of values of the dimensionality. The
software tools will include procedures for displaying the
distribution of pattern samples in different feature spaces
obtained by different feature selection
methods.
-
- Document Image Analysis (20 pt or 12 pt)
- Project-id: SR-docs
- Sid Ray
- The objective of document image analysis is to recognize text, graphics,
and pictures in printed documents and to extract the intended information
from them. There are two broad categories of document image analyses,
namely, textual processing and graphical processing. Textual processing
includes skew determination (any tilt at which the document may have been
scanned), finding columns, paragraphs, text lines and words, and
performing optical character recognition. Graphical processing deals with
lines and symbols.
- The scope of the proposed project will be the aspect of text processing
and it will concentrate on the development of a system for page layout
analysis.
-
- Image Thresholding by Probabilistic Distance Criteria (20 pt)
- Project-id: SR-threshold
- Sid Ray
- Thresholding
is one of the most popular approaches to image segmentation.
In this approach, all the pixels having a certain range of some image
property, say, intensity, are considered to belong to one group. Connected
regions in these groups lead to the desired segmentation. In the past,
the probabilistic entropy measure of Shannon, defined in the context of
Information Theory, has been successfully applied in determining
the gray level threshold between the "object" and the
"background" regions of an image, assuming separate probability
distributions for the object pixels and the background pixels. Image
segmentation methods also exist in which non-probabilistic entropic criteria,
devised in the fuzzy set-theoretic framework, have been
used.
Projects carried out in 1997 and 2002 included the study of
two probabilistic distance criteria, namely, the Bhattacharyya distance and
Kullback-Leibler Divergence function, as image thresholding criteria.
The purpose of the proposed project will be to make further investigations
in the area of probabilistic distance-based image segmentation methods and
their applications for both monochrome and colour images. Knowledge of
introductory Statistics and Probability Theory will be advantageous.
Attendance of CSE456 Pattern Recognition in Semester 1 will be helpful
but not essential.
-
- Image Segmentation by Clustering (20 pt)
- Project-id: SR-seg-1
- Sid Ray
- The aim of the project will be to make a critical study of a number
of existing non-hierarchical cluster analysis methods, such as, c-means,
fuzzy c-means, and isodata, with a view to their application in
image segmentation. Study of cluster validity will form an essential part
of the project. Applications will include both monochrome and colour
images.
-
- Segmentation of Textured Image (20 pt)
- Project-id: SR-seg-2
- Sid Ray
- Note: See the supervisor for details.
-
- Classification of Textured Image (20 pt)
- Project-id: SR-class
- Sid Ray
- Note: See the supervisor for details.
-
- Hybrid Image Segmentation (20 pt)
- Project-id: SR-hybrid
- Sid Ray
- Note: See the supervisor for details.
- Spatial models of evolutionary biology
- Project-id: SG-spatial
- Suzanne Sadedin (post doc) and David Green
- A project with the general theme
of spatial models of evolutionary biology. Some possible areas for
investigation include the
evolutionary dynamics of mating systems,
game theory for territorial animals, and
the paradox of the lek.
- You will
need to meet with me to negotiate a more specific topic that meets your
interests and is relevant to current work. You should only consider this
project if
a) you are genuinely interested in evolutionary biology and
willing to independently learn a lot of biological information and
jargon and
b) you are confident in your coding abilities, preferably in C++.
- Soft Models for Analysing Real-Time properties (Robyn SMART) (20pt or 12pt)
- Project-id: PS-rta
- Ian Peake and Heinz Schmidt
- Meet
Robyn, our Evolution ER1 robot (ER1 Robot Page 2004). In this project you will extend compositional performance models to predict "soft real-time" properties of Robyn, and design experiments to validate these. This will involve extending an existing package written in Java (Schmidt et al. 2003) and becoming more experienced with Markov Chain analysis.
Robyn and other real time systems have requirements on their ``extra-functional'' properties, such as resource (CPU and memory) requirements, performance over time, reliability, and quality of service. "Soft real-time" properties take into account probability, for instance "executes program in 12s with 0.6 probability assuming collision probability less than 0.1".
As industry moves to component-based software engineering, predicting extra-functional properties compositionally remains a challenging research problem (Schmidt 2003) for realistically sized systems.
The Evolution ER1 is a robotics kit with computer vision, hearing, speech, networking, remote control, email, autonomous mobility, gripping, and IR sensing capabilities, implemented under Windows(TM) and Linux, and controllable via simple state machines or even remotely via a character-based TCP/IP API.
The ER1 is suitable for use in a wide range of concept demonstration and prototyping scenarios, which have real-time aspects.
- References: ER1 robot page, 2005
(www.evolution.com/er1/)
- Heinz W. Schmidt:
Reliability prediction for component-based software architectures.
In Journal of Systems and Software, 66(3), pp. 241-252,
Elsevier Science Inc, 2003
- Heinz W. Schmidt, Ian D. Peake, Jue Xie, Ian Thomas, Bernd J. Kramer,
Alexander Fay, Peter Bort:
Modelling Predictable Component-Based Distributed Control Architectures
Proc. Ninth IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS' 03), IEEE, pp. 339-346, 2003
-
- Recovery-oriented Model checking (RoMoc) (20pt or 12pt)
- Project-id: ST-RoMoc
- Heinz Schmidt and Ian Thomas
- In
this project you will develop a new moderately-sized state machine model of
recovery-oriented interactions between independent autonomous software
components. Recovery oriented computing (Patterson 2000) branches
into exceptional behaviour before a failure occurs, i.e.,
proactively, for instance triggered by panicking components. Immediate
and efficient measures are taken to safeguard the computation and lower
the failure probability -
overall mean time to recovery shortens and availability
increases.
Contrast this with
reactive fault-tolerance where exceptions are thrown and handled
after a failure is detected. Think 'alert' instead of 'failure' or
'exercise' instead 'hospital bed'.
Your model demonstrator will be implemented by extending an existing
Java package for probabilistic finite state automata (PFSAs)
(Jayaputera 2003). PFSAs serve as abstract models of programs
(particularly in real-time and distributed, i.e., networked, systems).
These models can be checked against expected properties including
failure probabilities. Such properties are expressed in probabilistic
temporal logic (PRISM home page) and checked symbolically against your
abstract PFSA model or against actual executions of deployed and
running software which is claimed to be recovery-oriented as modeled
in the PFSA. The symbolic (model checking) executions could also form
the basis for automated testing. Automata-based model checking has
numerous practical applications in industry in advanced
telecommunication, military and technical domains such as automotive
software engineering.
- While there are many challenging open
problems with this novel recovery-oriented approach (Fox & Patterson
2003, Shapiro 2004) the scope of this Honours project is limited to
developing the one model, implementing and exploring it as a
model-checking case study.
- References:
- PRISM home page
- Jane Jayaputera, Iman H. Poernomo, Heinz W. Schmidt:
Timed Probabilistic Reasoning on UML Specialization for Fault Tolerant Component Based Architectures, Proceedings of the SAVCBS Workshop at European Software Engineering Conference, 2003
- Patterson, 2002: ROC: A new research agenda for a new century. Proc. 8th Intl Symp. on High-Performance Computer Architecture (HPCA'02), IEEE, 2002
Armando Fox and David Patterson: Self-Repairing Computers, Scientific American, June 2003
- Schmidt, 2004: Building systems from unreliable components, Dagstuhl Workshop on Architecting Systems with Trustworthy Components, 12/2004
- Shapiro: Self-healing in modern operating systems, ACM QUEUE vol 2 no 9, pp. 66-75, 12/2004
- Learning and Selection of ICA feature extractors for Image Retrieval (20pt)
- Project-id: DS-ica
- David Squire
- [NB Prefers to supervise one project only]
Independent Component Analysis (ICA) is a statistical
technique for discovering hidden factors underlying a set of random
variables. It goes beyond Principal Component Analysis in that it seeks
factors that are statistically independent, rather than just
decorrelated. ICA can be used to discover filters that can be used to
characterize visual textures in images. In this project you will
investigate the discovery and selection of a set of ICA filters for an
image collection, and test them in a content-based image retrieval
setting, using the GIFT, and open source image retrieval system. It is
hoped that such filters will give improved performance when applied to
specific types of images, such as dermatoscopic images of melanoma.
References:
www.cis.hut.fi/projects/compneuro/whatisica.html
www.gnu.org/software/gift/gift.html
-
- Segmentation for Content-Based Image Retrieval (12 or 20-pts)
- Project-id: DS-cb
- David Squire
- [NB Prefers to supervise one project only]
In recent years, the use of digital image collections has
become common, on the world wide web, in the preparation of both
electronic and paper publications, and via domestic digital cameras. The
need for tools to manage this rapidly-increasing quantity of visual data
is greater than ever. Specifically, there is great interest in systems
which allow users to query image databases. Consequently, there is great
interest in Content-Based Image Retrieval. CBIR uses the Query by
Example paradigm: to retrieve images, an example image is provided as
the query, and the system endeavours to retrieve similar images, based
on features such as colour, texture and shape. Many such systems have
been developed, such as IBM's QBIC, VisualSEEk, the GIFT. Most current
systems use whole images for query and for relevance feedback. In
reality, users are often only interested in a particular object in, or
region of, the query image. Some efforts have been made to support query
by region, such as in BlobWorld, but the open-source GNU Image Finding
Tool (the GIFT) does not yet do so. In this project you investigate
image segmentation algorithms for suitability for CBIR, with a focus on
Markov Random Field techniques. You will design a new feature extraction
module for the GIFT, which treats segments, rather than images, as the
basic elements.
www.glue.umd.edu/~vikas/Geman%26Geman.ppt
www.gnu.org/software/gift/gift.html
- Security in Out-Sourcing Databases (20 or 12pts)
- Project-id: BS-dbase
- Prof. B. Srinivasan
- Database
as a service model (DAS) is transferred to service provider for backup,
administration, restoration, space management, upgrades, etc.,
the advantage being that the application
service provider supports the S/W, H/W and provides their own human resources.
The issues and challenges are:
1 How to charge for service?
2. What kind of service guarantees can be offered?
3. Liabilities of the service provider.
4. Powerful interfaces to
support development environment are required.
5. Scalability in the web environment
and the associated overhaed costs due to network latency.
6.Privacy / security of out sourced data from
intruders and attacks, protecting clients from misuse by
service providers.
Secure and efficient RDBMS Storage model needs
to reduce the overhead with encryption. A new RDBMS storage
model needs to be developed to:
Reduce number of encryption calls, padding overheads and
avoid over encryption of queries on non-sensitive data.
- Improved image reconstruction for JPEG-LS near-losslessly encoded images (12pt)
- Project-id: PT-jpeg
- Peter Tischer
- JPEG-LS
is an international standard for the lossless compression of grey-scale
images. It achieves good compression at high throughput rates. JPEG-LS also
has a near-lossless mode. This achieves greater compression but introduces
some reconstruction error. By specifying a parameter, the maximum
reconstruction error in a pixel value can be controlled.
The near-lossless mode in JPEG-LS enables a picture to be encoded in a way
in which the maximum reconstruction error per pixel of the decoded picture
is guaranteed not to exceed a user-defined amount. The decoder has a
degree of flexibility in terms of how to assign a reconstructed value to a
pixel. This project will use local segmentation-based techniques to reduce
the amount of reconstruction error in a JPEG-LS near-losslessly encoded image.
Special Prerequisites - none
-
- Document Segmentation for Document Image Compression - (12 or 20pt)
- Project-id: PT-doc
- Peter Tischer
- The
JBIG-2 standard for compressing binary images, that is images whose pixel
values are either black or white, allows for images to be segmented and for
each segment to be treated differently in the compression process. However,
the standard does not specify how this segmentation should be carried out.
It specifies only how the encoder tells the decoder what the segmentation
is.
The aim of this project is to investigate ways of segmenting binary images
of documents, such as those that arise in fax and document processing systems,
so that the resulting segmentation may be used to encode the binary image
most efficiently. The segmentation will therefore be aimed at finding regions
that have similar properties with respect to how compactly the information
in those regions may be represented. Thus, an image of a page might be broken
into segments which represent text, headlines, tables, half-tone images
or white space at the margins of a page. The project might also investigate
how to use such a segmentation to get best compression of the binary image.
Special Prerequisites - none
-
- Image Deblocking using Local Segmentation (20 pt)
- Project-id: PT-seg
- Peter Tischer
- The baseline JPEG lossy compression mode has been hugely successful and
is widely used, particularly on the internet. The same approach is also used
in coding digital video such as videoCD, MPEG3, video-teleconferencing
and digital television. However, at low bit rates the block approach leads
to objectionable artifacts in the reconstructed image.
Local segmentation is a paradigm for constructing image processing algorithms
in which the pixels within a block are either classified as belong to the same
segment or as belong to a number of different segments. Operations on a pixel
will only use values from those neighbouring pixels which belong to the same
local segment. As one example, local segmentation has been shown
to be effective in creating denoising algorithms for removing Gaussian noise
while preserving underlying image structure.
The aim of the project is to use knowledge of how the image was compressed
in conjunction with local segmentation to produce a reconstructed image with
minimal block artifacts. This project is a continuation of a project
undertaken as a BSE honours 12 point project. There are a number of possible
extensions to the project. These include implementing and improving
an objective technique for the blockiness artifact of DCT-encoded images,
implementing a state-of-the-art deblocking technique to use in comparison
studies of deblocking techniques, an investigation of deblocking based
on a sub-band approach and a local segmentation-based approach based
on using the DCT coefficients directly.
Special Prerequisites - none
-
- Document Similarity Measurement via Data Compression (12 or 20 pt)
- Project-id: PT-sim
- Peter Tischer
- In
document classification it is often desirable to measure how similar two
documents are. This might be because we are interested in saying whether
documents are spam or whether the source code of two programs is so similar
that at least program is an example of plagiarism.
If two documents are similar, we would expect that we would require little
information to know the contents of the second document given that we already
know the contents of the first. One way to measure information is by using
data compression techniques as the compressed size is an upper bound
on the amount of information needed to store the original data.
The aim of the project is to adapt techniques from the area of text compression
to compress the data which shows how one document can be created from another.
Of particular interest is coming up with similarity measures that are also
distance functions.
Special Prerequisites - none
-
- Scientific Visualisation using MCST Projections. (12 or 20pt)
- Project-id: PT-vis
- Peter Tischer
- In Scientific Visualization we are often interested in representing higher
dimensional data in some lower number of dimensions such as 2,3 or 4.
Minimal Cost Spanning Trees (MCSTs) are simple to compute and are a way of
preserving closeness information in representing higher dimensional data.
Current methods for carrying out projections for visualizing higher
dimensional data tend to be computationally expensive with results that are
sensitive to parameters within the algorithm that are difficult to tune.
These methods are also sensitive to the dimension to which the projection
is being made. MCST-based projection does have these drawbacks and can be
engineered to preserve properties of the original higher dimensional data.
This project will involve implementing a package that uses MCSTs for
visualizing higher dimensional data in lower dimensions such as 2, 3 or 4
and comparing the performance of this package against packages incorporating
techniques like SOMs (Self Organising Maps).
Special Prerequisites - none.
-
- Automatic Cropping of Medical Images (20 pt)
- Project-id: PT-med
- Peter Tischer (+ Department of Medical Imaging and Radiation Sciences)
- In many forms of medical imaging there is a region of interest in which all
possible information needs to be retained but there may also be parts of the
image where substantial information loss may be acceptable.
For example, in an x-ray of an arm it might be acceptable that information
in the backround could be removed by turning the values of the background
pixels into some constant value.
This project will involve working with radiographers from the Department
of Medical Imaging and Radiation Sciences to automate the process of
deciding that a part of a medical image represents a background region
which does not need to be stored with a high degree of fidelity.
Special Prerequisites - none
- Exploring the Four-colour Theorem (20 or 12-pts)
- Project-id: MW-4col
- Mark Wallace
- That you only need four colours to colour a map, was proven using a
computer enumeration which
is brilliant but unsatisfactory. Given the years of effort on this
problem we cannot expect a simple
pencil-and-paper proof, but I believe that new visualisation techniques
might enable one to achieve
new insights. For example any region with four or more neighbours will
have at least two neighbours
linked by Kempe chains [Fritsch]. Repeatedly flipping the colours of
all regions bounded by a chain
must eventually produce a colourable map or return to the original
colouring. Can the latter ever
occur? If we transform arbitrary maps to a regular form based on
shortest paths, can we get a
better inkling why it is difficult to colouring some maps? This project
will generate some nice
animated graphics but it is unlikely to achieve anything more
substantial. But if it did, it would be
very exciting!
Rudolph Fritsch (1998)
The four color theorem: history, topological foundations and idea of proof.
-
- Rail Timetabling (20 or 12-pts)
- project-id: MW-rail
- Mark Wallace
- Invent a small suburban railway and design a regular timetable that
maximises the number
of trains per hour without any trains coming too close to each other.
This problem is quite
practical and also surprisingly hard [Peeters]. Constraint programming
was once used to
solve a version of this problem for the Melbourne suburban railway
[Gosselin]. The
challenge is to use a combination of constraint programming and
operational research
techniques to do it faster, better and with more scalability.
Peeters, L. W. P. (2003).
Cyclic Railway Timetable Optimization. Erasmus University Rotterdam.
Vincent Gosselin (1993)
Train Scheduling Using Constraint Programming Techniques
-
- Learning While Searching (20 or 12-pts)
- Project-id: MW-learn
- Mark Wallace
- Solving NP-hard problems, such as the most efficient way to wire a car,
or the lowest energy
structure of a protein, involves searching. Searching means trial and
error: and our challenge is to implement a search routine that learns
from its mistakes and never makes a second similar error. This requires
recording the choices that resulted either directly, or indirectly, in
the error and extracting minimum "inconsistent" combinations of these
choices, which can be added as constraints called "nogoods" on the
remaining search [Junker]. The challenge is to combine this learning
with forms of reasoning ahead "constraint propagation" so as to achieve
a highly efficient complete hybrid algorithm. The implementation
language will be constraint logic programming.
Ulrich Junker,
QuickXPlain: Conflict Detection for Arbitrary Constraint Propagation
Algorithms (2001)
- Learning under Resource Constraints (20pt).
- Project ids: GW1 and GW2
- Geoff Webb (two projects available).
- These projects, as part of a large research team investigating the wider
problem, will each tackle one aspect of the question of how to optimise
machine learning in a context where the computational resources available
for learning are unpredictable and varying. This is typical of many online
applications where the user load varies greatly from time to time. Such
applications include information retrieval, recommender systems, user
modeling, junk email filters and online fraud detection.

|
|