VBC Honours Projects 2004 Semester 1

The following are CSSE Honours Projects offered by the Victorian Bioinformatics Consortium (VBC). [What is Bioinformatics?] The VBC is mostly based at Monash University Clayton Campus and is headed by Trevor Dix (Computer Science & Software Engineering), Ross Coppel (Microbiology) and James Whisstock (Biochemistry).


Extending Artemis's annotation capabalities (12 or 20 point?)

Artemis is a GPL open source Java application for browsing DNA sequences and their associated annotated features. A typical feature is a gene, which would be annotated (commented on, labelled) with various tags/fields such as its name, biological function, or anything else of relevance.

  1. Artemis's annotation facility is primitive - it provides only a single editable textbox. The aim would be to re-implement this as a full featured editor which understands all the GenBank/EMBL annotation tags and verify all the fields.

  2. Artemis is designed to be used in a standalone manner. It only reads and writes simple flat files in standard formats. However annotation is usually done by a group of people simultaneously. The aim would be to allow Artemis to read and write features from a remote SQL-compliant database.

All improvements would be fed back to the Artemis community.


Public web site for Leptospira borgpetersenii

The VBC has a team annotating (finding and labelling genes) the bacterial genome Leptospira borgpetersenii. They aim to be the first genome fully assembled and annotated in Australia. This project would aim to produce the public web site to the project, allowing users to navigate the genome and explore the annotations and view or run analyses. The student should be familiar with Perl CGI and SQL databases. It is likely it would use the existing GBrowser framework and that it would be generic enough to work with any bacterial genome.

In silico identification of outer membrane proteins

In bacterial genomes, locating genes is relatively easy. The difficult part is assigning function and determining their "position" in the cell. Recent work in Bologna, Italy has produced software to automatically do this for E.Coli. The aim of this project would be to:

  1. Obtain the Hunter software
  2. Install it and get working on a Linux server
  3. Modify it for use with L.borgpetersenii
  4. Add a Java or CGI/Perl GUI for biologists to use it easily

Improving Artemis

Artemis is a Java application for browsing and editing genetic sequence files, usually stored in a flat structured format. Unfortunately, its annotation facility is primitive, consisting of a single textbox. The aim of this project would be to make this a full featured editor which understands all the GenBank/EMBL annotation tags and verify all the fields. All improvements would be fed back to the Artemis community.

RPM packages for bioinformatics

Bioinformatics is a fusion of computer science, software engineering, and the biological and medical fields. Many of the bioinformatics software packages are poorly documented and difficult to install, especially by the typical bioligist.

The aim of this project would be to create RPM (Redhat Package Manager) packages for many bioinformatic packages. It is expected that high quality .spec files would be programmed to minimize the configuration required by the installer, and maximize interoperability with standard Redhat distributions, in particular Fedora. The results could be fed back to the user community through the BioLinux project.

Mitochandrial Annotation Device

Mitochondria contain their own mini genome. This project would involve the creation of an annotation pipeline from existing sequence analysis and comparison programs. Given a partial or complete mitochondrial genome the pipeline would prepare a report in which each protein and rna gene is identified. In addition the genomic organisation would be plotted and compared with its taxanomically closest neighbours. The protein sequence comparisons will be performed by profile matching and some optimisation will be required to generate the best set of profiles in order to be able to distinguish the different taxa.

A clean virus database

GenBank collates genetic sequences. It has a large virus section, but unfortunately most of the files have headers inconsistent with the sequence data they contain, as well as sequences which should not be present. This pollutes search results with false positives and false negatives. This project would use Bio::Perl and other tools to create a clean version of the database.

Web-spider to find forms-based bioinformatics analysis pages

In the field of bioinformatics new algorithms are being published regularly. Typically these are made available through a web page at the author's University or Institute. The discovery of such pages is problematic. The purpose of this project is to create a web spider that would search the Internet, following links, to identify web pages that contain forms-based web pages that perform bioinformatics analysis and to compile a searchable database of such pages.

VBC Quarterly Reporting

The VBC has to report quarterly to the state government on its research activities. The project would involve developing a database and web front end for maintaining the research activity information. It should be able to generate reports based on that information.