Investigating the use of Software Engineering in Computer Science Research
Quick Links
About the project
Quick Links
Computer Science and Software Engineering
Method and results of investigation
Conclusions drawn
Recomendations

The Case Studies

CaMML | Causal Minimum Message Length

The CaMML (Causal Minimum Message Length) software was originally developed by Wallace and Korb (1999). Since then there have been four, perhaps 5 versions of CaMML. All of these versions bar one were based on the original code by Prof Chris Wallace.

Until the current version, CaMML had been largely developed without any form of Software Engineering. Dr Korb felt it had suffered for this. The latest implementation effort is headed by Dr Korb and plans to address some of the issues with past versions. Issues include maintainability, readability and extensibility. These goals have the support of the development team "readability and extensibility does make extra work, but in the long run I'll save myself work so I want to do it anyway" (O'Donnell, 2002).

During the observation of this case study a gap appeared between design documents and code. It was discovered that the design was understood, but the reasons for it were not. "We were being typical computer scientists and coding without any specification, ah, with a bit of specification" (Hope, 2002). This occurred after the design documents were looked at in conjunction with a different version of the code that had been completely rewritten by Julian Neil. Julian's version was easier to read then the original code, and had a far simpler (but less general and extensible) structure. Without understanding the reasons behind the design the full CaMML group had agreed to, the developers decided to diverge from the design documents, making it more like Julian's version. "We need to bring it back together again" (Korb, 2002).

The key changed were to the data structures. "In the design doc [the model structures] looked ugly and they were ugly to implement. So we just went no. The design document was clear enough to read, its just that it was a totally unnecessary abstraction of the part of a graph" (Hope, 2002). The original diagram of the data structure and an justification in terms of extensibility can be seen in F.

Dr Korb explained the development process used in the CaMML project as good an approximation of industry development as possible within the constraints of the university setting. The most noticeable constraint being a lack of time for all concerned. O'Donnell (2002) commented that "If Lucas and I were sharing an office it would make things easier. Ideally I'd be in a room with Lucas and Josh and Leigh so I'd have the CaMML and CDMS people there".

The largest issue effecting CaMML is currently being examined. "I have put in a faculty research grant to hire someone full time to develop CaMML...but I have little confidence in PhD students developing a useful framework for CaMML, because they have trouble finding time. I think the only way to make it successful is to get someone on it full time...I have some indication...that this use of faculty research funds might be appropriate. CaMML isn't just some sort of commercial product in the end, its going to be the base for lots of research" (Korb, 2002).

CDMS | Core Data Mining Software

The inspiration for CDMS came from a suggestion by Chris Wallace many years before the current students became involved. "I think he was lamenting the fact that there was no common platform for people to carry-out data mining ...different programs spat out different forms of data and no one knew how to use any of the programs apart from the author" (Comley, 2002). In 1998 Lloyd Alison started building a number of prototype models. Some of these are now up to their seventh or eighth revision, but are still in the core CDMS code.

When the current developers started work on CDMS, class diagrams were used and found useful for learning about the existing code. On its own the code was hard to follow. The diagrams later became a burden as the information in them no longer needed to be looked up and was rapidly changing (Fitzgibbon, 2002; Comley, 2002; Alison, 2002). Eventually the diagrams were abandoned. Mr Comley added that at this point the design documents didn't help them think about the system. "We were so involved in it that everything was in our heads. We were sort of living and breathing it" (Comley, 2002). He added that had they taken a months break, the documentation might have been useful to refresh their memory. At one point change were occurring once or twice a week and the design documentation was being updated accordingly. Updating the documents "just got ridiculous" (Comley, 2002).

Although multiple people are developing the system, they are taking a paired-programming approach. "We very rarely sat down and start coding without discussing the issues first and putting our ideas up on the whiteboard and mulling it over for a bit. This goes for big and small things. Its interesting and scary the impact that small little issues can have on the system as a whole. Even the smaller items we bounce off each other" (Comley, 2002). This sort of communication and team-work was found to generally be much more effective than written documentation (Fitzgibbon, 2002). "I think we get heaps of benefit working side by side" (Comley, 2002).

Fitzgibbon (2002) said that "a written overview of how the thing works would be more useful than a diagram for people starting to get involved in CDMS. The diagrams are mostly for once you get into programming. They explain the logic of it, not the whys" Mr Fitzgibbon gave a good analogy: he said producing detailed design on its own was like a mathematician inventing a mathematical system to solve some problem. Then he generalises it, removing it from the domain of that specific problem. The published result is written in such an abstract sense that it is universally applicable, but no one can understand it. "It doesn't make sense to them. They're not motivated by that problem. I think the diagrams are useful, but first you need higher levels" (Fitzgibbon, 2002). `How-to' documents were suggested as one solution. For example `how to write a plug-in for CDMS'.

There is a plan to produce a manual or descriptive technical report for CDMS. A public release of CDMS code would go with this. "We're not sure how this will happen. We were sort of hoping it would happen by magic or delivered by a stork" (Alison, 2002). There are random bursts of activity. CDMS has a high publication potential, but so far it isn't stable enough to write about and there are other more pressing things requiring attention (Fitzgibbon, 2002).

CDMS uses CVS for storing source code, documents and design ideas.

GIFT | GNU Image-Finding Tool

All discussion in this section is from the interview Squire (2002) unless otherwise stated.

The GIFT (GNU Image-Finding Tool) is a framework for content-based image retrieval (CBIR) systems. The tool developed out of work on `Viper' at the University of Geneva and `Circus' at EPF Lausanne. The project was initiated when it was realised a significant amount of work was being duplicated at the two institution located approximately 50km apart.

Regular meetings started taking place between the key researchers at each institution. A successful grant application by each university named the other as a collaborator. The grant applications were "sold" to the funding body on the basis that a framework would be developed to support general research in this area. The framework itself, although clearly mentioned, was not the basis of the grant application. One possible explanation for the emphasis on frameworks is the high staff turnover in Geneva. At the end of four years, Dr Squire was the second-longest serving member of his research group. As staff are regularly moving, it becomes vital for them to produce a significant contribution wasting as little time as possible.

The framework built in the GIFT project provides the following key features of a CBIR system:

  • a way of organising images on disk
  • a way of indexing those images
  • a feature extraction system, which created the numerical features that you then build the index with
  • a query system that access that index and retrieves the image
  • and the user interface

A markup language for image retrieval, MRML is also included and forms a foundation platform for the system.

Most people who do research are interested in one or two components of the system. But they're probably not interested in all of them. Using the GIFT framework, a plug-in that affects one of these components is approximately the right size for postgraduate work. Focusing on one aspect of the problem and not having to code up the other parts allows the student to focus and produce better quality code. A student coding parts of the system unrelated to their research area is likely to do it in a sub-optimal way. Code unrelated to their thesis needs to work, and that is enough.

Dr Squire also mentioned a masters minor thesis, where a student of his extended the MRML (Multimedia Retrieval Markup Language) that underpins GIFT. "There's no way in my opinion that someone in one semester could even have thought of tackling this kind of problem if they had to build the whole thing from the ground up" (Squire, 2002).

Source code for this case study is maintained in CVS (Concurrent Versions System). There was a version control problem early on and a lot of manual work was required to recombine differing versions. The problem is thought to have been human error and has resulted in members of team better familiarising themselves with CVS. There are now designated people managing the MRML and GIFT CVS trees.

The GIFT project had some tension between contributors that required the supervisor's monitoring and intervention. The tension resulted from one person becoming the focal point for fixes, leaving the other person with time to write papers. While both names appeared on the papers, there was some tension regarding who got lead author. This problem was solved by diving work more evenly, making time for the one author to write papers.

Although design documents were not produced during the initial production of GIFT, a high level abstraction of MRML was provided in the manual "configuring and hacking the gift" (Müller, 2000). This was better explained in diagram form six months later in the resulting thesis (Müller, 2001). A project to go back and document GIFT in UML is currently underway (The GIFT Team, 2002).

In discussion with an independent user of GIFT (i.e. one not associated with the developers), it was found that the manual was helpful and provided the first point of call. A few concepts however had to be looked up in the thesis or resolved via e-mail with the authors. The decision to use the GIFT was based on an expected time saving and its ability to be platform independent (Lim, 2002a). Due to the modular nature of GIFT it seems the high level design and MRML specification is generally enough for most users. When asked about the value of the current UML documentation effort the user interviewed replied "I don't know much about UML. So, it probably doesn't help me much" (Lim, 2002b).

References above can be seen in the Bibliography.