Investigating the use of Software Engineering in Computer Science Research
Quick Links
About the project
Quick Links
Computer Science and Software Engineering
Method and results of investigation
Conclusions drawn
Recomendations

The problems in Computer Science

Academic software development suffers from many of the same problems, such as running over time, and budget and being under resourced, as commercial development. This was clearly pointed out in a workshop at Carnegie Mellon University, a leading academic institution in Software Engineering and home of the Software Engineering Institute. The 1993 workshop (Steier et al., 1993), field by the Engineering Design Research Center (EDRC) of Carnegie Mellon University, discussed the role of software in disseminating new engineering design methods. Those who attended included industrial affiliates as well as staff and students. The workshop looked at numerous case studies of software development and suggested that "the understanding and knowledge now exist, and the time is ripe, for adopting a more comprehensive approach to software development | even within a research setting | and for establishing a better infrastructure for software design, maintenance and reuse" (Steier et al., 1993).

While very little work has been done to investigate the benefit of applying software engineering principles in computer science research at academic institutions, much work has been done on the application of this knowledge in industry. A number of problem areas have been found in the process of software development (in industry) and, over time, the field of software engineering has produced solutions to many of these problems. Industry problems, such as the difficulty of coming to grips with large system, the difficulty of explaining how software works to experts outside the design team, the difficulties of staff knowledge becoming out-dated, the difficulty of coding more efficiently and others have been at least partially solved through software engineering. These problems and solutions are discussed below. While the development environments in a university research setting and a commercial software development company are vastly different, in many cases either the same problems exist, although possibly on a different scale, or different problems exist but these problems may be addressed using existing methods found in industry. This will also be discussed below, under the sub-headings "The SE solution".

The sections below each relate to a problem I have discovered through observation of past or on-going projects, or discussion with computer science researchers about their work. Following each problem statement and explanation is a suggested method for addressing it that has been used or recommended in the software engineering literature to solve industry-based problems of a similar nature. In some cases solutions from other fields of science, that have engaged in problem-solving with the aid of computers, will also be mentioned.

1 Difficulties in communication and understanding research

The difficulties in understanding other academics' research methods, concepts, software and how they all relate, may hinder or prevent research being understood, expanded upon and practically applied.

1.1 The CS (Computer Science) problem

Unlike most commercial software development, the aim of most university research is to increase the amount of public knowledge. While academic research programmes are typically small programs (by industry standards) they often include a high degree of complexity due to the nature of the research. A small research program can be as hard to understand (due to the concepts involved) as a large, but fairly generic, transaction processing system developed in industry.

Scientists in a university department (outside of computer science) generally use scientific languages such as mathematics, chemical reaction diagrams, Feynman Diagrams etc to help them express and explain their theories and ideas. In the computer science department, computer scientists working in support of such scientific research may use these common symbolic languages when discussing the scientific concepts behind their software with these other scientists. While this may be adequate for other sciences who are merely using the computer as a tool, for research in the field of computer science itself one needs to know more. From a scientific point of view, it must be explained why some code is actually an implementation that experimentally proves and validates a proposed theory. As shown by their use of other notations for ideas outside the field of computer science, computer scientists are apt at explaining concepts in precise notation if given the right symbolic languages. Concepts in the field of computer science itself however, are often `proven' with software that remains a black box. Many papers get published in the field explaining some concepts said to be embodied in software, yet the code itself (one of many possible forms of notation) is often poorly documented and no attempt is made to symbolically explain how and why it works.

1.2 The SE (Software Engineering) solution

Like academia, industry too has a need to understand how and why software does what it claims to. This needs comes from a desire to meet the product specification that often forms part of contractual agreements. There is often also a need to be able to prove to the customer or an external expert that these requirements will be met or are being allowed for in the design. The US military's DOD-STD-2167A (Defence System Software Development) standard, (later replaced by MIL-STD-498,) was for many years a de-facto standard in industry precisely because it made developers accountable in this way (Newberry, 1995; Nutt, 1995).

The solution provided by SE to industry's difficulties with large software projects was the creation of formalised system analysis and design languages and methods. While this has clearly not solved all the problems, it has enabled improvement and allowed professional developers to discuss design and development at a level above the code. It allowed outside experts to come in and rapidly obtain an overall picture of the system (Fowler and Scott, 1997). It improved communication of processes between experienced and more junior developers. While there is still a failure rate, it has dropped over time. Where good design documents exist, future developers (be they new members of the team or contractors) do find it much easier to continue development and correctly understand and integrate with exist work (Nutt, 1995).

1.3 Application of a solution

Unlike a counterpart in industry, a new researcher may have access to software that implements the theory and access to scientific papers discussing the theory, but he will often not have access to design documents that show why and how the software is an implementation of the theory (and through the software how the theory is applied). The researcher may have to spend a long time reconciling the theory and application through experiment, or painstaking examination of the code. This is very similar to the situation that is faced by an outside expert, or new member of a design team, working on an industry project that lacks design documentation.

The need for design documentation has been recognised and used in some university projects. The COMIC Project, an international multi-disciplinary research project highlighted the importance of appropriate symbolic language by making it one of the project's four major themes (Cooperation Technologies Laboratory, 2002). The CaMML (Causal Minimum Message Length) (Wallace and Korb, 1999), GIFT (GNU Image-Finding Tool) (The GIFT Team, 2002) and CDMS (Core Data Mining Software) (Fitzgibbon, Comley and Allison, 2002) projects have also used design languages such as the UML (Unified Modeling Language) (Object Management Group, 2001) at various stages of development.

As shown in industry, the availability of design documents that explain why the software does what it does in a symbolic language understood by all working on the project can make the task of understanding much simpler. it is however fair to suggest that when taken to a research environment not all design notation might be useful in this way, further, unlike industry with one large project, research tends to be broken into smaller research projects, each using the ideas that came before. If the goal is only to communicate the finished ideas and how they were implemented, there is no reason the "design" should not be done after the theory has been applied and proven to work, i.e. after the product is completed.

Making design documents public will make it easier for others to replicate the experiments or incorporate the research into their own work. Unlike commercial work, which aims to protect its software, this is often the very idea behind university research. A desire to share knowledge is the reason why algorithms are published.

1.4 Results

3.2 Knowledge Transfer

This is the problem of getting leading edge concepts understood well enough that they are appreciated and eventually widely used.

2.1 The CS problem

Often methods that are provably superior are ignored because they are hard to understand from their technical publications, or suffer from the "not invented here" syndrome. MML is a case in point. First published in 1968 by Wallace and Boulton (Wallace and Boulton, 1968), and repeatedly shown to be better than a similar method called MDL on many problems (Oliver, Baxter and Wallace, 1996; Wallace and Dowe, 1999; Buntine, 1993). MML is used almost exclusively at Monash University where staff and students have had access to Professor Chris Wallace and others involved in MML research. The evidence suggests that, without access to these people, researchers have ignored a superior technique in preference to one that is easier to implement. This is despite the issue being of sufficient interest for the Computer Journal's special issue on Kolmogorov Complexity to be mostly devoted to the debate between MML and MDL (Gammerman and Vovk, 1999).

It is plausible that, given the design documents (if easy to follow design documents existed) for programs such as CaMML and Snob that use MML, others would be able to better understand not only how MML works, but also how it has been implemented in the past. This would lower the barriers to its future use. This increased reuse has been shown in the past in other areas (Steier et al., 1993).

The EDRC workshop explained this phenomenon by suggesting that the characteristics of the development process, and specifically the adherence to software engineering standards, have a large influence in enabling a knowledge transfer from the university setting to industry or other academic institutions (Steier et al., 1993). In other words, good design may not only make research clearer, but may also increase the likelihood that people in industry will take the time to look.

2.2 The SE solution

Software Engineering has developed many modeling languages that are used to explain new innovations. The most prominent of these at the moment is the UML (Unified Modeling Language, used for explaining systems designed in the OO (Object Oriented) Paradigm. While not a panacea, OO is currently in widespread use in industry and a key component of most computer science undergraduate courses. Even if a company is not currently using it, it is viewed as important that developers understand it (Hamilton, 1999).

Others modelling tools such as DFDs (Data Flow Diagrams), ER (Entity Relationship) diagrams, Data Dictionaries and many others may be used when working in the structured analysis and design paradigm (Yourdon, 1989). The Structured and OO paradigms are "fundamentally different" and require a "different way of thinking about decomposition" (Booch, 1989). For communication to work, the same meaning must be attached to common terms. The choice of design language is a means of specifying the methodology that is being used and adopting a formalised set of definitions for certain programming, architectural and abstract concepts, clarifying communication of ideas much like the formal scientific symbolisms mentioned in Section 1 above.

There are many other methodologies and languages. Just within the OO paradigm there are the Demeter Method, SCRUM Development Process, MOSES (Methodology for Object-oriented Software Engineering of Systems), Shlaer-Mellor Method, Fusion Method, Open Modelling Language, and many others (Suzuki, 2002).

Extreme Programming (XP), is perhaps the newest methodology that some have begun to embrace (Wells, 2002). It explicitly allows for change and aims to reduce the level of management (and hence documentation) required for a project (Wells, 2002). Though this methodology may be of particular use in an academic environment, the approach is still very new and further experience with it and research is needed.

3.2.3 Application of a solution

If a common language such as the UML is used, not only will other researchers be able to follow the concepts more easily, but those in industry who might make use of the research will be speaking the same design language.

While many may not understand the technical implications of the research, given a blueprint in a language they understand, they will still be able to implement the work from the documentation. It is not the first version of the software (the proof of concept) but the second that is likely to be used. As Fred Brooks suggested "plan to throw one away; you will anyhow" (Brooks, 1995). I will suggest later that an incremental development should not involve throwing everything away. Nevertheless, the development of a commercial version of software properly belongs in a commercial environment (be that private industry or a company attached to a university department). Allowing industry to build this second version will ensure that more ideas make the transition from proven theory to incorporation in usable products.

3 Loss of Knowledge

An inability to communicate ideas effectively due to differing language and culture can lead to a loss of detail and misunderstanding between student and supervisor or between members of a research team.

3.1 The CS problem

There is, a generation gap in the knowledge of computer science researchers. The field is a broad one, and with the exception of those doing research or teaching in the area, most researchers do not spend time keeping up-to-date with designs practices used in industry and taught to students. The result is that there is a difficulty in communicating ideas, for example when one party talks about "objects" and the other of "slots".

In order for a project team to work well together, either every one must speak the same technical language, or the group must resort to avoiding technical language. In the case of a design language, avoiding the language is most easily achieved by avoiding design work altogether. In too many cases this simple approach is used.

Many researchers I have spoken with admit to having most of the details of their work in their head and very little, if any, on paper, the exception being a published paper once the work is complete. The problem with this approach is that the knowledge requires the person. If the researcher loses interest, moves to a new institution, or is hit by a bus, the result for their research students, institution, and potentially the computer science community as a whole, is a disaster. While many computer scientists believe strongly that information should be public, their own actions prevent a full discussion of their research.

3.2 The SE solution

In industry today many jobs require the applicant to have specific knowledge of design languages, for example UML. Companies often also provide ongoing education to their employees. One of the first companies to do this with software engineering knowledge was AT&T Bell Laboratories, Merrimack Valley. In 1987 the company ran a once-off 2 year course (on company time) for non-computer science staff that gave staff a computer science and software engineering background. This ensured a consistent design knowledge across the group, and, by allowing other employees to take certain units, it enabled a standard level of knowledge to be achieved across the company. The motivation for the program was a vast increase in production of software systems and a desire to have staff who were both experienced with the old mechanical and effectrical equipment paradigm and able to work reliably and professionally in a software environment (Cleaveland and MacDonald, 1987).

Like AT&T Bell in their shift to software, the IT industry (including the education sector of the industry) has also undergone a rapid change: from mainly procedural to mostly OO programming. Industry has been working to adapt to the "paradigm shift" (Fowler and Scott, 1997) and to retrain staff, while in the universities some have still not made the transition or taken the time to learn OO and add the new paradigm to their own knowledge.

3.3 Application of a solution

The development of decent documentation, both in code and externally, can lead to a situation where the loss of a key staff member involves a set-back, but the work can go on. Just as importantly, there can be some level of continuity between work done by different PhD candidates and other students. These people can thus collaborate more, do more, and have a larger impact with their contributions.

4 Inefficient and ineffective reinvention

The problem of re-inventing the wheel one wants to use.

4.1 The CS problem

The term software reuse, as mentioned before, was coined at the NATO Science Committee conference "Software Engineering" in 1968 (Naur and Randell, 1969). It was put forward as a means to reduce both the effort required in software production, and the error rate. The Numerical Recipes books (Press and Teukolsky, 1997) are probably the earliest examples of this being put into practice in a scientific domain. The original book was created between 1981 and 1985 by a group of physicists then all working in a university environment. The inspiration for the book was to close the gap between best practice "as exemplified by the numerical analysis and mathematical software professional communities" and average practice "as exemplified by most scientists and graduate students" that the authors knew. That observation was made 25 years ago. While the numerical recipes book became very popular, this positive example of reuse is an exception rather than the rule. In 1992 Krueger reported that reuse was still not evident and he addressed the question of why the uptake was so slow (Krueger, 1992). In the university setting today there is still a tendency to do everything from scratch and reuse is fieldom mentioned.

Casual observation will show research students learning how to code up basic components like linked lists, and then coding them again year after year (even after they leave the university) from scratch each time they need to use one. This is inefficient when the code for a doubly linked list is part of standard template library in C++, a standard class in Java and some languages, such as Perl, just don't need linked lists (the Perl array has all the functionality one might want a linked list for) (Stroustrup, 1997; Flanagan, 1999; Wall, 2000). This re-coding becomes ineffective and indeed dangerous when programmers forget subtleties of the data structures they are coding, or make logical or typing errors. Now not only have they wasted time re-coding, but they will have to spend a lot more time debugging code that wasn't needed in the first place. While the pedagogical motivation of re-coding fundamental data structures is acknowledged, like assembler, there is a time and place for this in a computer science course. For most students, postgraduate research is neither the time, nor the place.

While scientists, and particularly computer scientists, have benefited from these advantages of modern high-level languages, at the macro level this inefficiency and ineffectiveness often still holds when whole projects need to be re-coding from scratch just to add what could otherwise be a small plug-in. The reason for re-coding is often an inability to read the last person's poorly documented code. Unfortunately there is a habit of passing the problem on when the new person also fails to document what they've done. This can be seen in early version of CaMML, Snob and other projects.

Another common problem is the lack of a framework. Often many projects would be able to share components if someone had built the framework to allow it to happen. Unfortunately as this is often not research in and of itself, but rather the coding of a tool, no one is willing to invest the time. The issue of frameworks has in the past been addressed. Brooks suggests that a dedicated tool-maker be attached to a programming team to provide a common framework (Brooks, 1995), and in a university environment this was in fact done in the past (including at Monash). For many years now computer science has been funded (at least in the UK) on a similar model to mathematics rather than that of other sciences (where technicians to set up experiments are regarded as a standard requirement), this is cited as one of the problems with research in the Computer Science field by Mike Holcombe, Dean of Faculty of Engineering and a past head of the Computer Science Department at the University of Shefield (Holcombe, 2000).

4.2 The SE solution

Reuse can take a number of forms ranging from the reuse of one's own toolkit through different projects through to the buying of commercial packaged software. Two implementations of reuse at the macro level are the development of standard but highly customisable applications and the use of modular operating systems. Today's ERP (Entity Resource Planning) systems (such as SAP, PeopleSoft and Baan) are an example of the first (Hamilton, 1999). They allow plug-in modules, purchased separately, to extend their capabilities. The GNU opensource project is an example of the second. Open source works in a similar fashion with lots of separate modules. It allows additional modules of the system to be added or upgraded as they become available or needed.

At the implementation level in industry, programming languages such as Java are being pushed, and many argue succeeding, due (as mentioned before) to their vast array of well-maintained libraries containing common data structures (Neighbors, 1998). The Microsoft Foundation Class was similarly invented to decrease the amount of work needed to do common tasks and was similarly accepted by a large part of the industry. The idea of frameworks have also been explicitly used, successfully, in industry. An example of a successful framework implementation in industry is that of the GENVIS project, a report visualisation framework developed for a leading Swiss Bank (Bredenfeld, Ihler and Vogel, 2000).

These are all specific (limited) methods of reuse. The larger and more general question has been shown to be much more elusive. Research literature in the area of reuse has mostly taken the form of developers in ongoing projects tracing the cause of their success back to reuse, rather than publication of new techniques for reuse developed from first principles (Krueger, 1992).

4.3 Application of a solution

Some academically developed software has been accepted into the GNU (GNU is Not Unix) open source system. The GIFT (GNU Image-Finding Tool), the subject of a case study in this project, is a GNU framework for Content Based Image Retrieval Systems (CBIRSs) (The GIFT Team, 2002).

CDMS (Core Data Mining Software) (Fitzgibbon et al., 2002), is a framework for a collection of programs related to modeling data. CDMS makes use of Java libraries and then extends them into its own language, while still allowing plug-ins written in Java to be integrated. A number of projects currently plan to make use of CDMS either as a platform to build on, or as a simple method for managing input find output with the user (i.e. reuse of the interface, parsing and sorting abilities).

5 Repeated framework rebuilding

There is a loss of time "re-inventing the clock in order to adjust the time".

5.1 The CS problem

A PhD student wanting to extend a piece of software invented as a proof of concept will often find that they have to rebuild everything that came before them. Often the code is undocumented, close to unreadable (due to lack of commenting and sometimes as it was written in an old language) and critical to the new research. Many students have wasted large amounts of time rebuilding a system so they can add their small bit to it. Unfortunately this rebuilding is often also done without any attempt at design, documentation, or improved clarity of the old code. The student re-implementing it will complain of all the wasted time...yet the next PhD student who comes along will now have even more undocumented and poorly designed code to work through.

5.2 The SE solution

In software engineering journals there is talk of plug-ins, filters, frameworks and converters. (Weis and Geihs, 2000; Baudry, LeHanh and LeTraon, 2000; da Silveira and Meira, 2000; Rakotonirainy, Bond, Indulska and Leonard, 2000) Software is designed to be extensible, it is documented so it can be modified. The importance and size of the maintenance phase is today recognised and developers try to plan for it accordingly. In some cases maintenance can take up over 60% of the total development effort on a project (Pressman, 2001). Today's software developed by industry is often design to be adjustable to tomorrow's, as yet unknown needs. At the same time many companies are not able to justify spending money maintaining their libraries (as opposed to their products) in order too keep them fully compatible and up-to-date with new operating systems, standard libraries and other changing environmental factors. Many have turned to Java as a means of reducing some of this maintenance (Neighbors, 1998).

Another concept in use in industry is that of design patterns. Although not mainstream, it has become a core approach of many companies and given them tangible benefits (Seen, Taylor and Dick, 2000). The concept of design patterns is used to give developers a basic solution to common problems that their software may be trying to solve (Larman, 1998). The effect is such that "one person's pattern is another person's primitive building block" (Gamma, Helm, Johnson and Vlissides, 1995). In short, once a solution is found and recognised as a pattern, the next time this class of problem comes up the same solution can be reapplied rather than reinvented. The result is improved communication and knowledge sharing resulting in increased reusability, flexibility and compatibility (Seen et al., 2000).

The explicit concept of frameworks is also used. Two such example are the KOM and SCAF frameworks (Rakotonirainy et al., 2000; Weis and Geihs, 2000). KOM used a common framework to manage components in the K desktop environment. It allowed communication between these components, and also the use of one component by another. KOM was based on CORBA and as a result allowed communication and reuse of components regardless of the languages used (Weis and Geihs, 2000). The creation of KOM also introduced new design patterns, the "signals and actions pattern" (Weis and Geihs, 2000). SCAF (Simple Component Architecture Framework) was also based on CORBA and aimed to provide a simple general purpose component based framework (Rakotonirainy et al., 2000).

5.3 Application of a solution

If research software can be built in a maintainable manner, it may be possible to use an existing framework and add to it rather than building from scratch. The CDMS project taking place at Monash, which aims to provide data capture find output as well as user interface and scripting, is an attempt to build such a framework. The CDMS framework is in this way similar to the KOM and SCAF frameworks. The concept of reuse is, in CDMS, also being incorporated through the use of standard reusable model component. These CDMS models will incorporate the basic behaviours, while allowing new researchers to add their own features, such as new search methods.

The concept of patterns can also be adapted to academic research. Providing (perhaps publishing) patterns for such concepts as model finding using MML might make it easier for other researchers to implement the MML concept in their own work. Solving the framework problem could potentially also solve the knowledge transfer problem by effectively lowering the barriers against new or different techniques.

6 Guaranteeing Authenticity

The authenticity of research results issue

6.1 The CS problem

Testing proves that the software does what it is supposed to: that it meets its specification. The key to research software is that it faithfully implements the algorithms and concepts being tested.

There have in the past been cases where software intending to prove a thesis has been knowingly written using a method other than that purported in the related paper (Korb, 2002). Possibly more common is the case of simple bugs. The algorithm may be implemented incorrectly and as a result give a better or worse result than the actual. The outcome could be an incorrect analysis either suggesting against an improved method, or advocating a poor one.

By creating suitable documentation of how the software actually works, the author can more easily test their program for correctness and the scientific community can more easily check that the algorithms used are those stated in research papers.

7 Intermediate software

Dangers with the use of "intermediate software" that has not yet been stabilised.

7.1 The CS problem

As mentioned before, the proof of concept software is often not really ready to be used for real work. This aside, it is often the only software that will do the job. Further research leads to further work and slowly the system grows and becomes more and more unmanageable. As research funding isn't intended for re-implementing old systems, these systems are often only redesigned when new research can be incorporated. This enforces what Brooks calls the "second system effect"(Brooks, 1995) The second system effect is the desire to include all the frills left off the first version in the second. The end results is a different, larger system that is really still a first generation proof of concept system.

The EDRC workshop noted that the first generation of the software had to be re-written and in some cases many times before it was accepted by industry. They also noted that industry was aware of this and was as a result reluctant to use university produced software | even if it was free (Steier et al., 1993).

7.2 The SE solution

Brooks himself provided a list of things to do or avoid in order to prevent a second system effect. The list includes: avoiding the desire to now include all the "frills" left out in the first design, hiring a senior architect who has been involved in at least two large projects (and hence has already experience the problems particular to second systems), keeping in mind the aims of the original project explicitly, and generally being aware of the "special temptations" of second projects. Most of these he learnt himself the hard way (Brooks, 1995). The software development life cycles was first modelled as a waterfall by Royce in 1970 (Royce, 1970). This form of development encourages the developer to build the system incrementally and never to go back to an earlier stage (later adaptations such as the fountain model made allowances for feedback). This desire to finish the project before accepting change and to make sure all changes are included (so a third versions is not needed) is a large contributor to the second system effect experienced by Brooks.

Requirements in industry change constantly and often a developer is simply not able to decide to put off changes until later, or to leave out what the client considers useful features. The solution in industry was Boehm's Spiral Model which allowed for rapid improvements using an incremental model and prototyping while still retaining the systematic nature of development that had been gain through development of the waterfall model (Boehm, 1988; Pressman, 2001).

7.3 Application of a solution

The use of a life cycle model that allows, but controls, change may have significant advantages for research. One possible implementation would be a model with minor increments a number of times per project (i.e. student), and major ones over different projects. This has the advantage that feedback can be transferred between projects, and while the second system effect can be avoided (by moving some of the new ideas forward to the next major revision), the code being built is explicitly understood to be both a working version and a framework for further work (though not at the same time). As one does not know which future directions research may take, the assumption must always be that there is going to be another project.

8 Creative Problem

8.1 The CS problem

Computer science is about modelling problems in a format that a computer can work with in order to get a solution. The creative process of working out new modelling processes, that is new algorithms or abstract data types... this is the real challenge of computer science.

8.2 The SE solution

Software engineering is about patterns: enabling designers to use best practice to complete a task in a repeatable fashion (Larman, 1998). The actual process of creative invention is not something that Software Engineering deals with. In his famous paper "No Silver Bullet" (Brooks, 1986) Fred Brooks says that "the complexity of software is an essential property, not an accidental one." He goes on to explain this as meaning that software is pure "thought stuff" and to abstract away its complexity of concept is often the same as abstracting or wishing away the software itself (Brooks, 1986). Without its complexity, software is useless. Being a model already, all parts must be understood by at least one person in their full detail or they can not be coded. (Compare this to natural sciences where meaningful results can still be found if parts of the systems involved are unknown.)

The most valuable part, is often not the part that takes the most time. "With any creative activity comes dreary hours of tedious, painstaking labor, and programming is no exception" Brooks acknowledges (Brooks, 1995). The claim that research is pure inspiration and therefore the research process cannot be improved is one that we reject. The question is not if the process can be improved, but rather how and at what cost.

8.3 Application of a solution

Facilitating discussion and mutual understanding when that discussion takes place (both within research groups and within the field at large) would increase the amount of input researchers have to worth with. The advent of ARPANet and later its successor the internet has already shown this. Further solving the communication and knowledge transfer problems will again increase the knowledge people have to worth with. While this would provide fertile ground for ideas to grow in, it will not itself create anything.

Another potential indirect gain to the creative process may occur by reducing the hours of "painstaking labor" that Brooks referred to. Freeing up more time for leading researchers to come up with their ideas will effectively increase overall scientific output. It remains to be seen if the initial investment in recording the knowledge would eventually be recouped. Even if the system is documented exceptionally well, a certain amount of clarification will always be unavoidable and invaluable.

While side-effects of an improved process may remove many of the impediments to creativity, the initial creative process is something that can not be artificially improved at this time.

References above can be seen in the Bibliography.