|
|
Research proposal
School of Computer Science and Software Engineering
Monash University Bachelor of Computer Science with Honours (CSE4300), Clayton Campus Contents1 Introduction 2 Research Context 2.1 The problem of understanding research software (as a developer) 2.2 The difficulty in understanding other research 2.3 The difficulty in getting leading edge concepts understood 2.4 The design documentation generation gap 2.5 Loss of time re-inventing the wheel one wants to use 2.6 Loss of time re-inventing the clock in order to adjust the time 2.7 The authenticity of research results issue 2.8 The use of “intermediate software” 3 Research Plan and Methods 3.1 Research Methodologies 3.1.1 Literature Review 3.1.2 Survey1 - Northern Hemisphere 3.1.3 Survey2 - Southern Hemisphere 3.1.4 Interviews 3.1.5 Case Studies 3.1.6 Metrics and Evaluation of Results 3.2 Proposed thesis chapter headings 3.3 Timetable 3.4 Special facilities required 3.5 Potential difficulties 4 Relevance of Proposal 5 Bibliography 1 IntroductionBy 1968 the “Software Development Problem” was seen to be so severe that the NATO Science Committee sponsored a conference in Garmisch, Germany to discuss it[NR69]. The conference, titled “Software Engineering” was the first in this field. The following year saw another conference, this time in Rome. The Rome conference centered around the question of how to make the development of computer applications and programs more engineering-like. The idea of standardized parts that could be reused was raised.[RB70] Since these early times software engineering, first as field of computer science and now some contend as a field in its own right, has tried to turn the art of programming into an engineering discipline. To a large degree this has succeeded. While software engineering in industry has been applied to help solve the software crisis, and software engineering education in academia is producing ever greater numbers of software engineers, the academic research that forms the core of computer science and the basis of software engineering does not appear to be using software engineering principles[RR98, Pre]. While some research has looked at the difference between what students learn and what they apply[RR98, Hum98], no research has yet asked the same question of academic researchers in the field of computer science. This question needs to be answered both for the potential boon to the researchers, and for the secondary effect in the students who emulate them. Watts Humphrey of the Software Engineering Institute asks “Why don’t they practice what we preach?”[Hum98]. In the USE CSR project we asks the question “Why don’t we practice what we teach?” The research will look at the questions:
This research will investigate if the use of software engineering in research leads to improved research outcomes. If software engineering does lead to improved outcomes, why is it (in general) not being undertaken? While leading experts disagree on the effectiveness of software engineering in computer science research[Pre, Bro, SCS93], a silver bullet in the form of improved software engineering and particularly re-use may be passing the computer science research community by. 2 Research ContextAcademic software development suffers from the same problems as commercial development. This was pointed out quite clearly in a workshop at Carnegie Mellon University, a leading academic institution in Software Engineering and home of the Software Engineering Institute. The 1993 workshop[SCS93] held by the Engineering Design Research Center (EDRC) of Carnegie Mellon discussed the role of software in disseminating new engineering design methods. Those who attended included industrial affiliates as well as staff and students. The workshop looked at numerous case studies of software development and suggested that “the understanding and knowledge now exist, and the time is ripe, for adopting a more comprehensive approach to software development - even within a research setting - and for establishing a better infrastructure for software design, maintenance and reuse.”[SCS93] While very little work has been done to investigate the benefit of applying software engineering principles in computer science research at academic institutions, much work has been done on the application of this knowledge in industry. A number of problem areas have been found in the process of software developments (in industry) and over time the field of software engineering has produced solutions to many of these problems. While the development environment in a university research setting and a commercial software development company are vastly different, in many cases either the same problems exist (although on a different scale), or different problems exist but these problems can be solved using existing methods found in industry. The problems that will be discussed in this proposal include:
The problems, possible solutions from industry and the potential benefit from solving them are discussed below. 2.1 The problem of understanding research software (as a developer)Industry has for many years struggled with the complexity of large software systems. While academic research projects are typically small programs (by industry standards) they often include a large degree of complexity due to the nature of the research. A small research program can be as hard to understand (due to the concepts involved) as a large but fairly generic transaction processing system developed in industry.The solution to industry’s difficulties with large software projects was the creation of formalised system analysis and design languages and methods. This enabled professional developers to discuss design and development at a level above the code. It also allowed outside experts to come in and rapidly obtain an overall picture of the system.[FS97] In a computer science department at a university, research theory is often discussed at an abstracted theoretical level... this gives many of the benefits sought by industry though formal design languages. Often though, this theory can be quite complex, specialised and very abstract. A new researcher may have access to software that implements the theory and access to papers discussing the theory, but without access to design documents that show why the software is an implementation of the theory (and through the software how the theory is applied) the research may have to spend a long time reconciling the theory and application through experiment. 2.2 The difficulty in understanding other researchUnlike most commercial software, the aim of university research is to increase the amount of public knowledge. Just like the researcher who is new to the project and uses the software, research papers and design documents to bring themself up to speed faster ...so to can an outside researcher examine the research done and potentially gain a faster understanding if they have access to the softwares design documents.Making public design documents will make it easier for others to produce copies of the software or incorporate it into their own work. Unlike commercial work which aims to protect its software, this is often the very idea behind university research. A desire to share knowledge is the reason why algorithms are published. 2.3 The difficulty in getting leading edge concepts understoodNot only will other researchers be able to more easily follow the concepts, but those in industry who might make use of the research will be speaking the same design language. While many may not understand the technical implications of the research... they will still be able to implement the work from the documentation. It is not the first version of the software (the proof of concept version) but the second that is likely to be used. As Fred Brooks suggested “plan to throw one away; you will anyhow”[Bro95]. Allowing industry to build this second version will ensure more ideas make the transition from proven theory to incorporation in usable products. The EDRC workshop takes this further by suggesting that the characteristics of the development process, and specifically the adherence to software engineering standards have a large influence in enabling a knowledge transfer from the university setting to industry or other academic institutions.[SCS93] In other words not only can industry understand, but they are more likely to take the time to look. 2.4 The design documentation generation gapIn order for a project team to work well together, either every one must speak the same technical language, or the group must resort to avoiding technical language. In the case of a design language, avoiding the language is most easily achieved by avoiding design work altogether.In industry today many jobs require the applicant to have specific knowledge of design languages, for example UML. Companies often also provide ongoing education to their employees. One of the first companies to do this with software engineering knowledge was AT&T Bell Laboratories, Merrimack Valley. The company ran a once off 2 year course (on company time) for non computer science staff that gave staff a computer science and software engineering background. This ensured a consistent design knowledge across the group, and by allowing other employees to take certain units it enabled a standard level of knowledge across the company. The reason for the program was a vast increase in production of software systems and a desire to have staff who were both experienced with the old mechanical and electrical equipment paradigm and able to work reliably and professionally in a software environment.[CM87] Like AT&T Bell’s shift to software, so too has the IT industry (including the education sector of the industry) undergone a rapid change. In this case from functional to OO programming. Industry has been working to counter the “paradigm shift”[FS97] and to retrain staff, while in the university some have still not made the transition. 2.5 Loss of time re-inventing the wheel one wants to useSome sciences have for a long time used reuse in their computer work. The Numerical Recipes books are probably the earliest examples of this. The original book was created between 1981 and 1985 by a group of physicists then all working in a university environment. The inspiration for the book was to close the gap between best practice “as exemplified by the numerical analysis and mathematical software professional communities” [PT97] and average practice “as exemplified by most scientists and graduate students” [PT97] that the authors knew.In industry programming languages like Java are being pushed due to their vast array of libraries containing common methods. The Microsoft Foundation Class was similarly invented to decrease the amount of work needed to do common tasks. In the university setting there is still a tendency to do everything from scratch. Casual observation will show students learning how to code up basic components like linked lists, and then coding them again year after year from scratch. 2.6 Loss of time re-inventing the clock in order to adjust the timeIn software engineering journals there is talk of plug ins, filters and converters. [WG00, BLHLT00, dSM00] Software is designed to be extensible, it is documented so it can be modified.In contrast, a PhD student wanting to extend a piece of software invented as a proof of concept will often find that they have to build everything that came before them. Often the code is undocumented, close to unreadable (due to lack of commenting and sometimes as it was written in an old language) and critical to the new research. Many students have wasted large amounts of time rebuilding a system so they can add their small bit to it. Unfortunately this rebuilding is often also done with out any attempt at design, documentation, or improved understand ability of the old code. The student re-implementing it will complain of all the wasted time... yet the next PhD student who comes along will now have even more undocumented and poorly designed code to work through. 2.7 The authenticity of research results issueTesting proves that the software does what it is supposed to, that it meets its specification. The key to research software is that it faithfully implements the algorithms and concepts being tested.There have in the past been cases where software used to prove a thesis has knowingly implementing a method other than that purported in the thesis. Possibly more common is the case of simple bugs. The algorithm may be implemented incorrectly and as a result give a better or worse result than the actual. The outcome could be an incorrect analysis either suggesting against an improved method, or advocating a poor one. By creating suitable documentation of how the software actually works, the author can more easily test their program for correctness and the scientific community can more easily check the algorithms used are those stated in research papers. 2.8 The use of “intermediate software”As mentioned before, the proof of concept software is often not really ready to be used for real work. This aside, it is often the only software that will do the job. Further research leads to further work and slowly the system grows and becomes more and more unmanageable. As research funding isnt designed for re-implementing old systems these systems are often only redesigned when new research can be incorporated. This enforces what Brooks calls the “second system effect”[Bro95] The second system effect is the desire to include all the frills left off the first version in the second. The end results is a different, larger system that is really still a first generation proof of concept system.The EDRC workshop noted that the first generation of the software had to be re-written and in some cases many times before it was accepted by industry. They also noted that industry was a ware of this and was as a result reluctant to use university produced software... even if it was free. 3 Research Plan and Methods3.1 Research Methodologies3.1.1 Literature ReviewThis will include a review of methods currently used outside of a university research setting that may be of use to researchers. It will also be used as a first step to creating the surveys. The review will include a broad range of IT and software engineering sources as well as journals from other scientific fields. It has been suggest that fields outside of computer science have been coping with the same problems in their own research software and in some cases doing a better job (eg Numerical Recipes in the field of physics).3.1.2 Survey1 - Northern HemisphereThis will be the first of two surveys. The survey focuses both on the researcher and a project nominated by the researcher that they have worked on.The survey aims to find out:
On the project side the survey aims to discover:
3.1.3 Survey2 - Southern HemisphereThis will be based on the first survey but questions may alter slightly or be added based on what is learnt in the northern hemisphere survey. It is anticipated that the second survey will use a mostly Australian sample.3.1.4 InterviewsA number of in depth interviews will be carried out with supervisors, students and end users. These will aim to discover the problems that can arise when carrying out scientific research that relies on software as a prove of concept, or a means of analysing data. These interviews may include researchers outside the field of computer science, but who rely on proof of concept software to complete their own work. Cases where software engineering has had a positive impact will be sought out and discussed.Interviews will also be carried out with leading experts in the field of software engineering (be they in an academic or non academic environment) and their opinions sought. 3.1.5 Case StudiesA small number of case studies will be used. These will look at similar issues to the survey but will include analysis of the methods used and software produced based on accepted software engineering metrics (eg Program size, Composability,[BSPSS00] Coupling [PD00], Compatibility of design documents to programming language used[STD00] etc). The case studies will be used to show where software engineering methods have worked, which methods were used and what was outcomes were gained as a result. It will also look at where software engineering could have helped achieve better outcomes or in some cases would have helped avoid disaster. The methods that could have helped will be drawn from the results of the survey as well as reading of methods applied in industry.Software related research projects that have been suggested as case studies include Gift, CaMML, NAG, CDMS and Snob. One or more of these will be used as well as other research projects that have run to completion. It is expected that software engineering will increase the life time of projects due to improved knowledge retention, higher reusability of software components, increased use of reusable components (and well documented skeleton programs) and decreased complexity of the project through the above. It is also expected that improved software engineering practices will result in post graduate students spending less time recoding existing software and more time on new research, this should allow them more time to create larger contributions. It is hoped that this or its compliment (that a low level of software engineering can lead directly to poor out comes) can be found and presented. 3.1.6 Metrics and Evaluation of ResultsThe final results from the survey will look at outcomes, including: The number of papers published on the research, the number of modifications and addition projects to the main project (for example, summer vacation work, honours projects, PhD projects), the number of students who go on with further research related to the project after their initial project is finished, the completion rate of higher degree by research students involved in the project, the number of publications made by students involved in the project, as well as the perceived impact as noted in the survey and case studies.The results will support the thesis that software engineering can improve computer science outcomes if:
3.2 Proposed thesis chapter headings
3.3 Timetable
3.4 Special facilities requiredNo special facilities are required.3.5 Potential difficultiesA large sample size will be needed for the survey. As a precaution the survey has been divided in 2 and will cover both the USA and Australia rather then just Australia as originally planned. This will increase the sample size and allow correction of the survey should any problems become apparent in the first round.4 Relevance of ProposalExisting research looks at how software engineering methods of various types can improve commercial software development. Very little work has been done to investigate how software engineering can assist in a research environment. In this research we look at the impact software engineering can have in a university research environment in the field of computer science, and how this can impact on research outcomes. 5 BibliographyReferences[BLHLT00] B. Baudry, V. Le Hanh, and Y. Le Traon. Testing-for-trust: the genetic selection model applied to component qualification. Technology of Object-Oriented Languages (TOOLS), 33:108-119, 2000. [Bro] Frederick P Brooks, Jr. Re: Use of software engineering in computer science research. Email to Andre Oboler. [Bro95] Frederick P Brooks, Jr. The Mythical Man Month. Addison-Wesley, 1995. [BSPSS00] D. Beuche, W. Schroder-Preikschat, O. Spinczyk, and U. Spinczyk. Streamlining object-oriented software for deeply embedded applications. Technology of Object-Oriented Languages (TOOLS), 33:33-44, 2000. Most of today’s computer systems are embedded and a substantial amount of them are deeply embedded systems with very limited resources. Object-oriented software is rarely to be found in those systems. We discuss reasons why object orientation is seldom used in this area. We also propose a set of design and implementation techniques based on modern object-oriented methods, which reconcile the reusability and flexibility of object-oriented software with the required efficiency of deeply embedded systems. [CM87] J.C. Cleaqveland and R.W. MacDonald. The Computer Science Education Program at AT&T Bell Laboratories, Merrimack Valley. In Freeman P. Fairley R., editor, Issues in Software Engineering Education, pages 475-493. Springer-Verlag, 1987. The computer science education program was designed to produce software engineers at AT&T Bell Laboratories, Merrimack Valley. This provided an opportunity to those who wanted to change careers or become more formally trained in software engineering and computer science. The program helped ease the technology shift towards software and provided AT&T will skilled software engineers who were already familiar with the transmission system products. There was a deliberate tailoring of the program to meet the needs of this AT&T location. The paper provides an overview of the design, development and results fo the program, including the curriculum, administration, costs and rewards. Also see http://craigc.com/cs/resume.html The course includes a 5 day course on software reuse. [dSM00] G.E. da Silveira and S.L. Meira. Codelivery: an environment for distribution of reusable components. Technology of Object-Oriented Languages (TOOLS), 33:371-382, 2000. [FS97] Martin Fowler and Kendall Scott. UML Distilled. Addison-Wesley, 1997. This book is part of the Addison-Wesley Object Technology Series. [Hum98] Watts S. Humphrey. Why don’t they practice what we preach? Annals of Software Engineering, 1(4):201-222, 1998. [NR69] Peter Naur and Brian Randell, editors. Software Engineering: Report on a conference sponsored by the NATO SCIENCE COMMITTEE, Garmisch, Germany, 7th to 11th October 1968. Scientific Affairs Division, NATO, 1969. http://www.cs.ncl.ac.uk/people/brian.randell/home.formal/NATO/. [PD00] G. Poels and G. Dedene. Measures for object-event interactions. Technology of Object-Oriented Languages (TOOLS), 33:70-81, 2000. [Pre] Roger Pressman. Re: Se practice in comp. sci. research. Email to Andre Oboler. [PT97] William H. Press and Saul A Teukolsky. Numerical Recipes: does this paradigm have a future? Computers In Physics, 11(5):416-425, 1997. [RB70] B. Randell and J.N. Buxton, editors. Software Engineering Techniques: Report of a conference sponsored by the NATO Science Committee, Rome, Italy, 27-31 Oct. 1969. Scientific Affairs Division, NATO, 1970. http://www.cs.ncl.ac.uk/people/brian.randell/home.formal/NATO/. [RR98] Pierre N. Robillard and Martin P. Robillard. Improving academic software engineering projects: A comparative study of academic and industry projects. Annals of Software Engineering, 6:343-363, 1998. Paper copy only. In Caulfield Library at: 005.105 ANN. URL is abstract only. Need to get ! [SCS93] D. Steier, R. Coyne, and E. Subrahmanian. Software doesn't transfer, people do - (and other observations from an EDRC workshop on the role of software in disseminating new engineering methods). Technical report, Carnegie Mellon University, 1993. about the Software Engineering Institute, how to get software developed in university used. [STD00] M. Seen, P. Taylor, and M. Dick. Applying a crystal ball to design pattern adoption. Technology of Object-Oriented Languages (TOOLS), 33:443-454, 2000. [WG00] T. Weis and K. Geihs. Components on the desktop. Technology of Object-Oriented Languages (TOOLS), 33:250-261, 2000. |