From timm@cse.unsw.edu.au Tue Jul 23 09:51:32 1996 Date: Tue, 23 Jul 1996 00:24:05 +1000 (EST) From: Tim Menzies To: Vasilios Tourloupis Subject: Re: Aims and Significance Vasilios Tourloupis writes: > > On Mon, 22 Jul 1996, Tim Menzies wrote: > > > > > if you can make a case that abduction is sueful > > then your thesis is useful > > else its useless. > > > > get it? > > > What do you mean by useful? Do you mean 1) "useful" as in abduction (HT4) > can be applied to many KL activities? Or do you mean 2) "useful" as in > abduction (HT4) being practical for the theory sizes that are seen in > practice? stress 1). mention 2), but less than 1). 1) is what will catch your reader. > BTW - What follows are the Smythe 89 results chapter, and the > Optimisations chapter of my thesis. Please comment as soon as > possible. > > Regards, > > Bill Tourloupis > +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ > | Vasilios E. Tourloupis Tel: (03) 9903 1470 (F5.25) | > | vasilios@insect.sd.monash.edu.au Tel/Fax: (03) 9546 7643 (AH) | > | vasilios@hestia.cc.monash.edu.au | > +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ > | Disclaimer: There are some that call me Bill! | > +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ > > \section{The Smythe '89 study} > The previous chapter we gave a detailed description of the HT4 algorithm. > In this chapter, we discuss the base results which will be used as a benchmark > to optimise the HT4 algorithm. We will demonstrate that although developing > and executing HT4 on a faster platform has improved its performance, developing > it in C/C++ has not significantly affected its performance. > > The Smythe '89 study was an attempt to repeat the results obtained > by Menzies \cite{MENZIES95}. The experiment was based on the Smythe '89 > model of glucose regulation \cite{SMYTHE89}. The HT4 algorithm executed > 870 comparisons in 12343 seconds (approximately 206 minutes) on a Macintosh > Powerbook 170 under System 7.0.1. On an average, 45\% of the outputs > were found to be explicable. This was an improvement over the original > Smythe '89 study, where the QMOD/JUSTIN algorithm took two days to execute > the model on a Macintosh SE (a machine that is four times slower than the > Powerbook used to develop and execute the HT4 algorithm). > > A Prolog implementation of the HT4 algorithm, on an Intel Pentium PC running > at 133MHz, further reduced the execution time of the algorithm to 755 seconds. > A bug fix brought the average percent explicable to 40.5\%. This is a > significant increase in efficiency over the Smalltalk implementation of the > algorithm. This result suggest that the efficiency of the algorithm is, > indeed, dependent on the platform it is developed and executed. > > The C/C++ implementation of the HT4 algorithm, developed and executed on an > Intel Pentium PC, running at 133MHz, has further reduced the execution time > of the algorithm to 184 seconds. Although this is a significant reduction, > over the Smalltalk version of HT4, this result is not as significant, compared > to the Prolog version. This is disappointing, since we anticipated a greater > order of magnitude reduction in execution time over the Prolog version. > This result does demonstrate, however, that the overall efficiency of the > HT4 algorithm is language independent. We believe that developing the > algorithm, using a more efficient language (for example, assembler), on > the same platform used to develop the C/C++ implementation, will not > significantly improve the performance of the algorithm. what is required is a more basic analysis about the computational complexity of the program. > ------------------------------------------------------------------------ > > \chapter{Optimisations} > The previous chapter we demonstrated that developing and implementing HT4 > on a faster platform has significantly improved its performance, while > developing it in C/C++ had not significantly affected its performance. > In this chapter we demonstrate that we are tantalizing on the limits to ---------------------------------------------^^^^^^^^^^^ ------------------------ what does this word mean in this context? > testing, by using the Smythe 89 results obtained in the previous chapter > as a benchmark to optimizing the HT4 algorithm. > > This chapter is divided into four sections. The first section discusses > code optimisations and results achieved by using the GNU C++ profiler, > textbf{gprof}, to optimise the source code. The second section discusses > algorithmic optimisations and results achieved by applying ``clever'' > optimisations. The third section discusses a change to the specifications, > which significantly increased the efficiency of the algorithm. The final > section discusses the results obtained. > > Note that the results of each program optimisation were generated on an Intel > Pentium computer running at 133MHz. Some results were not producable on the > Pentium, due to a lack of virual memory space. In such cases, the results were > produced on the Sparc workstation. One may argue that the results of these > experiments are incomplete, or inconclusive, as a result. Our argument is > that, except for the total execution time, the results produced are similar, > whichever platform HT4 is executed on\footnote{In fact, the Intel Pentium PC, > running at 133MHz, produced better runtimes than the Sparc workstation, by a > factor of 2}. Furthermore, measurements such as the number of assumptions, > the number of controversial assumptions, the percentage of reachable vertices, > etc., remain constant, whichever platform HT4 is executed on. > > > \section{Program Optimizations} > \subsection{Code Optimization} > The initial version of the HT4 program terminated with a segmentation fault, don't mention this intial version, yeah, there were start up problems. but a mention of these does nothing for the bsaic argument > due to a lack of virtual memory. Even worse was the fact that it had taken > well over 3 hours to execute about 89 percent of the Smythe 89 model. As we > will discuss later, the lack of virtual memory was due to a memory leak. As > for the execution time, the GNU C++ profiler revealed that this was mainly > caused by a function call to the C++ template function pow(), of the bitstring > class. Since we were taking powers of 2, we were able to replace the pow() > function with the left shift operator, so that pow(2,5) would be equivalent > to 1 shifted 5 bits to the left. Table ~\ref{TABLE1} shows that after using > the GNU C++ profiler to optimise HT4, we were able to reduce the total run > time of the program to 773 seconds. Bug fixes and minor modifications, > however, increased the total run time to 811 seconds. This is still a > significant increase in efficiency, compared to the initial version. > Level 3 compiler optimisation reduced the total runtime of this version > even further to about 568 seconds. > > The next significant increase in efficiency occurred by replacing template > lists with intrusive lists \cite{STROUSTRUP95}. Intrusive lists are lists > that contain pointers to specific types. Table ~\ref{TABLE1} shows that > by replacing template lists with intrusive lists, we were able to reduce > the total run time to 362 seconds. > > \subsection{Memory Usage} > As mentioned above, the initial implementation of the HT4 algorithm, failed > to terminate due to a lack of virtual memory. Later experiments revealed that > the program required about 112 megabytes of virtual memory for the program to > produce any useful results and terminate. This was primarily caused by > recursive functions failing to de-allocate redundant structures in memory, > after having completed their task. Table ~\ref{TABLE1} shows that after > initial memory optimisations, we were able to reduce the memory usage to > 31 megabytes. > > This memory optimisation also reduced the execution time by 10 seconds. > Although not significant, it led us to believe that less virtual memory > (swap disk space) the program used, the more efficient it could run. > Further memory and code optimisations, brought the memory usage down > to 25 megabytes, however, it did not significantly reduce the execution > time, as table ~\ref{TABLE2} illustrates. The fact that the program > was still using 25 megabytes of swap disk space, suggested that there > still existed a memory leak. > > > \section{Algorithmic Optimisations} > \subsection{Transitive Closure} > The transitive closure study was motivated by the fact that more time was being > spent in the forward sweep of proof generation. The idea behind this study was > to reduce the forward sweep by calculating the transitive closure of all the > vertices in the dependency graph. The program could then mark vertices as > relevant, in linear time \cite{SEDGEWICK}. The disadvantage of this method is > that it ignores \text{and} vertices, thus increasing the number of reachable > vertices. > > Table ~\ref{TABLE2} illustrates that although the transitive closure > implementation has reduced the forward sweep, it has increased the backward > sweep. Furthermore, it has increased the total run time, considerably. > Tests have confirmed that this increase in execution time is due to the > increase of reachable vertices, in the forward sweep. Where other versions > of HT4 are culling 21\% of the search space in the forward sweep, on average, > this version is culling 61\% of the search space in the forward sweep, on > average, thereby slowing down proof generation, considerably, in the backward > sweep (see table ~\ref{TABLE2}). > > Note that due to memory leaks, this version is also using a lot more virtual > memory than the normal C/C++ version. This is directly related to the increase > in the relative time spent in the backward sweep. For this, and other reasons, > we suspect that most of the memory leaks occur in the backward sweep of the > HT4 algorithm. tthis is all on transistive closure? MORE!! a diagram showing how we find in* before and after the transistive closure trick. note that the current forward sweep is and-aware while the transistive closure trick is not: therefore more in*. more "a" more "ac" more work by abckwards sweep. > > \subsection{Single World Assumption} > The single world assumption study was motivated by the fact that most of > the proofs being generated were consistent with each other, thus resulting > in single worlds being generated. The idea behind the single world > assumption implementation was to eliminate the sorting of proofs into an, > otherwise, single world by forcing the generation of consistent proofs. > This was done by isolating the controversial assumptions, and environment > of vertices which have lead to successful proof generation. That way, > only proofs which use the same controversial assumptions, and environment > can be generated. this is all on single world? no no no. show the psueduo code changes (i.e. very small). expand the above paragraph with lottsa diagrams showing how contraversial assumptions are handled without/without SWA. > Unlike the transitive closure study, this study has revealed a limited > success. Table ~\ref{TABLE2} illustrates that on an Intel Pentium PC, > running at 133 MHz, we have been able to reduce the total run time by > approximately 30 seconds. This is not a significant reduction in > performance, however, what is interesting is that the program produced > similar results to that of the Smythe 89 study. This result, and the similiar % explicables, not just "similiar results". that is, we can decrease the runtimes without compromising the logical compeltenss of the program > very nature of the single world assumption implementation failing to > generate inconsistent proofs, prompted a further study, whereby the > parent edges of vertices were randomized. The idea behind this study much more here on the randomisation studies > was to force proofs, with different environments, to be generated, > and the differences in results compared. Table ~\ref{TABLE3} illustrates > that the standard deviation of results, produced by the single world > assumption implementation, is minimal. > > > \section{Change to the Specifications} > Upon reviewing the pseudo-code, we realized that the function which marks > the controversial assumptions, in the forward sweep, could be implemented > more intuitively, the details of which are discussed in chapter ~\ref{PSEUDO}. ^^^^^^^^^^^^^^^^^^^???????????????^^^^^^^^^^^^^^^^^^^^^^ more details here or what isgoing in Psoedo. did u tell me about this stuff? > It came as no surprise that we were able to reduce the total run time of the > algorithm, from 180 seconds to 114 seconds, for the Smythe 89 model, on an > Intel Pentium PC running at 133 MHz (see table ~\ref{TABLE2}). This is just > over a third of the total run time! What came as a surprise was the fact that > this modification had reduced the forward sweep to about 3 percent of the total > execution time. So, where the transitive closure modification failed, this > modification succeeded! > > Further tests confirmed that about 98 percent of the forward sweep was spent > on marking controversial assumptions, in the normal C/C++ implementation, while > this modification was able to reduce this relative time to about 8 percent > (see table ~\ref{TABLE4}). This illustrates that most of the forward sweep > was spent on marking controversial assumptions. We suspect that the overhead > of the set union and set difference operations were the major cause of this > slow down. > > \section{Discussion of Results} > Developing the algorithm using a more efficient language, has genarally > improved the efficiency of the algorithm by a factor of 4, compared to the > Prolog version, running on the same platform. This is disappointing, since > we were anticipating a much greater order of magnitude increase in efficiency. > This result does demonstrate, however, that there is not a significant > difference between the languages used to implement HT4. We believe that > developing the HT4 algorithm, on the same platform, using a more efficient > language, such as an assembler, will not significantly increase the efficiency > of the algorithm. > > The fact that most of the execution time was being spent in the forward sweep, > would explain the limited success of the single world assumption implementation, > which was an attempt to improve the efficiency of the algorithm by optimising > the backward sweep. This would also explain the greater success of changing > the specifications, which improved the general efficiency of the algorithm, > by optimising the forward sweep. > > We have demonstrated that optimising the HT4 algorithm for efficiency, has > generally improved its performance, with the exception of the transitive > closure trick. We believe, however, that it is unlikely that algorithmic > optimisations will ever significantly increase the efficiency of HT4. > This is because abductive inference, which HT4 uses to validate models, > is intractible in nature \cite{SELMAN}. And although the above results > do not stress a limit to testing, we believe that we have tantalizingly ---------------------------------------------------------------^^^^^^^^^^^^^^^ > increased the limits to testing, by increasing the efficiency of the > program. > > > -- Dr. Tim Menzies rm EE339 | timm@cse.unsw.edu.au | "But what... is it good for?" ,-_|\ www.cse.unsw.edu.au/~timm | -- Engineer at the Advanced / \ AI Dept, Com. Sci. & Eng. | Computing Systems Division \_,-._* Uni.NSW, PO Box 1, Kensington| of IBM, 1968, commenting v Sydney, NSW, Australia, 2033 | on the microchip. +61-2-385-4034(p)385-5995(f) |