Phase 3

 

Navigation:

 

Introduction

Background
- Knowledge Engineering Bayesian Networks  (KEBN)
- Ecological Risk Assessment (ERA)

Events
- Phase 1
- Phase 2
- Phase 3

Conclusions

Downloads

Links

References

This development phase was concerned with incorporating automated methods of model learning and evaluation from the available data, into the model development.  To assess the suitability of the data to these tasks, a data quality analysis was conducted, identifying missing variables and assessing the completeness of coverage of variable states.  Based on the results of this analysis, methods for incorporating the data information into the parameters and causal structure were identified and implemented.  In order to test the validity of the model predictive accuracy tests using the data were also performed.

Methods:

Automated learning of Model Components:

In order to incorporate improved parameter learning into the model, different support tools offered by Netica were investigated and tested for suitability.  The Netica environment supports a number of different methods for learning parameters from case data files, these were the Lauritzen & Spiegelhalter, EM and gradient descent methods.  Both the EM and gradient descent methods allowed the combination of data with expert knowledge already in the CPT’s, via the assigning of an experience weight.  All of these Netica methods were investigated to determine how the data could best be incorporated into the existing model CPT’s.

CaMML [5] a tool for automated learning of the underlying causal model of data was also applied.  In order to gain a better understanding of the data, and perhaps create useful feedback, the CaMML generated model was created for comparison with the expert generated model.

Predictive Accuracy:

To evaluate the model, its predictive accuracy on the case data was determined.  The predictive accuracy is determined by entering data case information, whilst withholding values, to see how often the model predicts the missing value/s correctly.  When combining this evaluation with models learnt from data it is standard to divide the data into two sets, one for learning and one for testing, so that the same data is not used twice.  This measure does not access how close, or distant, the prediction is from the correct value, however it is a good initial grade of the model.

Results:

Bhattacharyya Distance Support tool:

In order to compare models before and after automated training a support tool applying the Bhattacharyya distance measure was created.  The Bhattacharyya distance support tool provides a useful extension to the Netica programming environment.  Providing a useful and powerful interface to compare BNs with differing CPT’s.

Changes to Quantitative Component:

As a result of the investigations into the effects of incorporating the case data files using the EM method.  The variables with data case entries were given experience values of 1.  The variables with expert elicited CPTs, if included in training, were given experience values between 5 and 10. 

Predictive Accuracy

The poor quality of the data, having only 8 cases with a high future abundance meant that expectations of the value of predictive accuracy tests on this node were low.  The results reflected the poor knowledge of the fish communities in the catchment and suggested that the domain experts are biased toward giving pessimistic predictions.