David's Webpage : Honours


Home
Honours Project

Experiment and Strength of Association

The algorithms use decision trees to find patterns in the database entries they have available and use those patterns to best guess what a particular missing value originally was. To the their effectiveness at determining the correct pattern and choosing replacements based upon that pattern, the entries in our test databases will naturally need to be produced such that a pattern, whether evident or not, does exist. In order to produce database entries with detectable patterns, a set of Bayesian belief networks were created. These networks were then used to create the entries that populate the test databases.

The Bayesian belief networks were created so that a node in the belief network was equivalent to a field in the database. A complete set of the values of the nodes in the network was equivalent to a single entry in the database.

In order to test the effectiveness of the algorithms, the strength of the association between the pattern and missing value was varied from test to test. As the strength of association between pattern and value increases, the correct pattern should be easier to determine and the number of correct guesses made by the algorithms should increase. The strength of association should also act as a ceiling for the best it is possible for the algorithm to do. If a pattern is valid 70% of the time, then assuming the algorithm has identified and is using that pattern, its guesses wont be correct much more than 70% of the time. From the results garnered the algorithms can be evaluated on how well they identified the pattern by comparing the number of correct guesses they make.

Each of the Bayesian belief networks used to create the test entries has been created using a program "Netica", a program used for working with Bayesian belief networks. In order to vary the strength of association between tests multiple versions of each network were created, with varying probabilities (50%, 60%…100%) that the values produced by Netica would follow the pattern governing the network.

At a strength of association of 50%, there should be no detectable link between the pattern governing the network and the values present in the test database (because boolean values are used). However as the strength of association increases, to 60% and 70%, the pattern the test entries are taking should become more obvious and make a more reliable means of prediction. Beyond that strength of association level, the patterns start to become obvious even to someone just reading through the database entries.

The Bayesian Belief Networks use to create the databases:

Network A:

Network B:

Network C:

Back to top