|
Honours Project
|
A somewhat different strategy than that of static decision trees was also looked at over the course of this project. It was of interest to see whether algorithms that attempted to learn using the decisions they had already made would perform to a standard comparable to those examined so far here.
Toward this end the algorithms that performed the decision tree construction and replacement based on the decision trees were designed so that they would have the option of writing the replacements they make back into the database they use as a reference to build their trees.
In order to make use of the replacements already made when considering the next replacement the algorithms need to rebuild the decision tree they wish to consult. Frequently a single replacement will have no overall impact on the decision trees created, however for each replacement chosen, the odds of a shift in the way the decision tree forms will increase, as will the potential change in the distribution of replacement values consulted at a path's end.
To be sure the trees are as accurate as possible, they can be reconstructed after every decision is made. Unfortunately this requires a large amount of computation per replacement and as mentioned earlier frequently to no benefit as the trees may well be virtually identical. This amount of computation per replacement slows down the algorithms significantly, and one naturally wants to avoid this.
One solution, that of making an arbitrary decision regarding number of replacements before a tree is declared stale and reconstructed, is simple enough but perhaps inappropriate. It will speed up the algorithm's progress, but at the cost of hiding that algorithm's decisions from it when we want it to be learning from those decisions.
Of the algorithms examined during the course of this project, only two of the four were thought to have any gains to be made by learning from their previous replacements. These two algorithms were Decision Tree algorithm A and Decision Tree algorithm B.
The Decision Tree algorithm C wasn't employed in this fashion because incorporating its replacements into the database its decision trees are generated from wasn't considered an asset. Replacing missing values in the database will steadily reduce the number of entries that lend weight to paths that include missing values, as the number of entries that follow these paths diminish the distribution at the end of the path becomes more suspect. This being the case towards the end of the algorithm's task of replacing values in the database, its decisions may become foolish indeed.
The Bayesian algorithm wasn't chosen for tests involving learning from its choices mainly because of its simplicity. In the overwhelming majority of cases the decisions will simply reinforce the pattern it has already decided to follow. The effort expended to learn from its decisions will be wasted.
This reinforcement of pattern can also be the case for Decision Tree algorithms A and B. When presented with a situation where an entry has a single missing value, the replacement is chosen according to the pattern detected. Writing the value in and reconstructing the decision tree reinforces that pattern. However that is not the only affect the replaced value has.
When presented with a situation where an entry has multiple missing values, Decision Tree algorithm A will be called to select popular values to temporarily stand in for some of the missing values as explained in the algorithm description. By writing in replacements these popular values may change as more replacements are made, instead of remaining static as they do when the algorithm does not attempt to learn. This in turn causes entries to be furnished with replacements that are perhaps different from the ones the static version of the algorithm would select.
A similar situation arises with the Decision Tree algorithm B. Entries with a single missing value will tend to reinforce the pattern that led the algorithm to a particular conclusion, but will also affect the smaller decision trees that are constructed. These different smaller decision trees can give rise to replacements that the static version of the algorithm would not have chosen.
Since it is the entries that have multiple missing values that stand the greatest chance of producing replacements that deviate from the norm, one must treat these entries with care. An entry with multiple missing values has then replaced one at a time, until it is an entry with one missing value, and finally none. Because the present values of an entry determine the path chosen through the decision tree, the replacement values for a particular entry strongly influence the replacement values for any other missing values in that entry.
This means that potentially, the order in which missing values in a single entry are replaced can change the replacement values that are chosen. Naturally we want to select the most likely replacement values, so we must somehow determine which value to replace first. For this one would need a measure by which they can rank the confidence they have in the replacement value chosen for a particular field in the entry. This measure of confidence can then be revised after each replacement if necessary to be sure that the most likely replacement candidates are selected.
Since the decision trees produce a probability distribution, we can use this as a measure of confidence in our replacement. The higher the probability that a field has a certain replacement value, the more confident we are in replacing that field before any others. Unfortunately between constantly reconstructing the decision trees and running confidence estimates on replacement values in situations with multiple missing values, the algorithms were very slow. This was especially noticeable for Decision Tree Algorithm B. It was this that led to only a few tests being made with "learning" versions of the algorithms and too few to say definitively whether they were any better or worse than their static counterparts.