Home
About This Project
Background Information
Testing
Results
Conclusion
References
Glossary of Terms
Downloads
Useful Links
Contact







Results



Iterative Mixture Modelling
Mixture Modelling with Snob


Iterative Mixture Modelling



For the Gaussian, Poisson and right-skewed Rayleigh distributions each image was segmented with one, two and three thresholds, and for the oppositely-skewed Rayleigh distributions each image was segmented with one threshold only. An analysis of the results was undertaken in four stages. An initial assessment of the results for the synthetic images was conducted in order to establish an expected level of quality, relative to the parameter estimation and threshold selection. A judgement as to the success or failure of the segmention results for the natural images was made in terms of how well each mixture model distribution was able to isolate the regions of interest clearly. The next stage was to visually assess the goodness of fit of the mixture models for each distribution. An examination of the information measures was then made in order to see any potential patterns emerging in the results, and so ascertain any correlations that may exist between the information measures and the success or failure of the resulting images. The discussion of the results is too long to put here, why don't you go and read Chapter 7 of my thesis.
Back to Top


Mixture Modelling with Snob



The images were preprocessed and Snob input files were created with each pixel value as a separate data point. The Gaussian distribution was used with precision of 1 to cater for the discrete nature of the data. The aim was to see if Snob could find a mixture model with the appropriate number of classes that would best model the histogram, and so find the most appropriate segmentation of the image. When Snob was run with each of the input files created, it appeared that Snob was overfitting the data somewhat. While the two component synthetic images syn1-g2.gif and syn2-g2.gif were classified correctly, Snob found four classes for the third synthetic image syn3-g2.gif with the most overlap of components. It was for the natural images, that Snob was finding a very high number of classes, however, the classes that were found were an exceptionally good approximation to the histogram's pdf. An example of this are the classes found for the image cloth.gif. From an inspection of the mixture model for the image cloth.gif together with its histogram, it was clear that the mixture model is, without a doubt, approximating the histogram's pdf exceptionally well (see Figure~7.8 in my thesis). Nonetheless, the number of classes found by Snob for each natural image was deemed too high to be suitable for the task of thresholding. Segmenting with too many components produces results that look very much like the original image, and so defeats the purpose of thresholding. It is interesting to note that for the first two synthetic images, the classification was a very good approximation to the original distribution (see Tables 7.8 you know where to go...).

In some instances where a very large number of components were found, the parameters for many of the classes seemed quite trivial, i.e., the means were very close to one another and standard deviations were very small. For these images it would seem that separate classes were being made to classify very small groups of data points. With the exception of two images, the number of classes found ranged from six to 52. For the other two images, Snob found 92 classes for house.gif, and 249 classes for tungsten.gif. The data in these two images has clearly been overfit.

It is at this point where I draw the reader's attention to the message lengths for the images house.gif, pellets.gif, and fingerprint.gif to highlight an unusual point. It can be seen that the values for these message lengths fall below the value calculated for the entropy of each of the images' histograms; only very slightly for the image fingerprint.gif, but by what would appear to be a significant amount for the images house.gif and pellets.gif. Entropy is the lower bound for lossless compression and so these message lengths would indicate that information has been lost during the classification calculated by Snob. In a two-part MML message, the receiver must be able to decode the message exactly, however, if information has been lost then decoding the message exactly would not be possible.

These discrepancies could be due to errors propagating through the calculations, however, the magnitude of the difference between the entropy and the message length for the images house.gif and pellets.gif does not suggest that this is the case. Both image input files were run again with Snob, and similar classifications were obtained. Without knowing the implementation details of Snob, no clear explanantion can be given for the discrepancy between these message lengths and the entropy values at this late stage of this project. But it should be noted that the images for which the extreme overfitting occurred all have a similar property in their histogram. On inspection of the histograms for the iamges house.gif, pellets.gif and tungsten.gif, it can be seen that each of these has an extended interval where the frequency of the grey-level values seems almost constant. This feature can also be seen in the histograms for the images pellets.gif and tungsten.gif, although the message length for tungsten.gif did not fall below the entropy. It would appear that, to approximate the pdf of the histogram for images with this property, Snob was trying to find a separate class for almost each grey-level value in this interval.

It was thought that Snob could possibly be overfitting the data due to its discrete nature. An image is actually a continuous function of two variables, $f(x,y)$, but is discretised for digital storage and manipulation. From a suggestion by Torsten Seemann, the discrete pixel values were modified by adding a randomly generated number from the interval [-0.5, 0.5) before writing the value to the input file, in order to emulate the continuous nature of the image. This idea was tested to isolate whether or not the problem was with classifying discrete values. An input file for the image matches.gif was generated using this approach, since the histogram displays properties that make it very well-suited to thresholding. Unforunately, a very large number of classes were found, even when tested at different levels of precision. Table 7.11 in my thesis details the level of precision used and the number of classes found.

It was decided to randomly sample the data points at different rates to see what classifications Snob would produce. Very surprising results were obtained. Sampling at a rate as low as 100 pixels, Snob gave some very good classifications, with parameters very similar to those found for the iterative mixture modelling method.

Back to Top


Go on to the Conclusion.


home | about | background | testing | results | conclusion | references | glossary | downloads | useful links | contact

LAST UPDATED: 4 March 2003