Data and Models

This section examines data values and models to learn lessons for some generalised software being developed in the CSSE.

CSE454 2003 : This document is online at http://www.csse.monash.edu.au/~lloyd/tilde/CSC4/CSE454/ and contains hyper-links to other resources - Lloyd Allison ©.

<< [02] >>
Some Types of Values:

Type--|--Scalar--|--Discrete--|--Ints & subranges
      |          |            |
      |          |            |--Symbolic
      |          |
      |          |--Continuous & subranges
      |
      |--Structured  i.e. multivariate
      |
      |--Vector      N.B. homogenous
      |
      |--Union       i.e. either S1 or S2
      |
      |--Function    i.e. S1->S2
      |
      |--Model...

<< [03] >>
Some distributions / models:

Model--|--Discrete----|--Uniform
       |              |
       |              |--Multistate etc.
       |
       |--Continuous--|--Uniform
       |              |
       |              |--Normal(m,s) etc.
       |
       |--Structured--|--Independent
       |              |
       |              |--Factors  etc.
       |
       |--Vector------|--set (independent)
                      |
                      |--series--|--Markov
                                 |
                                 etc.

A Model should be able to give (-log) probability of data value, generate (sample) data, ...

<< [04] >>

		parameters \| \| \| v
input space exogenous variables	----->	"Model"	----->	(output) Sample (Data) Space endogenous variables

Note, input space and/or parameter space may be trivial.

e.g. A classification- (decision-) tree T models blood-pressure as N(m,s) given age, gender and weight where m and s depend on age, gender and weight.

<< [05] >>

Mixture

Can form a mixture (weighted average) of models M₁, ..., M_n, given weights w₁, ..., w_n, where SUM_i w_i=1, provided that the types of the models are the same.

i.e. input spaces, parameter spaces, and data spaces are the same across the M_i.

<< [06] >>

(Time-) Series

A model M with data space S trivially induces a model on S^* if the elements of the series are modelled as being independent.

There are more interesting models in S^*: A 1st-order Markov model can be thought of as |S| 0-order MM's, one for each "context".

(A 0-order Markov model is just a multi-state distribution.)

<< [07] >>

Complex Models

People use the word "model" to cover anything from a simple probability distribution to "a model of the Australian economy" (MAE). At its most general the word is too general to program with although any instance, such as MAE, can be programmed from a collection of functions, data structures and simpler models.

Complex, commonly used models,: e.g. (hidden) Markov models (HMM), probabilistic finite state automata (PFSA), mixture models, classification- (decision-) trees, phylogenetic (evolutionary) trees, causal networks, artificial neural networks (ANN),
can be built from a "library" of building blocks:: e.g. conditional probability tables, multi-state distributions, normal distribution,
possibly with some "discrete structure" - sequence, tree, graph (network).

See L.Allison, Types and Classes of Machine Learning and Data Mining, Twenty-Sixth Australasian Computer Science Conference (ACSC2003), pp207-215, 4-7 February 2003; Conferences in Research and Practice in Information Technology, Vol.16.

Created with "vi (IRIX)", charset=iso-8859-1