Data and Models

This section examines data values and models to learn lessons for some generalised software being developed in the CSSE.

CSE454 2002 : This document is online at http://www.csse.monash.edu.au/~lloyd/tilde/CSC4/CSE454/ and contains hyper-links to other resources - Lloyd Allison ©.

<< [02] >>
Some Types of Values:

Type--|--Scalar--|--Discrete--|--Ints & subranges
      |          |            |
      |          |            |--Symbolic
      |          |
      |          |--Continuous & subranges
      |
      |--Structured i.e. multivariate
      |
      |--Vector
      |
      |--Union     i.e. either S1 or S2
      |
      |--Function  i.e. S1-->S2
      |
      |--Model...

<< [03] >>
Some distributions / models:

Model--|--Discrete----|--Uniform
       |              |
       |              |--Multistate etc.
       |
       |--Continuous--|--Uniform
       |              |
       |              |--Normal(m,s) etc.
       |
       |--Structured--|--Independent
       |              |
       |              |--Factors  etc.
       |
       |--Vector------|--set (independent)
                      |
                      |--series--|--Markov
                                 |
                                 etc.

A Model should be able to generate (sample) data, give (-log) probability of data value, ...

<< [04] >>

		\| \|parameters \| \| v
input space exogenous variables	----->	Model	----->	(output) Sample (Data) Space endogenous variables

Note, input space and/or parameter space may be trivial.

e.g. A decision tree T models blood-pressure as N(m,s) given age, sex and weight where m and s depend on age, sex and weight.

<< [05] >>

Mixture

Can form a mixture (weighted average) of models M₁, ..., M_n, given weights w₁, ..., w_n, where SUM_i w_i=1, provided that the types of the models are the same.

i.e. input spaces, parameter spaces, and data spaces are the same across the M_i.

<< [06] >>

Series

A model M with data space S trivially induces a model on S^* if the elements of the series are modelled as being independent.

There are more interesting models in S^*: A 1st-order Markov model can be thought of as |S| 0-order MM's, one for each "context".

(A 0-order model is just a multi-state distribution.)

<< [07] >>

Complex Models

People use the word "model" to cover anything from a simple probability distribution to "a model of the Australian economy" (MAE). At its most general the word is too general to program with although any instance, such as MAE, can be programmed from a collection of functions, data structures and simpler models.

Complex commonly used models,
e.g. (hidden) Markov models (HMM), probabilistic finite state automata (PFSA), mixture models, decision trees, phylogenetic (evolutionary) trees, causal networks, artificial neural networks,
can be built from a "library" of building blocks:
e.g. conditional probability tables, multi-state distributions, normal distribution,
possibly with some "discrete structure" - sequence, tree, graph (network).

Created with "vi (IRIX)", charset=iso-8859-1