
A type system probably cannot(?) do all of the following:
 Some variables have an origin, some do not, e.g.,
 position has an origin.
 length has a scale but does not have an origin.
 A length may be multiplied but a position may not(?) be multiplied.
 Two positions may not be added, but
 the difference between two positions is a length.
 Units, e.g.,
 centigrade v. farenheit,
 feet and inches v. m., cm, and mm.
 and dimensions, e.g.,
 length, mass, time, electriccurrent, temperature,
amountofsubstance (mole), luminousintensity (candela)
  7 base SI dimensions for physics.
 acceleration = length.time^{2}
 mass x acceleration = m.length.time^{2} = force = momentum/time
Some
types, and
classes, of data.
atomic 
discrete 
categorical 
data T = C1  C2  ...  Cn deriving (Bounded, Enum)
 e.g. data Boolean = True  False
 e.g. data Gender = Male  Female
 e.g. data DNA = A  C  G  T
 e.g. data Party = Liberal  Labor  Democrat  Green  Indep
 NB. Something changes qualitatively for
a "large" number of categories, maybe even for 7+ or 10+.
 NB. Bounded and
Enum do not imply any
semantic (nonarbitrary) order on the values; see ordered below.

ordered 
as above plus deriving (..., Ord)
 e.g. data Quality = Bad  Poor  Avg  Fair  Good
 e.g. data Topography = Mountains  Foothills  Plain  Coastal
 See [missing persons].

hierarchic, partially ordered 
e.g. reptile  mammal( rodent  primate( chimp  gorilla))
 One method...
 data Animal = Reptile  Mammal (Maybe M)
 data M = Rodent  Primate (Maybe P)
 data P = Chimp  Gorilla
 Is a primate ~ Mammal (Primate Nothing),
=> is a mammal ~ Mammal Nothing ::Animal.

 Model by a suitable collection of multistate models.
 Or setbased, Primate = {Chimp, Gorilla} etc., c.f. DNA.
 Also see measurement accuracy, discrete.

integer 

posInt 
>0
similarly nonneg. >=0

periodic 
e.g. day of the week, month.

continuous 
Real 
Float, Double
 e.g. voltage, position (1D), velocity (1D)

(Complex) ?structured? 
(rl,im) or (r,θ) see vector

positive 
e.g. mass, length, speed 
periodic 
e.g. angle 
composite 
multivariate 
tuple: (T1, T2, ..., Tm)

or constructor:
data Person = Person String Int
or: data Person = Person{name::String, age ::Int}

or array (homogeneous)

or list, [t], (homogeneous) 
vector 
array (homogeneous)
 e.g. mDim. position, velocity, force, etc.

sequence 
list: [t]
list of t
 e.g. DNA seq.,
annual weather data,
daily stock exchange data,
visits to doctor, etc.
 Element type can be multivariate.

set 
 list of members,
e.g. [set of mutations],
 or vector (bit map),
 (equiv. in principle but not necc. in practice, esp. sparse sets).

structured 
the sky is the limit, new datatypes 
inapplicable 
usually structured data, e.g.
data Person = Male  Female Int
#pregnancies!

optional 
Maybe t,
but different symantics from missing data (below)!
Also Either t1 t2  standard H98.
 Really a special kind of structured data.
 Model as discrete plus a suitable model for t.
 Whether an optional t was in fact present or not could be missing, and if
it was known to be present then the value itself could be missing or not!

properties 
data measurement accuracy 
continuous 
 (a) fixed, ±δ, or
 (b) relative, ±x%, or
 (c) arbitrary, range (lo,hi), per datum.
 NB. omitting to deal with accuracy in a data trans.
can affect inferences; safer to inverse transform model.

discrete (sometimes) 
 e.g. DNA
 H={A,C,T}, ..., R={A,G}, Y={C,T}, K={G,T},
N={A,C,G,T} ~missing?!
 (A 4bit "set" rep. works nicely for many purposes.)
 Also see hierarchic, partially ordered, above.

missing data 
Maybe t = Nothing  Just t
H98 standard type
 There was a value but it was either not measured or not recorded.
 (a) Missingness is common knowledge; need not be coded at all.
 (b) Missingness is of known prob.; can code using a fixed given prob..
 (c) Missingness is to be estimated once, globally,
for use in all submodels.
 (d) Missingness is to be estimated per submodel, and so
may influence global model structure.
 See [modelMaybe].

censored data 
either: data Cnsrd t = Cnsrd  Normal t or
transform the model.
Related to missing, and optional, but with different semantics.
 E.g. A "sticky" voltmeter measures [0.0 .. 1.0]v as 0.0v.
 A reasonable, although not perfect, way to model censored data
is similar to what can be done for missing data,
cases (c) or (d), above.
(As in ecological segmentation '05.)

weighted data 
 (i) integral: compacting repetitive values,
 (ii) fractional: part membership of a class in
a [mixture model].

25/5/2006, LA.

window on the wide world:

