|
A type system probably cannot(?) do all of the following:
- Some variables have an origin, some do not, e.g.,
- position has an origin.
- length has a scale but does not have an origin.
- A length may be multiplied but a position may not(?) be multiplied.
- Two positions may not be added, but
- the difference between two positions is a length.
- Units, e.g.,
- centigrade v. farenheit,
- feet and inches v. m., cm, and mm.
- and dimensions, e.g.,
- length, mass, time, electric-current, temperature,
amount-of-substance (mole), luminous-intensity (candela)
- -- 7 base SI dimensions for physics.
- acceleration = length.time-2
- mass x acceleration = m.length.time-2 = force = momentum/time
Some
types, and
classes, of data.
atomic |
discrete |
categorical |
data T = C1 | C2 | ... | Cn deriving (Bounded, Enum)
- e.g. data Boolean = True | False
- e.g. data Gender = Male | Female
- e.g. data DNA = A | C | G | T
- e.g. data Party = Liberal | Labor | Democrat | Green | Indep
- NB. Something changes qualitatively for
a "large" number of categories, maybe even for 7+ or 10+.
- NB. Bounded and
Enum do not imply any
semantic (non-arbitrary) order on the values; see ordered below.
|
ordered |
as above plus deriving (..., Ord)
- e.g. data Quality = Bad | Poor | Avg | Fair | Good
- e.g. data Topography = Mountains | Foothills | Plain | Coastal
- See [missing persons].
|
hierarchic, partially ordered |
e.g. reptile | mammal( rodent | primate( chimp | gorilla))
- One method...
- data Animal = Reptile | Mammal (Maybe M)
- data M = Rodent | Primate (Maybe P)
- data P = Chimp | Gorilla
- Is a primate ~ Mammal (Primate Nothing),
=> is a mammal ~ Mammal Nothing ::Animal.
-
- Model by a suitable collection of multistate models.
- Or set-based, Primate = {Chimp, Gorilla} etc., c.f. DNA.
- Also see measurement accuracy, discrete.
|
integer |
|
posInt |
>0
similarly non-neg. >=0
|
periodic |
e.g. day of the week, month.
|
continuous |
Real |
Float, Double
- e.g. voltage, position (1D), velocity (1D)
|
(Complex) ?structured? |
(rl,im) or (r,θ) see vector
|
positive |
e.g. mass, length, speed |
periodic |
e.g. angle |
composite |
multivariate |
tuple: (T1, T2, ..., Tm)
|
or constructor:
data Person = Person String Int
or: data Person = Person{name::String, age ::Int}
|
or array (homogeneous)
|
or list, [t], (homogeneous) |
vector |
array (homogeneous)
- e.g. m-Dim. position, velocity, force, etc.
|
sequence |
list: [t]
--list of t
- e.g. DNA seq.,
annual weather data,
daily stock exchange data,
visits to doctor, etc.
- Element type can be multivariate.
|
set |
- list of members,
e.g. [set of mutations],
- or vector (bit map),
- (equiv. in principle but not necc. in practice, esp. sparse sets).
|
structured |
the sky is the limit, new data-types |
inapplicable |
usually structured data, e.g.
data Person = Male | Female Int
--#pregnancies!
|
optional |
Maybe t,
but different symantics from missing data (below)!
Also Either t1 t2 -- standard H98.
- Really a special kind of structured data.
- Model as discrete plus a suitable model for t.
- Whether an optional t was in fact present or not could be missing, and if
it was known to be present then the value itself could be missing or not!
|
properties |
data measurement accuracy |
continuous |
- (a) fixed, ±δ, or
- (b) relative, ±x%, or
- (c) arbitrary, range (lo,hi), per datum.
- NB. omitting to deal with accuracy in a data trans.
can affect inferences; safer to inverse transform model.
|
discrete (sometimes) |
- e.g. DNA
- H={A,C,T}, ..., R={A,G}, Y={C,T}, K={G,T},
N={A,C,G,T} ~missing?!
- (A 4-bit "set" rep. works nicely for many purposes.)
- Also see hierarchic, partially ordered, above.
|
missing data |
Maybe t = Nothing | Just t
--H98 standard type
- There was a value but it was either not measured or not recorded.
- (a) Missingness is common knowledge; need not be coded at all.
- (b) Missingness is of known prob.; can code using a fixed given prob..
- (c) Missingness is to be estimated once, globally,
for use in all sub-models.
- (d) Missingness is to be estimated per sub-model, and so
may influence global model structure.
- See [modelMaybe].
|
censored data |
either: data Cnsrd t = Cnsrd | Normal t or
transform the model.
Related to missing, and optional, but with different semantics.
- E.g. A "sticky" voltmeter measures [0.0 .. 1.0]v as 0.0v.
- A reasonable, although not perfect, way to model censored data
is similar to what can be done for missing data,
cases (c) or (d), above.
(As in ecological segmentation '05.)
|
weighted data |
- (i) integral: compacting repetitive values,
- (ii) fractional: part membership of a class in
a [mixture model].
|
25/5/2006, LA.
|
|