Multidimensional Data Visualisation
Joseph Iscaro, 2003
Bachelor of Software Engineering (with Honours)
- Parallel Coordinate Plot.
- Star Plot of car statistics.
- Experiment 1 - Screen 1
- Experiment 1 - Screen 2
- Experiment 1 - Screen 3
- Experiment 2 - Screen 1
- Experiment 2 - Screen 2
- Experiment 2 - Screen 3
- Experiment 2 - Screen 4
- Experiment 2 - Screen 5
When data sets contain extremely large quantities of data, analysing the data in its raw form to discover
relationships or trends can prove to be daunting if not impossible. Visualising the data as a model can help
present relationships within the data set and make them more apparent.##Spe2001, ##Spe2001
Currently there are many useful and familiar techniques for visualising data with one (Histogram, Tukey Box Plot),
two (Scatter Plot, Box Plots) and three (2 Dimensional Representations of 3 Dimensional Scatter
Plots, Plots in 3 Dimensional Space) variables (univariate, bivariate and trivariate respectively).##Spe2001, ##Spe2001
When the data being analysed is of a higher order (multidimensional) limitations arise when the data
becomes more and more complex. Existing techniques for visualising this higher
order data do prove to be useful, however there is a mixed response to
whether the a technique is limited to the domain of the data being analysed.
Also, when large magnitudes of data are visualised, the visualisation tends to
become messy and relationships within the data become unclear. As with any form
of visual depiction, the message conveyed is generally open to interpretation.
Creating a visualisation technique is an experimental process as it cannot be
proven that one technique is suitable for all types of data. In order for a
technique to produce substantial results, consideration into the following must
be taken:
- Human Visual Perception and Interaction
How do we see things, and what attributes are we readily able to differentiate?
- Existing Multidimensional Visualisation Techniques
How do existing techniques fare, and can they be integrated to produce better
techniques?
- Mapping Dimensions
Which data attributes are best mapped to which visual attributes?
For any visualisation technique, the data is passed through four basic stages
during visualisation ##War2000, ##War2000:
- The collection and storage of data itself.
This is the easiest of the phases as it is essentially performed
independently of any visualisation technique.
- The pre-processing intended to transform the data into comprehensive form.
Pre-processing the data allows the display algorithms to work at a satisfactorily
rapid rate and reduces the need to re-search the data file again and again.
- The display hardware and software used to produce a visual representation.
Ideally, high performance display hardware and graphics libraries would be used
in any implementation of a visualisation technique. The increase in performance
and low cost of computing equipment today shows a promising future for computer
visualisations.
- The human interaction, querying and perception.
The algorithms used to visualise the data need to address the limits of human
perception and maintain these constraints while presenting as much information
as possible in an understandable format. Also, human beings are impatient at
best, meaning response time to a query is an important factor.
Each phase the data is subjected to influences the final perception and
understanding of the data, and are as much an influence on each successive phase
to the entire process. The major contributors to this influence are phases two
(2), three (3) and four (4) (outlined above).
The way in which we perceive or gain an understanding from a visualised data set
is determined by the visual attributes we are able to differentiate. Assuming
the user who is interacting with a visualisation isn't colour blind and has no
problems with depth perception, the visualisation can include the following.
Colour and texture can be used to represent limited range variables as they stand
out from one another and can draw attention quickly.
Texture
Texture has very high dimensionality as the only limit is how many different
textures one can conceive. Using texture on a visual information display can
increase dimensionality as textures are as rich and expressive as colours
##Spe2001, ##Spe2001. In addition to the properties of colours, texture can make use
of orientation as an added dimension.
Colour
Colour is perhaps the most commonly used feature in most visualisation techniques
so it is important to choose colours that that are distinguishable from one another.
Healey ##HEA96, ##HEA96 developed a technique for selecting of colours and showed
that the selection can be determined by controlling:
- Colour Distance
The Euclidean distance between different colours (measured in a perceptually
balanced colour model).
- Linear Separation
The ability to linearly separate targets from non-targets.
- Colour Category
The named regions of colours occupied both the target and non-target elements.
Size
Size is an easily distinguishable attribute, if used correctly. Differences in
size will not reflect well if an object is spatially distant compared to
a target, for example, a star may look small when compared to the moon from
earth but if we were comparing physical volume this is not true.
Shape
Shapes are useful in distinguishing between types of elements if there is a
limited range of values for the attribute it is representing, the same way colour
can. Say for example a dataset for weather information may have a location as a
common variable among records, a single shape can be used to represent this
common variable. It is, however not very useful when mapping a variable that
varies constantly over records.
Orientation
Orientation of an object has no visual polarity ##War2000, ##War2000 in that we cannot
distinguish between which of the orientations has a higher value.
Animating a transition between targets in a visualisation that uses spatial
representation can assist the user in producing a mental map of where in the
data they are currently located ##Bed1999, ##Bed1999, rather than just jumping from
target to target around the visualised data set.
We perceive depth and positioning naturally in our world, but when we represent
depth on a two dimensional display (such as a monitor), if it is not done
correctly the representation can be misinterpreted as a size difference between
two objects. We need motion or transparency to assist with this.
Blending or transparency can be distinguished from opaque graphical
elements. We tend to focus on the more active or standout body, so an object
blinking between transparency and non-transparent will stand out more.
Transparency can be used to assist with depth perception in the case that a
closer transparent target overlaps with a distant opaque target or vice versa.
However Kersten states that combining transparency and motion can lead to
a delay in depth perception, so one needs to decide on precedence of the two
attributes ##kersten91apparent, ##kersten91apparent.
In the real world, when we interact with something, we change the outcome of
what would occur if we didn't. Computer visualisation is no different. So we need
to know what level of interaction against automation is required.
Human interaction in any phase of the visualisation process will change the
output of that phase and therefore influence the corresponding visualisation.
Automatic mapping of attributes can reduce the interaction from a user on a
visualisation ##Mat2001, ##Mat2001, but may not lead to the desired results.
Matsushita and Kato ##Mat2001, ##Mat2001 propose a visualisation method that can
extract parameters for phase input in the form of a natural language that is
essentially a users profile or requirements for the visualisation, reducing the
amount of interaction from a user.
Interaction by exploratory analysis of data sets allows users to focus on small
amounts of interesting data within the large space. This differs
drastically from a simple database query ##flinninteractive, ##flinninteractive in that the
query from the user may be refined from a vague query to data that has a
specific interest to the user.
Implementing a querying language can aid the user in interaction and bring the
large sets of data into an understandable light. This, along with filtering and
data selection techniques are generally required when dealing with larger data
sets ##Oliv2003, ##Oliv2003.
A visualisation technique can be categorised into the following:
- Standard 2D or 3D Display
- Geometric Transformed Displays
- Iconic/Glyph Displays
- Dense Pixel Displays
- Stacked Displays
##Keim2002, ##Keim2002
Figure 2.1:
Parallel Coordinate Plot.
|
Parallel Coordinate Plots (Created by Alfred Inselberg) take all the axis of
the multidimensional space of the data and arrange them in order and parallel
to each other. Each record is then mapped to a point on the axis and then joined
together to show the links between variables of each data set. The beauty of
Parallel Coordinate Plots is that it can be expanded to use any number of
variables, enhanced to use colour to distinguish between records, it handles
both quantitative data and categorical data and all variables are treated
equally ##Spe2001, ##Spe2001.
Glyphs or iconic representation of data can be used to represent high
dimensional data sets effectively.
Star plots are capable of displaying an arbitrary number of variables
effectively in the form of star shaped figures, where each whisker or
point represents a dimension in the data set ##Friendly1991, ##Friendly1991.
Figure 2.2:
Star Plot of car statistics.
|
Use of glyphs can increase the dimensionality of a scatterplot, for example
generating glyphs of star plots and spatially positioning these glyphs in the
scatterplot can achieve a higher dimensionality in the display.
Chernoff faces is a glyph technique that draws each variable in the data on a
visage of a human face. Representation is achieved by the diversity in the size,
shape and separation of the features on the faces. This style was developed due
to human beings abilities to distinguish different facial expressions with ease
##spears99overview, ##spears99overview.
The basic concept of dense pixel displays is to map each variable of a record in
the data to an individual window in the display ##keim96pixeloriented, ##keim96pixeloriented.
Stacked Displays, or Dimensional stacking, involves a recursive algorithm where
two dimensions are used to plot a horizontal and vertical axis to
create a grid on the display. This process is applied again recursively for each
attribute in each element until all dimensions have been mapped
##wardcreating, ##wardcreating.
Analysis of multidimensional data can have improved efficiency by reducing the
dimensionality of the data with a minimal amount of data loss. Removing
redundant variables and applying constraints to analysed variables can transform
the data into a more compact set prior to visualisation ##demers93nonlinear, ##demers93nonlinear.
Multiple views are an interesting approach for visualisation. Different
viewports can be displayed on the one display allowing the user to gain
multiple views of the visualisation and perhaps assess aspects of the data they
may not have previously noticed. This i believe would also help in a large navigation
situation where a transition may have occurred and the user has lost track of
their spatial position, a multiple view situation will could have a map
in one view showing the user where they are in the data.
This technique could also allow multiple visualisation techniques to be displayed
simultaneously.
Two visualisation techniques are proposed in this thesis.
A 3 dimensional scatterplot is the best approach to attempt when mapping as many
visual attributes to quantitative data, as 3 variables are taken care of with
spatial placement of each records points. Also, it is a familiar technique used
for statistical analysis. Some visual attributes are best suited by a limited
range of input data so as to not confuse the user conducting the analysis.
Falling into this category are shape, colour and texture. Mapping these
attributes to variables with a limited range in the data set makes for a quick
and easy distinction between records. Other quantitative attributes such as
size, rotational velocity will be mapped to any variable. Using the
blending or transparency attribute can assist in determining which objects are
in front of others aiding with depth perception.
By adding depth to the standard implementation of a star plot, the number of
variables that can be represented on a star plot can be increased greatly as
more branches or arms can be added to a records representation thus allowing a
surface to be drawn around the star. By rotating the scene a user can get a
comparison from different angles and all the variables can be analysed on all
sides. Limited range variables can also be represented as a grouping by mapping
these values to a texture to be used as the surface.
Using OpenGL libraries and C++, scenes were rendered in accordance to the
methods described in the previous section using a data set obtained from
##statlib, ##statlib. The scenes rendered were built using an OpenGL base code using
the glut library that included navigational input via the mouse obtained from
Neon Helium Productions ##nehe, ##nehe, an OpenGL developers site that provides
OpenGL tutorials and base code to create basic OpenGL applications with.
The two techniques I experimented with were input with the same data set ##statlib, ##statlib
containing car information such as miles per gallon, cylinders, emphdisplacement, horsepower, weight and acceleration (
also two variables that can be used for grouping, year/make and
origin.
During both experiments, I obtained input from others in order to conduct my
discussions on the results of the experiments.
This experiment was an attempt to produce results that allowed as many visual
attributes on the same screen together without causing clashes. As discussed in
the literature review chapter of this thesis, there is a limited amount of
visual attributes that can be successfully mapped to a visualisation in
conjunction with other visual attributes.
Figure 4.1:
Experiment 1 - Screen 1
|
In this instance, I've attempted to map seven of the eight dimensions to
various different visual attributes 4.1.
- The weight of each car was mapped the X axis in the display.
- The horsepower of each car was mapped to the Y axis.
- The acceleration power of each car was mapped to the Z axis.
- The year or make of each car was mapped to a colour.
- Miles per gallon of each car was mapped to transparency. In cases where mpg
was N/A these objects were completely opaque.
- The number of cylinders boasted by each car was mapped to the size of the shape which
- was mapped to the origin of the car, in the data an id. In the case of this data set there are only
three origins. 1 Square, 2 Sphere and 3 Torus.
The only variable not mapped is that of displacement.
Figure 4.2:
Experiment 1 - Screen 2
|
The resulting output, although drastically clustered, can however derive some form
of analysis. In that, having mapped colour to a make of the model, one could
determine that a majority of cars whose origins are 1 (square), and created in
1970 (red) have a smaller value for acceleration.
The dense clustering of this result hinders any real discovery from within the
data set. However scaling the positioning variables (X,Y,Z) so that each object is
spaced farther away from each other produced results that placed each record in
a large line that produced no real base for comparison among any other record.
Instead, i changed the scale of the size produced for each shape (origin). This
still resulted in some large clustering 4.2.
Figure 4.3:
Experiment 1 - Screen 3
|
The ability to rotate the view of the screen, allows an analysis from each angle
and achieve comparisons from all axis. 4.3 shows a view looking down
(along y axis) at the 406 records giving a flat view of the data set, and shows
that a majority of the cars in this data set are manufactured in origin 1 (cube).
The attempt made to assign as many visual attributes as possible fell short of
its intended purpose. I discovered, however that a user requires an explicit
explanation on how to read a visualisation technique, as expected. A key must be
provided so that the user can determine which variable maps to what value.
Furthermore, if using 3 dimensional objects with transparency, a lighting source
assists the user in distinguishing depth from size difference, as does adding
a set of axis to guide the user along the planes of spatial positioning.
In this experiments, I made some changes to the method found in representing a
multidimensional record as a star plot. The changes i made are as follows:
- Add depth to the representation to allow for more variables to be branched off the plot.
- By rotating the scene, a user can view the surface plotted over the 3 dimensional star plot.
The reason for this is that star plots become overwhelming when a data set
contains records with excessive variables. The experiments conducted were
hard coded in their representations but the data values were taken from the same
data set used in experiment 1. Records 0 and 25 are shown in all figures.
Figure 4.4:
Experiment 2 - Screen 1
|
Figure 4.5:
Experiment 2 - Screen 2
|
Figure 4.6:
Experiment 2 - Screen 3
|
4.4, 4.5 and 4.6 all show comparisons
between two records from the car data set (records 0 and 25). The reason for the
different colours is due to when using a single colour for the shading, there was
confusion as to which side belonged to which point. This issue was resolved by
changing colours as you can clearly see where each vertex of a polygon starts,
this still however is a little hard to read due to the blending issue as seen in
4.4 where polygons that should be laid deeper in the view of the
world are seen on top of closer polygons.
Figure 4.7:
Experiment 2 - Screen 4
|
Figure 4.8:
Experiment 2 - Screen 5
|
By disabling the depth testing feature of OpenGL, shown in 4.7 and
4.8 there is less confusion over the polygons position in the
world. In 4.8 i have removed a redundant variable in that of
year/model as this isn't really an attribute worth comparing.
4.7 and 4.8 shows by far the best results in this
experiment as one can clearly see the surface surrounding the axis. the main
purpose of this experiment was
to see if a star plot could be expanded to allow for more dimensions and these
results show it can, although some key features to this technique that allow the
user to read it are missing from this implementation. Such as, no labels
indicating which axis is related to which variable in the record. The reason
this does not exist is due to me not being able to get text to display in those
regions. I intended to label each axis with an id and relate each id to a
variable name in a key of some sort.
The research presented in this thesis describes computer visualisation
techniques of quantitative data. Also, attempts were made to improve on one of
these existing techniques and also create a technique that uses as many
visually differential attributes as possible.
Much of the background research showed that the input domain of data
visualisation is extremely large and is constantly growing. Also, no one
technique suits all types of data, for example, when the dimensionality of a
data set becomes increasingly large, some techniques handle it better than others,
for example star plots can become extremely clustered as dimensionality
increases.
The extension of an existing technique proved to be the most interesting of the
experiments as it showed that a star plot can be remodelled to a 3 Dimensional
star plot to handle data with more variables than that of an unmodified (2
Dimensional) star plot. The experiment of trying to apply as many visual
attributes to a visualisation lacked in that the data set used in the experiment
was not very dense, this limited the outcome as not as many variables as i would
have liked could be mapped. Problems with underexposure to the OpenGL libraries
and general 3 Dimensional graphics programming has limited the development
somewhat. There is in fact, great room for improvement in both experiments.
Firstly, the mapping of visual attributes technique could be greatly improved by
adding a navigational feature using animation and motion. An animated rotational
motion could be added to each object and by linking its rotational velocity to a
variable this would increase the dimensionality of the technique.
As it was shown that the star plot can be successfully extended to a third
dimension giving it the ability to plot a surface over the stars axis, the only
necessary extension to this is to give it labelling on the axis, so the user
could determine comparisons and relations with other records in a data set.
Possibly, a texture may be added to each surface for grouping records using data
over a limited range, for example the data set cars used in the experiments,
one could map the year/model variable to the surface's texture allowing the
user to determine which cars displayed were created in the same year.
Rather than trying to develop new techniques to handle data, researchers could
look to existing techniques and attempt to increase the input domain an existing
technique can handle.
- -
-
Barrera, B. n.d..
Opengl basecode.
Accessed on 5/9/2003.
URL:http://nehe.gamedev.net/files/basecode/neheglglut.zip
- -
-
Bederson, B.B.; Boltman, A. 1999.
Does animation help users build mental maps of spatial information?,
Information Visualization, 1999. (Info Vis '99) Proceedings. 1999 IEEE
Symposium on, Vol., Iss., pp. 28-35.
- -
-
DeMers, D. and Cottrell, G. 1993.
Non-linear dimensionality reduction, in S. J. Hanson, J. D.
Cowan and C. L. Giles (eds), Advances in Neural Information
Processing Systems, Vol. 5, Morgan Kaufmann, San Mateo, CA, pp. 580-587.
Accessed on 30/7/2003.
URL:ftp://ftp.cs.ust.hk/pub/dyyeung/course/comp327+527/f96/nldr.ps.gz
- -
-
Ferreira de Oliveira, M.C.; Levkowitz, H.
2003.
From visual data exploration to visual data mining: a survey, Visualization and Computer Graphics, IEEE Transactions on, Vol.9, Iss.3,
July-Sept. pp. 378-394.
- -
-
Flinn, S. n.d..
Interactive graphical displays for visual information browsing and
exploratory search.
Accessed on 21/7/2003.
URL:http://www.cs.ubc.ca/spider/flinn/publications/wcgs95.ps.gz
- -
-
Healey, C. G. 1996.
Choosing effective colours for data visualization, in R. Yagel
and G. M. Nielson (eds), IEEE Visualization '96,
pp. 263-270.
Accessed on 14/7/2003.
URL:http://www.csc.ncsu.edu/faculty/healey/download/viz.96.pdf
- -
-
Keim, D. 2002.
Information visualization and visual data mining, Visualization
and Computer Graphics, IEEE Transactions on, Vol.8, Iss.1, Jan/Mar pp. 1-8.
- -
-
Keim, D. A. 1996.
Pixel-oriented visualization techniques for exploring very large
databases, Journal of Computational and Graphical Statistics (March).
Accessed on 30/7/2003.
URL:http://www.dbs.informatik.uni-muenchen.de/dbs/projekt/papers/StatisticsPaper.ps
- -
-
Kersten, D. and Bülthoff, H.
1991.
Apparent opacity affects perception of structure from motion, Technical Report AIM-1285.
Accessed on 29/7/2003.
URL:ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1285.pdf
- -
-
Matsushita, M.; Kato, T. 2001.
Interactive visualization method for exploratory data analysis, Information Visualisation, 2001. Proceedings. Fifth International Conference
on, Vol., Iss., pp. 671-676.
- -
-
Michael Friendly, Y. U. 1991.
Statistical graphics for multivariate data.
Accessed on 29/7/2003.
URL:http://www.math.yorku.ca/SCS/sugi/sugi16-paper.html
- -
-
of Mathematics, Y. U. D. and Statistics.
n.d..
Gallery of data visualisation.
Accessed on 30/7/2003.
URL:http://www.math.yorku.ca/SCS/Gallery/bright-ideas.html
- -
-
Peskin, P. R. L. n.d..
Parallel coordinates and application to analysis and control.
Accessed on 30/7/2003.
URL:http://www.caip.rutgers.edu/ peskin/epriRpt/ParallelCoords.html
- -
-
Spears, W. M. 1999.
An overview of multidimensional visualization techniques, in
T. D. Collins (ed.), Evolutionary Computation Visualization, Orlando,
Florida, USA, pp. 104-105.
Accessed on 17/7/2003.
URL:http://iet.open.ac.uk/pp/t.d.collins/workshops/gecco-99/abstracts/bill.ps.gz
- -
-
Spence, R. 2001.
Information Visualization, Pearson Education Limited, Harlow,
England.
- -
-
StatLib n.d..
Cars data set.
Accessed on 18/7/2003.
URL:http://lib.stat.cmu.edu
- -
-
Ward, M. O. n.d..
Creating and manipulating n-dimensional brushes.
Accessed on 30/7/2003.
URL:http://elvis.wpi.edu/ matt/docs/asa97.ps
- -
-
Ware, C. 2000.
Information Visualization : Perception For Design, Academic
Press; Morgan Kaufmann Publishers, San Diego, USA; San Francisco, USA.
This document was generated using the
LaTeX2HTML translator Version 2K.1beta (1.48)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html thesis -split 0
The translation was initiated by MR JOSEPH LINDSAY ISCARO on 2003-11-12
MR JOSEPH LINDSAY ISCARO
2003-11-12