Multidimensional Data Visualisation

Joseph Iscaro, 2003

Bachelor of Software Engineering (with Honours)

Contents


List of Figures

  1. Parallel Coordinate Plot.
  2. Star Plot of car statistics.
  3. Experiment 1 - Screen 1
  4. Experiment 1 - Screen 2
  5. Experiment 1 - Screen 3
  6. Experiment 2 - Screen 1
  7. Experiment 2 - Screen 2
  8. Experiment 2 - Screen 3
  9. Experiment 2 - Screen 4
  10. Experiment 2 - Screen 5


\begin{thesisabstract}Scientific research and statistical analysis are commonly ...
... visually differential attributes to a data sets
variables.
\end{thesisabstract}

Introduction

When data sets contain extremely large quantities of data, analysing the data in its raw form to discover relationships or trends can prove to be daunting if not impossible. Visualising the data as a model can help present relationships within the data set and make them more apparent.##Spe2001, ##Spe2001

Currently there are many useful and familiar techniques for visualising data with one (Histogram, Tukey Box Plot), two (Scatter Plot, Box Plots) and three (2 Dimensional Representations of 3 Dimensional Scatter Plots, Plots in 3 Dimensional Space) variables (univariate, bivariate and trivariate respectively).##Spe2001, ##Spe2001

When the data being analysed is of a higher order (multidimensional) limitations arise when the data becomes more and more complex. Existing techniques for visualising this higher order data do prove to be useful, however there is a mixed response to whether the a technique is limited to the domain of the data being analysed. Also, when large magnitudes of data are visualised, the visualisation tends to become messy and relationships within the data become unclear. As with any form of visual depiction, the message conveyed is generally open to interpretation.

Creating a visualisation technique is an experimental process as it cannot be proven that one technique is suitable for all types of data. In order for a technique to produce substantial results, consideration into the following must be taken:

For any visualisation technique, the data is passed through four basic stages during visualisation ##War2000, ##War2000:
  1. The collection and storage of data itself.
    This is the easiest of the phases as it is essentially performed independently of any visualisation technique.
  2. The pre-processing intended to transform the data into comprehensive form.
    Pre-processing the data allows the display algorithms to work at a satisfactorily rapid rate and reduces the need to re-search the data file again and again.
  3. The display hardware and software used to produce a visual representation.
    Ideally, high performance display hardware and graphics libraries would be used in any implementation of a visualisation technique. The increase in performance and low cost of computing equipment today shows a promising future for computer visualisations.
  4. The human interaction, querying and perception.
    The algorithms used to visualise the data need to address the limits of human perception and maintain these constraints while presenting as much information as possible in an understandable format. Also, human beings are impatient at best, meaning response time to a query is an important factor.
Each phase the data is subjected to influences the final perception and understanding of the data, and are as much an influence on each successive phase to the entire process. The major contributors to this influence are phases two (2), three (3) and four (4) (outlined above).

Literature Review

Human Visual Perception and Interaction

Visual Perception

The way in which we perceive or gain an understanding from a visualised data set is determined by the visual attributes we are able to differentiate. Assuming the user who is interacting with a visualisation isn't colour blind and has no problems with depth perception, the visualisation can include the following.

Texture and Colour

Colour and texture can be used to represent limited range variables as they stand out from one another and can draw attention quickly.

Texture

Texture has very high dimensionality as the only limit is how many different textures one can conceive. Using texture on a visual information display can increase dimensionality as textures are as rich and expressive as colours ##Spe2001, ##Spe2001. In addition to the properties of colours, texture can make use of orientation as an added dimension.

Colour

Colour is perhaps the most commonly used feature in most visualisation techniques so it is important to choose colours that that are distinguishable from one another. Healey ##HEA96, ##HEA96 developed a technique for selecting of colours and showed that the selection can be determined by controlling:

Size, Shape and Orientation

Size

Size is an easily distinguishable attribute, if used correctly. Differences in size will not reflect well if an object is spatially distant compared to a target, for example, a star may look small when compared to the moon from earth but if we were comparing physical volume this is not true.

Shape

Shapes are useful in distinguishing between types of elements if there is a limited range of values for the attribute it is representing, the same way colour can. Say for example a dataset for weather information may have a location as a common variable among records, a single shape can be used to represent this common variable. It is, however not very useful when mapping a variable that varies constantly over records.

Orientation

Orientation of an object has no visual polarity ##War2000, ##War2000 in that we cannot distinguish between which of the orientations has a higher value.

Animation

Animating a transition between targets in a visualisation that uses spatial representation can assist the user in producing a mental map of where in the data they are currently located ##Bed1999, ##Bed1999, rather than just jumping from target to target around the visualised data set.

Spatial Perception (Depth)

We perceive depth and positioning naturally in our world, but when we represent depth on a two dimensional display (such as a monitor), if it is not done correctly the representation can be misinterpreted as a size difference between two objects. We need motion or transparency to assist with this.

Transparency and Opacity

Blending or transparency can be distinguished from opaque graphical elements. We tend to focus on the more active or standout body, so an object blinking between transparency and non-transparent will stand out more. Transparency can be used to assist with depth perception in the case that a closer transparent target overlaps with a distant opaque target or vice versa. However Kersten states that combining transparency and motion can lead to a delay in depth perception, so one needs to decide on precedence of the two attributes ##kersten91apparent, ##kersten91apparent.

Interaction

In the real world, when we interact with something, we change the outcome of what would occur if we didn't. Computer visualisation is no different. So we need to know what level of interaction against automation is required.

Human interaction in any phase of the visualisation process will change the output of that phase and therefore influence the corresponding visualisation. Automatic mapping of attributes can reduce the interaction from a user on a visualisation ##Mat2001, ##Mat2001, but may not lead to the desired results. Matsushita and Kato ##Mat2001, ##Mat2001 propose a visualisation method that can extract parameters for phase input in the form of a natural language that is essentially a users profile or requirements for the visualisation, reducing the amount of interaction from a user.

Interaction by exploratory analysis of data sets allows users to focus on small amounts of interesting data within the large space. This differs drastically from a simple database query ##flinninteractive, ##flinninteractive in that the query from the user may be refined from a vague query to data that has a specific interest to the user.

Implementing a querying language can aid the user in interaction and bring the large sets of data into an understandable light. This, along with filtering and data selection techniques are generally required when dealing with larger data sets ##Oliv2003, ##Oliv2003.

Existing Multidimensional Visualisation Techniques

A visualisation technique can be categorised into the following: ##Keim2002, ##Keim2002

Geometrically Transformed Displays

Parallel Coordinate Plots

Figure 2.1: Parallel Coordinate Plot.

Parallel Coordinate Plots (Created by Alfred Inselberg) take all the axis of the multidimensional space of the data and arrange them in order and parallel to each other. Each record is then mapped to a point on the axis and then joined together to show the links between variables of each data set. The beauty of Parallel Coordinate Plots is that it can be expanded to use any number of variables, enhanced to use colour to distinguish between records, it handles both quantitative data and categorical data and all variables are treated equally ##Spe2001, ##Spe2001.

Iconic/Glyph Displays

Glyphs or iconic representation of data can be used to represent high dimensional data sets effectively.

Star Plots

Star plots are capable of displaying an arbitrary number of variables effectively in the form of star shaped figures, where each whisker or point represents a dimension in the data set ##Friendly1991, ##Friendly1991.

Figure 2.2: Star Plot of car statistics.

Glyph Displays on Scatterplots

Use of glyphs can increase the dimensionality of a scatterplot, for example generating glyphs of star plots and spatially positioning these glyphs in the scatterplot can achieve a higher dimensionality in the display.

Chernoff Faces

Chernoff faces is a glyph technique that draws each variable in the data on a visage of a human face. Representation is achieved by the diversity in the size, shape and separation of the features on the faces. This style was developed due to human beings abilities to distinguish different facial expressions with ease ##spears99overview, ##spears99overview.

Dense Pixel Displays

The basic concept of dense pixel displays is to map each variable of a record in the data to an individual window in the display ##keim96pixeloriented, ##keim96pixeloriented.

Stacked Displays

Stacked Displays, or Dimensional stacking, involves a recursive algorithm where two dimensions are used to plot a horizontal and vertical axis to create a grid on the display. This process is applied again recursively for each attribute in each element until all dimensions have been mapped ##wardcreating, ##wardcreating.

Data Reduction

Analysis of multidimensional data can have improved efficiency by reducing the dimensionality of the data with a minimal amount of data loss. Removing redundant variables and applying constraints to analysed variables can transform the data into a more compact set prior to visualisation ##demers93nonlinear, ##demers93nonlinear.

Multiple Views

Multiple views are an interesting approach for visualisation. Different viewports can be displayed on the one display allowing the user to gain multiple views of the visualisation and perhaps assess aspects of the data they may not have previously noticed. This i believe would also help in a large navigation situation where a transition may have occurred and the user has lost track of their spatial position, a multiple view situation will could have a map in one view showing the user where they are in the data. This technique could also allow multiple visualisation techniques to be displayed simultaneously.

Method

The Project

Two visualisation techniques are proposed in this thesis.

Mapping Visual Attributes

A 3 dimensional scatterplot is the best approach to attempt when mapping as many visual attributes to quantitative data, as 3 variables are taken care of with spatial placement of each records points. Also, it is a familiar technique used for statistical analysis. Some visual attributes are best suited by a limited range of input data so as to not confuse the user conducting the analysis. Falling into this category are shape, colour and texture. Mapping these attributes to variables with a limited range in the data set makes for a quick and easy distinction between records. Other quantitative attributes such as size, rotational velocity will be mapped to any variable. Using the blending or transparency attribute can assist in determining which objects are in front of others aiding with depth perception.

3D Star Plot

By adding depth to the standard implementation of a star plot, the number of variables that can be represented on a star plot can be increased greatly as more branches or arms can be added to a records representation thus allowing a surface to be drawn around the star. By rotating the scene a user can get a comparison from different angles and all the variables can be analysed on all sides. Limited range variables can also be represented as a grouping by mapping these values to a texture to be used as the surface.

Implementation

Using OpenGL libraries and C++, scenes were rendered in accordance to the methods described in the previous section using a data set obtained from ##statlib, ##statlib. The scenes rendered were built using an OpenGL base code using the glut library that included navigational input via the mouse obtained from Neon Helium Productions ##nehe, ##nehe, an OpenGL developers site that provides OpenGL tutorials and base code to create basic OpenGL applications with.

Results of Experiments

The two techniques I experimented with were input with the same data set ##statlib, ##statlib containing car information such as miles per gallon, cylinders, emphdisplacement, horsepower, weight and acceleration ( also two variables that can be used for grouping, year/make and origin.

During both experiments, I obtained input from others in order to conduct my discussions on the results of the experiments.

Experiment 1 - Experimenting with visual attributes.

This experiment was an attempt to produce results that allowed as many visual attributes on the same screen together without causing clashes. As discussed in the literature review chapter of this thesis, there is a limited amount of visual attributes that can be successfully mapped to a visualisation in conjunction with other visual attributes.

Results

Figure 4.1: Experiment 1 - Screen 1

In this instance, I've attempted to map seven of the eight dimensions to various different visual attributes 4.1. The only variable not mapped is that of displacement.

Figure 4.2: Experiment 1 - Screen 2

The resulting output, although drastically clustered, can however derive some form of analysis. In that, having mapped colour to a make of the model, one could determine that a majority of cars whose origins are 1 (square), and created in 1970 (red) have a smaller value for acceleration. The dense clustering of this result hinders any real discovery from within the data set. However scaling the positioning variables (X,Y,Z) so that each object is spaced farther away from each other produced results that placed each record in a large line that produced no real base for comparison among any other record. Instead, i changed the scale of the size produced for each shape (origin). This still resulted in some large clustering 4.2.

Figure 4.3: Experiment 1 - Screen 3

The ability to rotate the view of the screen, allows an analysis from each angle and achieve comparisons from all axis. 4.3 shows a view looking down (along y axis) at the 406 records giving a flat view of the data set, and shows that a majority of the cars in this data set are manufactured in origin 1 (cube).

Discussion

The attempt made to assign as many visual attributes as possible fell short of its intended purpose. I discovered, however that a user requires an explicit explanation on how to read a visualisation technique, as expected. A key must be provided so that the user can determine which variable maps to what value. Furthermore, if using 3 dimensional objects with transparency, a lighting source assists the user in distinguishing depth from size difference, as does adding a set of axis to guide the user along the planes of spatial positioning.

Experiment 2 - Experimenting with an Existing Technique

In this experiments, I made some changes to the method found in representing a multidimensional record as a star plot. The changes i made are as follows: The reason for this is that star plots become overwhelming when a data set contains records with excessive variables. The experiments conducted were hard coded in their representations but the data values were taken from the same data set used in experiment 1. Records 0 and 25 are shown in all figures.

Results

Figure 4.4: Experiment 2 - Screen 1

Figure 4.5: Experiment 2 - Screen 2

Figure 4.6: Experiment 2 - Screen 3

4.4, 4.5 and 4.6 all show comparisons between two records from the car data set (records 0 and 25). The reason for the different colours is due to when using a single colour for the shading, there was confusion as to which side belonged to which point. This issue was resolved by changing colours as you can clearly see where each vertex of a polygon starts, this still however is a little hard to read due to the blending issue as seen in 4.4 where polygons that should be laid deeper in the view of the world are seen on top of closer polygons.

Figure 4.7: Experiment 2 - Screen 4

Figure 4.8: Experiment 2 - Screen 5

By disabling the depth testing feature of OpenGL, shown in 4.7 and 4.8 there is less confusion over the polygons position in the world. In 4.8 i have removed a redundant variable in that of year/model as this isn't really an attribute worth comparing.

Discussion

4.7 and 4.8 shows by far the best results in this experiment as one can clearly see the surface surrounding the axis. the main purpose of this experiment was to see if a star plot could be expanded to allow for more dimensions and these results show it can, although some key features to this technique that allow the user to read it are missing from this implementation. Such as, no labels indicating which axis is related to which variable in the record. The reason this does not exist is due to me not being able to get text to display in those regions. I intended to label each axis with an id and relate each id to a variable name in a key of some sort.

Conclusion and Future Work

The research presented in this thesis describes computer visualisation techniques of quantitative data. Also, attempts were made to improve on one of these existing techniques and also create a technique that uses as many visually differential attributes as possible.

Review of Work

Much of the background research showed that the input domain of data visualisation is extremely large and is constantly growing. Also, no one technique suits all types of data, for example, when the dimensionality of a data set becomes increasingly large, some techniques handle it better than others, for example star plots can become extremely clustered as dimensionality increases.

The extension of an existing technique proved to be the most interesting of the experiments as it showed that a star plot can be remodelled to a 3 Dimensional star plot to handle data with more variables than that of an unmodified (2 Dimensional) star plot. The experiment of trying to apply as many visual attributes to a visualisation lacked in that the data set used in the experiment was not very dense, this limited the outcome as not as many variables as i would have liked could be mapped. Problems with underexposure to the OpenGL libraries and general 3 Dimensional graphics programming has limited the development somewhat. There is in fact, great room for improvement in both experiments.

Future Work

Firstly, the mapping of visual attributes technique could be greatly improved by adding a navigational feature using animation and motion. An animated rotational motion could be added to each object and by linking its rotational velocity to a variable this would increase the dimensionality of the technique.

As it was shown that the star plot can be successfully extended to a third dimension giving it the ability to plot a surface over the stars axis, the only necessary extension to this is to give it labelling on the axis, so the user could determine comparisons and relations with other records in a data set. Possibly, a texture may be added to each surface for grouping records using data over a limited range, for example the data set cars used in the experiments, one could map the year/model variable to the surface's texture allowing the user to determine which cars displayed were created in the same year.

Rather than trying to develop new techniques to handle data, researchers could look to existing techniques and attempt to increase the input domain an existing technique can handle.

Bibliography

-
Barrera, B. n.d..
Opengl basecode.
Accessed on 5/9/2003.
URL:http://nehe.gamedev.net/files/basecode/neheglglut.zip
-
Bederson, B.B.; Boltman, A. 1999.
Does animation help users build mental maps of spatial information?, Information Visualization, 1999. (Info Vis '99) Proceedings. 1999 IEEE Symposium on, Vol., Iss., pp. 28-35.
-
DeMers, D. and Cottrell, G. 1993.
Non-linear dimensionality reduction, in S. J. Hanson, J. D. Cowan and C. L. Giles (eds), Advances in Neural Information Processing Systems, Vol. 5, Morgan Kaufmann, San Mateo, CA, pp. 580-587.
Accessed on 30/7/2003.
URL:ftp://ftp.cs.ust.hk/pub/dyyeung/course/comp327+527/f96/nldr.ps.gz
-
Ferreira de Oliveira, M.C.; Levkowitz, H. 2003.
From visual data exploration to visual data mining: a survey, Visualization and Computer Graphics, IEEE Transactions on, Vol.9, Iss.3, July-Sept. pp. 378-394.
-
Flinn, S. n.d..
Interactive graphical displays for visual information browsing and exploratory search.
Accessed on 21/7/2003.
URL:http://www.cs.ubc.ca/spider/flinn/publications/wcgs95.ps.gz
-
Healey, C. G. 1996.
Choosing effective colours for data visualization, in R. Yagel and G. M. Nielson (eds), IEEE Visualization '96, pp. 263-270.
Accessed on 14/7/2003.
URL:http://www.csc.ncsu.edu/faculty/healey/download/viz.96.pdf
-
Keim, D. 2002.
Information visualization and visual data mining, Visualization and Computer Graphics, IEEE Transactions on, Vol.8, Iss.1, Jan/Mar pp. 1-8.
-
Keim, D. A. 1996.
Pixel-oriented visualization techniques for exploring very large databases, Journal of Computational and Graphical Statistics (March).
Accessed on 30/7/2003.
URL:http://www.dbs.informatik.uni-muenchen.de/dbs/projekt/papers/StatisticsPaper.ps
-
Kersten, D. and Bülthoff, H. 1991.
Apparent opacity affects perception of structure from motion, Technical Report AIM-1285.
Accessed on 29/7/2003.
URL:ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1285.pdf
-
Matsushita, M.; Kato, T. 2001.
Interactive visualization method for exploratory data analysis, Information Visualisation, 2001. Proceedings. Fifth International Conference on, Vol., Iss., pp. 671-676.
-
Michael Friendly, Y. U. 1991.
Statistical graphics for multivariate data.
Accessed on 29/7/2003.
URL:http://www.math.yorku.ca/SCS/sugi/sugi16-paper.html
-
of Mathematics, Y. U. D. and Statistics. n.d..
Gallery of data visualisation.
Accessed on 30/7/2003.
URL:http://www.math.yorku.ca/SCS/Gallery/bright-ideas.html
-
Peskin, P. R. L. n.d..
Parallel coordinates and application to analysis and control.
Accessed on 30/7/2003.
URL:http://www.caip.rutgers.edu/ peskin/epriRpt/ParallelCoords.html
-
Spears, W. M. 1999.
An overview of multidimensional visualization techniques, in T. D. Collins (ed.), Evolutionary Computation Visualization, Orlando, Florida, USA, pp. 104-105.
Accessed on 17/7/2003.
URL:http://iet.open.ac.uk/pp/t.d.collins/workshops/gecco-99/abstracts/bill.ps.gz
-
Spence, R. 2001.
Information Visualization, Pearson Education Limited, Harlow, England.
-
StatLib n.d..
Cars data set.
Accessed on 18/7/2003.
URL:http://lib.stat.cmu.edu
-
Ward, M. O. n.d..
Creating and manipulating n-dimensional brushes.
Accessed on 30/7/2003.
URL:http://elvis.wpi.edu/ matt/docs/asa97.ps
-
Ware, C. 2000.
Information Visualization : Perception For Design, Academic Press; Morgan Kaufmann Publishers, San Diego, USA; San Francisco, USA.

About this document ...

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.48)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html thesis -split 0

The translation was initiated by MR JOSEPH LINDSAY ISCARO on 2003-11-12


MR JOSEPH LINDSAY ISCARO 2003-11-12