Visualisation of Research Data

From Game Technology Lab

Jump to: navigation, search

Publishing your research is about communicating with the rest of the world what you have done. Understanding the audience for your publication is critical both for the words you use, and the images and figures you present. We will discuss some of the problems and solutions to errors in visual communication as part of research.

Contents

Data to Ink Ratio

The Data-Ink ratio is a concept introduced by Edward Tufte. In 1983 he published "The Visual Display of Quantitative Information", where his primary focus was: "Above all else show the data" Tufte, 1983

Visualisation tools were invented relatively recently. From Wikipedia "William Playfair invented four types of diagrams: in 1786 the line graph and bar chart of economic data, and in 1801 the pie chart and circle graph, used to show part-whole relations."

Animation and Modern Graphics

Hans Rosling has an excellent presentation on how to show information to people. Stats that reshape your worldview

You can use the visualisation tool to view other data.Gapminder

The core idea here is that by presenting information in a clear comparison where each aspect of the image contains useful information. Looking at the circles in his graphs, each aspect contains information

  • Size = population
  • Colour = Area of the world
  • Position = related to the two axes under comparison
  • Time = Changes based on temporal data

This creates a data rich display that is still easy to understand. Your objective when displaying your research data is to think about how cleanly you can show your data, and how each element can convey useful information.

Tools

There are lots of graphing tools available for you. The simplest and perhaps most ugly is Excel. The graphs generated by excel tend to look a bit cartoony and simple. If you are using MS Word then they fit well with the general document style, if you are using LaTex, then excel graphs look out of place.

Each of the following tools takes a bit of effort to learn. However if you put in some effort early in the process, the benefit at the end could be very large.

GNUplot

This is still one of the best open and free tools for working with data. The concept is that you create graphs using scripts. The script is run on the data and the graph is generated as the output of the script. The output can be raster graphics such as png, or vector to integrate with LaTeX, or even HTML5 or svg for web display.

Gnuplot does take some time to learn. To plot from a file called force.dat using the first column of numbers as X and second as the Size and the third as the heat. There are many good tutorials for gnuplot, but the best idea is to share the scripts that you generate with each other to build up knowledge of what to do

 plot  "force.dat" using 1:2 title 'Size', 
       "force.dat" using 1:3 title 'Heat'


SVG and Inkscape

If you are interested in turning numbers into visual symbols, then the easiest way to program this is probably Scalable Vector Graphics (SVG). SVG uses XML to define graphic elements. These include lines, objects, shading, and paths. A simple way to generate really nice visual displays of information is to use a program like Inkscape and then use tinyXML to load and edit the SVG file. This allows you to have your program create high quality graphics directly from the data. SVG can be shown in most browsers, and if you are really keen you can animate svgs, and use mouse events etc.

Google Charts

If you are working on the web, or are interested in regenerating graphs then using online tools such as Google Charts can be very useful for displaying information from accessible data sources.

D3

This is a graphing tool that can be used for making data driven design for web or svg. The concept for this is that you use web standards and generate an SVG or Canvas document directly from your data.

Examples

Having marked a large number of Theses and projects there both common mistakes, and critical errors that are made by students.

Some recent examples.

Euler Diagrams

If you use a Euler Diagram to show the relationships between data within a group you need to understand the meaning of the areas you are creating. In the diagram to the right you can see the logical area of the diagram. By drawing the diagram you are making logical statements about the relationship between entities. In this case you claim that All X's are Y's, that no Z is of type Y, and there exist Y's that are not X's.

This diagram also implies, but does not state that the number of Y's is larger than Z's, and that the number of things in X and Z are probably similar. Understanding that people will take more than just the logical implication from a diagram is important.

Line Graphs

I am sure you all think you know line graphs. But there are some things to think about. If you have a line graph then you are stating that there is a valid value returned at any point on the line.

If the X axis is time it must be linear. If you have data from 1998, 2004, 2006, 2007, 2008, 2010, and 2011. You do not just put these as X labels and treat them as a uniform sequence.

If you put to graphs next to each other to compare data they must have the same Y axis, and they should have the same X axis.

Pie Chart

Pie charts are for comparing data by showing its size, and ratio of a meaningful whole. The advantage over a bar chart is the additional information about this ratio of the whole. Never use 3D exploded pie charts. They are just a way of hiding your data.

Reading

  • Tufte 2001 "The Visual Display of Quantitative Information" on Amazon
  • Klass 2012 "Just Plain Data Analysis" on Amazon
Personal tools