Last week, data visualisation specialist Andy Kirk presented the final talk of the year in Parliament’s series of seminars on Exploring Digital. The session explored the challenges of visualising data effectively and the approaches practitioners could take to meet them.
If you missed it, then you can catch up on the talk here.
Here's a summary of some of the points made by Andy that we might learn from in our own work and in organising teams in Parliament.
What is data visualisation?
Data visualisation is the art and science of presenting numerical information visually, so that patterns and relationships in the data are revealed and better understood. As an example, the charts below show four datasets that are obviously very different; and yet these datasets all have the same mean and variance in each variable, the same correlation coefficient, and the same liner regression equation. Their summary statistics are identical. It is only when you visualise the data that their differences become apparent. This is why visualising data is an essential part of any data analysis. These datasets are known as Anscombe's Quartet.
Data visualisation is about much more than infographics, it is about the accurate and effective presentation of statistics in order to help the reader better understand them.
Data visualisation is interdisciplinary
Data visualisation combines skills from a range of disciplines: project management, data analysis, science, journalism, psychology, communication, visual design, and computer programming. It is unrealistic to expect any single individual to master all of these skills, but you can aim to incorporate these skills across a team.
Context matters
A common mistake in data visualisation is to assume too much knowledge on the part of the reader. When producing a visualisation it is easy to take your own knowledge of the subject for granted and forget what it contributes to the experience of reading the data.
Andy illustrated this point with a bar chart that showed the number of goals scored by the footballer Lionel Messi and the number of games he played in each of the last few seasons. If you don’t know that scoring more than one goal per game on average is highly unusual in football, the chart loses its meaning.
Visualisations should aim to provide the context that makes their data meaningful to the reader. As an example, Andy showed a scatterplot where each point on the plot showed the median male and female salary in a particular industry. By explicitly drawing the 45 degree diagonal line on the chart — which is where any data point would lie if median male and female salaries were equal — you could instantly see that in every industry the median female salary was lower than the median male salary.
You can’t avoid making editorial decisions
Visualisations tell a story, so editorial decisions are unavoidable. Editorial balance comes from choosing a representative and accurate set of stories to tell, rather than trying to achieve balance within a visualisation. It is more meaningful to ask whether a visualisation is misleading in the way it presents the data than to ask whether it is balanced. Make accurate and honest visualisations that put the data in context.
The message is more important than the messenger
Organisations are often keen to incorporate their corporate identity into data visualisations by rendering elements of the visualisation in corporate colours. Sometimes it is possible to do this without harm to the visualisation, but colours play an important role in the way a visualisation communicates its meaning to the reader.
Colours have cultural associations and known psychological effects. Furthermore, human perceptions of colour are biased; for example, perceptions of the lightness and darkness of a particular colour varies with the hue in question. Effective visualisations must take account of these aspects of colour. Incorporating a corporate identity into a data visualisation should be secondary to accurately communicating the nature of the data.
The number of possible visualisations is larger than you think
Andy has identified more than 60 different widely-used types of data visualisation. Excel can handle the most common and vital cases (bar charts, line charts, scatterplots etc.) but does not straightforwardly support some types of visualisation that are becoming more common: Sankey diagrams, tree maps, heat maps, bubble charts, slope diagrams, small multiples, network diagrams etc. Excel also does not contain any mapping tools.
The ecosystem of visualisation software is large and growing
Andy maintains a page on his website listing all of the software resources available for data visualisation. He lists more than 300 pieces of software in the field, much of it free.
Interactivity is expensive (in time, but not necessarily in cost)
Adding interactivity to a visualisation takes time, but potentially provides much more information to the user. Before developing an interactive visualisation, first calculate the costs and benefits and ask whether it is worth it.
You should only add interactivity where it significantly enhances a visualisation’s ability to communicate information to the user. Interactivity should add something, either giving the data new meaning, or allowing the user to explore the data in depth in order to answer their own questions.
Interactivity is particularly useful where you can provide a common interface to a large dataset, so that people can see and compare the same data for different geographical areas, or at different points in time. Don’t use interactivity to add pointless visual gimmickry.
The most important technologies
Andy identified Excel, Tableau, R, and Illustrator as the technologies he relies on the most. He said that the D3 JavaScript library was the most exciting new technology he had seen, as it lets you visualise things in literally any way you can imagine. D3 has a steeper learning curve than some other software, because you need to be able to program with JavaScript to use it effectively. But he considered it worth learning if you have the inclination, as it is potentially the most powerful tool currently available for developing visualisations online.
Data visualisation work in Parliament
In the last two years there have been significant efforts to explore and develop Parliament’s capabilities in this area. The Innovation, Development and Feasibility (ID&F) Board has sponsored projects aimed at establishing the visualisation needs of teams in both Houses and demonstrating the potential benefits of new technologies. The statistical sections of the House of Commons Library have actively contributed to these projects. Andy’s talk supported the direction we have taken so far and suggested some interesting possibilities for further developing our data visualisation capability in future.