SPEED DATAING

Welcome to the 'afternoon delight' of training programs. By Ruth Spencer & Lena Groeger.

Many of you have probably seen this, but take a look at Anscombe’s Quartet.  Here are four data sets with identical statistical properties – they all have pretty much the same averages, variances, correlations and regression lines. But graph them, and they look vastly different. Visualizing data can give you an entirely new perspective on its structure. This also illustrates another important principle: distributions are more interesting than averages (that one’s also from Amanda Cox).

Important to note that even though we’re getting to the visualization step last, it doesn’t have to be. Part of all that wrangling that you’ve just done might include visualizing the data, to quickly see what it looks like, to experiment with different forms, to uncover new interesting facts or to prompt new questions. Take a look at how this map of voting patterns prompted a whole discussion of the history of land use in the South to soil types to ancient marine life.  

The form you chose for your visualization is crucial. Form determines how the data will be understood, whether the person looking at it is just you or the general public. Choose the right form, and the data reveals a story. Choose the wrong form, and you could confuse or mislead the user.

The best way to choose a form is to pick one that exploits the data you have; which lets the structure of the data reveal itself. For example, choosing a box chart to display the budget lets you easily compare amounts of money, which a bar chart probably wouldn’t capture as well. Other times that means mapping the data, or just plotting it through time. Here’s a good chart of chart options. Often it’s a matter of iterating different forms over and over until you find one that works.

Part of what’s going to help you decide on a form is to consider people’s own deeply engrained psychological intuitions. A researcher at Columbia years ago did a study where she presented a simple data set about height and gender in different formats. When she presented it as a stacked bar chart, people said things like “more males are taller” or “males are usually taller than females.” But when she presented it as a line graph, she had people saying stuff like “the more male you are, the taller you are.” Clearly, that makes no sense, but so engrained is the intuition that LINES = TRENDS, that it’s hard to get out of thinking that way. Keep this in mind as you think about a form, and try to work with people’s intuitions instead of against them. Often, you can use these visual intuitions to your advantage.

Jeff Larson, a news apps developer at ProPublica, gave a talk a few months ago on maps. He wondered why, with all the gorgeous maps out there, we’re still using Google maps for most of our map visualizations online. Google maps is horrendous, it’s got tons of specific details of roads and street names, a bunch of distracting colors, etc. But in maps as in many other kinds of visualizations, less is more. Here’s a map he did for an investigation on redistricting, which is an example of how a minimalistic map can often work better than a cluttered one. Adding context doesn’t have to mean including every single data point you can find.

To get you started visualizing your own data, we’ve listed a bunch of examples in the resources section. Take a look at different ways of presenting similar data, think about the questions that might have led someone to choose that particular way of doing it, compare that data and those questions to the one’s you are interested in. 

Next »