Truncating the Axis and a Concept Redesign

Last week Mona Chalabi posted a chart First Time Home Buyers Average Age. Below is Mona's chart.

This chart spread quickly through the data visualization community. Within 15 minutes of her posting this graph, someone messaged me, "Mona truncates the y-axis" and other comments started to quickly pop up on Twitter. Pekka Taipale wrote "I'm most impressed by the 30 underground stories that all these houses have. #scale #barchart".

Chris Ganowski wrote "Nice concept, but that y axis needs to start at zero - otherwise you're exaggerating the differences.". He later writes about experts agreeing that the axis on bar charts should be set at zero, to which Edward Tufte responds.

In the Big Book of Dashboards, we addressed this type of scenario a number of times. One example is this callout box on page 44 in Chapter 2 - the Course Metrics Dashboard. Our advice is that when using length, height, or area to encode the data for comparisons, always start the axis at zero.

So what is the issue here? The primary issue is that the data is being encoded with height, in this case the height of the house is representing the average age of the firt time home buyer. The y-axis starts at 30 years old. As a result, the height of the house in 2008 appears to be double the size of the house in 2007, and 2017 is more than three times as tall. It doesn't take someone two or three times longer to buy a house, so there is a disconnect between the actual data and the data that is being plotted. The graph is actually plotting "How much older than 30 is the average age of a first time home buyer."

A number of a people pointed out that no one will buy a house at the age of zero and in many places there may be a legal age required to enter into a contract to own real estate (for example, 18 years old). While this is true, this does not change how the data is being encoded. The height of the houses are still encoding "the number of years over 30".

Truncating an Axis

Should we ever truncate the axis? Yes, there are many instances where truncating an axis would be a good idea. If the difference in the data is small and that difference is important, and you aren't encoding the data using height, length or area, then truncating the axis could be very useful. For example, the difference in the world record times for the 100 meter backstroke short course over the last 10 years is about one second. In November 2009, Nick Thoman (from Cincinnati, OH) set the world record 100 meter backstroke short course at 48.94 seconds. In the past 10 years, it has been broken three times, each by 2/100th of a second.

If we show this as a bar chart or a line chart with the y-axis starting at zero then there is no visible difference.

Aternatively, we can change it to a stepped line and truncate the y-axis to something that references the data in a meaningful way.

Let's go back to Mona's chart. Could we plot that as a line chart and truncate the y-axis? Well, in fact, that is exactly what the BBC did when they published this data. Here is the original chart from the BBC published on January 31, 2019.

Notice this is a line chart starting at 30, using the same .5 intervals, and over the same period of time. This chart works very well. They also included the summary in a sentence right above the chart, "The average age of first-time buyers has risen from 31 to 33 over the same 10 years." This makes the point, simple and clear.

Concept Redesign

One option for a redesign would be to change the chart type. Using a line chart, dots or other encoding that does not encode with height or length would be one option. This would be a major redesign and would completely change the design and style of Mona's visualization. This would probably be a good time to add that I am a big fan of Mona Chalabi's work. Be sure to follow her on Twitter if you are not already. Her designs are impressive and inspiring, so the last thing I want to do is take away from that. One of my old professors used to say, "Don't throw the baby out with the bath water." In an effort to keep as much of the design as possible, another option might be to change the story slightly to match how the data is encoded.

Below is a concept redesign. I changed four things.

1. Since the data is encoding "how much older than 30 is the average age", I changed the title of the visualization to match the encoding, "How long after 30 ?"
2. I added an annotation to the visualization to make the message clear "The average age of a first time home buyer increased 2 years from 2007-2017, ~ 31 to 33 years old." This will help the reader understand the magnitude of the difference quickly.
3. I extended the y-axis down to 29.5 to help the reader see that the y-axis starts below 30, but the houses are plotted starting at 30.
4. I added a line to the chart in an attempt to bring out the trend without encoding height.

Would a line chart be better? Probably, but then we lose all of the design elements in this visualization. However, by making a few minor changes, we can keep the design elements and help the reader understand what is going on in the data. The encoding matches the story, there are clues to help the reader see that the y-axis doesn't start at zero and the main message is brought out explicitly so that there is no confusion in the magnitude of the change over time.

Here is a variation of the redesign, muting the colors of the house a bit and bringing out the line. Overall, my goal was to keep as much of Mona's design elements as possible, so I didn't want to mute the colors much more than this.

Note - This is a concept redesign. I did this in Adobe in a few minutes time just to show the concepts that I have discussed in this blog post. I am sure others could do a much more finished and polished redesign.

I hope you find this information useful. If you have any questions feel free to email me at Jeff@DataPlusScience.com

Jeffrey A. Shaffer

Follow on Twitter @HighVizAbility