How to Create a Wheat Plot in Tableau
Yesterday, Stephen Few published his quarterly newsletter
and discussed issues around jittering dot plots. He proposed a new chart type or new version of jitter (whichever you prefer). He referred to this chart as a Wheat Plot or stripogram. Steve Wexler
and I traded several emails with Stephen Few about this chart prior to the newsletter and Steve Wexler created several variations with different data sets. In this post I will outline how I built the Wheat Plot in Tableau.
Note: I will be using the World Indicators data set in Tableau, but this is for demonstration purposes only. The data is from 2000-2012 and as a result, the countries are repeated in the dot plot. Therefore, this plot would not be all that useful for analysis purposes.
Building a Wheat Plot
Step 1: Build the Dot Plot
Move Life Expectancy of Female
Select Circle on the Marks card
Change the Size of the dots to make them smaller
Step 2: Create a calculated field and bin
Calculated Field Name: index
Right-click on Life Expectancy Female
and Create Bin
Set the Bin size to 5
Move Life Expectancy Female
Step 3: Set calculation and sort order of index
Right-click on index
on Columns and select Edit Table Calculation
Choose Specific Dimensions
Move Life Exp Female (bin)
up to the top of the list
up to the second on the list
Check the box for Life Exp Female (bin)
and uncheck Region
Set Restarting every
to Life Exp Female (bin)
Set Sort Order
to Custom Sort and select Life Expectancy Female
You now have a Wheat Plot. Change the Size of the dots as needed. Remember, the bin size is set to 5, so the dots will restart every 5.
You can adjust the bin size up or down. Changing the bin size to 2 brings the dots much closer, similar to a unit histogram.
If you don't want a fixed width colum then another option is to set the index to discrete (right-click on index
on Columns and select Discrete). This will size the column width based on the number of dots.
Is a Wheat Plot useful?
Now onto the usefulness of a Wheat Plot. Here are my general thoughts.
I found the high slopes difficult to interpret. Based on the Twitter response of Steve's newsletter, I'm guessing most people will have the same reaction. However, once it settled in for me and I interacted with the data, I did find them useful. For example, in random jitter, once you hover or select a dot, you can't easily find the neighboring dots. Which dot is immediately above or below the value you are selecting? The Wheat Plot allows you to go in order, up and down the data, seeing all of the neighboring values. That said, I worry that people will struggle with the look of these charts and how to interpret the data.
When playing with the bin size on this data set, I prefered a bin size of 2, so I think the bin size will make a big difference on these plots. This will be based entirely on the data set, so it may require iterating through bin sizes to find the best bin size.
That brings me to the data set. The data set that Stephen Few used in his example is very specific. It has very tiny differences in the values that are being plotted precisely. This isn't always the case. For example, if I plot the grades on the exam of all of the students in my data visualization class, there are many with the exact same value and the dots would plot directly on top of each other. There will be many students that have a 92% and none of them will have 91.8% or 92.3%.
Even when the data isn't plotted directly on top of each other with the exact same values, there will be times when two decimal place accuracy is not meaningful. As an example, we visualized session ratings at a conference in The Big Book of Dashboards
(Chapter 3, page 59). The conference sessions were rated on a scale of 1 to 5. When this is averaged to a session, there is no meaningful difference between a session rating of 4.23 and 4.21. These ratings can be rounded and binned to one decimal place (example below). In both of these cases, I find that the random jitter works well. However, my preferred view of this type of data is often the unit histogram. In the case of the speaker rating, we can also encode size of the dot with the number of people attending the session. This adds another level of detail, for example a session that has a 4.2 with a small number of attendees vs. the same rating and a very large number of attendees. Speaker 317 not only had great ratings, but it was also one of the largest sessions that was rated.
I hope you find this information helpful. If you have any questions feel free to email me at Jeff@DataPlusScience.com
Jeffrey A. Shaffer
Follow on Twitter @HighVizAbility