If you are not one of the 100 followers of Katarzyna "Kasia" Gasiewska then you will be missing out on some great visualizations that are sure to pop up on her Tableau Public profile. Her very first post on Twitter was her recent Iron Viz entry Water, Water Everywhere and when she posted a new viz last week, it caught my eye immediately. The viz was DID YOU SRSLY NAME ME THAT? and it was the Tableau Viz of the Day last Friday. When I saw the image of the viz I was immediately drawn too it, but before I even clicked the link, I noticed her Tweet about the "way-too-long initial loading time". As soon as I clicked on her viz I felt her pain.
Exploring the Viz
In addition to the long loading time (the viz took more than a full minute to load for me on Tableau Public), clicking between boy/girl names also took a very long time to filter. When I explored the viz, I didn't see anything too complex (a bar chart, a radial bar chart and a dot plot). So I decided to download the workbook and have a look under the hood. At that point the pain increased significantly. It was like going to a medical practitioner that asked to rate the pain on a scale of 1 to 10, and this viz was rating high on that scale. It was slow enough loading and interacting on Tableau Public, but when downloading and actually trying to move pills around to make changes to the viz, it quickly reached the top of my pain scale.
At that point, I felt bad for Kasia, because I know the hours that can go into designing a viz like this and I could just imagine her trying to build this viz, one step at a time, and having to wait, minute by minute, for the viz to update when she adjusts the shapes, color, or the size slider, or make any other change to her viz. We traded a few messages and she has updated her viz, but I decided to write this post to describe some things that I did to help improve the speed of her visualization and offer a few links and resources for others that might wander into this poor performance zone.
There are lots of resources out there on "how to optimize your Tableau workbook". I would encourage you to read a little on the subject, even if you've never encountered these types of performance issues. Understanding just a few tips and tricks can save a huge amount of time in the long run.
Below are a few things that I did to help improve this workbook. Note - this workbook can be optimized further. Kasia made these improvements plus a few others and her viz is now much faster than her originally version
1. There was an extra data source in the file that wasn't being used. This was an easy fix. Right-click on the data source and select close. If you aren't sure if it's being used, no worries, Tableau will give you a warning before it closes a data source that is being used. In this case, the size of the TWBX file dropped almost in half.
2. There were 1.8 million rows of data, but most of this data was not being used in the visualization. The most granular level of detail in this viz is the dot plot. This dot plot shows the top 10 names from 1990 to 2014 for both boys and girls. That means that we have 1.8 million rows of data to show 115 years (1900-2014) * 10 (for the top 10) * 2 (boy/girl). That's a ton of extra rows to show 2,300 data points. One solution for this is to trim the data down to what is really needed.
3. There were a ton of calculations that were being done along the way. First to sum the count of each name, then another calculation to rank that sum of the count. Then one set of calculations for boy/girl size and another set for shape. Those four calculations have a complex if-then-elseif-then-elseif-then-else-end structure, and often with an OR statement included. In addition, the output of these calculations was a string, for example, "Size1" and "Size2".
Resolving these Issues
The best thing I've read on Tableau Performance is this whitepaper written by Alan Eldridge. This whitepaper is basically a mini-book on the subject and it's an incredible resource covering the broad range of topics necessary to improve efficiency in Tableau workbooks. Yes, it's 88 pages, but it's a must read if you are trying to make your workbooks more efficient.
Here are the summary points from Alan's whitepaper:
There is no silver bullet for inefficient workbooks. Start by looking at the performance recorder to understand where the time is going. Long-running queries? Lots of queries? Slow calculations? Complex rendering? Use this insight to focus your efforts in the right direction.
The recommendations in this document are just that – recommendations. While they represent a level of best practice, you need to test if they will improve performance in your specific case. Many of them can be dependent on structure of your data, and the data source you are using (e.g. flat file vs. RDBMS vs. data extract).
Extracts are a quick and easy way to make most workbooks run faster.
The cleaner your data is and the better it matches the structure of your questions (i.e. the less preparation and manipulation required), the faster your workbooks will run.
The majority of slow dashboards are caused by poor design – in particular, too many charts on a single dashboard, or trying to show too much data at once. Keep it simple. Allow your users to incrementally drill down to details, rather than trying to show everything then filter.
Work with the data you need and no more – both in terms of the fields you reference as well as the granularity of the records you return. It allows Tableau to generate fewer, better, faster queries and reduces the amount of data that needs to be moved from the data source to Tableau’s engine. It also reduces the size of your workbooks so they are easier to share
and open faster.
While reducing the data, make sure you use filters efficiently.
Strings and dates are slow, numbers and booleans are fast.
Let's apply a few of these techniques to Kasia's workbook and see if we can improve the speed of this visualization.
Work with the data you need and no more
Before proceeding, let's reference Tableau's Order of Operation.
Here are Kasia's measures on the Columns and Rows and the filters she applied.
The Rank is ranking the sum of Sales after the Year and Gender filters are applied. Unfortunately, those filters only reduce the number of records from 1.8 million down to 1,052,480. Then a sum of count is conducted, because that is needed for the rank of that sum. Only then can a filter be applied to that rank. In other words, the calculation for sum and rank had to be done on one million records that remain after the dimension filters. This isn't necessary, because we only need a very small number of these rows to create the view.
Alan wrote, "Work with the data you need and no more." That is great advice. This workbook has 1.8 million rows of data, but in the most granular view it only needs 2,300 rows of data. In an ideal world, we would trim it down to just the rows that are needed and use that data instead of the full data set. Instead, I will apply a quick and easy solution to trim this data down; Data Source Filters.
Notice in Tableau's Order of Operation that a data source filter (and Extract filters) are applied before other filter types and long before other calculations occur in Tableau. Leveraging this can really speed up performance in this workbook. There were two quick and easy data source filters that I applied to Kasia's viz.
1. Year - the data set starts in 1880, but Kasia is only using data from 1900 to 2014. Filter the records that are "at least 1900" removes 56,000 records that are not used in the analysis.
2. Count - this is the important one. Every name, boy and girl, has a count within each year. This count is used to determine the top 10 names in each year using the rank function. The lowest count that is being used was 1,906. By adding a data source filter at 1,906, we can remove 1.8 million rows of data that are not being used in the viz.
Applying these two data source filters takes the data set from 1,825,433 rows down to 24,130. This is still 20,000 records more than we need, but it's a very quick and easy way to filter data that is not needed, which will speed up all of the underlying calculation. This single step increases the speed of the workbook significantly. In fact, this alone makes the viz much more usable on Tableau Public.
Strings/Dates vs. Numbers/Boolean
The next really helpful tip from Alan that could help improve this workbook's performance is this one; "Strings and dates are slow, numbers and booleans are fast."
Here's an unrelated example. Instead of using an IF statement to assign a highlight color as a string, we can use a boolean output.
Highlight a Color as a String:
IF [State] = [State Parameter] then "Blue"
Highlight a Color as a Boolean:
[State] = [State Parameter]
Notice the boolean is a much more elegant solution in this case and it will perform better on larger data sets.
Kasia's workbook had a few calculations with complex IF statements that output to a string (and calculating them on one million records). We can't use a boolean solution for Kasia's calculation, however, we can make these calculations faster by converting them to numbers.
Kasia's original calculation for Size as a String:
if [Circles - Boys]=0 then "Size1"
ELSEIF [Circles - Boys]=1 OR [Circles - Boys]=2 OR [Circles - Boys]=4 then "Size2"
ELSEIF [Circles - Boys]=3 then "Size3"
else "Size4" end
Here's a revised calculation to get the same results, but instead of a string it outputs a number:
case [Circles - Boys]
when 0 then 1
when 1 then 2
when 2 then 2
when 4 then 2
when 3 then 3
Note - Another option in this case would be to group them, for example 1, 2 and 4 grouped as 2, and use the group on size, without using a calculation.
In the end, the changes were pretty straight forward. Kasia was able to make a few minor changes, reducing the data to what she needs in the viz and updating a few calculations, and the performance of the workbook increased significantly. Using Tableau's Performance Recorder in Tableau Desktop (Help menu -> Settings and Performance -> Start Performance Recording) we can see a huge difference in the performance of this workbook with these changes.
Original Viz: 39.57 seconds to open the workbook and 23.15 seconds computing table calculations
Updated Viz: 2.075 seconds to open the workbook