Home | About | Services | Contact | Insights


10/9/2013
Geocoding Addresses Directly in Tableau 8.1 Using Integration with R


It's now day 2 playing with Tableau's integration with R and the questions for tonight: Do you have a list of addresses in Tableau and need to geocode them? Ever wish you had an easy way to geocode them directly from Tableau without using a geocoding service or website? Well now with Tableau 8.1 integration with R this is possible, at least in some instances.

Here is quick map that I plotted by geocoding data directly from Tableau using integration with R. Latitude and longitude for these addresses was not in the dataset. The address was concatenated into a single string, sent to R to be geocoded, passed back to Tableau, parsed and then plotted on a map and all this was done with a few calculated fields.



One of the best things about R is all of the wonderful free packages that are available. There are literally thousands of packages available and each of those packages have various functions available. For batch geocoding I am using a package called ggmap and the function is called geocode(). This technique also requires using Rserve as described in Tableau 8.1 and R by Bora Beran here.

Step 1: Install and load the packages Rserve and ggmap in RStudio

   install.packages("Rserve")
   install.packages("ggmap")
   library("Rserve")
   Rserve()
   # The ggmap package will load from Tableau when necessary

Make sure the R connection is setup in Tableau.

Step 2: Load a data file in Tableau that has fields address, city and state

I created a CSV file of ice cream shops in Cincinnati and included a name field.



Step 3: Create some calculated fields

   Concatenate the address into a single string
   Calculated Field Name: Full Address
   Formula: [Address]+", "+[City]+", "+[State]

   Create Geocode function to R
   Calculated Field Name: Geocode
   Formula:
      SCRIPT_STR("
      library('ggmap');
      geo <- geocode(.arg1, output = 'latlon');
      geo$latlon <- do.call(paste, c(geo[c('lat', 'lon')], sep = ','));
      geo$latlon
      ", ATTR([Full Address]))

     Notice the simple function in R to geocode an address: geocode('123 Main Street, Cincinnati, Ohio', output ='latlon')

   Parse latitude from Geocode field
   Calculated Field Name: lat
   Formula: left([Geocode],find([Geocode], ",")-1)

   Parse longitude from Geocode field
   Calculated Field Name: lon
   Formula: right([Geocode], len([Geocode])-find([Geocode], ","))

   Convert lat from string to float (the final field to plot)
   Calculated Field Name: latitude
   Formula: float([lat])

   Convert lon from string to float (the final field to plot)
   Calculated Field Name: longitude
   Formula: float([lon])

   When complete, assign "Geographic Roles" to latitude and longitude

It's now possible to plot latitude and longitude in Tableau as if they were in the original data set. Here is a screen print of a quick map of the ice cream shops.



What's nice about this functionality?

1. Tableau integration with R is very exciting. R is such a powerful tool and by combining the power of Tableau the possibilities are endless. As mentioned earlier, there are thousands of packages available that allow for nearly any kind of analysis to be done.

2. This approach, while not practical for large data sets, would be very handy for small batches of addresses. This method would allow for real-time geocoding of customer data as a one-stop shop. Addresses could be added regularly to a database or CSV file and then geocoded on the fly as needed on a worksheet or dashboard. Recently a number of the free batch geocoding services have shut down or moved to a pricing model, so this tool (through Google) could be helpful as another resource.

3. The process is very simple. A few calculated fields in Tableau and basically one line of code in R to geocode.

Now for the disclaimers:

1. Slow - Geocoding is a slow process. The typical speed for geocoding engines is around 1 to 3 records per second. I find that using this method it is approximately 1 second per record, so this tool is not going to be practical for large files of addresses.

2. Limits - This method utilizes Google to geocode and there is a limit of 2,500 per day from a single IP address. They monitor this very closely so once the limit is hit it will not geocode anymore.

3. Continual Refresh - It seems that on any actions or adjustments in Tableau that the data refreshes and the process to R begins again. This means clicking on new sheets, changing the size of the shapes on the maps, adding or adjusting fields on the worksheets or dashboards, etc. Having the tool constantly refresh isn't practical. This auto refresh can be stopped by unchecking "Auto Updates" in the Worksheet menu, however, this doesn't resolve the overall problem because if it's unchecked then it won't plot the data at all. This would only be useful when building the report without constant geocoding. This may be something Tableau will have to work out, for example when adding the worksheet map to the dashboard it restarted the process and geocoded everything again. When I adjusted the shapes of the circles it geocoded again. When I added a new sheet and switched back and forth it geocoded again and when I annotated a point it geocoded again. This problem may not exist with other R integrations, but I can see this being an issue when calling other functions on a regular basis. Maybe Tableau can suggest some alternative approaches here or maybe they can provide more controls on when functions to R are called. After all, it's still only beta 2 at the moment.

4. Beta's Crash - This was done with Tableau 8.1 beta 2. I have found beta 1 and beta 2 to be VERY stable overall. In fact, until working with this file today it had only crashed on me once. This particular process seems to crash Tableau frequently, especially if there are hundreds of addresses and multiple sheets. I was able to geocode and map these 39 addresses fairly quickly and without too much trouble, but working with several hundred records seemed problematic.

5. Not a universal solution - Many companies geocode all of their addresses when they are loaded in the system (we do), in those cases those organizations may not have a need for a tool like this.

Overall I am thrilled with the Tableau integration with R, but there is a long way to go with this. Tableau did not open the door here, they simply cracked a window. I hope this example illustrated some of the power of Tableau and R working together and gives a glimpse of the wonderful things to come as they continue to build on this integration.

Download the Tableau 8.1 workbook
here.

Please share any Tableau integration with R that you've done or will do in the future. All this talk about ice cream makes me want to go eat some ice cream, so it's a good thing I have some Graeter's ice cream here at home. Happy vizzing!

Jeffrey A. Shaffer