10/23/2014 (Updated with 2015 data)
Halloween Data Set Published for Data Visualization
I have lived in my house for 12 years, and every Halloween I get inundated with trick-or-treaters. Year after year it's like Willy Wonka's Chocolate Factory at my house, with candy flying out the door. Now being a data guy, what do I do? Well that's simple, I count them. A friend and I keep meticulous count of the trick-or-treaters in 30 minute intervals, and I've been tracking this data year by year. After a number of years of recording these numbers, I decided to use it as a data set for data visualization training. I've used this data set in the data visualization course at the University of Cincinnati, training at KPMG in their Advisory University and Global Analytics training as well as other data visualization workshops and corporate training. At the time of writing this post, I would estimate that over 1,000 people have created a data visualization using this data and based on the hourly rate of some of those folks, it is well over a six figure investment overall.
There are a number of reasons why I really like this data set, and I've had great success using it as a teaching tool. Large data sets are very common nowadays. This very small data set forces people to be creative with the data and the story they tell. It's a fun topic; something most everyone can relate to (at least in the U.S.), and it's always interesting to see the wide range of data visualizations that are created during this exercise. I decided to make the data set public and to keep it updated.
Yes, in 2011 I had 869 trick-or-treaters come to my house.
Instructions (typically 45-75 minutes in workshop format, small group or individual or as one assignment in class):
1.) Determine a story or goal for the visualization.
Homeowner dashboard summarizing Halloween
Forecast future trick-or-treaters or estimate future candy need
Explore variation of the number of trick-or-treaters year by year
2.) This is a very simple data set. There are only a few years of data broken down into 4 half-hour time blocks with cumulative totals. Think broadly about the data.
The data is time series data – any additional choices?
What comparisons can you make?
What table calculations can be made?
What additional data can be appended from other sources to help tell the story or complete an analysis?
NOTE - be very careful because there are many pitfalls at this step.
Numbers in data file for Excel are cumulative.
Numbers in data file for Tableau have been unpivoted and are not cumulative.
Zip Code: 45207
Neighborhood: East Walnut Hills/Evanston (being on the border likely increases the number of trick-or-treaters
The date for trick or treating has always been on 10/31 (some neighborhoods change the night for trick or treating)
The type of candy did not vary year by year. It is always a general mix of candy
Official trick or treat hours are from 6pm-8pm, but at 8pm I do not kick the children off my porch and say "no candy for you". These "stragglers" trickle down and by 8:15pm there are no more.
In 2013 there were a number of them that showed up early.
Instructor Notes Available:
Instructors who would like to use this data set for their class or workshop can contact me at Jeff@DataPlusScience.com for detailed instructor notes on this assignment/exercise.