10/23/2014 (Updated with 2020 data)
Halloween Data Set Published for Data Visualization
I have lived in my house for 12 years, and every Halloween I get inundated with trick-or-treaters. Year after year it's like Willy Wonka's Chocolate Factory at my house, with candy flying out the door. Now being a data guy, what do I do? Well that's simple, I count them. A friend and I keep meticulous count of the trick-or-treaters in 30 minute intervals, and I've been tracking this data year by year. After a number of years of recording these numbers, I decided to use it as a data set for data visualization training. I've used this data set in the data visualization course at the University of Cincinnati, training at KPMG in their Advisory University and Global Analytics training as well as other data visualization workshops and corporate training. At the time of writing this post, I would estimate that over 1,000 people have created a data visualization using this data and based on the hourly rate of some of those folks, it is well over a six figure investment overall.
There are a number of reasons why I really like this data set, and I've had great success using it as a teaching tool. Large data sets are very common nowadays. This very small data set forces people to be creative with the data and the story they tell. It's a fun topic; something most everyone can relate to (at least in the U.S.), and it's always interesting to see the wide range of data visualizations that are created during this exercise. I decided to make the data set public and to keep it updated.
Yes, in 2011 I had 869 trick-or-treaters come to my house, and in 2020 with a global pandemic, I had 219 trick-or-treaters.
Instructions (typically 45-75 minutes in workshop format, small group or individual or as one assignment in class):
1.) Determine a story or goal for the visualization.
Homeowner dashboard summarizing Halloween
Forecast future trick-or-treaters or estimate future candy need
Explore variation of the number of trick-or-treaters year by year
2.) This is a very simple data set. There are only a few years of data broken down into 4 half-hour time blocks with cumulative totals. Think broadly about the data.
The data is time series data – any additional choices?
What comparisons can you make?
What table calculations can be made?
What additional data can be appended from other sources to help tell the story or complete an analysis?
NOTE - be very careful because there are many pitfalls at this step.
3.) Build a data visualization
Download Data sets (xlsx format):
Halloween Data Set in Excel (cross tab) (2008-20120): Download Here.
Halloween Data Set in Excel for Tableau (2008-2020): Download Here.
Numbers in data file for Excel are cumulative.
Numbers in data file for Tableau have been unpivoted and are not cumulative.
Zip Code: 45207
Neighborhood: East Walnut Hills/Evanston (being on the border likely increases the number of trick-or-treaters
The date for trick or treating has always been on 10/31 (some neighborhoods change the night for trick or treating)
The type of candy did not vary year by year. It is always a general mix of candy
Official trick or treat hours are from 6pm-8pm, but at 8pm I do not kick the children off my porch and say "no candy for you". These "stragglers" trickle down and by 8:15pm there are no more.
In most of the recent years there were a number of trick-or-treaters that showed up before 6pm. The earliest was in 2020 at 4:45pm (neighbors that wanted to come before anyone else because of COVID-19).
For 2020, because of COVID-19, I set up two long candy chutes so that trick-or-treaters could get candy from a safe distance.
Instructor Notes Available:
Instructors who would like to use this data set for their class or workshop can contact me at Jeff@DataPlusScience.com for detailed instructor notes on this assignment/exercise.