One of the pieces of data we use regularly is dates. Every data repository needs the ability to view the data by day, month, year, etc. There is often also a need to view data based on holidays.
Funnily enough, you would think that a list of data containing the NZ public holidays would be readily available in one of the many and varied NZ Govt Open Data portals, but you would be wrong. The best we have is this which as you can see makes it a little hard to use as a reusable set of data.
So as we always do when we need to reinvent the wheel, we asked around and found a talented person who had, of course, solved this problem. So I got sent an excellent Excel spreadsheet with what I needed. (Note: It’s not nested, so it’s usable unlike the data provided by the NZ statistics website)
|1/01/80||Tuesday||New Year’s Day||Y||1/01/80||Tuesday|
|2/01/80||Wednesday||Day after New Year’s Day||Y||2/01/80||Wednesday|
I have had problems before where Open Data seems to disappear with no warning, so a while ago I started storing any open data I used in AWS S3. I then decided to make it accessible just in case it was useful to somebody else, so I created a simple catalog to access it via a WordPress site.
I decided I would publish the Public Holiday data. But as I was about to do it, I thought, ‘Hey, there is probably a little augmentation I could do to make the data more useful’.
So, enter my favourite (free) data wrangling desktop tool, Trifacta.
So, the first step was to import the excel spreadsheet into Trifacta, which was, of course, easy peasy.
Then I clicked on the date column to see what suggestions the Trifacta wisdom-of-the-crowd based machine learning engine would provide.
Oh looky, it suggested that I assign a day of the week to each date and shows me, in yellow, the result. Yup, that’s a useful piece of data to have so into the recipe pot it goes.
Now, let’s just check if that transform worked, so we will have a quick look at the profile data.
Well, we can see that Monday is our most popular day for public holidays by a long mile, but we can also see that we have one missing value, that we will need to investigate to see why it is not resolving to a day number.
Oh while, as I’m here I might as well have a look and see what holidays are generated.
Well, it looks like we have 41 years of holidays, and all the ones I expect to see are there.
Ok, the last step I think I will be to split the date out into its component parts, to make it easy for somebody to find all the holidays for a certain year if they are using a legacy data visualisation tool.
So, a simple wrangle formula to split out the year.
Followed by a few more to split out the other data parts and wallah.
Last, step is run the recipe, output the results to CSV and JSON and then up to opendata.nz it goes.
Change, learn or fade away, it’s your choice – Shane.
Shane blogs about all of the things data and business intelligence.
You can read all of Shane’s blogs here.
Don’t forget, we can train your team in the art of agile business intelligence at any time!