Anybody that has worked anywhere near me has heard me say many times: “I can’t code”.
Well that’s not really true; I can code, just badly. When I do code, it’s more like I hack. So, it’s either a case of hacking somebody else’s code to do what I want or using my friend Mr. Google to find bits that I can cobble together to achieve my desired goal.
As part of delivering faster and safer data solutions for our customers we have become adept at delivering automated Data Warehouse capability via ODE, and we have also been working on safer and automated ways to acquire semi structured data (csv, excel, xml etc.) into our cloud data platform. As we do this development I am always on the lookout for solutions that we can leverage, rather than being forced to go to the effort of baking our own.
So, the other month I came across a product called Trifacta. It was showcased on the awesome BBBT video series. BBBT videos are a great way of seeing what is happening in BI (Business Intelligence) vendor land and is completely free, not often you get value like that.
The enterprise version of Trifacta is based on a Hadoop platform, Cloudera or Horton Works. So, while it looked cool the OptimalBI team has been a bit busy on other things to spend time on standing the platform up to play with it properly.
Then in October Trifacta announced a free desktop version of their product. It runs on both a windows and a Mac, whoot!
So in my spare time I have been playing with a set of open data to see what I can do, and the answer is (apologies to the customer I am currently helping on their AgilebI journey for the “I can code” dance”) I CAN CODE!
The key to my new found capability is this:
Trifacta provides a combination of a visual interface with pointing and clicking and a code generator (using their own cool wrangler language). But unlike most of the ‘ETL’ like analyst tools that I have used that are based on a flow based UI, Trifacta works on the idea of visual interaction and a recipe.
So, in this screen shot I have clicked on one of the character ” in the data and Trifacta gave me a bunch of options that I might want to do with that bit of data (“). I chose the Replace option and it wrote the code to remove all the ” it could find in the data. This is added as the next step in the recipe (on the right of the window).
The previous step in the recipe was to take the first line of the csv file I imported and flag it as the header row, using the simple header command funnily enough.
You can click on each step in the recipe and it will visually show you what it is going to do to the data with that command. This means you can pick up somebody else’s analysis and replay the recipe to see step-by-step what they did to get the result!
There are lots of more cool features of course, checkout the auto profiling of the data at the top of each column as just one, not to mention the cool crowd sourcing approach to recommendations.
For me Trifacta looks like a good free desktop option to analyse data for those of us that aren’t R gurus. But be warned there is a limit of 100MB of data per recipe in the free version.
Wrangle on Cowboys!