Machine, Process and Human Generated Data
BI Modal Courses and Conferences
When I can make the time I find it useful to attend conferences and courses. I try and mix up the things I attend to combine both events that are in the BI domain and events that are outside the BI domain as I always seem to find something interesting and relevant attending both types. My all time fav conference is still Webstock, it’s one you should definitely add to your conference bucket list.
Sometimes attending a course reinforces to me how much I love the courses we run over at OptimalBI due to the high level of hands-on content we include (nothing like sitting listening to somebody speak to powerpoint for a few days to reinforce this). Sometimes it gives me time to zone out and think through some of the things I have been working on and off for a while and an “ah ha” moment arrives.
But there is always a few key new things I take away from the course itself.
My Latest Takeaway
One of the things I took away from this course and have used consistently since, is the concept of Machine, Process, and Human Generated data.
The concept is explained in Barry’s book Business unIntelligence, and there is also a great summary available in a white paper Barry wrote for Teradata in 2013 Unlocking Machine-Generated Data. An excerpt from this paper:
1. Human-sourced information* : All information ultimately originates from people, an artifact of the human mind. This information is the highly subjective record of human experiences, previously recorded in books and works of art, and later in photographs, audio and video. Human-sourced information is now almost entirely digitized and electronically stored everywhere from tweets to movies. Loosely structured and often ungoverned, this information may not reliably represent for the business what has happened in the real world. Structuring and standardization (e.g. modeling, validation in operational systems and cleansing as data moves to BI) allows the business to convert human-sourced information to more reliable process-mediated data.
2. Process-mediated data: Business processes are at the heart of running and managing every business. These processes record and monitor business events of interest, such as registering a customer, manufacturing a product, taking an order, etc. The process-mediated data thus collected is highly structured and includes transactions, reference tables and relationships, as well as the metadata that sets its context. Process-mediated data has long been the vast majority of what IT managed and processed, in both operational and BI systems. Its highly structured and regulated form is well suited to promoting information management and data quality, as well as for storage and manipulation in relational database systems.
3. Machine-generated data: All the name-value pair data in the previous examples falls into this category. Its source is the sensors, machines and computers used to measure and record the events and situations in the physical world. From simple sensor records to complex web logs, machine-generated data is well-structured and usually considered to be highly reliable, provided that the occurrence of faulty sensors and missing data is accounted for. As sensors proliferate and eCommerce becomes pervasive, driving ever larger data volumes, machine-generated data is becoming an increasingly important component of the information stored and processed by many businesses. Its well-structured nature is amenable to computer processing. It is sometimes claimed that its size and speed is beyond the abilities of traditional RDBMS, mandating NoSQL data stores. However, high-performance relational databases are regularly used for such data.
A Great Concept
The reason I love this concept is I can easily map these three styles of generated data to technologies that are designed to manage them.
Machine Generated Data such as Sensors can be acquired, loaded and stored with “Big Data” technologies such as Spark and Hadoop. Heaven to Betsy we can even call it a “Data Lake.”
Process Generated Data such as CRM data stored in a relational database, can be acquired using Change Data Capture (CDC) technologies and stored in an MPP style relational database.
Human Generated Data such as Twitter and Facebook can use similar patterns as the Machine Generated data. But we might use a different pattern say one that uses MongoDB rather than Hadoop.
Differentiating the three different data generation patterns allows me to easily articulate the reason for potentially needing three different technology patterns.
There seems to have been a movement in the past to use these newer data technologies such as Hadoop as the only option, effectively trying to use them for patterns they were not designed for.
There is no longer one BI tool to rule them all, with most organisations I work with now moving to a BI Toolbox concept. The same should be the case for data acquisition, using the right tool for the task. The world of BI is once again becoming a world of hybrid solutions where we need to focus on picking the right technology pattern for the data and then spend the rest of our time determining how to integrate and govern these solutions.
As I write this I think I might have coined a new acronym Acquire, Load and Store (ASL), I wonder if it will become the new ETL/ELT 😉
So thank you, Barry for the Machine, Process, and Human data concept, it made the course worthwhile.
Oh and One More Thing
One of the other things I got out of the course was an “ah ha” moment around moving from the layer cake style data architecture diagrams I have been doing for years to one I have called Service Orientated Data Architecture (SODA), but that’s going to have to wait for another post.
Change, learn or fade away, it’s your choice – Shane
Shane blogs about all of the things data and business intelligence.
Want to read more? Try Gartner Data Integration Magic Quadrant 2016 – Behind with the times or more from Shane
We run regular business intelligence courses in both Wellington and Auckland.
Start your AgileBI journey with us.
Its that time of year again, New Year has been and gone, Christmas is but a memory, in half the world Winter is mid flow and in the other half (including my half) are mid summer wearing shorts and jandles.As well as good friends, good barbecues and great...
It's that time of the year (where the hell did 12 months go!) where Gartner announce the latest version of their Magic Quadrant for Business Intelligence tools, or this year what they call "Modern Analytics and BI Platforms".Im always interested in which...
As a Stakeholder or Product Owner I want I want to understand if constantly adding Agile Team members is a good idea So that I can have a valid conversation with my Product Owner and Scrum Master Initially at least! VelocityWhen building a new AgileBI delivery team,...
February has come around again, the Gartner BI conference is running in Sydney Australia, people I know are in Sydney drinking beer, I am not, and the Gartner Magic Quadrant for Analytics and Business Intelligence Platform is out.Can I take a little space to rant (of...
As a Stakeholder or Product Owner
I want to understand who should attend the steering committee
So that I know I have the best representation
As a Stakeholder or Product Owner
I want to understand what experience a Project Manager needs
So that I know if they are suitable as a scrum master
3. AgileBI Things: Agile Data Science, Governed Data Lake, The Agile Data Warehouse A Practical Approach
Source: Pixabay Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIN Pulse. This article below is a copy of "3. AgileBI Things - 2017-04-09". Shane also writes on AgileBI concepts at AgileBI Guru. 1. Agile Data Science When the new world of...
Source: Mosman Library Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIN Pulse. This article below is a copy of "3. AgileBI Things - 2017-04-02". Shane also writes on AgileBI concepts at AgileBI Guru. 1. Big Data Papers I still hear...
Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIN Pulse. This article below is a copy of "3. AgileBI Things - 2017-03-26". Shane also writes on AgileBI concepts at AgileBI Guru. 1. Automated Testing Goals So I have finally been lucky...
Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIN Pulse. This article below is a copy of "3. AgileBI Things - 2017-03-19". Shane also writes on AgileBI concepts at AgileBI Guru. 1. Big Model Upfront When delivering an AgileBI project one...
Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIn Pulse. This article below is a copy of "AgileBI Things - 2017-03-12". Shane also writes on AgileBI concepts at AgileBI Guru. Image source: Pixabay 1. SQL Server Temporal tables, automating...
3. AgileBI Things – Data Lakes vs Data Warehouses, Change Management in an Agile approach, So you wanna be a Scrum Master
Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIN Pulse. This article below is a copy of "3. AgileBI Things - 2017-02-19". Shane also writes on AgileBI concepts at AgileBI Guru. 1. Data Lakes vs Data Warehouses I think in the next few...