Global access to data is exploding. At the same time, our ability to categorise, classify and analyse this data is also expanding. In terms of how we apply this new opportunity for data analytics and data analysis tools in the enterprise, we need to ‘train’ the models we use and carefully supervise the process of machine learning as we move towards unsupervised learning capabilities.
It’s time to get smarter at getting smarter.
We’re gathering a lot of data these days. Your grandmother (if you’re lucky enough to still have one) could probably have told you that. But taking on all the data that’s out there is (and we are increasingly using the expression) rather like drinking from a firehose.
Sorting the wheat from the chaff
Intelligent first steps on the road to machine learning are essentially strategic before they are tactical. We need to know where we are going as a business and create data models and supporting data analytics functions that can help channel us in the right direction for growth, profit and expansion.
There’s a lot of on-the-ground tactics that need to follow. Starting by laying down process frameworks to verify and validate our data whilst also working to deduplicate, classify, secure and (in some cases) anonymise our data.
Needle in a haystack
But let’s go back to strategic planning for a moment. We know that we have all this data swirling around us in various volumes and at different data quality levels, so we need to be able to stand back and define what our business goals and objectives are before proceeding with our data analysis. It’s hard to find the needle in the haystack until you know how long, sharp or thick the needle actually is.
In real business terms, obviously this is not a case of needles. This is a case of knowing what our defined outcomes should be for the business operations we seek to execute in the short, medium and long term. Once we know what those outcome targets are, we can then start to look for the data that will support the decisions we need to make in day-to-day operations.
What all of this means is that before we can start to move towards unsupervised learning capabilities, we need to think about the health, provenance and accuracy of the data we are going to use to inform our machines within core knowledge frameworks.
We need to be able tell our machines what they need to do and define their role in the universe, but this is never easy because we’re probably dealing with shoddy unsubstantiated datasets at the outset.
It’s foolish to think that more data is always better. In fact, lots of unproductive data leads to failed projects and systems running into bugs and other issues. It’s not hard to fall into a vicious circle of uncorroborated data, brittle business objectives and flaky workflow policies. Put simply, data quality shouldn’t be taken lightly.
Stepping out of the vicious circle isn’t easy, but it is possible if we ask ourselves what we really mean by data quality.
C-cubed: Completeness, Compliance, Currency
It is imperative that we question our data’s completeness, that is – does a particular dataset have all the component pieces of information in it that we need for the specific data analytics and processing we wish to carry out?
It is crucial that we examine the level of compliance of the data in hand. By this I mean – is our data presented in the required format and standard for the defined regulatory frameworks that we seek to operate within, connect to and integrate with?
Lastly, we need to question our data’s currency (as in time, not value), which means that we need to examine its age, relevance and ability to still be ‘current’.
If we can check our data using these parameters, we can move forward and start applying machine learning efficiencies to the business.
So when do we start putting these machine learning systems into deployment?
The answer is: when we know our data is a) trusted and b) can provide quantifiable results for our data analytics. When we know that we can get the same (and ultimately better results) by allowing a machine to start making decisions, we can get really smart.
Start small, think big
We should obviously start small, in defined compartmentalised areas, but then look upwards to scale into strategically interrelated parts of the business.
In summary, define your business goals, look for the right data as you insist upon the three Cs, start the data analysis process methodically and move forward by adhering to architectural precision throughout. This is the pathway that leads to smarter machine learning.