Global access to data is exploding. At the same time, our ability
to categorise, classify and analyse this data is also expanding. In
terms of how we apply this new opportunity for data analytics and
data analysis tools in the enterprise, we need to ‘train’ the models
we use and carefully supervise the process of machine learning as we
move towards unsupervised learning capabilities.
It’s time to
get smarter at getting smarter.
We’re gathering a lot of data
these days. Your grandmother (if you’re lucky enough to still have
one) could probably have told you that. But taking on all the data
that’s out there is (and we are increasingly using the expression)
rather like drinking from a firehose.
Sorting the wheat
from the chaff
Intelligent first steps on the road to
machine learning are essentially strategic before they are tactical.
We need to know where we are going as a business and create data
models and supporting data analytics functions that can help channel
us in the right direction for growth, profit and expansion.
There’s a lot of on-the-ground tactics that need to follow.
Starting by laying down process frameworks to verify and validate
our data whilst also working to deduplicate, classify, secure and
(in some cases) anonymise our data.
Needle in a
But let’s go back to strategic planning for a moment.
We know that we have all this data swirling around us in various
volumes and at different data quality levels, so we need to be able
to stand back and define what our business goals and objectives are
before proceeding with our data analysis. It’s hard to find the
needle in the haystack until you know how long, sharp or thick the
needle actually is.
In real business terms, obviously this is
not a case of needles. This is a case of knowing what our defined
outcomes should be for the business operations we seek to execute in
the short, medium and long term. Once we know what those outcome
targets are, we can then start to look for the data that will
support the decisions we need to make in day-to-day operations.
What all of this means is that before we can start to move
towards unsupervised learning capabilities, we need to think about
the health, provenance and accuracy of the data we are going to use
to inform our machines within core knowledge frameworks.
need to be able tell our machines what they need to do and define
their role in the universe, but this is never easy because we’re
probably dealing with shoddy unsubstantiated datasets at the
It’s foolish to think that more data is always better.
In fact, lots of unproductive data leads to failed projects and
systems running into bugs and other issues. It’s not hard to fall
into a vicious circle of uncorroborated data, brittle business
objectives and flaky workflow policies. Put simply, data quality
shouldn’t be taken lightly.
Stepping out of the vicious
circle isn’t easy, but it is possible if we ask ourselves what we
really mean by data quality.
It is imperative that we question our
data’s completeness, that is – does a particular dataset have all
the component pieces of information in it that we need for the
specific data analytics and processing we wish to carry out?
It is crucial that we examine the level of compliance of the data
in hand. By this I mean – is our data presented in the required
format and standard for the defined regulatory frameworks that we
seek to operate within, connect to and integrate with?
Lastly, we need to question our data’s currency (as in time, not
value), which means that we need to examine its age, relevance and
ability to still be ‘current’.
If we can check our data using
these parameters, we can move forward and start applying machine
learning efficiencies to the business.
So when do we start
putting these machine learning systems into deployment?
answer is: when we know our data is a) trusted and b) can provide
quantifiable results for our data analytics. When we know that we
can get the same (and ultimately better results) by allowing a
machine to start making decisions, we can get really smart.
Start small, think big
We should obviously start small,
in defined compartmentalised areas, but then look upwards to scale
into strategically interrelated parts of the business.
summary, define your business goals, look for the right data as you
insist upon the three Cs, start the data analysis process
methodically and move forward by adhering to architectural precision
throughout. This is the pathway that leads to smarter machine