Big data according to Merriam-Webster is defined as, “an accumulation of data that is too large and complex for processing by traditional database management tools.”
Big data has been giving us headaches for years. I would even go as far as to say that long before we started fretting about ‘BIG’ data we didn’t understand the data anyway. We suffer from disparate database and software environments with some business units still utilising manual processes to override what they believe were technical limitations.
Today, the issue is no longer how we store our data or where, but rather how we access it and gain insight. Are we able to access the multiple systems in a secure and independent fashion without potential issues or bringing our source system to its knees?
So let’s look at two scenarios;
Scenario 1, I, the auditor, request data from a highly skilled and motivated IT contact who during the course of a few days or even as long as a month, returns with a ‘dump’ of data to which he describes, “I wasn’t sure what you wanted so I exported as much as I could.” Now although this seems like a valuable deed, the large data dump comes with no confirmation of amendments, exclusions or even in a general format to which I can start testing. Never the less, I dive head first into the data and begin the long and tedious task of normalising, formatting and sorting it.
Now, a few days if not weeks have past and I finally have a fair amount of valuable data to start the testing process. Having touched on data analytics at varsity and during my time at one of the big four I confidently start testing the data to find possible red flags and exceptions. Unfortunately, due to the fact that my data analytic skills having been sitting comfortably on the bench gathering dust over the past few years, I quickly begin to realise that complex scripting is well out of my reach and settle for duplicates and fuzzy duplicates.
In the end, it’s consumed much of my valuable time, energy and ultimately patience to get an unclear picture of what I should be auditing. From a business perspective I lack independence and assurance with yet another sample of data therefore limiting my overall coverage of the business.
Scenario 2, I – the auditor – don’t need to request anything from IT. I have a secure and direct connection to my server. I am able to log into a web portal, access an entire library of predefined scripts and run them whenever I need them. Some have even been scheduled to run on my behalf and send the exceptions directly to me or my team for investigation.
Because I have a server, my analytics are able to run against every transaction in the business 365 days a year. Irrelevant of my disparate database and software environment or technical resource availability, I am testing 100% of my data, all of the time. Rest assured my server is grinding through billions of lines of data while I sleep and only communicating when it finds an exception or red flag.
This automated process allows me to begin my work days in a peaceful way with a good cup of coffee (who doesn’t like to start their work day with a good cup of coffee). I have not lost a wink of sleep worrying about what could have gone wrong during the numerous automated processes that our organisation runs overnight. I am fully rested and start the day with the peace of mind that I have 100% coverage with 100% independence and assurance that what I am looking at is a true reflection of the business.
And there you were thinking BIG DATA was a problem.