I was recently working with a client that asked me to help with how to tackle their data quality problems and come up with a "very tactical plan" to address them. Like all consultants, I proposed an assessment of what we were dealing with. Every problem has its particularities, but it is possible to classify each by groups of similar root causes.

To start, we used Smartsheet, a peculiarly flexible online spreadsheet system with a lot of collaboration features to crowdsource and manipulate any list. We were able to use a form to get data quality problems plaguing the organization.

Next, we classified all the issues based on Laura Sebastian-Coleman's book A Data Quality Assessment Framework:

In my client's case, most of the data quality issues were evenly distributed among the top four in the list.

Data Entry Problems

What is interesting is that the ways to solve each of these are pretty standard. Let's take the example of Data Entry Problems.

The first opportunity for remediation is to analyze the source system processes and fix the source of the problem. Other times, the problem can be fixed by implementing controls that will prevent future errors from being captured.

Then, it is about awareness. It is making users more aware of the impacts that these errors have downstream and the cost and effort in trying to correct things after the fact. What helps is to show how the data is used downstream and to solicit the users' input to improve processes.

I will write next about the improvement opportunities for other root causes soon.


©2020 by Modern Data Analytics