This post outlines the key principles of data quality — what data quality is and why you should care about it. In my experience, data quality gets much less attention than it deserves. Many of the challenges we face in M&E are the result of poor quality data that could have been addressed or prevented at the outset of a program or activity.
Rather than focus on the amount of the data we collect, I’d like to make the argument here that we should be focusing more on obtaining the highest quality data, even if that means fewer — but arguably better — data points.
Data Quality issues can either prevent us from using data in the way we intended or make our analysis incomplete. As a result, we end up without a clear sense of how the program is doing and which areas we might need to focus more attention on.
In my experience, this is what our roles in M&E often feel like:
We collect lots of data and when things go awry (either we find an error in data that’s been recorded or a field has not been consistently filled in by those collecting data) we feel a sense of overwhelm and confusion about where to start to address and prevent these issues from occurring in the future.
This post introduces the concept of data quality and provides a few tips for how you can incorporate principles of data quality into your work, hopefully, to offset the overwhelm.
So, what is data quality?
Data quality refers to the accuracy or worth of information collected. It is the ability of data to serve the purposes for which it was gathered.
There are a few key dimensions of Data Quality that I want to highlight:
Completeness: meaning there are no gaps in the data from what was supposed to be collected to what was actually collected. This means that essentially everyone on your team is reporting a full set of data.
Consistency: everyone is collecting data in the same way. The goal here is for everyone who collects data to have the same understanding of what to fill in on a data collection form (e.g. for a survey, interview, etc.) and are filling out the form in the same way.
Accuracy: meaning the data recorded is correct and free of errors.
What causes poor data quality?
Missing data: When fields are left blank on our data collection forms then we’re well within the range of what is considered poor data quality. This can happen when the data collector does not have a clear understanding of what they are expected to fill in on the form. Most data collection includes complex forms that require the data collector to know when information should and should not be recorded (e.g. when forms use skip logic that only require responses depending on the interviewees answers). To prevent this from happening, a training for data collectors should always be done before data collection begins.
Data entry errors: This occurs when what is entered into the data system (whether an Excel database or a more advanced online database) does not match what was on the form. In general, the more people who are in contact with the data the greater likelihood there are for errors. For example, if the person filling in the form is different from the person who enters it into the data system then the chances of an error occurring are greater.
Data collection errors: This can be due to a lack of understanding of the concepts included on the form. Another cause of data collection errors could be the format of the form – if the form is difficult for the user to follow, then there are likely to be errors in what they record.
When should we be concerned with data quality?
All the time! If that’s not a sufficient answer for you, here’s a helpful diagram that is useful for thinking through the steps in a data collection process where we need to be monitoring and proactively responding to issues when they arise.
Management of data quality takes place throughout this entire cycle, which comprises of 5 key stages: the data source, data collection, collation, data analysis, reporting and data use.
Let’s walk through each stage of the cycle and talk through examples of what data quality looks like at each step.
Step 1: Data Source – these are the people or communities who are the sources of information we collect on our data forms. Often times these are clients and community members. To promote high quality data at this stage we want to ensure that clients feel comfortable and at ease when sharing information with us. How we approach clients and ask questions for surveys, interviews, and focus group discussions all have implications on the quality of data obtained at this stage.
Step 2: Data Collection – You can face data quality issues at the data collection stage if data collectors do not have a shared understanding of concepts on the form or data collection tool. This can result in inconsistent reporting & errors. One thing to consider at this stage is whether there is additional documentation (i.e. a data collection guidance document) that would help promote a shared understanding of what is to be collected
Step 3: Data Collation – this is a fancy term for receiving and entering data into our data systems. Some considerations at this stage are:
What processes are in place for reviewing the accuracy and completeness of data collection forms before they are entered into your database?
What systems are in place within the database to alert you to any issues around data completeness and accuracy?
Finally, what do you do at this stage if data is found to be inaccurate or incomplete?
The majority of data quality issues should be identified and proactively resolved at this stage. You want to avoid having inaccurate or incomplete data entered in your data systems because this ultimately affects your ability to use the data effectively in these later stages.
Step 4: Data Analysis — here we come back to the concept of building a shared understanding. In order to do analysis well, you need a shared understanding of how data analysis is done. Ask yourself and your team:
Do all staff who are analyzing data have a shared understanding of what you are analyzing?
What processes are in place for double checking the accuracy of data included in analysis?
Although most errors should be caught at the previous stage – it is always a good idea that the person analyzing the data also double check that what they are compiling and calculating does not contain major gaps or incorrect information.
Step 5: Reporting and Use — In terms of data quality, we want to consider how data quality issues are communicated to others when data is shared. Sometimes errors are identified that you’re not able to correct immediately before reporting takes place. When this occurs we have a responsibility to communicate any issues to those we share the data with.
Finally, ask yourself at this stage:
When compiling a report, do we check for accuracy and logical consistency of the figures generated?
And, will the person who receives this report be able to make sense of it?
Hopefully, you can see now how critical Data Quality is at each step of the data management cycle.
Where can I find additional resources on data quality?
Introduction to Data Quality Management, Pact Guide
Free online course in Data Quality from the Global Health Learning Center
Data Quality Resource page from MEASURE Evaluation