Understand what data quality means and how data quality management can solve your bad data woes
How is your data?
If that very question makes you cringe, then you’ve come to the right place.
No one wakes up in the morning thinking “Yay, I get to work with bad quality data today!” Sadly, that’s the way things are in many organizations.
It doesn’t mean it has to be that way. Not if you want to make truly data-driven decisions.
- How can you understand your data better?
- How can you make it accurate, complete, reliable?
- Are there any measures or checks for ensuring data quality?
- In spite of setting up quality checks, you have bad data in your systems. Is there a way to fix it?
- Lastly, is there a way to improve data quality and ensure that the quality doesn’t go downhill again? (hint: it’s called data quality management)
… then good job! It shows that you’re aware of your data problems and are actively looking for help. That’s the first step. #psychology101
After awareness, the next step is to find a solution. So read on as we answer all those questions (and more) on data quality, starting with what refers to data quality.
What is data quality?
Data quality is the ability of your data to serve its intended purpose based on seven distinct characteristics. (If you’re already familiar with data quality, feel free to jump ahead to its characteristics.)
Before exploring these characteristics, let’s understand the concept of data quality better.
Defining data quality
A quick online search will give you countless definitions. After giving it some thought, here’s how we define data quality and high quality data:
Data quality is the answer to the question “How is my data?” If your data helps you with business operations and decisions, then you can say that your data is of good quality.
BTW … of all the sources online, we found the definition from Thomas C. Redman, “Data Doc” and author of the book
Data is generally considered high quality if it is “fit for [its] intended uses in operations, decision making and planning”.
Defining data quality management (DQM)
And the process that you adopt to improve and ensure data quality at all times is called data quality management (DQM).
Now you might wonder, a process? That’s because data quality can’t be a one-time activity—purging a few rows of bad data or adding a glossary with a few key terms. Data quality needs consistent care and attention. DQM is simply the practice of focusing on and consistently improving the quality of your data.
One of the most important parts of DQM is to understand what quality data looks like. So let’s look at the characteristics of data quality.
What are the characteristics of data quality?
There are seven factors that play a huge role in determining data quality.
- Accuracy: Is your data correct, precise, error-free? Without accuracy, your data is misleading and useless.
- Availability: Is the right data available to the right people within your organization? Data has to be available and accessible for the humans of data to do their jobs.
- Completeness: Is your data incomplete? Is some information missing? Incomplete data leads to gaps in information, making it harder to put data to use.
- Granularity: What’s the level of detail that your data can provide? The right degree of granularity in data is necessary for accurate and effective decision-making.
- Relevance: Do you know whether you really need the information that you’ve collected? What’s the purpose of the data you’ve storing? Irrelevant information just ends up wasting your time, effort and money.
- Reliability: Is your data ambiguous, vague or contains contradicting information? In all such cases, the information you have is unreliable and you cannot trust your data.
- Timeliness: Is your data outdated or obsolete? Data collected at the right time is an important measure of data quality. Relying on data that isn’t timely is misleading and can lead to inaccurate decisions.
Now let’s revisit the definition of data quality to make it sound more complete:
If your data is accurate, available and accessible, complete, relevant, reliable, timely, provides the right degree of granularity and helps you with business decision-making, then your data is of good quality.
I know, that’s a lot to ask from your data. But then that’s how important it is to have high-quality data.
Still skeptical about its importance? Then let’s slay your doubts once and for all.
Why is data quality so important?
When your data is poor, incorrect, incomplete and unreliable, the consequences can be quite damaging for your business.
Think back to when you spent two weeks working on a report for sales showing business deals won and lost.
On day 1, everything was hunky-dory… birds chirping, sun shining down on you and your Excel.
But by day 5, the weather had changed to cloudy with a chance of data errors?
Come day 13, you realized that the data was not even reliable—something that you had no way of knowing since you couldn’t see the source nor the changes that happened to it before it reached you.
After all, data that comes to you as an isolated Excel file will never give you the complete context you need to understand the quality of data.
The result? That funnel never got tweaked and the numbers didn’t improve, at least not within the time frame that you’d initially planned.
The problem with bad data
And you’re not alone in this. See what others have to say about the toll that bad data exacts from businesses.
The cost of bad data is 15% to 25% of revenue for most companies.HBR
Knowledge workers waste up to 50% of their time dealing with mundane data quality issues. For data scientists, this number may go as high as 80%.Sloan Management Review
Still unconvinced on the impact of bad data? Here’s a $3.1 trillion dollar reason for you.
The yearly cost of poor quality data, in the US alone, in 2016 was $3.1 trillion.IBM
Bad data + deadlines = chaos & mismanagement
Dealing with erroneous data and misleading information when you’re facing tight deadlines can be exhausting and hardly solves the root problem.
In such cases, you’re most likely to make corrections by yourself using your best guesses so that you meet your deadlines. You’re less likely to look for the person responsible for creating/collecting the wrong data and report the issue.
So instead of fixing the problem once and for all, you’ll just keep implementing temporary fixes, which doesn’t help save time or effort (much like firefighter Charmander here).
Redman summarizes the problem with bad data and its impact on the humans of data in the best possible manner in his HBR article:
Salespeople waste time dealing with erred prospect data; service delivery people waste time correcting flawed customer orders received from sales. Data scientists spend an inordinate amount of time cleaning data; IT expends enormous effort lining up systems that “don’t talk.” Senior executives hedge their plans because they don’t trust the numbers from finance.Thomas C. Redman
And that’s why data quality is important, which leads to finding the solution to the bad data problem.
How to ensure data quality and maintain high quality data?
Resolving data quality issues requires a multifaceted approach that involves people, governance, processes and technologies as key factors. Data and analytics leaders should build a comprehensive data quality operating model including these factors to foster data quality assurance.Gartner
But first things first, before fixing a problem, you need to know why it exists in the first place. What’s causing it?
1. Start at the source
When you come down with something, does
Well, if she’s a good doctor, she would go with the latter, maybe even ask you to run some tests to get to the bottom of the issue.
Ensuring data quality is something similar. Whenever you realize that the data you have is of poor quality, you should spend some time finding out:
- How was the data brought into your organization’s data repositories?
- What was the purpose?
- Who was the creator/owner of that data?
- Who has access? Why?
- Who has made changes/revisions to said data?
- Where has the data been used? (so that once you fix your problem, everyone who has used it for reporting or decision-making gets notified)
While this might sound like going down a deep, dark rabbit hole, it really isn’t. 🐰
To fix your bad data problems, you must start at the source (aka root cause analysis). It may take you longer, but the end result is worth it.
See how Atlan can help you fix data quality problems at the source with end-to-end lineage and cataloging. Take Atlan for a test drive here and let us know what you think!
2. Use data quality tools
Once you’ve figured out the reason behind your bad data problem, the next step is to fix it. You can do it manually, but that sounds tedious, time-consuming and complex, doesn’t it?
Good news is there are plenty of data quality tools available that can come to your aid. 🚀
Data quality tools are the processes and technologies for identifying, understanding and correcting flaws in data that support effective information governance across operational business processes and decision making. The packaged tools available include a range of critical functions, such as profiling, parsing, standardization, cleansing, matching, enrichment and monitoring.Gartner
In other words, data quality tools help you implement DQM within your organization. Since these tools play such an important role, you must ensure that it completely solves all your problems—from finding the problem
Let’s see how by looking at some of these functions.
Data cleaning (also known as cleansing) is the process of removing incorrect or duplicate entries while fixing any dubious entries and missing data. The data quality tool should help you detect and fix such entries.
A platform like Atlan helps you do just that at a glance with auto-generated quality checks (frequency distribution graphs, mean and median calculations or maximum and minimum values).
Another function key to ensuring data quality is data standardization, which helps you ensure that your data is consistent—each data type has the same content and format.
Data standardization gets a lot easier with an auto-generated data dictionary. For more on this, check out our article on
Yet another function is data profiling, which provides you with information about your data (metadata + business context). A data catalog—complete with tags, descriptions, READMEs and a business glossary—easily takes care of this function.
For more on data catalogs, read our super-handy article here.
And those are just a few functions. Your data quality tool should be an end-to-end offering that takes care of all those functions, and more—everything you need to ensure the quality of your data.
See how Atlan is an end-to-end offering for complete data quality management. Sign up for a demo here and bid your bad data problems adieu!
3. Follow best practices for data quality
While data quality tools can help you fix your bad data problem, they’re not enough. Without proper process and quality checks in place, you’ll undoubtedly run into more data quality issues soon enough.
But before that frown on your brow deepens, we have good news! 🎉
We’ve put together six best practices for data quality that will solve your problems once and for all. This also acts as a quick recap+summary of everything we’ve discussed so far. So without further ado, here you go:
- Educate everyone within your organization on data quality. Everyone has a role to play when it comes to better data quality. Get buy-in from management.
- Make data quality a part of your data governance framework, define Quality Assurance (QA) metrics and perform regular QA audits.
- Appoint roles such as data owners, data stewards and data custodians within your organization and establish proper processes to ensure high data quality.
- Investigate quality problems at the source, just like we’ve mentioned above.
- Establish a single source of truth (SSOT) for all your data.
- Automate workflows, especially the ones for data entry and ETL/ELT as they’re responsible for ingesting, transforming and organizing data for further use.
4. Get started with data quality management today
By now you already know that you need to manage data quality at every step—from the moment you ingest data into your systems to building reports for business operations.
With Atlan, you can also:
1. Create a single source of truth
Say bye-bye to inconsistencies in data by creating a single source of truth across all applications, complete with end-to-end lineage and stewardship workflows.
Oh, and if you’ve been wishing for a way to share data as easily as a Google Doc file, then your wish has been granted. Atlan allows you to share all the data you need with the right people using a single URL.
2. Set up granular data governance
Manage data usage, adoption
Get behavior-based insights, data stewardship, reporting, dynamic policy management and more—all under one roof.
Guess what that means? Fines avoided, data quality and integrity ensured and security crisis averted!
3. Receive automatic updates and alerts
If you’re one of those who check their emails every second to keep an eye on everything, we’ve got you covered.
We’ve built a data news feed just for folks like you. Get alerts and notifications on everything, anytime you want.
Sounds too good to be true?
We get you, working with data has never been easy. But it’s possible to bring order to the data chaos with Atlan. So why don’t you take us for a spin?
Schedule a demo with us here and embark on your data quality management journey.