Search for:
Webinar Series: How are top data teams making the move to remote?

Discover how Atlan, a modern data catalog, helps in ensuring data quality

Does this sound familiar?

The Head of Sales asks for a report of gross sales in Q2 2019

Three months later, he asks for the same report and gets completely different numbers. 

Why is this happening? Is the new report incorrect? Or was it the original report? Or … brace yourselves … is the source itself corrupt?

In any case, now it’s impossible to trust the data because of all the inconsistencies (aka data integrity issues) and uncertainties.

Because it’s poor quality data.

And also because, from the looks of it, there’s no effective process (aka data quality management 101) in place for managing and ensuring data quality. 

That’s why it’s so important to care about and ensure data quality. 

Ensuring data quality—is it possible?

A tough question to answer. And an even tougher challenge to solve.

A Reddit thread on data quality
A Reddit thread on data quality
A Reddit thread on data governance and quality issues
A Reddit thread on data governance and quality issues

Pro tip: For more on bad data, check out our article on the cost of bad data here.

Now, it may not be possible to have “superb data quality”, certainly not overnight or by wishing it all away. But you can improve the current state of your data and make life easier for everyone in your organization. With Atlan.

Sales pitch alert!

Hold your horses, we’re going to show you how a platform like Atlan helps in ensuring data quality.

You can certainly go about it all manually and we definitely admire you for it. After all, you’re hardcore!

But it doesn’t have to be that complex. Ensuring and improving data quality can be simpler and less chaotic. See how Atlan can help.

The curious case of inconsistent sales reports (ft. Atlan)

Let’s go back to our case of inconsistent reporting on gross sales for Q2 2019

The sales head could go back to the data team and demand an explanation. The data team would then scramble to pinpoint the exact reason why. 

Meanwhile the sales head just spends more days not knowing how to proceed with business acquisition for the upcoming quarter.

Or, he could investigate the root cause of his data quality problems and find answers to his questions within minutes. With Atlan. Here’s how.

1. Basic quality checks with Atlan’s data catalog

First things first.

Is the data set verified? Or is it a data set that hasn’t been completely cleaned and checked for errors

With the Status tags on Atlan’s data catalog, this is one of the easiest checks to do. 

How status tags work for data sets inside Atlan
How status tags work for data sets inside Atlan

At a glance, you also know who owns the data set and when was the last update made—all handy information when you’re investigating the root cause of inconsistencies in data.

Oh, while you’re at it, you can also check if the metadata is proper and whether the business glossary (or data glossary) is complete and makes sense.

If everything checks out, then the problem is with the data itself. Wouldn’t it be nice to be able to spot if there’s an error in your data instantly?

2. Automatic data checks with Atlan’s auto-generated data dictionary

As soon as your data is in Atlan’s catalog, you can auto-generate a data dictionary for your data. The dictionary lets you detect anomalies at a glance with first-level data checks such as frequency distribution graphs, minimum, maximum, unique and missing values.

Pro tip: Curious to learn more about data dictionaries and understand the nitty-gritty of business glossaries? We’ve got you covered. Check out our article on data dictionary covering the concept, examples, data dictionary best practices and more.

The auto-generated data dictionary inside Atlan's data catalog
The auto-generated data dictionary inside Atlan’s data catalog

How would this help? 

Let’s say the total sales from a region is $500,000. Then the sales from individual cities in that region must be less than $500,000. 

One look at the first-level checks on Atlan’s data dictionary tells you that there’s an outlier to this condition if the maximum value for total sales is, let’s say, $700,000.

What’s more? From the dictionary, you can also see how many entries within your data set have that value. In our case, let’s say the sales head finds 30 such entries.

<Spidey senses going on high alert

Now you know there’s something wrong with the total sales value. But how do we find out row is it

3. No-code querying for insights with Atlan’s visual SQL editor

All roads lead to SQL.

You can write a few lines of SQL to find out which are the 30 entries with total sales value a $700,000.

<But I don’t know SQL!>

That’s alright, you don’t need to. Use Atlan’s visual SQL editor—as easy as setting up filters in Excel—and sit back while the magic happens.

The visual SQL editor inside Atlan's data catalog
The visual SQL editor inside Atlan’s data catalog

You can run queries within a matter of seconds on your browser and instantly get all the information you need. Without depending on IT. 🎉

After running the query, our sales head would have all the 30 inconsistent rows. What happens next?

4. Investigate further with Atlan’s data lineage explorer

Why do you even need data lineage?

The obvious and yet the most complex task of all, but it has to be done. Otherwise, you’ll only end up putting out temporary fires.

Now there are two things to investigate:

  1. Is this a one-time thing or a recurring error?
  2. Where is the data coming from? (to check whether the error is in one of the workflows or the source itself)

To know if there are more such instances, compare the data you have with other data sets that also report gross sales.

If you’re picturing yourself digging through trillions of reports, stop right there.

It’s way easier than whatever nightmarish scenario you’re picturing. And it also helps you figure out the origins of the data—feeding two birds with one scone!

What’s this miraculous solution, you ask? Two word: data lineage. And Atlan has one in-built precisely for situations such as these.

Inside Atlan's data lineage explorer
Inside Atlan’s data lineage explorer

Data lineage with Atlan

Atlan’s data lineage explorer lets you see:

  1.  All the reports that get data from the same source as yours
  2. Workflows (ETL/ELT) that helped populate the data in the gross sales report
  3. The sources (Amazon S3, Azure data lake, Salesforce or Excel, among others) that the ETL/ELT workflows use 

Tracing the origins and usage of the data you have will help you pinpoint where it all started going wrong—be it the workflows transforming source data or the source data itself. Root cause analysis simplified!

If it’s one of the workflows, then you can look into the transformations and see what’s causing the error and rectify it. 

On the other hand, if it’s the source itself, then you can create a new workflow with a more accurate, updated and error-free source.

Also, since Atlan’s data catalog helps you set up a single source of truth for all your data, once you fix the root cause, it will fix the problem once and for all. 

Pro tip: Curious about data catalogs? Then read our article on the topic here.

Now what? Like we mention in our article on data quality management, managing data quality isn’t a one-time thing. It’s a continuous process. Which leads us to the next step.

5. Set up quality checks with Atlan

Since we know the sales from an individual city cannot exceed $500,000, adding that as a quality check makes sure you never have to deal with outliers (like $700,000 sales value from 30 cities).

This brings you one step closer to working with high-quality data.

The quality checks inside Atlan's data governance platform
The quality checks inside Atlan’s data governance platform

You can set up these checks using good old SQL. Or go a step further and create custom data quality checks using R or Python scripts. 

Once you’ve put the quality checks in place, automate those workflows and let Atlan weed out bad data right at the source. No more hiccups!

6. Collaborate and communicate better

While setting up processes helps in ensuring data quality, it doesn’t guarantee it. Not unless you involve the right people. 

Now before you picture yourself looking up several people and sending them emails, see how Atlan makes it simpler.

Discussions inside Atlan's data catalog
Discussions inside Atlan’s data catalog

With Discussions, add details of everything you did—from the moment you spotted the error to the checks you’ve in place for all future reports. 

Once you’re done, tag the owner of the data set and also others who use this data set for their reports and everyone gets a notification.

You can also set up a workflow using Atlan Projects to set emails to all these people, with specifics of the changes you made, error logs and the URL of the new, error-free data set.

A simple workflow to send emails
A simple workflow to send emails

And… scene!

Final word

Now comes the sales pitch you’ve been waiting for! 

With Atlan, any business user (case in point, our dear sales head with the inconsistent sales report) can check, improve and ensure data quality. From detecting anomalies and writing queries to tracing data lineage and implementing quality checks, the possibilities are endless.

Self-service analytics? Check!

Democratize data? Check!

Empower business users? Also check!

So why don’t you sign up for a demo and see Atlan in action for yourself? 

Author

Editor and Content Lead, Humans of Data

Write A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.