Atlan | Humans of Data Atlan | Humans of Data
  • Articles
    Data Governance in the AI Era: 3 Big Problems and How to Solve Them
    March 4, 2025
    Breaking Down Data Silos: A Practical Framework from the Field
    February 26, 2025
    Convergence, Consumerization, and AI: Unpacking the Top Trends from Gartner’s Data Governance MQ
    February 25, 2025
    AI Governance with Atlan: AI Use Cases, Risk Assessments, Workflows & Shadow AI Governance
    February 21, 2025
    2024 at Atlan: A Love Letter to the Humans of Data 💌
    February 20, 2025
    North Drives Millions in Business Value Through Governance, Self-service, and Atlan
    December 16, 2024
    Contentsquare: An Active Metadata Pioneer
    November 26, 2024
    Optimizely: An Active Metadata Pioneer
    October 17, 2024
    Global Excel Management: An Active Metadata Pioneer
    September 27, 2024
    ChargePoint: An Active Metadata Pioneer
    September 25, 2024
    Previous Next
  • Community
    AI Governance with Atlan: AI Use Cases, Risk Assessments, Workflows & Shadow AI Governance
    February 21, 2025
    North Drives Millions in Business Value Through Governance, Self-service, and Atlan
    December 16, 2024
    Contentsquare: An Active Metadata Pioneer
    November 26, 2024
    Optimizely: An Active Metadata Pioneer
    October 17, 2024
    Global Excel Management: An Active Metadata Pioneer
    September 27, 2024
    ChargePoint: An Active Metadata Pioneer
    September 25, 2024
    Kiwi.com’s Path to Data Democratization, Powered by Data Products
    September 24, 2024
    Tala: An Active Metadata Pioneer
    March 19, 2024
    Telefónica Tech: An Active Metadata Pioneer
    March 19, 2024
    Signifyd: An Active Metadata Pioneer
    March 7, 2024
    Previous Next
  • Guides
    • The Future of the Modern Data Stack in 2023
    • The Third-Generation Data Catalog Primer
    • The Secrets of a Modern Data Leader
    • The Ultimate Guide to Evaluating Data Lineage
    • How Active Metadata Helps Modern Organizations Embrace the DataOps Way
  • Inside Atlan
    Embarking on a New Chapter as Chief Revenue Officer at Atlan
    May 2, 2024
    What makes our Chief Revenue Officer, Jim Smittkamp special?
    May 2, 2024
    Monte Carlo + Atlan Scaling Data Trust and Collaboration with Monte Carlo and Atlan’s New Integration
    May 25, 2023
    Atlan Debuts as a Leader in 3 Categories—Data Governance, Machine Learning Data Catalogs, and Data Quality—in the G2 Spring 2023 Grid® Reports
    April 7, 2023
    Product Roundup: Our 15 Favorite Features from 2022
    January 24, 2023
    December Product Roundup: 11 Reasons for Holiday Cheer
    January 5, 2023
    Introducing Supercharged Automation for your Data Estate
    December 19, 2022
    November Product Roundup: 13 Features to Feed Your Data Appetite
    December 7, 2022
    Atlan + dbt Labs partnership and integration Atlan Becomes dbt Semantic Layer Launch Partner and Announces Integration
    October 18, 2022
    Atlan + Fivetran Metadata API Column-Level Lineage Atlan Partners with Fivetran and Launches Integration with Metadata API
    September 22, 2022
    Previous Next
  • Facebook
  • Twitter
  • YouTube
  • LinkedIn

The Future of Data Catalogs

By Prukalpa Sankar May 10, 2022
Let’s visit a website just to “browse the metadata,” said no one ever.

Last Friday, Data Twitter was buzzing with Josh Wills’ tweet about metadata and business intelligence.

To my many friends/followers doing metadata/catalog startups, I have a request: please integrate the metadata info with my BI tool so that I can see it *while I am doing queries.*

I have no desire to *ever* visit a third website to just "browse the metadata."

— Josh Wills (@josh_wills) April 29, 2022

At Atlan, we started as a data team, and we failed three times at implementing a data catalog. As a data leader who saw these projects fail, I found that the biggest reason data catalogs fail is the user experience. This isn’t just about a beautiful user interface though. It’s about truly understanding how people work and giving them the best possible experience.

People like Josh want context where they are, when they need it.

For example, when you’re in a BI tool like Looker, you inevitably think, “Do I trust this dashboard?” or “What does this metric mean?” And the last thing anyone wants to do is open up another tool (aka the traditional data catalog), search for the dashboard, and browse through metadata to answer that question.

Imagine a world where data catalogs don’t live in their own “third website”. Instead, a user can get all the context where they need it — either in the BI tool of their choice or whatever tool they’re already in, whether that’s Slack, Jira, the query editor, or the data warehouse.

Active metadata - Atlan + Looker
Active metadata in Looker

I believe this is the future of data catalogs — activating metadata and bringing metadata back into the daily workflows of data teams.

In Josh’s words, ‘It’s like reverse ETL but for metadata’.

Why don’t data catalogs work like this today?

Traditionally, data catalogs were built to be passive. They brought metadata from a bunch of different tools into another tool called the “data catalog” or the “data governance tool”.

The problem with this approach — it tries to solve a “too many silos” problem by adding one more siloed tool. That doesn’t solve the problem that users like Josh face every day. Eventually, user adoption suffers!

A senior data leader at a large company called these data catalogs “expensive shelfware”, or software that sits on the shelf and never gets used.

Active metadata vs passive metadata (the old way of data cataloging)

How can we save data catalogs from becoming shelfware?

Think about the modern tools we use and love today — GitHub, Figma, Slack, Notion, Superhuman, etc.

One common thing across all these tools is the concept of flow. In the words of Rahul Vora (Founder of Superhuman):

Flow is a magical feeling.

Time melts away. Your fingers dance across the keyboard. You’re driven by boundless energy and a wellspring of creativity — you are completely absorbed by your task.

Flow turns work into play.

Rahul Vora, Superhuman

The secret to magical data experiences lies in flow. These great user experiences aren’t about the macro-flows. They’re about micro-flows, like not having to switch to a separate data catalog to get context for the dashboards in your BI tool. There are dozens of micro-flows like this that can power magical experiences and completely change the way that data users feel about their work.

Therein lies the promise of active metadata.

What is active metadata?

Instead of just collecting metadata from the rest of the stack and bringing it back into a passive data catalog, active metadata platforms make a two-way movement of metadata possible, sending enriched metadata back into every tool in the data stack.

My favorite explanation of “active metadata” and how it is different from traditional, passive approaches actually goes back to… the dictionary.

“If you describe someone as passive, you mean that they do not take action but instead let things happen to them.”

Collins Dictionary

Being “active” is about always being engaged and moving forward, rather than sitting back and letting things happen around you.

Take a moment to think about this means in the context of metadata, and it paints a picture of what active metadata can be — when metadata transforms into “action” to make our data experiences better.

Achieving flow through active metadata

The only reality in data teams is diversity — a diversity of people, tools, and technology. Diversity that leads to chaos and sub-optimal experiences for everyone involved.

The key to wrangling this diversity and achieving flow lies in metadata. It’s the common thread across all of our tools that gives the context we’re desperately lacking every time we bounce between tools to figure out what’s going on with a data project.

  • When you’re browsing through the lineage of a data asset and find an issue, you can create a Jira ticket right then and there.
  • When you ask a question about a data asset in Slack, a bot brings context about that asset directly to you in Slack.
  • When you are pushing to production in GitHub, a bot runs through the lineage and dependencies and gives you a “green” status that you’re not going to break anything — right in GitHub.
Activating experiences with active metadata

Going beyond the data catalog

The “data catalog” is just a single use case of metadata — helping users understand their data assets. But that barely scratches the surface of what metadata can do.

Activating metadata holds the key to dozens of use cases like observability, cost management, remediation, quality, security, programmatic governance, auto-tuned pipelines, and more.

The more I think about this, the more I have begun to believe that active metadata can make intelligent data dream a reality.

Here’s an example of how it could work:

  1. With active metadata, you could use past usage metadata from BI tools to understand which dashboards are used the most and when people use them.
  2. End-to-end lineage connects these dashboards to the tables that power them in the data warehouse.
  3. Operational metadata shows connected compute workloads, associated data pipelines, and run times.

Couldn’t we use all of this information to auto-tune our pipelines and compute, optimizing for a great user experience (updated data in the dashboard when people need it, and best performance at the time of max usage) while minimizing costs?

Active metadata platform

Beyond that, it feels like the use cases of active metadata are limitless. It has the potential to bring intelligence and flow to every part of the data stack and truly act as the gateway to the data stack of our dreams — a truly intelligent data system.

  • Automatically deduce the owners and experts for data tables or dashboards based on SQL query logs
  • Automatically stop downstream pipelines when a data quality issue is detected, and use past records to predict what went wrong and fix it without human intervention
  • Automatically purge low-quality or outdated data products
  • and much more

In the past few years, it has been heartening to see active metadata become the de facto standard for next generation metadata, with even Gartner releasing its inaugural Market Guide for Active Metadata a few months ago. This may sound a little crazy, but in a world with self-driving cars, smart houses, and rovers that navigate themselves across Mars, why can’t we imagine a smarter data experience powered by our wealth of metadata?


Want to learn more about third-generation data catalogs and the rise of active metadata? Check out our ebook!

This article was originally published on Towards Data Science.

active metadataData Managementmetadata management
Author Prukalpa Sankar

    Related Posts

    North Drives Millions in Business Value Through Governance, Self-service, and Atlan

    December 16, 2024

    Optimizely: An Active Metadata Pioneer

    October 17, 2024

    Global Excel Management: An Active Metadata Pioneer

    September 27, 2024

    Write A Comment Cancel Reply

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    © 2023 Atlan. All registered.

      Atlan | Humans of Data
      • Articles
      • Community
      • Guides
        • The Future of the Modern Data Stack in 2023
        • The Third-Generation Data Catalog Primer
        • The Secrets of a Modern Data Leader
        • The Ultimate Guide to Evaluating Data Lineage
        • How Active Metadata Helps Modern Organizations Embrace the DataOps Way
      • Inside Atlan

      Type above and press Enter to search. Press Esc to cancel.