Atlan | Humans of Data Atlan | Humans of Data
  • Articles
    How to Kickstart a Data Governance Program
    January 25, 2023
    Product Roundup: Our 15 Favorite Features from 2022
    January 24, 2023
    Gordon Wong: Using a data contract to build trust in modern data teams
    January 17, 2023
    The Future of the Modern Data Stack in 2023
    January 10, 2023
    December Product Roundup: 11 Reasons for Holiday Cheer
    January 5, 2023
    colorful high-rise buildings What I got wrong: Looking back at my 2022 predictions for the modern data stack
    December 21, 2022
    2022 Reading List: The Top 5 Must-Read Blogs on the Modern Data Stack
    December 21, 2022
    Introducing Supercharged Automation for your Data Estate
    December 19, 2022
    Erica Louie, Head of Data at dbt: Leadership lessons and fresh perspectives on managing a modern data team
    December 7, 2022
    November Product Roundup: 13 Features to Feed Your Data Appetite
    December 7, 2022
    Previous Next
  • Community
    2022 Reading List: The Top 5 Must-Read Blogs on the Modern Data Stack
    December 21, 2022
    Gartner Data & Analytics Summit 2022 Key Takeaways from Gartner Data & Analytics Summit 2022: Augmented Analysis, Synthetic Data, Adaptive Governance, and More
    August 24, 2022
    Gartner Data & Analytics Summit from Aug 22-24, 2022, in Orlando, FL Gartner Data & Analytics Summit: Don’t Miss These Sessions on Active Metadata, DataOps, and More
    August 11, 2022
    Databricks Data and AI Summit 2022 Data + AI Summit 2022: Recapping 11 Major Announcements across 4 Keynotes
    July 3, 2022
    Great Data Debate - Future of Metrics Layer and Metadata with Prukalpa Sankar (Atlan), Drew Banin (dbt), and Nick Handel (Transform) Future of the Metrics Layer with Drew Banin (dbt) and Nick Handel (Transform)
    May 19, 2022
    Why Snapcommerce Chose to Start Data Cataloging
    March 11, 2022
    Subsurface LIVE Winter 2022 Conference Subsurface 2022: Leveraging DataOps to Build India’s National Data Platform
    March 3, 2022
    Postman-blog-missing-layer-data-stack A Behind-the-Scenes Look at How Postman’s Data Team Works
    October 7, 2021
    MDSCON 2021 - Modern Data Stack Conference MDSCON 2021: Learnings and Insights from Our Favorite Sessions
    September 23, 2021
    Postman-blog-missing-layer-data-stack How Postman Fixed a Missing Layer in their Data Stack
    September 13, 2021
    Previous Next
  • Guides
    • The Future of the Modern Data Stack in 2023
    • The Third-Generation Data Catalog Primer
    • The Secrets of a Modern Data Leader
    • The Ultimate Guide to Evaluating Data Lineage
    • How Active Metadata Helps Modern Organizations Embrace the DataOps Way
  • Inside Atlan
    Product Roundup: Our 15 Favorite Features from 2022
    January 24, 2023
    December Product Roundup: 11 Reasons for Holiday Cheer
    January 5, 2023
    Introducing Supercharged Automation for your Data Estate
    December 19, 2022
    November Product Roundup: 13 Features to Feed Your Data Appetite
    December 7, 2022
    Atlan + dbt Labs partnership and integration Atlan Becomes dbt Semantic Layer Launch Partner and Announces Integration
    October 18, 2022
    Atlan + Fivetran Metadata API Column-Level Lineage Atlan Partners with Fivetran and Launches Integration with Metadata API
    September 22, 2022
    Introducing the DataOps Leaders Program and its Inaugural Cohort of Inspiring Data Leaders
    July 28, 2022
    Atlan Pioneering Active Metadata with a Brand New Look and Features
    July 18, 2022
    Auto-generated, column-level lineage in Atlan for an Amazon Redshift asset How to Collaborate Across Your AWS Data Stack with Atlan
    May 13, 2022
    atlan snowflake ready technology partner Atlan Is the First Data Catalog Validated as a Snowflake Ready Technology Partner
    April 26, 2022
    Previous Next
  • Facebook
  • Twitter
  • YouTube
  • LinkedIn

The Future of Data Catalogs

By Prukalpa Sankar May 10, 2022
Let’s visit a website just to “browse the metadata,” said no one ever.

Last Friday, Data Twitter was buzzing with Josh Wills’ tweet about metadata and business intelligence.

To my many friends/followers doing metadata/catalog startups, I have a request: please integrate the metadata info with my BI tool so that I can see it *while I am doing queries.*

I have no desire to *ever* visit a third website to just "browse the metadata."

— Josh Wills (@josh_wills) April 29, 2022

At Atlan, we started as a data team, and we failed three times at implementing a data catalog. As a data leader who saw these projects fail, I found that the biggest reason data catalogs fail is the user experience. This isn’t just about a beautiful user interface though. It’s about truly understanding how people work and giving them the best possible experience.

People like Josh want context where they are, when they need it.

For example, when you’re in a BI tool like Looker, you inevitably think, “Do I trust this dashboard?” or “What does this metric mean?” And the last thing anyone wants to do is open up another tool (aka the traditional data catalog), search for the dashboard, and browse through metadata to answer that question.

Imagine a world where data catalogs don’t live in their own “third website”. Instead, a user can get all the context where they need it — either in the BI tool of their choice or whatever tool they’re already in, whether that’s Slack, Jira, the query editor, or the data warehouse.

Active metadata - Atlan + Looker
Active metadata in Looker

I believe this is the future of data catalogs — activating metadata and bringing metadata back into the daily workflows of data teams.

In Josh’s words, ‘It’s like reverse ETL but for metadata’.

Why don’t data catalogs work like this today?

Traditionally, data catalogs were built to be passive. They brought metadata from a bunch of different tools into another tool called the “data catalog” or the “data governance tool”.

The problem with this approach — it tries to solve a “too many silos” problem by adding one more siloed tool. That doesn’t solve the problem that users like Josh face every day. Eventually, user adoption suffers!

A senior data leader at a large company called these data catalogs “expensive shelfware”, or software that sits on the shelf and never gets used.

Active metadata vs passive metadata (the old way of data cataloging)

How can we save data catalogs from becoming shelfware?

Think about the modern tools we use and love today — GitHub, Figma, Slack, Notion, Superhuman, etc.

One common thing across all these tools is the concept of flow. In the words of Rahul Vora (Founder of Superhuman):

Flow is a magical feeling.

Time melts away. Your fingers dance across the keyboard. You’re driven by boundless energy and a wellspring of creativity — you are completely absorbed by your task.

Flow turns work into play.

Rahul Vora, Superhuman

The secret to magical data experiences lies in flow. These great user experiences aren’t about the macro-flows. They’re about micro-flows, like not having to switch to a separate data catalog to get context for the dashboards in your BI tool. There are dozens of micro-flows like this that can power magical experiences and completely change the way that data users feel about their work.

Therein lies the promise of active metadata.

What is active metadata?

Instead of just collecting metadata from the rest of the stack and bringing it back into a passive data catalog, active metadata platforms make a two-way movement of metadata possible, sending enriched metadata back into every tool in the data stack.

My favorite explanation of “active metadata” and how it is different from traditional, passive approaches actually goes back to… the dictionary.

“If you describe someone as passive, you mean that they do not take action but instead let things happen to them.”

Collins Dictionary

Being “active” is about always being engaged and moving forward, rather than sitting back and letting things happen around you.

Take a moment to think about this means in the context of metadata, and it paints a picture of what active metadata can be — when metadata transforms into “action” to make our data experiences better.

Achieving flow through active metadata

The only reality in data teams is diversity — a diversity of people, tools, and technology. Diversity that leads to chaos and sub-optimal experiences for everyone involved.

The key to wrangling this diversity and achieving flow lies in metadata. It’s the common thread across all of our tools that gives the context we’re desperately lacking every time we bounce between tools to figure out what’s going on with a data project.

  • When you’re browsing through the lineage of a data asset and find an issue, you can create a Jira ticket right then and there.
  • When you ask a question about a data asset in Slack, a bot brings context about that asset directly to you in Slack.
  • When you are pushing to production in GitHub, a bot runs through the lineage and dependencies and gives you a “green” status that you’re not going to break anything — right in GitHub.
Activating experiences with active metadata

Going beyond the data catalog

The “data catalog” is just a single use case of metadata — helping users understand their data assets. But that barely scratches the surface of what metadata can do.

Activating metadata holds the key to dozens of use cases like observability, cost management, remediation, quality, security, programmatic governance, auto-tuned pipelines, and more.

The more I think about this, the more I have begun to believe that active metadata can make intelligent data dream a reality.

Here’s an example of how it could work:

  1. With active metadata, you could use past usage metadata from BI tools to understand which dashboards are used the most and when people use them.
  2. End-to-end lineage connects these dashboards to the tables that power them in the data warehouse.
  3. Operational metadata shows connected compute workloads, associated data pipelines, and run times.

Couldn’t we use all of this information to auto-tune our pipelines and compute, optimizing for a great user experience (updated data in the dashboard when people need it, and best performance at the time of max usage) while minimizing costs?

Active metadata platform

Beyond that, it feels like the use cases of active metadata are limitless. It has the potential to bring intelligence and flow to every part of the data stack and truly act as the gateway to the data stack of our dreams — a truly intelligent data system.

  • Automatically deduce the owners and experts for data tables or dashboards based on SQL query logs
  • Automatically stop downstream pipelines when a data quality issue is detected, and use past records to predict what went wrong and fix it without human intervention
  • Automatically purge low-quality or outdated data products
  • and much more

In the past few years, it has been heartening to see active metadata become the de facto standard for next generation metadata, with even Gartner releasing its inaugural Market Guide for Active Metadata a few months ago. This may sound a little crazy, but in a world with self-driving cars, smart houses, and rovers that navigate themselves across Mars, why can’t we imagine a smarter data experience powered by our wealth of metadata?


Want to learn more about third-generation data catalogs and the rise of active metadata? Check out our ebook!

This article was originally published on Towards Data Science.

active metadataData Managementmetadata management
Author Prukalpa Sankar

    Related Posts

    Semantic Layer

    The Past, Present, and Future of the Semantics Layer

    November 1, 2022

    How to Improve Data Discovery with Persona-Driven Strategies

    October 12, 2022

    Forrester changed the way they think about data catalogs, and here’s what you need to know

    October 4, 2022

    Write A Comment Cancel Reply

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    © 2023 Atlan. All registered.

      Atlan | Humans of Data
      • Articles
      • Community
      • Guides
        • The Future of the Modern Data Stack in 2023
        • The Third-Generation Data Catalog Primer
        • The Secrets of a Modern Data Leader
        • The Ultimate Guide to Evaluating Data Lineage
        • How Active Metadata Helps Modern Organizations Embrace the DataOps Way
      • Inside Atlan

      Type above and press Enter to search. Press Esc to cancel.