Atlan | Humans of Data Atlan | Humans of Data
  • Articles
    How to Kickstart a Data Governance Program
    January 25, 2023
    Product Roundup: Our 15 Favorite Features from 2022
    January 24, 2023
    Gordon Wong: Using a data contract to build trust in modern data teams
    January 17, 2023
    The Future of the Modern Data Stack in 2023
    January 10, 2023
    December Product Roundup: 11 Reasons for Holiday Cheer
    January 5, 2023
    colorful high-rise buildings What I got wrong: Looking back at my 2022 predictions for the modern data stack
    December 21, 2022
    2022 Reading List: The Top 5 Must-Read Blogs on the Modern Data Stack
    December 21, 2022
    Introducing Supercharged Automation for your Data Estate
    December 19, 2022
    Erica Louie, Head of Data at dbt: Leadership lessons and fresh perspectives on managing a modern data team
    December 7, 2022
    November Product Roundup: 13 Features to Feed Your Data Appetite
    December 7, 2022
    Previous Next
  • Community
    2022 Reading List: The Top 5 Must-Read Blogs on the Modern Data Stack
    December 21, 2022
    Gartner Data & Analytics Summit 2022 Key Takeaways from Gartner Data & Analytics Summit 2022: Augmented Analysis, Synthetic Data, Adaptive Governance, and More
    August 24, 2022
    Gartner Data & Analytics Summit from Aug 22-24, 2022, in Orlando, FL Gartner Data & Analytics Summit: Don’t Miss These Sessions on Active Metadata, DataOps, and More
    August 11, 2022
    Databricks Data and AI Summit 2022 Data + AI Summit 2022: Recapping 11 Major Announcements across 4 Keynotes
    July 3, 2022
    Great Data Debate - Future of Metrics Layer and Metadata with Prukalpa Sankar (Atlan), Drew Banin (dbt), and Nick Handel (Transform) Future of the Metrics Layer with Drew Banin (dbt) and Nick Handel (Transform)
    May 19, 2022
    Why Snapcommerce Chose to Start Data Cataloging
    March 11, 2022
    Subsurface LIVE Winter 2022 Conference Subsurface 2022: Leveraging DataOps to Build India’s National Data Platform
    March 3, 2022
    Postman-blog-missing-layer-data-stack A Behind-the-Scenes Look at How Postman’s Data Team Works
    October 7, 2021
    MDSCON 2021 - Modern Data Stack Conference MDSCON 2021: Learnings and Insights from Our Favorite Sessions
    September 23, 2021
    Postman-blog-missing-layer-data-stack How Postman Fixed a Missing Layer in their Data Stack
    September 13, 2021
    Previous Next
  • Guides
    • The Future of the Modern Data Stack in 2023
    • The Third-Generation Data Catalog Primer
    • The Secrets of a Modern Data Leader
    • The Ultimate Guide to Evaluating Data Lineage
    • How Active Metadata Helps Modern Organizations Embrace the DataOps Way
  • Inside Atlan
    Product Roundup: Our 15 Favorite Features from 2022
    January 24, 2023
    December Product Roundup: 11 Reasons for Holiday Cheer
    January 5, 2023
    Introducing Supercharged Automation for your Data Estate
    December 19, 2022
    November Product Roundup: 13 Features to Feed Your Data Appetite
    December 7, 2022
    Atlan + dbt Labs partnership and integration Atlan Becomes dbt Semantic Layer Launch Partner and Announces Integration
    October 18, 2022
    Atlan + Fivetran Metadata API Column-Level Lineage Atlan Partners with Fivetran and Launches Integration with Metadata API
    September 22, 2022
    Introducing the DataOps Leaders Program and its Inaugural Cohort of Inspiring Data Leaders
    July 28, 2022
    Atlan Pioneering Active Metadata with a Brand New Look and Features
    July 18, 2022
    Auto-generated, column-level lineage in Atlan for an Amazon Redshift asset How to Collaborate Across Your AWS Data Stack with Atlan
    May 13, 2022
    atlan snowflake ready technology partner Atlan Is the First Data Catalog Validated as a Snowflake Ready Technology Partner
    April 26, 2022
    Previous Next
  • Facebook
  • Twitter
  • YouTube
  • LinkedIn

5 Trends Driving the New World of Metadata in 2022

By Prukalpa Sankar February 2, 2022
These trends have converged to create a storm around a new, modern idea of metadata.

Last year, we hit some major landmarks in the world of metadata. Gartner scrapped its Magic Quadrant for Metadata Management, companies started asking for third-generation data catalogs, and modern metadata companies (like mine!) launched and raised some serious VC money. 

All of this actually prompted me to add metadata as one of my six key data ideas for this year.

But why is metadata such a hot topic in the data world now? What’s behind all of this hype?

In this article, I’ll break down the five trends behind this new world of metadata. Some kicked off over half a decade ago, while some are just months old — and today they’ve converged to create a storm around a new, modern idea of metadata.

TL;DR: The five trends

  • The modern data stack went mainstream, featuring a full range of unprecedented fast, flexible, cloud-native tools. The problem — metadata has been left out.
  • Data teams are more diverse than ever, leading to chaos and collaboration overhead. Context is key, and metadata is the solution.
  • Data governance is being reimagined from top-down, centralized rules to bottom-up, decentralized initiatives — which requires a similar reimagining for metadata platforms.
  • As metadata is becoming big data, the metadata lake has infinite use cases for today and tomorrow.
  • Passive metadata systems are being scrapped in favor of active metadata platforms.

1. The creation of the modern data stack

Starting around 2016, the modern data stack went mainstream. This refers to a flexible collection of tools and capabilities that help businesses today store, manage, and use their data.

These tools are unified by three key ideas:

  • Self-service for a diverse range of users
  • “Agile” data management
  • Cloud-first and cloud-native

Today’s modern data stack is easy to set up, pay as you go, and plug and play — people won’t put up with anything else these days! Tools like Fivetran and Snowflake let users set up a data warehouse in less than 30 minutes.

In an ecosystem of increasingly easy, fast, interconnected data tools, the old idea of metadata — passive, siloed data inventories, powered by an army of data stewards — just doesn’t cut it anymore. Many of the earlier second-generation data catalogs still need significant engineering time for setup, not to mention at least five calls with a sales representative to get a demo. So is anyone surprised that the data world is eagerly searching for a better way to handle metadata?

Read more about the modern data stack.

The latest version of the modern data stack

2. The diverse humans of data

A few years ago, only the “IT team” would get their hands dirty with data.

However, today’s data teams are more diverse than ever before. They include data engineers, analysts, analytics engineers, data scientists, product managers, business analysts, citizen data scientists, and more. Each of these people has their own favorite, equally diverse data tools — everything from SQL, Looker, and Jupyter to Python, Tableau, dbt, and R.

This diversity is both a strength and a struggle.

The new diverse data team

All of these people have different tools, skill sets, tech stacks, work styles, and ways of approaching a problem… Essentially, they each have a unique “data DNA”. More diverse perspectives mean more opportunities for creative solutions and out-of-the-box thinking. However, it also usually means more chaos within collaboration.

This diversity also means that self-service is no longer optional. Modern data tools need to be intuitive for a wide range of users with a wide range of skill sets. If someone wants to bring data into their work, they should be able to easily find the data they need without having to ask an analyst or file a request.

Metadata is emerging as the solution to these challenges. As Benn Stancil wrote, “Today’s data stack is quickly fracturing into smaller and more specialized pieces, and we need something that binds it all together.” His answer to this is metadata. Metadata is evolving to provide critical context as we continue to bring an increasingly diverse set of people and tools into our data ecosystem.

Read more about the humans of data.

3. The new vision for data governance

Data governance is seen as a bureaucratic, restrictive process — a set of rules dropped down from on high to slow down your work. And the reality is, that’s often how it actually works.

Companies surround their data with complex security processes and restrictions, all dictated by a distant data governance team.

However, as the modern data stack has made it easier to ingest and transform data, this idea of data governance has become one of the biggest barriers in daily data work.

For the first time, the need for governance is being felt bottom-up by practitioners, instead of being enforced top-down due to regulation. That’s why data governance is currently in the middle of a paradigm shift.

Quoted from Tristan Handy’s “The Modern Data Stack: Past, Present, and Future”

Today, governance is becoming something that the humans of data embrace rather than fear. At its heart, it’s now less about control, and more about helping data teams work better together.

As a result, data governance is being reimagined as a set of collaborative best practices by and for amazing data teams — ones that are about empowering and creating better data teams, not controlling them.

Today’s data governance evolution

Modern, community-led data governance needs a whole new kind of metadata management platform. For example, the old way of top-down, steward-based data management processes won’t work anymore. Tools need to adapt to allow data users to crowdsource context as a part of their daily workflows in Slack or Microsoft Teams. Another key aspect involves using metadata to automate data classification, such as auto-classifying and restricting access to assets with PII data.

Read more about modern data governance.

4. The rise of the metadata lake

In 2005, more data was being collected than ever before, with more ways to use it than a single project or team could dream of. Data had limitless potential, but how can you set up a data system for limitless use cases? That led to the birth of the data lake.

Today, metadata is at the same place. Metadata is itself becoming big data, and technical advances (i.e. elasticity) in compute engines like Snowflake and Redshift make it possible to derive intelligence from metadata in a way that was unimaginable even a few years ago.

As metadata increases, and the intelligence we can derive from it increases, so too does the number of use cases that metadata can power.

Today, even the most data-driven organizations have only scratched the surface of what is possible with metadata. However, metadata is on the cusp of fundamentally changing how our data systems operate. The metadata lake is what makes this possible.

A metadata lake is a unified repository that can store all kinds of metadata, in both raw and further processed forms, in a way that can be shared with other tools in the data stack to drive both the use cases we know of today and those of tomorrow.

Just like data became far easier to use with data lakes, the metadata lake allows us to finally understand how we’ll be able to use today’s deluge of metadata.

Read more about the metadata lake.

Architecture of the metadata lake

5. The birth of active metadata

In August 2021, Gartner scrapped its Magic Quadrant for Metadata Management and replaced it with the Market Guide for Active Metadata Management. This marked the end of the traditional approach to metadata management and kicked off a new way of thinking about metadata.

Quoted from Gartner’s Market Guide for Active Metadata Management

Traditional data catalogs are passive. They are fundamentally static systems that don’t drive any action and rely on human effort to curate and document data.

However, an active metadata platform is an always-on, intelligence-driven, action-oriented system.

  • Always-on: Rather than waiting for humans to manually enter metadata, it continuously collects metadata from logs, query history, usage stats, etc.
  • Intelligence-driven: It constantly processes metadata to connect the dots and create intelligence, such as automatically creating lineage by parsing through query logs.
  • Action-oriented: Instead of being passive observers, these systems drive recommendations, generate alerts, and operationalize intelligence in real time.

Active metadata platforms act as two-way platforms — they not only bring metadata together into a single store like a metadata lake, but also leverage “reverse metadata” to make metadata available in daily workflows.

Read more about active metadata.

Our vision for active metadata

Looking forward

It’s easy to complain about the state of metadata. But when I look back on where it was even five years ago, it’s amazing how far we’ve come.

Thanks to the convergence of these five big trends, we’re at an inflection point in metadata management — a shift from old-school, passive tools to modern, active metadata that powers our entire data stack.

No longer static documentation, metadata holds the key to unlocking our dream of a truly intelligent data management system. We have a ways to go, but I personally can’t wait to see what the next year holds for metadata.


This article was originally published on Towards Data Science.

Header image: Pietro Jeng on Unsplash

metadatametadata management
Author Prukalpa Sankar

    Related Posts

    Forrester changed the way they think about data catalogs, and here’s what you need to know

    October 4, 2022
    What is active metadata? Man staring at a wall of books

    What Is Active Metadata, and Why Does It Matter?

    May 24, 2022
    Great Data Debate - Future of Metrics Layer and Metadata with Prukalpa Sankar (Atlan), Drew Banin (dbt), and Nick Handel (Transform)

    Future of the Metrics Layer with Drew Banin (dbt) and Nick Handel (Transform)

    May 19, 2022

    Write A Comment Cancel Reply

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    © 2023 Atlan. All registered.

      Atlan | Humans of Data
      • Articles
      • Community
      • Guides
        • The Future of the Modern Data Stack in 2023
        • The Third-Generation Data Catalog Primer
        • The Secrets of a Modern Data Leader
        • The Ultimate Guide to Evaluating Data Lineage
        • How Active Metadata Helps Modern Organizations Embrace the DataOps Way
      • Inside Atlan

      Type above and press Enter to search. Press Esc to cancel.