In January of this year, I wrote that metadata management is at the cusp of a transformational leap forward.
This week, Gartner took a huge step toward this by scrapping its Magic Quadrant for Metadata Management Solutions and replacing it with a Market Guide for Active Metadata. This change heralds a new way of approaching metadata in today’s modern data stack.
Whether you deal with kilobytes or terabytes of data every day, you’re probably wondering what this actually means.
- Why did traditional metadata management fail?
- How is traditional metadata different from active metadata?
- Why does this report mark a paradigm shift in metadata management?
- What does this mean for data leaders today?
In this article, I try to unpack these questions (without any of the marketing jargon) and share my predictions on where metadata management is headed.
The past and present of metadata management
Metadata, as an idea of organizing information, has been around since ancient times, but the modern idea of metadata dates back to the late 1990s.
Metadata management started out as an IT discipline. As we embraced the internet, and as data types and formats exploded, IT teams were put in charge of creating an “inventory of data.”
Then, as data spread beyond the IT team and became more mainstream, the idea of data governance took root. This was the discipline of managing the people and processes around data to ensure its availability, integrity, and security for an enterprise.
As the idea of data governance started catching on, many companies drank the Kool-Aid and went all-in on data. They created entire departments for data governance, built new roles for people called “Data Stewards”, invested in data governance committees, and more.
These teams started realizing they needed software to manage all this metadata. That kickstarted a golden era for metadata management.
As with any new technology, things blew up quickly.
New companies were formed, and existing companies created new metadata products. People needed a way to sort through all these new metadata software options, so Gartner started publishing their Magic Quadrant for Metadata Management report. Companies like Informatica, Collibra, and Alation — all leaders in Gartner’s report — leveraged this market hype to grow rapidly.
Billion-dollar companies were created, and hundreds of millions of dollars was spent on metadata management software.
So, after all that, why did Gartner scrap their report last week? And why does the market guide that replaced it start with the ominous note, “Traditional metadata practices are insufficient…”?
The one-word answer for why traditional metadata management failed: passive
If you describe someone as passive, you mean that they do not take action but instead let things happen to them.Collins Dictionary
If you google the word “passive”, this is the first result. And honestly, there’s no better way to explain the fundamental failure of earlier passive metadata systems.
- Traditional metadata management tools did not take action. By simply cataloging or storing metadata, traditional metadata systems couldn’t drive any “action” from metadata signals. This reduced the impact that metadata could have within a data platform and for data consumers.
- Traditional metadata management systems let things happen to them. Traditional metadata systems were fundamentally static tools that relied on human effort to curate and document data. This meant that the success of a metadata program depended on the people implementing it.
These fundamental flaws led to the ultimate downfall of traditional metadata management tools.
As a result, despite significant investments in metadata management software, most companies have struggled to make their metadata programs successful. A few weeks ago, a senior data leader at a large company remarked about these tools, “Everyone knows that the tools that we have bought are expensive shelfware.”
A consultant at a prestigious professional services firm that implements metadata management solutions confirmed this sentiment: “About 50% of our engagements are when someone in a company has spent millions of dollars buying an expensive tool, and 2–3 years later realizes that it isn’t working or being used, and brings us in to try and desperately fix the situation.”
The paradigm shift: Going from passive to active metadata
Today, we’re at an inflection point in metadata management — the start of a new era marked by a wholly new way to think about metadata and the role that it plays in the data stack. This is where active metadata, the subject of Gartner’s new market guide, comes in.
A quick search for the word “active” throws up phrases that are the polar opposite of passive:
- “engaged in action; characterized by energetic work, participation, etc.”
- “being in a state of existence, progress, or motion”
- “having the power of quick motion; nimble”
Take a moment to think about these phrases in the context of metadata, and they paint a picture of what active metadata can be.
Active metadata: an always-on, intelligence-driven, action-oriented system that is an antithesis of its passive, static predecessor.
The 4 key characteristics of an active metadata platform
According to Gartner, active metadata is “a set of capabilities that enable continuous access and processing of metadata that support ongoing analysis…”
What does this actually mean, and how do active metadata platforms differ from traditional metadata management platforms? Here are the four fundamental characteristics you should look out for.
Active metadata platforms are always on.
Active metadata platforms don’t wait for humans to manually enter metadata through committees. Instead, they are continually collecting metadata at every stage of the modern data stack — logs, query history, usage statistics… Just about any kind of metadata, from anywhere, at every second.
Active metadata platforms don’t just collect metadata. They create intelligence from metadata.
Unlike traditional metadata platforms, active metadata platforms are constantly processing metadata to connect the dots and create intelligence.
For example, by parsing through the SQL code from query logs, an active metadata platform can automatically create a column-level lineage, assign a popularity score to every data asset, and even deduce the potential owners and experts for each asset.
What really stands out about active metadata platforms is that they are true learning systems, which means that the intelligence of the platform will only grow over time. As people use the platform more and the platform observes more metadata in the data stack, the end-user experience will get better.
Active metadata platforms don’t just stop at intelligence. They drive action.
This is probably the most important leap that active metadata platforms have taken from their predecessors. Instead of just being passive observers, they drive recommendations, generate alerts, and operationalize intelligence in real-time data systems.
For example, an active metadata platform can leverage past usage logs to understand which datasets are used most, and accordingly recommend an optimized schedule for data pipeline runs. However, a true active metadata platform wouldn’t just stop there. It would send this recommendation to the data pipeline system and actually tune it through native integration. All this without any human intervention, furthering the principles of a truly DataOps-driven system.
Another example is using active metadata to improve data quality. When a data quality issue is detected in a source table, the system can automatically stop the downstream pipelines to ensure that incorrect data doesn’t make its way to the dashboard. Or better yet, the system can use past records about data quality failures to accurately predict what went wrong and fix it without any human intervention.
Active metadata platforms are API-driven, enabling embedded collaboration.
Embedded collaboration is about work happening where you are, with the least amount of friction. The action layer of an active metadata platform is what finally makes embedded collaboration possible.
What if you could request access to a data asset when you get a link, just like with Google Docs, and the owner could get the request on Slack and approve or reject it right there?
While this workflow sounds pretty simple, it is phenomenally difficult to implement seamlessly (which is why it probably doesn’t exist yet). It would require that the final end-user’s tool (where the user requests access to the data, like a data catalog) interface with an access and entitlements policy engine, which would send a request to the data owner on a communication tool like Slack.
We will never be able to achieve an embedded collaboration workflow like this without an active metadata platform orchestrating actions across the entire data stack.
What’s next for metadata?
For years, metadata management has lagged far behind the rest of the modern data stack. But in 2021, it seems like metadata is finally starting to catch up.
Innovation is on overdrive. I’m pretty sure that more startups have launched in this space in the last 12 months than in the past decade. (My colleague Rohan, who tracks the space closely, even decided to create a catalog of data catalogs!)
The monumental decision by Gartner to scrap its Magic Quadrant for Metadata Management and introduce active metadata as a new category is a huge step forward.
This finally sets aside the traditional, passive approach to metadata management and paves the way for a new era of metadata.
As with any major Gartner announcement, this announcement will likely introduce some short-term confusion in the market. Traditional metadata products will scramble to rebrand themselves as “active metadata platforms”. Some will actually start to add some active metadata capabilities into their products, further adding to the confusion. And, of course, more startups will be founded.
But eventually, in the next 12–18 months, one or more active metadata platforms that have been truly built from the ground up on the right design principles will emerge as the ultimate winners in the category.
It is an incredible moment for metadata in the modern data stack. Hopefully, this time around we’ll finally get it right.
If you’d like to be notified when I publish my next post, do subscribe to my Humans of Data Substack!
This article was originally published on Towards Data Science.