How an HR Tech Pioneer Enabled Self-service with Hundreds of Data Asset Definitions
At a Glance
- A global provider of employment services sought to improve the creation and distribution of metrics definitions atop their new, modern data stack
- By adopting Atlan as their data catalog, their data team collaborated with subject matter experts to define critical KPIs and make them available to data consumers in both Atlan and Looker
- In less than two years, the organization has defined more than 400 terms for over 500 users, and has automatically mapped more than 75,000 data assets in Atlan
It’s 1993 and you’ve just graduated from college. You’re going job fair to job fair, looking through alumni directories, and constantly carrying a stack of printed resumes on high-weight, cream-colored stock. This was the reality of starting a career before the advent of web-based job boards, an innovation that changed how professionals across the globe built their careers.
With over millions of unique visitors per month to their website and hundreds of thousands of job postings, supported by thousands of employees, the organization sits atop a truly staggering amount of data. And among the key people responsible for stewarding and activating this data is their data leader.
“I feel that I’ve had a magnetic pull to data and analytics since the beginning of my career,” their leader shared. “I’ve always loved the way data can tell a story, but I also recognize how important it is that the person writing the story has the right knowledge, tools, and information to drive it.”
This leader helped to build the foundations of their modern data stack and function, and was responsible for business intelligence and data engineering, and overseeing data strategy and governance.
Summing up how important these functions are to the organization, their data leader explains, “It’s really to provide that foundational data architecture which stores the insights, the source of truth, the clean, governed, trusted data so our business consumers, our customers, and the folks planning the future of our product can access data to make decisions.”
An Event-driven Modern Data Stack
Underpinning their data team is a modern, re-architected tech stack, transformed from a fragmented ecosystem into a one data lake and warehouse. Their team ensured these new systems were structured from the ground up for the purpose of analytics, driving quality in a performant, cost-effective manner.
Much of their modern data stack is constituted of Google Cloud Platform. “We were excited about the managed technology within BigQuery,” their leader shared. “That would allow our teams to really focus on modeling and structuring the data, as opposed to some of the things prior to BigQuery that had to be managed more manually.”
In 2019, the team adopted Looker in record time. “It was a really great partnership between the business and my team. I call it a heroic effort,” their leader shared. Nearly 500 people at the organization use Looker, across a number of business functions.
We have a huge volume of data and it’s widely recognized that the data in Looker is trusted. That’s a really different position than we were in a handful of years ago, where there may have been three or four different reports to talk about the same data, and not really an understanding of which system was meant to be the right one to go to for a certain purpose.”
The organization’s data stack benefits from an event-driven architecture. While some data pipelines operate on batch, a majority of their enterprise data flows into their data lake in real-time, enabling their data team to decide whether or not data should drive just-in-time use cases. “This makes us much more nimble as a business, and it allows us to support much higher volumes of data than what we were previously handling,” their leader explained.
Finally, to ensure the significant investment the organization has made into data technology yields as much value as possible, their data leader and her team formulated a data governance program to ensure data is fit-for-purpose and supports the broader business strategy.
“Fit-for-purpose means, for me, that the data is high-quality and trusted, that people know what it means when they look at the data,” their data leader explained. “From a more technical standpoint, that means we need to think about the ways we manage our metadata. We need to have a data catalog and business glossary, which we do in Atlan. We need to think about the right use cases for the right data sources and really empower folks to play a role in owning the data through data stewardship.”
The Need for a Data Catalog Emerges
In the midst of the organization’s re-architecture, it became clear to the data leader and her team that a data catalog was a necessary piece of the puzzle. “We were transforming our technology and our business model at the same time,” she shared. “There were a plethora of new terms to be referred to based on different systems and new ways we were doing business. To be able to socialize that knowledge and to agree on what something should be was so important.”
“For instance, there was no absolute, one way to define how many people viewed a job. It may sound silly, but do you count a view if someone scrolls past this tiny job on a webpage? Do you count a view if they only open it up and look at its fullest form? So simple debates like that were necessary to define the common KPIs that we would use to manage our business.”
In the absence of a data catalog, the organization’s data team drove definitions through consensus with business colleagues and subject matter experts, which resulted in 20 definitions stored in a separate glossary. Recognizing that definitions would need to be created and agreed upon at a far higher velocity, the data leader and her team began to actively search for a catalog to accelerate the process. This solution would host a business glossary, and serve as a mechanism for data experts to document what terms meant, break siloed knowledge, and to share it with downstream consumers.
“We needed to get to a data empowerment perspective. Some call it data ownership, where you have data stewards who take responsibility for saying, ‘This is what this data term means. I can explain it. I can share and socialize this,’” their data leader shared. “I needed a tool to do that at scale. We were talking about hundreds of, and eventually thousands of defined terms, and we were just not able to do that in the Confluence tool that we were using earlier.”
Supercharging Documentation with Atlan
After a thorough evaluation of the Data Catalog and Active Metadata Management markets, the organization chose Atlan to serve as the interaction and contextualization layer on top of their Google Cloud Platform-powered data estate.
Key to their data leader and her team’s choice was Atlan’s time to value, with their data estate mapped and visible in mere days. “It was very fast to get started. That was a significant part of why we chose Atlan. The integration with our toolset was very easy to do with a quick connection through a service account,” their data leader shared. “Atlan scans our BigQuery and scans our Looker data. So for us, there’s no entering data assets into Atlan at all. We now have 75,000-plus data assets, and that was all done through a very easy setup.”
With their data estate mapped and visible, and automated lineage making it clear how data traveled through it, the data leader and her team moved to defining the roles that they and their counterparts would play as they scaled their definitions. They quickly worked to define who would serve as data stewards, how data would be grouped and categorized in domains and structured in Atlan, and how their glossary would be structured.
Then, the data leader and a colleague took the 20 terms that were previously defined in slow-moving, large meetings, and populated them into Atlan to familiarize themselves with the tool and prepare for broader adoption. The first cohort of users would be product managers, uniquely familiar with their data, and capable of being the final arbiter of a definition.
To ensure these product managers were both capable and motivated to write these definitions, their data team partnered with Atlan’s customer success team. “There was that burden of a blank page. A tabula rasa challenge of how to get started. So we partnered with Atlan and did an adoption program for about a quarter. Atlan gave us so many great ideas and supported us with trainings, even with prizes. That was an amazing partnership,” their data leader shared.
Between clearly defined roles, the right enablement and support, and the ease of Atlan’s user interface, the pace of creating, finalizing, and communicating definitions skyrocketed. “It took us about as much time to get the original 20 definitions as it did the next 100,” their data leader shared. And after inviting more product managers to participate, the number of defined terms increased to 250.
But the data team and their colleagues weren’t finished. With product owners realizing the benefits of defining terms, and end users benefitting from fast, transparent self-service, the pace of definitions continued to accelerate.
But the beautiful thing is that actually, we hit our stride. So with no more adoption effort, we reached around 300 terms towards the end of last year (2022), and that was just through our product owners realizing that they needed to document these things for their own benefit, for the business, my team and the data team being another force to remind them, ‘Hey. People are asking about this. Please make sure it’s there as we launch reports trying to get everything documented.’ So that got us up to around 300 terms.”
Now at 400 terms defined, the organization’s data team has built a sustainable foundation for growth, with data stewards continuously updating their assets with new context. “We went from a scenario where it was taking four to five hours to define a term we were documenting in Confluence, and it was impossible to socialize that definition properly, to having a clear, clean definition in Atlan that would be a living, breathing source of knowledge,” their data leader explained. “People can add additional information when it’s found. Now, they can add links to other pages, Jira tickets, and we can link those definitions to where they’re used.”
Extending the Reach of Definitions
Reaching as large a number of users who could benefit from definitions as possible, their data team then focused on making context available where data consumers spend the most time. Hundreds of reports are routinely created on Looker, both by the data team and data-savvy counterparts in the business. Each requires context, including the purpose of the report, who’s responsible for its creation and maintenance, its popularity, and direction on when and when not to use it.
“Atlan has been perfect for us to start storing that knowledge and making it available. When Atlan then went ahead with the Looker plugin, that has just eased the ramp-up for new users tremendously because they’re able to work and live in Looker, which is what we really want to train them on primarily,” their data leader explained.
By having terms and definitions available immediately through Atlan’s chrome plug-in on Looker, the data team drove significant value for a broad spectrum of users. New employees, and even tenured colleagues launching new products and services, have near instantaneous access to context, and can make decisions even faster, without the need to train on, then adopt Atlan’s core user interface.
That’s the kind of usability that we want to bring to our data platform if you think about our data as a product. My team is developing the data and thinking about our internal customers as our customers. And to make their lives easier, save them time, help them accomplish their full potential when they’re looking at the data is what makes us passionate about our jobs each day.”
Supporting a Full Spectrum of Users
Crucial to the organization’s successful rollout of Atlan was their careful consideration of who would use the platform, both data consumers and data stewards.
For data consumers, their data leader’s goal was to provide a simple, but powerful self-service experience where her colleagues would be confident they’ve found the right information, and knew exactly who to speak to if they needed more information, leading to a virtuous cycle of adoption and saving time.
“They may be able to be curious and explore more on their own, and then go to the expert, rather than feel like they’re taking the time from somebody who’s busy, which we don’t want them to feel. This is the reality of moving to the self-serve dynamic. It’s so positive,” their data leader explained.
For data stewards, success on Atlan means that the data team’s colleagues who own definitions and tribal knowledge see their documentation used frequently, and are either fielding fewer, or better-informed, questions.
“Instead of the person coming for that Basic 101, they’re coming in maybe at the college level, saying, ‘Okay. I know this, but what about if I want to use it this way?’ So that’s where I talk about helping people reach their potential instead of struggling with some of the basics,” their data leader shared. “I’m often in meetings where we’ll hear something coming up in the discussion, and the Atlan link can be published, and people can benefit from the work that the stewards have done.”
Simplifying Data Discovery
A key data consumer at the organization works in Digital Marketing, and has been an early beneficiary of the data team’s good work. Among her responsibilities are press relations, using the organization’s unique position as a broker between employers and candidates to provide novel insights to secure press coverage.
“Sometimes I find data about (candidate) profiles, or job offers, and I’ll send that to create content,” the digital marketer shared. “For example, this month we created a newsletter about jobs in sports, so I found data about people who have that background.”
Prior to adopting Atlan, the process for locating and understanding the organization’s existing data and reports was difficult for the digital marketer and her colleagues. Requests for data from the press meant navigating from Looker report to Looker report to learn if the data existed at all, then escalating to a manager or Data Engineering to find an answer. “If I didn’t find it, or I didn’t know if the data existed or not, that was my first frustration. And after I asked, I’d need to wait for the answer from my coworker and I lost time,” their digital marketer explained.
But when the organization’s data team implemented Atlan, providing hundreds of definitions, mapping their data estate, and making context available where the digital marketer lives, directly in Looker, the process became far simpler.
“After two or three times (using Atlan), I found all my data and all my answers,” the digital marketer shared. “I love to use Atlan because I know where I can find this data, and in which dashboard. I don’t have time to find the category of a dashboard.”
Sharing Deep Context in Looker
Among the biggest beneficiaries of the data stewardship experience on Atlan is one of the organization’s product managers, who is responsible for the organization’s job ingestion and job processing technology, as well as external partner APIs and integrations.
“One of the things that I realized early in my career is what you call something is important and how you define it. And so, I’m a big stickler for agreeing on the terminology and what it actually means,” the product manager shared. “This is a very important aspect, especially when you’re building systems greenfield, to make sure that you align on terminology and that we’re consistent about it.”
To store and distribute these definitions, the product manager historically used Confluence, and created what she calls “bite-sized” videos to provide additional context about design, business process, and the decisions that were made for software and data. And while Confluence still has a place in her toolkit, the introduction of Atlan has extended the benefit of her thorough documentation.
The organization’s data leader introduced the product manager to Atlan after hearing she had already been creating definitions in Confluence, communicating that it was the organization’s new system of record, and the two agreed that the product manager would own all terminology for job ingestion.
The product manager got to work creating succinct, but useful definitions in Atlan, and providing a link to Confluence on each data asset. Now, data consumers can use Atlan’s chrome plug-in to find more information about the data assets the product manager is responsible for, and can find even deeper context if desired, all in Looker, natively.
“It gives you almost an index of things that are related to that term if you want to explore different aspects of it,” the product manager shared. “The ability to connect the different tools and to be able to share the information across tools is really powerful.”
“I think we’re still in the early to mid-days of getting the terminology into Atlan, but I find that now when I go into meetings and I say, ‘Okay, we’re going to define this and we’re going to put it into Atlan,’ it’s more of a status quo thing. This is part of our process and this is going to be part of our process, so let’s align on terminology.”
A Bright Future, at Enterprise Scale
Looking back on what the organization’s data team has been able to accomplish, their leader remains impressed with the scale and criticality of their implementation, “One thing that excites me about what we’re doing is it’s really at the enterprise scale. We’re not just using Atlan for one team as a pilot or for the candidate side or for the employer side. It’s everything that we do.”
Their data team continues to grow their use of the platform, serving as a catalyst that ensures that each new term and report introduced is properly defined and accessible by its subject matter expert. And as their data ecosystem continues to evolve, as systems change, and as new business lines continue to launch, the foundation their data team built using Atlan will continue to pay dividends. “We are now in a much better position to continue to help people understand what they’re really looking at when they consume the data,” their leader shared.
I think it’s remarkable for me personally, the speed we’ve been able to work (considering) the fact we don’t have a big team dedicated to this. For everybody who’s been involved, whether they’re a steward, or a data producer, or a data consumer, or a business person just inquiring about the data that they see in reports, whatever role you’re playing. I think people can be proud that we made it this far. Not all companies can pull it off. So I’m proud of what we’ve done. We’re only scratching the surface of the value we can get out of the catalog.”
Photo by Dylan Gillis on Unsplash