How a Small, Distributed Data Team Supports Data Consumers, and Deprecates Snowflake Assets with a Modern Data Catalog
Trusted by thousands of companies in 40 global markets, this organization is a leader in travel and expense software and services, making business travel simple for their customers’ employees, and lending visibility and control to their financial stakeholders.
Serving a diverse set of personas, from occasional travelers, to road warriors, to financial controllers and CFOs, the organization counts on understandable, trustworthy data to create market-leading customer experiences that ensure every user of the platform is satisfied. And responsible for the technology and teams that power that data is their Director of Data Platform.
“My educational background was actually in energy and engineering, and I’ve found there are a lot of parallels between old-school engineering and the data world. I worked for a few solar companies where I learned a lot of exciting things around IoT data at scale,” he shared. “I fell in love with data right away, and ultimately, I made it here, into the travel and expense space.”
A Lean, Distributed Team Chooses Atlan
While the organization’s data team is small in number, their impact is significant, thanks to a staffing model that prioritizes close engagement with the business, and a modern data stack that ensures their attention is focused on valuable work, rather than administrative upkeep.
“We have a small data team and run very lean. Analytics here are decentralized, with many Analysts, Data Scientists, and data consumers spread across the organization, which sometimes makes it harder to get everyone aligned,” he shared. “Luckily, we have very good tools that have enabled us to do better, get insights to the end user faster, enable self-service, and let people answer their own questions.”
Enabling this small, distributed team is Fivetran and Airflow, Snowflake, dbt, and Monte Carlo for observability and alerting, now supported by Atlan for data lineage, data cataloging, and metadata storage.
With nearly 3,000 employees in the organization, many of whom regularly use data in their daily work, embedding data practitioners in business teams helped to increase engagement with their data, but resulted in significant back-and-forth questions as data consumers needed to learn more about available assets.
“We needed to make sure they had access to something more advanced than a Google Sheet. There were questions about where data comes from, what transformations have been done on a data set, what the data means, and if it’s under review, development, or deprecated,” he shared.
Originally considering dbt’s catalog offering, their team believed it was too complex for non-technical data consumers, and that given the complexity of their Snowflake implementation, with a high number of data models and objects, that they still had more work to do before enabling lineage using the tool.
Searching for a SaaS solution that integrated well with their data stack, could better service their data consumers, and could be implemented quickly, their data leader chose Atlan.
It was an obvious choice when I saw Atlan, if only because of how well it integrated with the tools we have. It was Fivetran, it was dbt, you connected to MySQL databases and to Salesforce, and there were exciting things coming with the Monte Carlo partnership. It gave that end-to-end experience to the user. We didn’t have to manage any clusters or compute resources. It was easy to sign up users, and really easy to onboard them.”
Director of Data Platform
Contextualizing Data and Deprecating Assets
Quickly integrating Atlan with their data stack, their data team turned their attention to enriching assets in Snowflake, adding definitions for commonly used KPIs, adding certifications, and enabling automated lineage so their data consumers could understand where assets were derived and what transformations occurred to deliver them. Next, they enabled Atlan’s Google Chrome Plugin, ensuring that each time a data consumer viewed a dashboard in Tableau, they understood the context of each data asset contained within it, natively.
Then, using the newfound visibility into their data estate afforded by Atlan, their data team used automated lineage and popularity metrics to begin identifying and deprecating unnecessary data assets in Snowflake, saving storage costs, and improving navigability.
“It saves compute costs when you no longer replicate data that nobody uses. You can pause your Fivetran connectors. You can deduce it all the way down to Tableau using lineage, and I think the value of that is tremendous,” their leader shared. “For me, as Director of Data Platform in a large organization, the savings were there right away. We’re talking about hundreds of tables, and probably 150 dbt models that were deprecated over the course of six months.”
By improving the navigability of their data estate through asset deprecation, and by enriching their data assets and making them available through self-service, the organization’s data team is yielding even more value from their commitment to Snowflake. With hundreds of data sources and models, direct access to Snowflake was an intimidating process for even seasoned data professionals, but with Atlan, these assets are navigable and contextualized for a broad spectrum of users.
“Unless you’re familiar with your data and have been in an organization for a very long time, even onboarding a new Data or Analytics Engineer is much easier with Atlan. I’m very excited about the relationship Atlan and Snowflake have built,” he shared.
Secure, Self-service Data
With increasing adoption of Atlan, the organization’s data team intends to double down on enabling self-service, ensuring that any onboarding team member can quickly learn about their data estate, and that a spectrum of users from engineers to product and project managers, can find answers in a process as simple as clicking a link to an Atlan asset profile.
And now that their data estate is mapped, with sources identified and critical assets enriched and accessible to users, the data team is beginning an exercise of better securing sensitive information. Using Atlan, the data team will tag assets with sensitive data like personally identifiable information, then instituting masking and access policies that ensure no matter the source of an asset, that it’s properly secured.
Looking back on what the organization’s data team has been able to accomplish in a short time with Atlan, their data leader reflects on the seismic shift forward that a modern data catalog represents to him and to his team.
What was the impact of Google Maps for you? What was the impact of going from a compass to a map? It’s huge, but can you quantify it? Can you put a monetary value on that? That’s hard to do.
Trying to chase pieces of code, or figuring out lineage in your head, it’s really inconvenient. It saves you so much time as an engineer, as a data professional, to be able to deliver really quickly. You can find or offer answers right off the bat, without wasting half an hour chasing some problem and trying to understand the Lineage, and the sources of your code.”
Director of Data Platform
Photo by Anete Lūsiņa on Unsplash