Improving Discoverability and Accelerating Migrations with Atlan
The Active Metadata Pioneers series features Atlan customers who have recently completed a thorough evaluation of the Active Metadata Management market. Paying forward what you’ve learned to the next data leader is the true spirit of the Atlan community! So they’re here to share their hard-earned perspective on an evolving market, what makes up their modern data stack, innovative use cases for metadata, and more.
In this installment of the series, we meet Jorge Vasquez, Director of Analytics at Datacamp, who shares how a leader in data education is modernizing their own data function and technology, the role Active Metadata Management can play in improving data discoverability, and why lineage is so important to Datacamp as they continue to introduce new tools and capabilities.
This interview has been edited for brevity and clarity.
Could you tell us a bit about yourself, your background, and what drew you to Data & Analytics?
I have an interesting journey with both Tech and Analytics. I was able to do internships at a bank, which was really fun. I also worked for one of the biggest Canadian tech companies as an intern for almost a year, which was Blackberry.
When I graduated, I wanted to continue working in tech, so the first thing that I did was get a job at a startup in Vancouver, which was tremendous fun.
After that, for me, it was all about the fact that there were a lot of skills that I’d learned, and that was probably the first time that I started doing A/B testing and a lot of data stuff. I said, “Well, I really like this.” So, I got a job at Best Buy Canada in the e-commerce technology team, and it was the best next step in my career.
There was no formal data and analytics team at Best Buy, so they hired a manager to start that team. At the time, I was doing a lot of data-related stuff with web analytics, and I knew how to program in R, so he decided to give me my first chance in analytics as the first official analyst on the team.
From then on, I had the opportunity to do a lot of really cool things implementing analytics projects. So I built the first BI dashboard and then helped implement it across Best Buy, and then helped implement the web analytics system. Implementations of clickstream tools require quite a bit of work, and I helped with all those things.
Then, with my manager, the two of us started growing the team, doing the first data science projects like text analytics and forecasting. We started getting into all the cool stuff that existed in data and analytics at the time. With the support of Best Buy’s leadership, we were able to build one of the best data teams in Canada and grew it to support teams across the whole organization.
And then, at that point, it had been almost eight years at Best Buy. Retail is really fast-paced; it was a lot of fun, and I learned a lot working with amazing people. But it was time. I wanted to go back to technology and give it another try. I like building things from scratch, which opened the door for DataCamp.
I was preparing for an interview using DataCamp, and I clicked on their hiring button. They called me the next day, and I started the process. Now, here I am, traveling the world, loving my life, working for DataCamp, and it’s been an amazing experience.
My focus has been just really building that foundation for data. We have really, really phenomenal people that have been doing amazing things.
Would you mind describing Datacamp and how your data team supports the organization?
At DataCamp, we have a mission of democratizing Data and AI education across the world. I joined because of that mission. I truly believe in that.
DataCamp serves both individuals and organizations in their upskilling journeys, but also a big part of our learner base comes from our Donates & Classrooms programs, where we support underserved communities with data education world wide. In the United States, in Africa, and in many, many different places, and I love that. That’s our mission. That’s why we exist as an organization, to give people opportunities so they grow and can leverage Data and AI in really valuable ways.
Now, when we look internally at DataCamp and how the data team supports the organization, we have a very simple mandate of enabling decision making with data. For the Analytics organization that I represent and also for Datacamp’s Data Engineers and Data Scientists, that’s why we exist. We’re all here to ensure that if you’re in Sales, if you’re in Finance, if you’re in Engineering, that you can easily make a decision using data. Of course, we understand that not all decisions need to be made with data, and not all decisions can be made with data, so it’s about being a data-informed culture.
Another important thing in terms of how we support the rest of the organization is that one of our values as a company is transparency, and we take it seriously. So, it’s all about making sure that people have access to the right data as fast and easy as possible while maintaining a strong governance framework.
As much as we’re permitted to, based on our governance strategy, we want people to look at the right data to make decisions, and that means that we need to have the right tooling that enables us to follow through on this principle.
What does your data stack look like?
Part of our original data stack was built internally, which drove tremendous value for our stakeholders and drove DataCamp’s growth. I give full credit to those original team members who did incredible work and have prepared us to start the next stage of our journey. As DataCamp continued to grow, we reached a new phase of our technical journey. As our needs changed, we realized that it would be better to invest in tools that are easier to scale and maintain and that have a special focus on governance as well.
We’ve recently completed two big migrations, moving to a new data warehouse and choosing a new clickstream system. And from the dashboarding side, we have a mix of open-source and enterprise SaaS solutions but are moving to new tooling to better align with the architectural and warehousing decisions we’ve made this year. From a data notebook perspective, to do more ad-hoc analysis, we’re heavily investing in our own tool, which is called Workspace, an AI-powered data notebook that’s easy to use.
Why search for an Active Metadata Management solution? What was missing?
One of the biggest challenges we had as an organization was the discoverability of our data ecosystem. The data team did a great job documenting the metadata for most of our warehouse and BI tools. However, this documentation was scattered across multiple tools and formats and was not consistently available for all of our assets. As a result, it was difficult for non-technical users to navigate the entire data ecosystem, especially if they also needed institutional knowledge to use it properly.
So, for us, finding a way to make it easy for people to understand a single version of the truth was key. For example, if you’re in Engineering and you want to search for active users last week, you should understand the definition of active users from the data catalog because there are many ways to define it, and you should be able to easily write a query or use the correct dashboard.
I do want to clarify that a data catalog is great, but it takes effort to fill it out with the proper definitions and agreements. All of that work is happening, and it will be a lot easier when everything exists in one place. If I want to discover the dashboard that I need to use for weekly reporting, I can just go into my data catalog and just search for “Weekly Reporting Dashboard” and it’s verified, it’s been reviewed, and it has all the commentary from the data team.
Then the other reason that became important to us is having the ability to manage the lifecycle of data assets. Let’s say, for example, we want to deprecate assets that are not being used, like specific tables or parts of our warehouse. We wouldn’t have that visibility without a catalog. There are ways we could have inferred that lineage, but we didn’t have a proper lineage tool, and these other methods were too expensive for us.
To give you an example, when we were deprecating our web analytics clickstream tool, the way that tool worked is that you embed it in the code of your site, and it collects clickstream data. Clicks, the user’s behavior, and it sends that into your data warehouse in real-time.
The problem is that as we wanted to move towards another tool, we needed to understand where all that data from our previous tool was being sent, and it took a lot of time for one analyst to figure out where all that data was going and how it was being consumed without a proper lineage tool.
The idea is that lineage allows us to see what is being used, what is not being used, and opportunities to reduce the cost of the migrations we still have to do. Having lineage allows us to lessen the costs of deprecating and migrating tooling by a lot, and it would have saved us a lot of time to have it a year ago when we were deprecating our clickstream tooling. We had to spend a lot of time just looking into what the dependencies were.
Why was Atlan a good fit? Did anything stand out during your evaluation process?
There’s a bunch of reasons. We started the search by looking at all the tools that exist in the market, starting with the Gartner reports. That’s how we heard about Atlan for the first time.
The first factor was ensuring that there was price flexibility to adjust to our data journey stage because Atlan is an enterprise tool, but we needed to make sure that it was within the right price range. Atlan adapted to the type of pricing that we needed for our organization and our current stage in our data maturity. So, it was very flexible in that regard.
We did multiple proofs of concept, and it ended up being a decision around a number of features.
There was the quality of the business glossary in terms of how easy it is to use it, update it, and how easy it is to leverage it. Then, figuring out how easy it is to collaborate was a big one, as well. There are a lot of catalogs, and with some, it’s hard to really collaborate with multiple people to add things to it.
The fact that Atlan had column-level data lineage for our warehouse and BI tools was a big, big factor for us. Not all tools have column-level data lineage. Some tools have lineage, but it’s just, for example, table-level, which is not as useful compared to column-level.
The data connectors were a major factor because, as part of this investment, we expect to save engineering hours in the long run. We hope that not having to build and maintain those pipelines will allow our team to focus on other high-ROI tasks.
Finally, data discoverability, as I mentioned, was one of the biggest pain points that we were trying to solve. When we compare data discoverability with other tools, Atlan’s UI makes it a lot easier. The fact that it has a plug-in for Google Chrome that allows us to look at data against our warehouse and BI Tools makes it a lot easier for our users because there are two audiences for the product.
We have the data team that leverages the functionality of data lineage, but we also have our stakeholders who want to use the product. It’s not only for the data team, and if we ask people to go into a data catalog all the time, which would be an extra tool to do things, it will make it a bit harder to drive that adoption and that discoverability. But if we can be where they already are with the Chrome Plug-in, I think that is a big incentive. That UI/UX factor is important for us to drive the adoption of the tool. As a world-class data team, we need to have world-class tools.
What do you intend on creating with Atlan? Do you have an idea of what use cases you’ll build and the value you’ll drive?
There’s a lot that we want to drive. The first one, in the short run, is being able to solve discoverability and lineage. Those are the two that we’re hoping to solve as best as we can. Not perfectly, but at least everyone should be able to say, “Where can I find this data? What is the definition of this metric?” For that question, you can go into Atlan, use the Chrome Plug-in, or use the Slack integration to get an immediate answer. Through that discoverability, we expect a lot more usage for the rest of our data stack. We’re making all these big investments, and Atlan, ideally, is going to help increase the ROI of those investments.
The second one will be using lineage to help us identify what is being used and what is not being used and reduce the cost of our future migrations. The idea is that we solve those two problems in the short run, and that’s where we expect where we’re going to put most of our energy in this first iteration.
The second iteration of Atlan involves leveraging it in more creative ways. There are probably two areas where there’s going to be some opportunities.
One is being able to integrate it more deeply with data observability tools to see the quality of our data. Being able to pass more of that information into a tool like Atlan allows us to better prioritize with our stakeholders. I’ve seen some demos from Atlan, and you can see, “Okay, this table has nine columns, and eight are verified. One is not verified.” Having that visibility on the overall quality of our data is going to be important.
The other part is going to be around what I mentioned about Workspace (Datacamp’s data notebook). We want to connect new assets that are not traditionally thought of as assets. The problem for us is that we’re creating a lot of insights that are generated in SQL, R, and Python, and we want to make sure that this information is properly connected and properly discoverable as well. So it’s also for us to innovate, using Atlan not only as a general data asset repository but also as an insights repository.
So taking it a bit to that next level to not only tell me about, “Hey, what about this table?” But to be able to search for an actual analysis. “Hey, what about the A/B test on the homepage?” We should be able to really answer that question, and we’re hoping that it’s possible.
We’re excited to try and test Atlan in new, different ways and take it in new directions to see what is possible.