Driving a Shared Language for Data with Atlan
The Active Metadata Pioneers series features Atlan customers who have recently completed a thorough evaluation of the Active Metadata Management market. Paying forward what you’ve learned to the next data leader is the true spirit of the Atlan community! So they’re here to share their hard-earned perspective on an evolving market, what makes up their modern data stack, innovative use cases for metadata, and more.
In this installment of the series, we meet Richard Goerwitz, Data Engineering Architect, Brian Kim, Data Analyst II, and Amit Kini, Data Engineer III at Foundry, an organization dedicated to empowering decentralized infrastructure for institutions seeking tools for mining and staking digital assets. They share how Foundry has quickly matured their data technology and team, and now a modern data catalog will drive discoverability and a shared language for their data assets and accelerate their Data Governance journey.
This interview has been edited for brevity and clarity.
Could you tell us a bit about yourselves, your backgrounds, and what drew you to Data & Analytics?
I’m currently working as a Data Analyst at Foundry, and I was first exposed to the data field while studying business analytics in grad school. I’ve always compared a data analyst to a chef. Raw data is the main ingredient and the analysts slice, dice, and cook it into something that’s consumable for the end users. The fact that I can serve my insight into data-driven decisions always made data & analytics interesting.
Especially for Foundry, all key business decisions are heavily data driven. This is why I spend most of my time proactively analyzing and visualizing data for teams like the Business Development Team and Finance Team, so that they can navigate through opportunities with less risk.
I’ve been in the data engineering field for about 15 years, and I’ve spent most of those years working in the sports-related data & analytics industry. As a Data Engineer at Foundry, I am currently focused on implementing the enterprise data warehouse model, onboarding new data technologies, and working closely with Atlan.
I have been exposed to databases and systems administration for almost 30 years now. My role at Foundry is to bring knowledge of data pipelining and overall tooling, and bridge data analyst, data scientists, and data engineers. The goal is to get everybody firing on all cylinders and utilizing the technology at hand. I’ve spent a lot of time introducing concepts to the team such as the standard data lake model, the advantages of columnar database in analytics, importance of data catalog & federation, and many more.
Would you mind describing Foundry and your data team?
Foundry is best-known for the Foundry USA Pool, which is currently the world’s largest Bitcoin mining pool, but we are also known for comprehensive best-in-class services for institutional miners, staking customers, and blockchain entrepreneurs, and providing them with the tools they need which includes services around mining equipment, including logistics and machine deployment.
We always refer to the data team as the data family because despite a number of different roles and titles, we all work very closely together to achieve our goals. The data analysts work to understand the business team’s needs & wants and determine the data requirements. Then, the data engineers provide their feedback on the best practices and pipe in data into the data lake.
What does your data stack look like?
Foundry was a young organization when I first joined, and our data engineering has matured significantly in a short amount of time. We’ve built out substantial infrastructure, such as an enterprise data warehouse and data lake, that will better surface data and support data driven decisions.
Why search for an Active Metadata Management solution? What was missing?
The people with their hands on the data have a good understanding of it, but the data team still runs into questions around where certain data elements can be found, what it looks like, and who they can contact to get access to data. There is a lot of verbal exchange of information to get the answers to these questions, and they take a lot of time.
What we are working towards is implementing more tools and processes to help our teams self-serve wherever possible, especially as we move into a domain data mesh where the teams are supposed to be able to operate somewhat independently by design. You really can’t do that without a good cataloging and governance tool.
As our organization matures, we are collecting and storing data faster than ever and the sources have become more diversified. One of the most prominent issues that I personally face is with data exposure. We have rich data. However, people outside of the data team are uncertain of where and what data we can provide. As a matter of fact, the most common questions that I run into are “Can you look for this data?”, “Is this data available?”, and “Can you provide this data?”. Oftentimes, these questions could be simply self-served if we have the right tool in place. In addition, I have seen cases where people are referring to the same terminology with different meanings. Getting everyone aligned and having better efficiency in communication is another big goal that I am looking forward to accomplishing.
Why was Atlan a good fit? Did anything stand out during your evaluation process?
Our team had a very good experience with Atlan in terms of its usability and how it’s focused on our use cases. For me, I think Atlan has very solid documentation when I try to search for what it’s capable of and what we can achieve with it.
We looked at three products before choosing Atlan. Atlan uses open source, has made some strategic moves into AI and there’s not a lot of legacy infrastructure as it started in a cloud environment. Those are things, in addition to having a nice interface, that made us decide in favor of Atlan.
What do you intend on creating with Atlan? Do you have an idea of what use cases you’ll build, and the value you’ll drive?
The business glossary and data discoverability are primarily what we are looking to implement in the near future. But there are other use cases that we are interested in learning more about. I am personally interested in data lineage, root cause analysis to see how the data is flowing through various systems, and data tagging.
Atlan will be great for exposing data through cataloging and getting people aligned through glossary. One additional use case of Atlan for us will be enhancing data governance. This is a topic that multiple teams across the organization are interested in, such as the Legal & Compliance, Security, and DevOps team.