Guest blog by Katarzyna Bodzioch-Marczewska, Solutions Architect at Brainly
Data governance is a critical aspect of any organization, and it becomes even more important in a distributed model (read about The Brainly Model here), where teams are independent and have their own data. In such a scenario, the discoverability of data becomes a significant challenge, as teams need a way to share information about the data with other teams. With our teams growing rapidly and in a remote setting, when each team has and owns their data silos, it can be difficult for other teams to find and access the data they need.
To address this challenge, Brainly decided to implement a data catalog.
The requirements we gathered included the following:
- Metadata of all of our data assets in one place (S3, Tableau, Redshift, Snowflake, BigQuery)
- Making our data assets discoverable (simple and broad search capabilities — to be able to find relevant data quickly across all of our assets, along with their context)
- Enable collaboration and trust (gather tribal knowledge of various teams in one place)
- Reduce dependencies between business, analysts, and engineers (giving everyone easy access to documentation and the ability to find data-related answers on their own)
- Ability to show where the data comes from (visual lineage of dependencies between the data objects and how the data flows throughout the organization)
After evaluating various vendors and going through several Proofs of Concept, we chose Atlan as our data catalog. The main reasons behind that choice include:
- Desired functionalities were working as we expected
- The tool was very intuitive and simple to use
- All of our data tech could be integrated
- Very good support from the vendor
- Reasonable cost
But as we know, tools themself are not solving any problems… We integrated all of our assets into Atlan… And that was where the interesting part began…
Once we had the technical metadata in, we needed to focus on the context. And to find and collect it, we needed (and still need) a change in the company culture among the Data People — to appreciate the value of the data asset’s documentation as part of the data product itself.
In order to make that shift, we implemented a gamification plan to engage teams and create greater awareness of the importance of documenting data assets. Through this initiative, we were able to get over 200 tables documented and shared across teams. The gamification plan involved setting up a leaderboard, where teams could earn points for documenting their data assets and sharing knowledge about the data. This created a friendly competition and helped to raise awareness about the importance of data governance. We got nice prizes for the winners of the competitions, including t-shirts that, by the way, became legendary after a few months.
But that was not enough. We learned that the key to successful data governance is clear ownership. Wherever the ownership of data was clear, teams were more engaged and willing to document and share their data assets. However, in areas where ownership was unclear or blurry, the documentation remained poor. This highlights the importance of establishing clear roles and responsibilities for data ownership and access within an organization.
As we are on our way to adopt Data Mesh (read about our journey here), we plan to address these issues during our migration to Snowflake. Data Mesh is a cultural and technical concept that aims to decentralize data management and enable teams to own and operate their own data services. By adopting a more distributed approach to data ownership and access, we hope to improve data discoverability and governance across Brainly.
In conclusion, implementing a data catalog and a gamification plan helped our company improve data discoverability and governance. Clear ownership and clear roles and responsibilities for data management are crucial. As we are migrating to Snowflake, we will continue to improve our data governance and make sure that teams can easily access and share data across the organization.
Stay tuned for updates on our progress.
Thanks to Brainly for writing this amazing article! 💙
This article was originally published by Katarzyna Bodzioch-Marczewska on the Brainly Technology Blog.