Revolutionizing Data Governance and Literacy, and Increasing Governance Team Efficiency by 40% with Atlan
At a Glance
- Porto, a Brazil-based insurance and banking leader with 13 million clients, sought to replace a legacy data catalog to maximize the ROI of new data technology, and improve data literacy across their organization
- Choosing Atlan as their new modern data catalog, Porto’s data governance team released the initial version of Datapedia, a self-service source of truth about their data assets, in less than six weeks
- Porto’s team has since automated the governance of over 1 million data assets, saving their 5-person team 40% of their time, and are nearly one-third of their way to 1,000 users of the tool by 2025
Founded in 1945, Porto is an insurance and banking giant based in Brazil. Starting in the insurance sector, the organization has since grown to 13,000 employees and 17,000 contractors, serving more than 32,000 brokers and 13 million clients across a diverse set of insurance and banking products and services.
With such a diverse set of business lines, and millions upon millions of customers, Porto sits atop oceans of data, and among the key people responsible for stewarding and activating this data is Danrlei Alves, Senior Data Governance Analyst.
“Porto has a goal of becoming a kind of lifetime company, present in every moment of a person’s life, be it through insurance, through banking services, through health insurance, or through life insurance,” Danrlei explained.
Crucial to Porto’s continued growth is this multi-product strategy, ensuring that their customers’ every financial need is met, at each stage of their life. And guiding that strategy is a modern data stack that surfaces Porto’s vast enterprise data, and a Data Governance team responsible for making it discoverable, understandable, and actionable across dozens of business domains.
With the help of best-of-breed technologies and business colleagues eager to make the most of new data capabilities, Porto’s Data Governance team has driven data literacy to new heights at Porto, building Datapedia, their Atlan-powered catalog designed to democratize data in every corner of their business.
Trading Legacy for Modernity
Over the course of Danrlei’s career at Porto, their ecosystem has rapidly modernized, deprecating legacy tools that no longer fit their needs, and adopting modern technologies like BigQuery, Tableau, and Atlan.
“I joined Porto to work on Data Quality. Back then, my job was to develop services and products for data quality and data validation,” Danrlei shared. “Eventually, due to my background in Data Governance, I was invited to help the Data Governance team, and our team today covers both governance and quality.”
Danrlei’s Data Governance team serves as somewhat of a central platform team, responsible for the myriad data ecosystems at Porto, and ensuring their data lakes and warehouses are well organized, well documented, compliant, and secure.
Porto’s data platform begins by consuming data from an array of on-premises systems, databases, and warehouses across technologies like Oracle, MySQL, SaaS tools, and MongoDB databases. Further downstream, their data stack is primarily composed of Google Cloud Platform products, using Airflow to ETL data from source systems into BigQuery, then consumed in Tableau, PowerBI, and Google Data Studio.
More recently, Porto has migrated away from Informatica tools like Data Engineering and Enterprise Data Catalog in favor of more modern cloud solutions that better fit in their ecosystem and match their lofty expectations for their data.
“These are solutions we don’t use anymore, and have replaced them with more modern solutions like Google Cloud Platform, Azure, and Atlan, as well,” Danrlei explained.
An Urgent Search for a New Catalog
In the midst of their modernization, it became clear to Danrlei and to the team that their existing data catalog solution was making it difficult-to-impossible to yield the maximum possible ROI from their data stack, and to achieve their data literacy goals. The Data Governance team had done great work ensuring their data lakes and warehouses were well-organized, but with a disorganized catalog, Porto’s data consumers would be hard-pressed to benefit from that work. Searching for a new solution became an urgent priority.
We used to have a data catalog. We had Informatica EDC. The motivation to look for a different solution came from some difficulties in implementing certain features like lineage, a business glossary. It was kind of frustrating, and it came to a point where we felt like the technology was kind of blocking us from reaching where we wanted to go. And that was the spark to look into the market for new technologies and platforms.”
Danrlei Alves, Senior Data Governance Analyst
Danrlei and his team began by evaluating the largest SaaS vendors in the market, as well as open-source solutions. As their research progressed, a member of Porto’s data team met Atlan at the 2022 Snowflake Summit, informing Danrlei and his team about Atlan’s potential fit when he returned home.
“In terms of values, we were very much aligned. We felt that a core objective of Atlan was to be user-friendly and to form collaboration. And those were the aspects we were also looking at,” Danrlei shared.
And while open-source solutions were intriguing, Danrlei and the team were eager to avoid the support issues that had plagued their previous data catalog projects.
“When we talked about going with an open-source solution, the main aspect was having the confidence that Atlan would be a partner and would help us with any problems we might have had in terms of providing support or new capabilities,” Danrlei shared.
If Porto had adopted an open-source data catalog, the responsibility for resolving technical issues would fall to their own data teams. Furthermore, implementing new features could require custom coding, or costly implementation as they were slowly released into the framework. But foremost on Danrlei’s mind was time to value.
“We wouldn’t have been able to build Datapedia so quickly in an open-source environment,” he explained. “We would have gone through many, many more tasks and InfoSec and architectural requirements. We would need infrastructure, to have support people available, and implement it ourselves. It would be much more effort, and we wouldn’t be able to reach our goal in time.”
Furthering their support requirements was an emerging federated data governance model at Porto, serving as a precursor to Data Mesh. In the future, Danrlei and his team aimed to enable data domains to autonomously develop data pipelines, and be responsible for data literacy around these products. While traditional data catalogs and open-source solutions may have been capable of a metrics catalog, the deeper level of cross-functional collaboration needed to execute Data Mesh meant that Atlan became Porto’s partner of choice going forward.
Atlan kind of fit like a glove, because we were very much aligned on all these aspects. The user experience was also very important to us, and we felt it was the most user-friendly interface we had seen from any other provider. From a cost perspective, it also had the best cost-benefit ratio for us.”
Danrlei Alves, Senior Data Governance Analyst
From Login to Rollout in Less than Six Weeks
Danrlei’s team began by integrating key data sources with Atlan, setting up permissions, and inviting users to test these new connections. With Atlan effectively crawling Porto’s data estate, the team then moved to build content parity with their previous data catalog on Informatica.
“We had a catalog before, and we had a lot of documentation there,” Danrlei explained. “We needed to export all the metadata, adjust that metadata, and input it via API.”
Using Atlan’s open API, Danrlei’s team pushed their existing business terms and glossaries from Informatica into Atlan, where they created a clean migration target in the form of custom metadata fields, matching the way relationships between tables, columns, schemas, and databases were defined in their legacy catalog. Finally, the team defined user personas and created a tag structure in Atlan, mapping permissions for each persona to the data they were authorized to access.
“That was the bare minimum. And we said, ‘If we want to roll out the platform, we can.’ We connected our sources, we have the metamodel implemented, and we have the documentation,” Danrlei explained. “We had a catalog. Users weren’t going to feel any negative aspects from the change, and they weren’t going to miss anything from our past catalog. We wanted that feeling.”
In less than six weeks, thanks to their team’s hard work and Atlan’s pre-built connectors, Open API, and personas, Porto’s team reached content parity with their previous data catalog, and were ready to release it to data consumers.
We wanted to have everything from the previous catalog in Atlan before considering rolling it out. And we have done that. That journey was maybe a month, a month and a half.”
Danrlei Alves, Senior Data Governance Analyst
Game-changing Access to Data
While the Data Governance team was able to release Datapedia at unprecedented speed, the most grateful beneficiaries of this work are Porto’s data consumers, including Pedro Ribiero, Chief Data Scientist for Porto’s R&D Division.
“I always wanted to be a scientist, ever since I was little. I blame Jurassic Park,” Pedro shared. “I was in the middle of my Master’s when I was asked to work in Data Science, and I found I liked it a lot. Then here I am, six years later.”
Data Scientists at Porto are embedded within teams across the company, and are responsible for locating and sourcing data for advanced modeling. Before the introduction of Atlan, Pedro’s team was familiar with a small sliver of Porto’s data that they worked with every day, but found it difficult to understand the data available to them, and if that data was appropriate for modeling.
Siloed Knowledge, Missing Metadata
Locating and understanding data was complicated by the sheer size and complexity of their organization. Porto has roughly 14,000 employees, and has been in operation since 1945, leading to siloed infrastructure and knowledge. In order to obtain context prior to modeling, Data Scientists would scour the organization, attempting to find the subject matter expert that could answer their questions.
“You needed to determine who knew in which database your data was located, assuming it existed,” Pedro explained. “So you were trying to find someone that you didn’t know existed, and you didn’t know if that person wanted to talk to you or had the time. If you found that person, you wouldn’t find anything that resembled metadata. You wouldn’t know what a column meant.”
Do-it-yourself Data Discovery
In Datapedia, Pedro and his team were offered an unexpected leap forward in literacy that would change the way they worked: A single pane of glass to discover, understand, and apply Porto’s data, in less time than ever before.
I didn’t know the Governance team was working with Atlan on Datapedia. I got an email asking me to check it out, and they explained to me the entire context. I was very skeptical. I said ‘I don’t believe you. That could never be done.’ But I spent two or three hours checking it out, and it was actually very useful.
It had a database that I used a lot. And before Datapedia, I had talked with 20 or 25 people to try to establish the metadata for just some of those columns. And when I saw that it was systematized (in Datapedia) with a nice interface that made sense, that I could check for things, and then look at the structural integrity and see a sample, I almost wept. I said, ‘That is not possible. I can’t believe it.’”
Pedro Ribiero, Chief Data Scientist
While Porto’s data scientists once practically searched through an org chart to find answers to their questions about data, Datapedia has enabled them to locate all available data, and the context around it, in a simple search. And rather than spending time locating data and understanding its relevance, a quick glance at the definition of a table means data scientists can continue searching, or begin designing a better data model, sooner.
“As it grew, almost all of our data was inserted into it, and it really changed how we work. The one sense that is prevailing in every interaction I have with the tool is ‘Wow, I can do this by myself.’ That’s a freeing sensation,” Pedro explained. “I just type what I would like to find out, and if it’s within our reach, then it will just pop up. I try not to advocate for any kind of tool, but I will advocate for Datapedia and Atlan because it really is tremendously useful.”
Driving a 40% Efficiency Gain by Automating Data Governance
Porto’s Data Governance team includes just five members, and while they’ve achieved remarkable velocity releasing Datapedia, they are ultimately responsible for a vast data estate, containing upwards of 1 million assets.
To extend their reach across this large amount of data, Danrlei’s team relies on Atlan Playbooks, rules-based bulk automations, to significantly reduce the manual effort they once spent defining asset owners, enriching data assets, and securing sensitive data. As a result of these automations, Porto’s five-member Data Governance team has yielded a 40% reduction in time once spent on manual governance tasks.
If we consider everything we’re doing now with Atlan compared to before we had Atlan, we are saving 40% in efficiency, in terms of time and expensive operational tasks for everything related to governance. This is a 40% reduction of five people’s time. We’re using the time savings to focus on optimizing our processes and upleveling the type of work we are doing.”
Danrlei Alves, Senior Data Governance Analyst
Despite already driving a significant leap forward in efficiency, Danrlei and his team are planning to place even more automations for manual work into practice, such as regular updates to the ownership metadata of their dashboards.
“It has been amazing. With the time we saved, we are increasing the scope of work we have, and projects we do,” Danrlei shared. “For the manual activities we still have, we’re finding a way to automate them, too.”
Driving this 40% gain in productivity are a number of automation workstreams that ensure Porto’s data assets have assigned owners, documentation is provided, and that sensitive data is secure.
Automated Ownership
The foremost use of Playbooks relates to Porto’s customer data, where indicating ownership is crucial. By understanding the source of customer data, such as a data lake belonging to Porto’s banking division, or a team responsible for a table, Atlan Playbooks automatically assigns the business domain or team responsible as the owner of each data asset in Datapedia.
Automated Documentation
Significantly reducing the effort spent on governance metadata across their roughly 1 million assets, Porto uses Playbooks to ensure that only the data assets that demand their personal attention are processed by the governance team, saving significant manual effort.
“We have tons of assets, and we don’t look at all of them the same way, or spend the same amount of effort on all of them. We have a classification for them that we call either complete governance, or simplified governance,” Danrlei shared.
Complete Governance refers to assets that require documentation, classification, data quality checks, and defined ownership. Simple Governance refers to what are usually temporary and intermediate datasets that are not consumed by end users. These assets require mapped lineage, cataloging, and ownership, but do not require complete attention from the governance team, and are not enriched with classifications or descriptions.
Prior to adopting Atlan, Danrlei’s team would use a spreadsheet to manually update these assets in bulk, into Informatica Enterprise Data Catalog.
“Back in the day, we had to manually update new fields within EDC. So let’s say we’ve got a new table, or a new data set. We would need to classify that as either Complete or Simplified governance, then we would have to make a bulk upload into EDC so we could have that classification, but we ran into errors all the time.” Danrlei explained.
To significantly reduce this manual process, Danrlei’s team utilizes Atlan Playbooks. A list of preset rules factors metadata such as usage from upstream systems, then automatically classifies data assets as either Complete Governance, placing the asset on a priority list for further enrichment and attention, or Simplified Governance, automatically enriching the asset, but indicating no further action is needed.
“The Playbook runs across our whole environment, over 1 million assets, and it classifies as Complete or Simplified based on the rule that’s passed. It’s easier with Atlan, because once we set the Playbook, it’s just going to run every day, or in the priority we’ve scheduled it,” Danrlei shared.
Automated Compliance
Finally, the Data Governance team is using Atlan Playbooks to ensure that sensitive information such as names, email addresses, and account numbers have strict access controls in Atlan, and are properly masked in upstream systems.
As a Banking and Insurance organization with 13 million customers, fastidious protection of sensitive information is a top priority for Porto. And enforcing those requirements is LGPD, Brazil’s General Data Protection Law, a 65-article regulation defining the rights of subjects of personal data, and the conditions under which that personal data can be collected, processed, stored, and shared.
“We need to ensure compliance with Brazilian LGPD laws. We can face penalties for non-compliance, and could be subject to lawsuits and consumer backlash if we have a data breach,” Danrlei explained.
Porto conducts stringent internal audits to ensure compliance with LGPD, and government regulators are permitted to audit organizations at their leisure, making it all the more important that their data is identified, secured, and masked.
Here, too, Atlan Playbooks is driving better compliance and is reducing effort by automatically tagging PII data.
“We’ve been using playbooks to help us classify PII data. We have preset rules that look for patterns in the names and descriptions of the fields, and if those patterns are matched, we classify that as a potential PII field. This has saved us tons of hours,” Danrlei shared.
Following assets being marked as PII, Danrlei’s team can now match the classification with logs from GCP Dataplex to confirm if the data is masked or not. If discrepancies are found, the assets will be manually reviewed in collaboration with their owners, ensuring that potentially costly compliance risks are identified and resolved earlier.
Building a Complete Lineage Graph with Atlan
Having already driven remarkable visibility across their data warehouses and lakes, the Data Governance team’s next priority will be enhancing Datapedia, and building their lineage graph both downstream and upstream.
With downstream assets in PowerBI and Tableau properly cataloged, Danrlei’s team can improve data literacy for an even larger number of Porto’s data consumers. With upstream source systems better connected to their data assets, quality, ownership, and literacy can be improved. And with an end-to-end view of their data estate, promising opportunities for storage and compute savings will manifest.
Achieving True End-to-end Lineage
The first priority for extending lineage is downstream toward their Business Intelligence tooling, where the bulk of data consumers interact with Porto’s data, and will benefit from understanding the context and ownership of dashboards.
“One of the directions that’s most exciting for me is the BI world. Most people are going to consume data from a dashboard, so we have already started cataloging our Tableau and PowerBI assets and that’s nearly finished,” Danrlei shared.
With the first step of cataloging their PowerBI and Tableau assets nearly complete, the Data Governance team will then identify the owner of each asset, and adjust their metamodel accordingly.
When Danrlei’s team has completed the process of extending lineage downstream to dashboard, their next priority will be pushing their lineage graph upstream. By mapping the systemic origins of this larger number of assets, Porto will better understand how their assets come into existence, better assess and improve quality, better define ownership, and improve data literacy even further.
When we get those owners involved, get literacy for these assets, and eventually make these assets available to everyone, we hope to have true end-to-end lineage. We’ll capture lineage from the systemic data source, to the metric, to the dashboard that’s being used to report to the executive board a certain number. Of course, it’s aiming to make Porto more data-driven, making better-informed decisions, making less mistakes, and having more quality.”
Danrlei Alves, Senior Data Governance Analyst
Extending the Value of Lineage
Making the most of this expanding lineage graph is a priority for Danrlei’s team, who are considering how to improve their Root Cause Analysis and Impact Analysis processes using Atlan. But emerging sooner is an opportunity to optimize their data estate, using Atlan’s automated lineage to identify warehouse assets and pipelines that aren’t consumed downstream, saving compute costs and simplifying navigation.
“We’ve been using lineage to identify assets that aren’t being used. And that’s pointing us to the FinOps world where we want to find efficiencies, finding databases that aren’t being used so we can cut costs by dropping the table or the pipeline. Those candidates for deletion came from Atlan. They came from having the lineage, which is super helpful, and usage metrics,” Danrlei explained.
With candidates for deletion in-hand, the Data Governance team is collaborating with their colleagues in Porto’s broader data team to determine whether these assets and pipelines are deprecated, or if they should be maintained for future use.
More than Just a Data Catalog
Day by day, the use of Datapedia grows within Porto, now approaching 300 users in just four months. And with a goal of 1000 users by 2025, Danrlei’s team is building a foundation for not just a data catalog, but a vessel for democratizing knowledge across Porto’s entire business.
“Our idea was to build more than a data catalog. We don’t only want to have the tables, the columns, and the documentation, we want to expand our domains to go beyond data literacy,” Danrlei shared.
In Datapedia, Danrlei and his team envision not just a data catalog, but a hub of business knowledge where any employee can ask any question they might have about Porto’s business. And in this hub of knowledge, every detail an employee needs about concepts from Churn Rate to Lifetime Customer Value would be explorable and explained plainly, without the need to ask a single question of a colleague.
I think users’ lives will be much easier, and they will be much more productive because they’ll be able to find any information they need, from anywhere in Porto, in one place, and they’re going to be able to get the context around that information that they need. They’ll know where data is, and who’s responsible for it. They’re going to be able to request access to information if needed, and they’re going to know who to talk to in order to prioritize that request. And at the end of the day, they’re going to be able to get access to that data and extract insight from it at a very low time-to-insight.”
Danrlei Alves, Senior Data Governance Analyst
And beyond tenured employees, as a new generation of interns, apprentices, and new hires join Porto, Datapedia will serve as a conduit for better understanding their business. Unlocking data from insurance domains to banking domains, Danrlei’s team has used Atlan to build a roadmap toward a definitive source of knowledge about Porto, capable of breaking down silos, enabling new ways of thinking and analysis, and driving the next big cross-functional opportunities as the organization continues to grow in its fifth decade in business.
An Indispensable Partner
Reflecting on what Porto’s data team has been able to accomplish in such a short time, Danrlei expressed the criticality and appreciation of the partnership they’ve built with Atlan.
We’ve used Atlan for everything. We use it to discover data, to crawl metadata, to do lineage, to extract queries, to establish the relationships between tables and between databases. The way that we organize, the way that we transcribe our metamodel, the way that we use connectors, and the way we built this conceptual vision of what is a term, and what is a data product? These are all features of Atlan.
Danrlei Alves, Senior Data Governance Analyst
Atlan is the catalyst to make all this happen. What we want to build in the concept of Datapedia would take a lot more work elsewhere, and in a nutshell, that’s the main role of Atlan. It’s helping us get there in a much faster, effortless manner.”