How a fast-growing fintech improved GDPR compliance with Atlan in hours, not months
At a Glance
- Tide, a UK-based digital bank with nearly 500,000 small business customers, sought to improve their compliance with GDPR’s Right to Erasure, commonly known as the “Right to be forgotten”.
- After adopting Atlan as their metadata platform, Tide’s data and legal teams collaborated to define personally identifiable information in order to propagate those definitions and tags across their data estate.
- Tide used Atlan Playbooks (rule-based bulk automations) to automatically identify, tag, and secure personal data, turning a 50-day manual process into mere hours of work.
Tide, a mobile-first financial platform based in the UK, offers fast, intuitive service to small business customers. Data is crucial to Tide, having supported its incredible growth to now nearly 500,000 customers in just eight years. But in financial services, data acutely presents risk and demands careful and fastidious protection of sensitive financial information. These risks only increase as enforcement of GDPR increases, with nine-figure fines levied against offending firms in just the last few years.
Recognizing the immense opportunities presented by data, Tide’s CEO, Oliver Prill, recruited Hendrik Brackmann to build a data science team. “The ambition at that point wasn’t so much to build a data organization. It was about where we could use machine learning at Tide”, Hendrik shared, “but it quickly became clear that you can’t realize that if you don’t have a data platform.”
The journey toward data maturity was a daunting one. Originally reporting into the Finance team at Tide, the data platform team consisted of just two employees. It became Hendrik’s responsibility to grow not just an advanced data science team, but to choose the right data platform technology, and to propose, build, and scale data and reporting teams.
“We looked very deeply into how our organization should look,” said Hendrik. “We made a number of changes, from splitting roles between analytics engineers and analysts, to starting a data governance team.” And along with personnel growth and a more mature support model to support Tide’s growth, Hendrik ensured that his team was aligned to business needs, delivering transformational solutions like a transaction monitoring system, support for revenue identification, and machine learning–powered risk scoring.
In just four years, Hendrik grew the function to a team of 67 across data engineering, analytics, data science, and governance. It was during this time of extreme growth that Hendrik recognized room for improvement: “We grew very quickly, and we saw we weren’t as efficient as we thought.”
While Tide’s data team had matured by leaps and bounds, as a regulated entity, compliance was a high priority that demanded huge effort and attention. “The legal team rarely spoke with the engineering functions. It was a bit isolated,” Hendrik said.
Early Days of Data Governance
Recognizing that collaboration between legal and technical teams had to improve, Hendrik began searching for a data governance expert. He met Michal Szymanski, who would become Tide’s Data Governance Manager. “The initial idea was to hire Michal as a bridge to the privacy function,” Hendrik remarked.
Michal joined Tide as a one-man team. “My scope of responsibilities increased a lot,” said Michal. “I had to deal with a vast array of challenges, starting from understanding where data governance could help in such an organization.” He began by attempting to understand his stakeholders’ needs. “I had to start by interviewing many people across different business areas to understand what they needed.”
Founded in 2016, Tide had little of the technical debt or legacy technology that typically burdens traditional financial services organizations. Their data stack consisted of dbt, Airflow, and Snowflake, with Looker downstream as their Business Intelligence (BI) layer. While Tide had invested in the right technology, Michal learned that his colleagues found it difficult to understand how data traveled across their stack.
Hendrik saw this challenge as an opportunity for growth.
We wanted to embed data protection and privacy into our running processes, rather than discussing it at the end of projects.Hendrik Brackmann
By combining Michal’s new governance function, an understanding of data lineage, and common definitions of data, they could achieve the collaboration they had been missing.
Hendrik and Michal began searching for a solution. Summarizing the path forward, Michal explained, “We needed to have a platform where we could put all such interesting information to help users navigate the data that we have. So my first task was to identify a data catalog.”
Adding a Context Layer
After a thorough evaluation of the market, Hendrik and Michal chose Atlan as their data catalog.
[Atlan] integrated seamlessly with all of our tools, and we felt it was very easy to use.Hendrik Brackmann
Starting with a few key problem statements, Tide implemented Atlan to improve data discovery, visibility, and governance in the short term, and democratize data access and understanding in the long run. To start, Hendrik ensured that Atlan was properly integrated with their data stack, and was capturing all relevant metadata.
With Atlan, technical and non-technical users could find the right data asset for their needs, quickly and intuitively, reducing the time it once took to find, explore, and use data across tools like Snowflake, Looker, and dbt. Using Atlan’s data glossary and metrics, Tide began to enjoy better context surrounding their data domains, which set the stage for standardizing classifications of sensitive data like personally identifiable information. And lastly, Atlan’s automated lineage added transparency so Hendrik’s team could understand where data came from, how it transformed throughout the data pipeline, and where it was ultimately consumed — something they couldn’t do before.
Tide grew to use Atlan to support a wide array of users and business units, from Legal and Privacy, to Data Science, Engineering, Governance, and BI colleagues. With improved context, higher trust in data, and democratized access to Tide’s data, Hendrik began to consider new use cases: “We were looking to identify how we could drive process efficiencies in our analytics and engineering teams.”
With a 360-degree view of their data estate, the stage was set for Hendrik’s team to build broader, more mission-critical solutions.
The GDPR Challenge
After using Atlan to better understand their data estate, Hendrik’s team was ready to support a crucial use case.
“Like every company, we need to be compliant with GDPR,” said Michal. And a key component of GDPR compliance is the right to erasure, more commonly known as the “Right to be forgotten”, which gives Tide’s customers across the European Union and the United Kingdom the right to ask for their personal data to be deleted.
Tide’s data team understood these obligations well, but the process of compliance was difficult.
Our production support team had a script, and whenever someone wanted to delete data, they would go through our back-end databases and delete personal data fields.Hendrik Brackmann
And while the support team’s script managed a significant amount of data deletion, manual effort was needed to find and delete data that persisted elsewhere in secondary systems that had local projections of the personal data fields. Michal explained, “The process was not capturing data from all the new sources that kept appearing in the organization, just the key data source.”
Complicating this challenge was a lack of shared definitions of personal data, with differing opinions on what constituted personally identifiable information across organizations from Legal to IT. This meant that completing the “Right to be forgotten” process involved frequently re-litigating definitions.
While Tide was doing its best to comply with GDPR, as its technology stack and architecture grew more complicated, new products and services were introduced, and customers increased over time, the compliance process took only more time and effort.
Automating this process became a priority. In an ideal world, when a customer exercised their right to be forgotten, a single click of a button would automatically identify and delete or archive all data about the customer in accordance with GDPR. Immense manual effort, and the risk of delays or human error, would be eliminated.
That’s exactly what Hendrik set his team to do.
Driving Common Understanding
Before pouring resources into solving the problem, Hendrik and Michal needed to justify the effort to their colleagues. “It required detail to be presented to senior leaders in order to decide that we would invest time and money in solving such a problem,” said Michal. “That was crucial, because no one really wants to invest unless it means some increase of revenue or cost savings. We said we can avoid fines and we can make sure the company is handling personal data at a high level.”
The case was so strong that solving the problem became a team OKR. With their goal in hand, Hendrik asked his team to understand the problem in greater detail: “The very first step was to figure out where we had this kind of data, then identifying ownership.”
In his role as a bridge between the data team and its business counterparts, Michal worked with the Legal team to establish what did or did not constitute personal data. And to ensure the teams were collaborating smoothly, Hendrik established a cross-functional working group. “It’s just getting the right people in a room and then getting them to talk,” said Hendrik. “Our biggest contribution was bringing people together and keeping them focused.”
By bringing technical teams and domain experts together, Hendrik ensured every voice was heard and that his team remained focused on collaboratively delivering value, rather than arcane technical concepts. Recalling an example of how strongly the team collaborated, Hendrik shared, “We had our privacy lawyer on the call when we discussed architecture. He could answer any questions that might come up directly.”
With these definitions in hand, Hendrik and Michal began comparing them against existing documentation and processes. “There were a couple of places where different people were trying to list personal data. So the front end team did this, and the back end team did that. Some product managers did the same, and they were not consistent,” Michal explained.
Further, while his colleagues had a good command of their data, they often had trouble communicating the data’s definitions — a key part of good data governance. Oftentimes, column names would serve as definitions. “In many cases, it was not precise enough,” said Michal.
With clear misalignment, Tide needed more precise documentation and process. Atlan presented a straightforward way to solve this challenge. Hendrik’s team would take what they learned from their research (including new definitions of personal data, opportunities for improvement, and owners of data) and document it once and for all in their catalog.
We said: Okay, our source of truth for personal data is Atlan. We were blessed by Legal. Everyone, from now on, could start to understand personal data.Michal Szymanski
From 50 Days to 5 Hours
With their data estate integrated with and made navigable by Atlan, Tide used automated lineage to quickly and easily determine where personally identifiable data lived, and how it moved through their architecture. Starting by identifying the columns and tables where personal data persisted, the team then used Atlan to track it downstream.
Michal explained just how valuable lineage was to the team: “This was very useful. It showed us how much data we have in our data warehouse, and then we could also extrapolate this to the upstream sources of Snowflake. We knew we had it in Snowflake because it’s coming from this and this database. So we informed the teams that they had a lot of personal data and we needed to come up with a design.”
Next, Hendrik’s team decided to properly tag personally identifiable data, and add their newly determined definitions. Assets stored in Snowflake, like account numbers, email, phone numbers, and more, would be searchable, but properly secured and masked in the Atlan UI.
While worthwhile, the manual effort involved was daunting. Michal explained, “People would have to go into the databases and try to translate my list of personal data elements. There were 31 elements to find in our databases, and we have more than 100 schemas, each with between 10 to 20 tables. So it would be a lot of work to identify it.”
Making assumptions about which schemas might contain personally identifiable information could save time, but this wasn’t an option. The risk involved meant Michal and his team had to be precise, searching and tagging location-by-location, or it would prove costly.
If we were very diligent and did it for every schema, then it would probably be half a day for each schema. So half a day, 100 times.Michal Szymanski
After discussing this scope with the Atlan professional services team, Michal learned about Playbooks, a feature unique to Atlan. Instead of spending 50 days manually identifying and then tagging personally identifiable information, Tide could use Playbooks to identify, tag, and then classify the data in a single, automated workflow.
Hendrik’s team was ready to spend 50 days of effort on a task that would make clear improvements to Tide’s risk profile. But after integrating their data estate with Atlan and driving consensus on definitions, they used Playbooks’ automation to accomplish their goal in mere hours. Michal explained, “It was basically a few hours to discuss what we needed.”
After saving nearly 50 days of work, Tide can now make further improvements to their process, far sooner than expected.
In the months to come, the team is building a microservices-based orchestrator to handle requests from customers about their personal data. It will then be enhanced to anonymize data in accordance with GDPR standards for de-identification and Tide’s data retention obligations as a regulated business. Here, too, Atlan has helped. Tide’s engineers can build these solutions more quickly by referencing the information and lineage made possible by Hendrik’s team and Atlan.
I would say I got great assistance from the Atlan team, who were with me on the whole journey. I would have never thought about Playbooks. It was suggested in the right way for the right use case.Michal Szymanski
As for Hendrik, his team’s accomplishments mean the realization of his vision from the very beginning of his time at Tide. “Over the last year, we’ve managed to move closer to the business. Being able to create this kind of organizational change is something that I feel very proud of.”
With a significant win for his team in hand, enabled by the right technology and guided by the right strategy, Hendrik shared his advice for fellow data leaders. “Focus on business value, and the actual value you’re generating for your organization rather than finding a process everyone in the industry follows and adopting the same thing. Don’t try to do governance everywhere. Figure out what data sets are relevant to you, and focus on these ends.”
Learn more about Atlan’s Playbooks and other supercharged automation features from 2022.
Header photo: Dan Nelson on Unsplash