Accelerating Root Cause Analysis by 50% and Saving Thousands in BigQuery Costs with Atlan
At a Glance
- Takealot, a South African eCommerce and Retail leader, sought a solution to improve technical understanding of their data estate and drive business self-service.
- By adopting Atlan, Takealot’s data teams benefit from an end-to-end view of their data estate, and data consumers enjoy a simple, self-service view into the data assets available for their consumption.
- Atlan’s automated lineage and popularity metrics drove significant time savings across root cause analysis and impact analysis processes, and significant cost savings as unused BigQuery assets were deprecated.
Takealot is a South African eCommerce and Retail leader encompassing three core businesses, Takealot.com, an e-Commerce business serving 1.8 million shoppers, Mr. D, an on-demand food delivery service with 1.6 million monthly deliveries, and Superbalist, an online fashion retailer.
With a five-year goal to become the number one eCommerce provider in Africa, data is crucial to Takealot. And leading their Business Intelligence function is their Group BI Manager.
“What the data team is trying to do to support that vision, is to have well-implemented data systems that allow our business to derive insights, and get the data they need, when they need it. It’s the core of what we’re trying to provide today, without being a bottleneck,” he shared.
Maturing and Centralizing Team and Technology
“I think we’re in the most mature state that we’ve been in for the last five years, and roughly three years ago, we did a re-org that changed how the data teams work,” he explained.
Where Takealot’s data team was once loosely coupled into the broader technology team, with Business Intelligence reporting into business lines, the team was centralized, with a Data Engineering Director building a five-year technology and organizational roadmap. Four teams now report into this structure, including Data Engineering, Data Ops, and more recently, Analytical Engineering. The fourth is Business Intelligence, a 16-member team with BI managers and analysts specializing in each of Takealot’s business domains, as well as shared services. Rounding out their data function is Data Science, sitting separately from the core data team and focused on novel ways of reaching new customers.
With their team better-organized, Takelaot plans to further centralize their Business Intelligence functions, yielding even more value from shared processes and systems.
Something that we’re currently working on, hence where all this change is coming from, is that we’re trying to standardize and centralize the Data Analyst function. At the moment, they all form part of each one of the business units. But we want to standardize all the tools that they make use of, how they’re measured and monitored, and what best practices they should be following.”
Group BI Manager, Takealot
Emerging to support this centralization and maturation is a modern data stack of predominantly Google Cloud Platform tooling, including Dataform, BigQuery, and Looker, supported by real-time streaming using Kafka.
Active Metadata for Lineage and Self-service
Takealot’s search for an Active Metadata Management solution was inspired by a time-consuming migration toward BigQuery and Looker, complicated by unclear lineage during development, and numerous questions from data consumers, once released.
“There’s a few key components that raised the question that we might need something. We were driving a migration to get off of QlikView and QlikSense, and it was taking a really long time to identify data lineage, where data was coming from, and where it might be breaking. We were essentially rebuilding everything from scratch,” he explained. “When we put things live and we were getting the business to test it, they would ask us questions about where data comes from, why it came from those places, and how we were calculating things.”
Without a solution in place, Takealot’s data team would continue to manually crawl, system-by-system, each time a breakage occurred, distracted by a growing volume of questions from data consumers.
So the first question was ‘How do we find a way to help speed up our own development work?’ And number two, ‘How do we speed up time-to-answers for our users? Instead of coming to us and slowing us down, how do we enable them to help themselves?’ Another reason why we looked is that self-service has always been a big element of the strategy defined by our Engineering Director.”
Group BI Manager, Takealot
Rather than a simple lineage tool, or a catalog of assets, Takealot would need an all-in-one platform that could serve the needs of deeply technical data experts, and eager data consumers, alike.
Drawing on a number of their leaders’ experiences with Data Lineage tools, the Takealot team began a formal evaluation of the market. Using BigQuery’s data catalog was quickly discounted, as it demanded too high a level of technical aptitude for Takealot’s data consumers. Beginning with 15 vendors ranging from large legacy solutions to new startups, Takealot’s team slowly narrowed down the list of 15 potential solutions to five, then two finalists, including Atlan.
Finally, conducting a multi-criteria analysis including business users, data analysts, and technical engineers, Atlan became Takealot’s all-in-one platform of choice to improve technical understanding of their data estate, and to drive business self-service.
Implementing Atlan
First on the list for implementation was connecting with critical portions of Takealot’s data stack, including BigQuery and Looker, to enable automated lineage, and to begin surfacing their data assets.
“Atlan provided a Customer Success Manager and a technical contact, as well, and they guided us through the process of connecting our various sources. As we came up with problems or roadblocks, they would jump right in and help us out,” he shared.
With lineage working and their assets accessible, the enrichment process began, filling in the description and ownership metadata necessary to enable self-service. Starting with the work they had already done defining assets and ownerships in their evaluation of the market, BI Analysts responsible for each of Takealot’s business units updated descriptions of their Looker assets. Then, in collaboration with domain experts in the business, the BI team began defining ownership for their data assets in Atlan.
“We had a phase where on certain assets you would find five people across the business that said ‘I’m fully responsible for this.’,” he explained. “The process of ironing out who owns what, or who should or shouldn’t have a say when we change a certain data asset is an ongoing piece of work.”
While certain departments like Marketing would have bespoke data sets using tools like Google Ads or social media platforms with clearly defined ownership, data shared across Takealot’s functions meant multiple domains could claim ownership.
Data sets like “Orders” or “Customers”, for instance, are used across domains like Finance, Supply Chain and Logistics, eCommerce, and more, each with a critical stake in how these assets are managed, and the reports that flow from them downstream. Asset by asset, the BI team worked across these teams to determine their primary owners, and to ensure all stakeholders agreed on the final definition of each, then making them available in Atlan for consumption.
“In the process of updating this metadata, we’ve partially opened up a can of worms in some spaces, but we’ve got businesses talking to one another, so it’s a positive thing,” he explained. “It’s helping us clean up how we measure KPIs across the business, because something like GMV (Gross Merchandise Value) might not be the same across five or six different business units.”
Takealot’s enrichment process is still ongoing, but progressing well, with some business units achieving 90% enrichment across their critical assets, paving the way for improved productivity for the data team, and confidently self-servicing business colleagues.
Saving Time and Cost with Automation and Lineage
As Atlan becomes a more critical part of technical team workflows at Takealot, a mixture of automated lineage and popularity metrics are driving significant time savings across Root Cause Analysis and Impact Analysis processes, and significant cost savings as unused BigQuery assets are deprecated.
Root Cause Analysis Driving 50% Reduction in Time-to-resolution
The most significant value yielded by Takealot’s data team stems from using Atlan’s automated lineage to conduct root cause analysis.
When Takealot’s data team is informed of a bug, they assign story points to its investigation and resolution. And prior to Atlan’s adoption, determining what might be breaking a pipeline and where the breakage occurred represented 50% of their time to resolution.
Instead of trawling through all the code, you can quickly follow lineage backwards and check it at every point to see what’s happening. Before, it could take a week or two weeks depending on how difficult a bug was to manage, with 50% of that time being investigating what the problem was and where it’s broken before actually applying the fix and getting it into production. I’d say we’ve probably halved that time. For a two-week breakage, we would spend a week investigating before spending the next week fixing, and we’re now only spending two days, max, on investigating what the problem is because we’re able to dive through it so much quicker, and follow the chain.”
Group BI Manager, Takealot
Avoiding Risk with Impact Analysis
Automated lineage is also driving improvements across all of Takealot’s engineering functions, not just Data Engineering. When making changes to applications and upstream systems that involve a change to core databases, engineers now explore their lineage to understand downstream effects, flagging potential breakages to critical reports and driving better decisions.
“That’s been really helpful, it’s reducing risk for them quite a lot,” he shared.
Saving Thousands by Deprecating BigQuery Assets
Finally, a mixture of automated lineage and popularity metrics are beginning to uncover opportunities to optimize Takealot’s data estate. Using Atlan, Takealot’s Analytical Engineering and Business Intelligence teams uncovered tables and models in BigQuery with little to no usage, and analyzed what deprecating them might save in storage and compute costs.
With an estimated cost savings in mind, the team created a checklist of tables that would either be deprecated, or merged into existing tables, and began the work of optimizing BigQuery. While asset deprecation is still ongoing, Takealot’s BI team have driven nearly $6000 in annual savings, so far.
“At the moment, we’re saving close to $500 per month based on some of the initial work that we’ve done. And we’ll obviously continue to build that out and come up with an overall savings this has provided us,” he shared.
After working through a promising backlog of cost-savings opportunities, Takealot’s data team plans to analyze their BigQuery usage proactively, creating tickets for Business Intelligence teams to conduct cleanup activities on a monthly basis.
Increasing Data Consumer Confidence and Efficiency
Also benefiting from Takealot’s new catalog, built on Atlan, are Data Analysts and Product Owners across Takealot’s business units. Understanding the data available to them, then to get access to it, once drove a high volume of questions to the data team. Complicating data discovery were access policies in BigQuery, meaning for many projects, only Data Ops personnel and Database Administrators were permitted to broker viewership of this data upon request.
“What Atlan has helped them do is give them a shopping window into what’s available in BigQuery, because they can see everything that’s available in every project, without actually having access to it,” he explained.
With these assets now available in Atlan, the BI team have removed this roadblock, replacing the time-consuming process of asking for access, deliberating on the level of access, and manager approvals.
Enhancing the value of this new accessibility is the use of Atlan Insights, a metadata-based query builder, enabling data consumers to run simple queries to more deeply understand data before an access request.
They’re able to run very simple queries on the metadata that’s been ingested. So if they quickly want to see if they can join up three or four tables and what they would look like, then can do that within Atlan, then go back to the Data Ops team with their manager who approves it. We’ve taken away that frustration, because they can now see exactly what’s there and identify what they need. So we’ve improved the process, and the time-to-insight for them. The noise has gone away, and there aren’t any fights between managers and analysts, and DataOps or BI.”
Group BI Manager, Takealot
Finally, ownership metadata and lineage are driving a better understanding of Takealot’s data among consumers. For each report or model that’s provided to these consumers, Atlan enables them to follow a “breadcrumb trail” to the source of data, leading to more informed questions and requests to asset owners.
“It’s helped them with a sense of confidence that what we’ve built is working. If it isn’t they can point it out quite early and say ‘It’s not using the right table. Can you make it use this table or column?’ So it’s speeding up their analysis time, as well,” he shared.
A Foundation for Governance, Quality, and Ownership
With significant value achieved across technical and non-technical teams, Takealot’s Group BI Manager envisions Atlan as the foundation of a Data Governance Program. Today, the bulk of requests to change the way data is managed still flow through Takealot’s data team, but in the future, asset ownership will mean not just subject matter expertise, but true authority and responsibility over the way data is classified, managed, and consumed.
“If people want to make changes or suggest different ways of doing things, they need to be talking to each other and not using us as the middleman. One of the things we’ll need to get right is governance within Atlan, setting up relevant groups, and making business owners the managers of these groups,” he explained.
Complementing the value of Atlan as Takealot’s governance platform of choice is burgeoning work on data quality, with the BI team planning on introducing quality data into Atlan, improving future asset owners’ understanding of their domains, and driving better collaboration.
Right now, we’re just touching the surface with descriptions, assets in one place, and ‘Here’s your shopping window.’ But the next step is how do we become better at managing those data sets on a day-to-day basis. So we want to try and use it as our whole governance platform. We really want to be the enablers and not the owners of data sets. They need to be the owners, make decisions on what needs to happen with their data.”
Group BI Manager, Takealot
Photo by Anna Permyakova on Unsplash