A tried-and-tested approach to democratizing tribal knowledge in TechStyle’s 50-member analytics team with Snowflake, Atlan, and Tableau
Rolling out a new data platform is no small feat. Doing it despite legacy backends, an almost brand-new team, and a sudden shift to working from home… that’s a huge challenge. But that was exactly what TechStyle set out to do at the beginning of 2020.
Founded in 2010, TechStyle Fashion Group is a fashion retailer with a portfolio of five brands — Fabletics, Savage X Fenty, JustFab, FabKids, and ShoeDazzle. By integrating data science and personalization with a membership model, the company has grown to become one of the world’s largest membership-based fashion companies (with 5.5 million members and over $750 million in annual revenue).
TechStyle has built its business model around embedding data across its operations. Think personalized customer experiences on its website, digital supply chains rooted in predictive analytics, and warehouses run on IoT devices.
To handle this data, TechStyle uses a “hub-and-spoke analytics model”. Each brand within TechStyle has its own embedded Analytics Team. However, it doesn’t make sense for each team to create its own technology, so that’s where the Data Platforms Team comes in. Its role is to enable every functional analytics team by creating and managing common data systems. It runs the company’s core data platform as well as data engineering, integrations, architecture, and governance.
In March 2020, TechStyle embarked on an initiative to overhaul these common systems and roll out a new data warehouse, spearheaded by Danielle Boeglin (VP of Data & Analytics) and Rachana Mukherjee (Director of Data Analytics on the Data Platforms Team). Along the way, Danielle and Rachana needed to tackle a problem that had plagued the company for years — making data discoverable and understandable to everyone, not just long-time team members.
As the co-founder of Atlan, the unified data workspace that TechStyle used as part of this initiative, I saw the evolution of their new data warehouse up close. Setting up documentation and metadata from scratch can be difficult, so I thought it would be valuable to share their process for other modern data teams looking to set up a better data system.
Through a series of conversations with Danielle and Rachana, I’ve written this article to dive into the behind-the-scenes process of how TechStyle structured its modern data warehouse and data analytics team. It also explains their process to document years of tribal knowledge through Agile sprints, and the importance of democratizing tribal knowledge and championing data management.
Setting a foundation for accessible data across the company
In early 2020, TechStyle decided to move its systems to a Snowflake data warehouse. Even though it was a major undertaking, this project gave the company a chance to reset and plan for the future.
TechStyle already had a bent toward traditional EDW design. But to Rachana, there was no question that the new Snowflake system should avoid the traditional data warehousing paradigm.
Traditional data warehousing can be quite expensive and time-consuming. It does extend the time to insight. So by the time you’re delivering the insight, it’s already potentially too late for the business. We need to adapt.
Rachana Mukherjee, TechStyle
Choosing a better path forward, though, was a tougher question. “Things are moving so fast now…” she said. “I think it’s difficult for a lot of people in our position, that are in a data leadership position, to say, ‘Okay, this is the way that we should do things’ because there’s no guidance in the market right now. We’ve been left to our own devices.”
Instead, TechStyle opted for an ELT style of data engineering, where they load the data as-is from the source. Once the raw data is loaded, TechStyle uses a hybrid approach to model whatever needs to be modeled and happily leave the rest untouched.
“You don’t always have to model the data,” said Rachana. For example, objects that will be reused in the future (such as history tables) are modeled, processed, and then stored for use on an ongoing basis. But for areas with a lot of unknowns, there’s no need to go through the trouble of premature modeling.
The data-to-knowledge gap
From the very beginning of this project, TechStyle knew that data discoverability was a big priority.
“There is, unfortunately, a huge gap in knowledge,” Rachana said. “We’re onboarding analysts, but they’re not as effective because they don’t understand the data.” Danielle added, “With our software and code, our workflows in Git make this really easy. We really needed an equivalent for our data.”
The Data Platforms Team gets its data from upstream systems within the company, such as their homegrown warehousing and e-commerce systems. These systems are over a decade old, so they have a long legacy of being managed and used a certain way.
However, documentation for these systems was usually limited or non-existent. “Previously our data documentation did not extend beyond documenting columns in our tables,” said Danielle. “The documentation was disconnected from our day-to-day workflow.”
This was compounded by the growth of data sources that weren’t owned by TechStyle’s central data team. As the company scaled and embedded technical resources directly into each brand team, teams started to create their own data sources. Though these were managed by individual brand teams, they were often used across the organization. “This introduced confusion and complexity over who owns and certifies the data,” said Danielle.
This documentation problem came to a head in early 2020. At that time, two people in the Data Platforms Team had been with TechStyle for a long time, over 7 years each. But the rest of the team, including Rachana, was fairly new. Normally, new team members would be onboarded gradually on these legacy systems in the office. But in March 2020, everyone started working remotely because of COVID-19.
Suddenly the informal information flow that worked naturally in the office came to a halt, and the two older team members became a bottleneck for transferring knowledge to the newer team members. “It’s a bit frustrating”, Rachana said. “We want to move faster, but we can’t, because we haven’t been here very long… Short answer, it’s not ideal.”
The result — data documentation became a priority of TechStyle’s new data warehouse from the beginning. Documenting their entire data system from scratch would be a massive undertaking, but it was well worth the time and effort.
It was basically driven by the need for agility. Agility was getting broken. It was just taking a lot longer for people to become effective in the company, because of all of this tribal knowledge that just didn’t exist anywhere.
Rachana Mukherjee, TechStyle
Implementing Atlan for better data documentation and visibility
As the VP of Data & Analytics, Danielle was personally involved in selecting the right tool for TechStyle.
“We were looking for a product that made it easier to democratize our data and was less dependent on someone central answering each individual analyst’s questions on a one-off basis,” she said. “Many data catalogues we explored in the market are tailored to more legacy systems and approaches where IT or a single data stewardship team owns the data. We chose Atlan because it integrates with our modern analytics stack such as Snowflake and Tableau and creates a collaborative experience for sourcing documentation.”
After deciding to move forward with Atlan, the Data Platforms Team quickly integrated it with TechStyle’s new modern data stack — Snowflake, Tableau, Apache Airflow, and Git.
We had all of our data sources flowing within the first day. The user interface made getting up to speed quick and easy and allowed us to source data documentation collaboratively across our organization.
Danielle Boeglin, TechStyle
Then came the hard part — organizing and documenting their data assets. It was time for TechStyle to start building a knowledge management process as part of the rollout of the new data warehouse.
Instead of creating this new documentation from the top down, Rachana rolled it out organically and collaboratively with the rest of the Data Platforms Team. “It was a mix between being prescriptive and also just letting the team drive it,” said Rachana.
Creating a replicable documentation process through agile, iterative experimentation
The biggest struggle for great documentation is creating standards that are easy for the entire team to adopt.
The TechStyle team handled this intelligently. Instead of starting with a long, top-down strategic process, they picked one use case as their initial prototype on Atlan. They then used their learnings to quickly create and validate new company-wide documentation standards.
Here’s the process TechStyle used to build their documentation from scratch:
- Choose an MVP: Roll out a data governance and documentation tool (Atlan) for one use case, the EDW.
- Test a few cases: Choose a few easy tables and ask team members to document them.
- Learn from the results: Compare and learn from the initial results to create common documentation standards.
- Sprint and refine: Carry out a series of sprints to continue building new documentation for other tables and columns in the EDW, all while refining the documentation standards.
Starting with an MVP to find patterns and create guidelines
The Data Platforms team knew that there were 718 columns in TechStyle’s main EDW database, and each one needed a clear explanation of its meaning and/or source. Rachana started with aligning on how to document these columns.
Instead of trying to tackle all the columns at once, their first step was to break down the challenge into smaller parts. First, the team tackled a few easy examples. They made a list of all the tables, identified the low-hanging fruit, then divided and conquered by assigning analysts with the most context to document the columns in each table.
Next, they came back together as a team to compare their results and create guidelines for future documentation. “Some of this was easy… I did see initially that there are a lot of natural patterns in the data,” said Rachana.
Creating data documentation standards at TechStyle
The team ended up deciding on basic hygiene for its data documentation. For example, here are a few of their basic rules:
- All documentation should be grammatically correct.
- The first letter of a column description should always be capitalized.
- There shouldn’t be a period at the end.
The team also tackled rules for specific question types. For example, they decided that Booleans need a prefixed “is” (e.g. “is_live”) and a standard description (“1 = True / 0 = False”).
They also were able to create rules for unambiguous columns in the data. For example, everyone on the team knew what the “Address_Key” and “Address_ID” columns meant. (A key is an artificial key that they assign or create in the data warehouse, while an ID is a field that naturally exists in the upstream source systems.) So they decided how to identically document any address key or ID column, no matter the data source or the person writing about it.
As they tackled more tables and more patterns emerged, the team was able to confidently create guidelines and make decisions upfront.
We took the best things that came about, and then we made decisions. We didn’t make decisions in a vacuum.
Rachana Mukherjee, TechStyle
Using Agile sprints to build a culture around data documentation
One thing that made this iterative process possible was using Agile sprints to quickly experiment, document, and carry learnings forward.
As soon as Rachana joined TechStyle, she put the Data Platforms Team into an Agile formation. “All of the work that we’re doing is planned upfront with these two-week sprints,” she said. “We have a Scrum process where you have all of this work that’s being done by the team right now.”
To make documentation part of everyday work, the team added documentation tasks into each sprint. For example, each sprint would include a list of new tables or columns that needed to be documented.
These sprints also included quality checks for new documentation. To make sure that the team was following the rules they had created, there was an approval process. The team designated an approver to review new comments and push them into Atlan.
What’s on the horizon for TechStyle’s data platform?
TechStyle is already looking past the rollout of this new data warehouse to the future of data within the company.
“Data science is a huge buzzword, and people talk about data science quite a bit. But there’s hardly anyone sort of championing the data management aspect,” Rachana said. “I would see data platforms evolve to be more strategic in nature in the future. I would like to become more strategic and work with the business and add value, rather than just being the people that maintain the data and administer the data.”
Within TechStyle, the Data Platforms Team occupies a pivotal role — the behind-the-scenes experts who help everyone work together better with data.
As they know far too well, collaborating on data is a major pain point. But with the right platform and great documentation, this rollout is a valuable chance to create a shared workspace where data and business alike, across different teams or brands, can work together seamlessly to tackle any business problem.
Big thanks to TechStyle, Danielle Boeglin, and Rachana Mukherjee for giving their time and support to this article! ❤️
This article was originally published in Towards Data Science.