Residential construction technology leader demystifies how to evaluate the Active Metadata Management market, then implement it successfully
My goal has always been for every single person in the company to have immediate access to data pertinent to their role upon their first day. They should onboard and get access to email, and then a modern data catalog that gives them data relevant to their role. Whether they’re in Sales, or Support, or whatever, they should have data that’s pertinent to them, even if that’s only one or two tables. I don’t know that we’ll be Netflix and have everybody have access to unlimited servers and every data set, but I do at least want to reach that level of democracy. Having clean, curated, documented, nice data relevant to your role at your fingertips, right away.”
Having supported more than one million construction professionals since its founding in 2006, Buildertrend offers market-leading construction management technology, providing project and materials management, financial tools, and sales and service support for more than two million construction projects across the globe. “We’re here to help construction businesses do their jobs more effectively,” shared Preston Badeer, Director of Data Engineering.
For five years, Preston’s role at Buildertrend has been that of a “jack of all trades”, initially joining as a Product Strategist, working closely with a two-person data science team to ensure strategy decisions were data-driven. Moving into a blended role of Data and Product Strategy, Preston then worked to commercialize new data products for Buildertrend, before joining a burgeoning data team as a Data Architect, then Director of Data Engineering.
“I like to attach myself to the biggest problem I can find and that I feel like I can have an impact on,” Preston shared. “And as I moved into the data team, it became clear that the biggest thing I could have an impact on was enabling our data scientists to do more, faster, with better data engineering. We didn’t have any tools, and didn’t have any sort of documentation. It was just, kind of, the wild west.”
Starting with just two Data Engineers under the data science team umbrella, Preston was tasked with building a team to support all 20+ data scientists and 10+ customer researchers, and help Buildertrend live up to the high expectations they had for their enterprise data.
And with an initiative underway to ensure every team at Buildertrend’s work was customer-centric and data-driven, continuing to rely on their data science team to support not only their own work, but everything from data engineering to responding to requests for data, was untenable.
“The goal for the team that I’m on is to democratize our data. We’ve gotten to a point where the data science team can’t keep up, nor can they scale fast enough to serve the data needs of everyone in the company. We’re trying to split the load, and make what we do with data more scalable. But we really want to get more data into the hands of the business. If they want an answer to a question, they won’t have to submit a ticket and wait. They can find answers really quickly on their own, and then use Data Science for what they’re great at, which is more complex analysis and modeling.
A Quickly-evolving Data Stack
Buildertrend’s data technology has grown by leaps and bounds. Mere years ago, their data scientists would create notebooks on their local machines, writing basic Python scripts, or queries in SQL Server. To better support their analysis, the team adopted Tableau, but were still writing queries against a replica of their production databases, then publishing reports.
“The first major change we did in tooling was an enterprise data science environment. We ended up buying Dataiku, and that made a huge difference. We stopped throwing spreadsheets around and were storing tables for intermediate transformations,” Preston shared.
The adoption of cloud-based, collaborative tooling meant that Buildertrend’s data team were now utilizing shared resources, could back up their work, and could share their analysis collaboratively. But their next leap forward would take the form of a data engineering function and technology stack.
“Our philosophy is to avoid tribal knowledge and specialization as much as possible,” Preston explained. “Everyone on the team should be able to pick up any project that anyone has worked on without any kind of ‘Joe knows about that thing and he’s on vacation,’ or ‘I know you’re on vacation, but only you know this so I’m going to bug you,’ anymore.”
With a consistent work environment and toolset, Buildertrend’s data engineers can simply pick up a ticket, are well-versed in team best practices and coding frameworks, are provisioned access to IDE plugins and standards, and can simply complete the task at hand. Supporting this new approach is a growing workbench of modern, flexible data technology.
“The sort of new stack we’re implementing is dbt for basically everything. Our database engine is in BigQuery, so we’ve used that as our warehouse because it’s easy, requires no management, and is scalable. Then we run Python scripts and dbt jobs in GitHub Actions, which we migrated to in days and was more than 12 times cheaper for us to run. Then lastly, we chose Fivetran and have been super happy with it, as it’s the best tool for us because of a lot of the dbt-specific things they do.”
Rounding out Buildertrend’s modern data stack is Hightouch. While the majority of the data engineering team’s work is SQL, there was a significant amount of non-SQL custom code dedicated to Reverse ETL. The adoption of Hightouch ensured they would remain focused on enabling their colleagues, rather than writing and maintaining bespoke code.
“The short story of all of this is that we’re trying to keep our team small and efficient. I prefer to throw tools at problems before people,” Preston shared.
Searching for a Data Catalog
With a growing team, a significant increase in requests for data, growing confusion about the nature of their data, and an array of market-leading data technology, Preston and his team began to search for a single place to ensure the data they provided was trusted and understood.
“Something that was always a high priority for me was how we identify a source of truth. How do we say that a data set is trustworthy or not, and where does that live?,” Preston explained.
Prior to COVID lockdowns and remote work, resolving questions about data rested on in-person interactions with or within Buildertrend’s data science team. While this collaborative way of working had some positive effects, a combination of remote work and a tripling in team size meant that a question-and-answer approach to data was unsustainable.
“We needed to scale data at Buildertrend, period. So, we started our search by looking at all the products we already had that offered data catalogs,” Preston shared. “Unsurprisingly, most of them have no way of ingesting metadata from anywhere else, which was ridiculous to me. I can’t give people 16 catalogs with different navigation systems.”
Buildertrend’s search for a modern data catalog continued with a thorough evaluation of the market, with Preston learning that many of the available solutions were mature, but did not meet their high user experience standards, or were too immature to support their complex use cases. But in Atlan, Preston and his team found a platform that met their high standards for both user experience and product maturity, and the right purchasing and evaluation process.
“Atlan immediately stuck out. As a product guy, I’m a big hands-on person, and I don’t want to sit through a demo. I want a trial,” Preston explained. “Having somewhat of an interactive tour was powerful for me because I learned more from that tour than I did about some other products during their demos.”
Preston and his team quickly worked to create a weighted matrix of requirements, placing particular emphasis on search experience, product experience, API maturity, and pace of product development.
“Atlan became the bar that I was feature comparing everybody else with,” Preston shared. “One of my test criteria was what happens when somebody enters something other than a table or column name in a search box, and every other product I looked at returned zero results. If I’m a data scientist looking up a specific table, that’s great, but that’s not search, that’s auto complete. The product experience also really set it apart, and an example of that was the API having good coverage and public documentation, which is a real sign of maturity for me.”
Implementing Atlan with Meticulous Documentation
Preston’s team began their Atlan implementation by setting up a connection with BigQuery, their main warehouse that houses the bulk of their metadata. Then, using automated lineage, the team prioritized their next integrations by identifying where the most important data flowed from.
With their most critical technology and data assets effectively crawled, Preston and his team began writing standards and documentation for how they would structure their glossary and enrich their data assets, and the personas and user groups they would onboard and enable.
“We developed documentation for subject matter experts. We have a process for approving our terms in the glossary, reviewing and verifying them. Now, these people know what a ‘term’ is, where they find it in Atlan, and what Data Engineering expects them to do. We also created a doc for the data engineering team to say, ‘Here’s the level of documentation you are expected to produce. Here’s where to put it in Atlan and how to set it up.’ We already had documentation at our warehouse level, but we had to tie that documentation to classifications and certifications and define what ‘verified’ means for Buildertrend.”
With onboarding complete, and documentation standards meticulously recorded, Preston began the rollout process by recording a walkthrough of Atlan with a member of Buildertrend’s engineering leadership team, and engaging with both data practitioners and consumers about how they would interact with Data Engineering on items like issue resolution, new data requests, and permissions for access. While still early in their implementation, the thorough nature of their planning and rollout means that Buildertrend is well-positioned to quickly iterate and improve the Atlan experience.
“We’re still very much in the thick of it. We’re still building stuff out, and are now at the phase where I’ve onboarded a number of folks who aren’t data people, and I’m using them as my trusted testers,” Preston explained.
Going forward, the Buildertrend data engineering team’s rollout strategy is to be data-driven and iterate. “What’s next for us is really the focus on onboarding, getting feedback, and getting into the rhythm of talking to people,” Preston shared. “We’ll ask if people found what they were looking for, and how easy it was. Or how we could have made things easier.”
With Atlan’s reporting functionality, Preston and his team are acutely aware of whether or not new functionality is consumed, guiding whether to double down on what’s working, or fix what’s not. Ultimately, by having the right technology in place, and maintaining a sharp focus on whether they’re providing value to every corner of Buildertrend’s business, Preston and his team are paving the way for data democratization.
Advice for Fellow Data Leaders
Recalling what led him and his team to choose the right data catalog for their needs, Preston’s advice to his fellow leaders on a similar search boils down to a simple concept: Defined requirements.
“My advice to people evaluating any product always starts with figuring out your requirements first. It takes way more time to figure out what your requirements are than it takes to find a product. Sit down and work through your requirements. If you’re not the subject matter expert, find out who knows what those are, and get those people in the room.”
“The second piece of advice I would give them is to weigh those requirements. This is the critical mistake that I see most researchers make. They will create this giant spreadsheet of features, and then they will buy whoever has the most features. If you do that, you’re always going to end up with enterprise products with a terrible experience that are just built to win the checklist war. There are companies who literally build to win that argument. You don’t want that. You don’t want the most features.”“It doesn’t need to be fancy, but if you gather requirements and work with subject matter experts to do that, and then you weight those requirements, you will find the right product for you. Not just the sexiest product, or the product with the most features or the cheapest product. That’s what we did. Defined, weighted requirements made the process much faster and easier. I think otherwise, who knows? We would have gotten something that somebody’s best friend thought was the best data catalog, rather than the one that’s the best fit for us.”