Unlocking Fast, Confident, Data-driven Decisions with Atlan
The Active Metadata Pioneers series features Atlan customers who have completed a thorough evaluation of the Active Metadata Management market. Paying forward what you’ve learned to the next data leader is the true spirit of the Atlan community! So they’re here to share their hard-earned perspective on an evolving market, what makes up their modern data stack, innovative use cases for metadata, and more.
In this installment of the series, we meet Prudhvi Vasa, Analytics Leader at Postman, who shares the history of Data & Analytics at Postman, how Atlan demystifies their modern data stack, and best practices for measuring and communicating the impact of data teams.
This interview has been edited for brevity and clarity.
Would you mind introducing yourself, and telling us how you came to work in Data & Analytics?
My analytics journey started right out of college. My first job was at Mu Sigma. At the time, it was the world’s largest pure-play Business Analytics Services company. I worked there for two years supporting a leading US retailer where projects varied from general reporting to prediction models. Then, I went for my higher studies here in India, graduated from IIM Calcutta with my MBA, then worked for a year with one of the largest companies in India.
As soon as I finished one year, I got an opportunity with an e-commerce company. I was interviewing for a product role with them and they said, “Hey, I think you have a data background. Why don’t you come and lead Analytics?” My heart was always in data, so for the next five years I was handling Data & Analytics for a company called MySmartPrice, a price comparison website.
Five years is a long time, and that’s when my time with Postman began. I knew the founder from college and he reached out to say, “We’re growing, and we want to build our data team.” It sounded like a very exciting opportunity, as I had never worked in a core technology company until then. I thought this would be a great challenge, and that’s how I joined Postman.
COVID hit before I joined, and we were all discovering remote work and how to adjust to the new normal, but it worked out well in the end. It’s been three and a half years now, and we grew the team from a team of four or five to almost a 25-member team since.
Back in the beginning, we were running somewhat of a service model. Now we are properly embedded across the organization and we have a very good data engineering team that owns the end-to-end movement of data from ingestion, transformations, to reverse ETL. Most of it is done in-house. We don’t rely on a lot of tooling for the sake of it. Then once the engineers provide the data support and the tooling, the analysts take over.
The mission for our team is to enable every function with the power of data and insights, quickly and with confidence. Wherever somebody needs data, we are there and whatever we build, we try to make it last forever. We don’t want to run the same query again. We don’t want to answer the same question again. That’s our biggest motto, and that’s why even though the company scales much more than our team, we are able to support the company without scaling linearly along with it.
It’s been almost 12 years for me in this industry, and I’m still excited to make things better every day.
Could you describe Postman, and how your team supports the organization and mission?
Postman is a B2B SaaS company. We are the complete API Development Platform. Software Developers and their teams use us to build their APIs, collaborate on building their APIs, test their APIs, and mock their APIs. People can discover APIs and share APIs. With anything related to APIs, we want people to come to Postman. We’ve been around since 2012, starting as a side project, and there was no looking back after that.
As for the data team, from the start, our founders had a neat idea of how they wanted to use data. At every point in the company’s journey, I’m proud to say data played a very pivotal role, answering crucial questions about our target market, the size of our target market, and how many people we could reach. Data helped us value the company, and when we launched new products, we used data to understand the right usage limits for each of the products. There isn’t a single place I could think of where data hasn’t made an impact.
As an example, we used to have paid plans in the event that someone didn’t pay, we would wait for 365 days before we wrote it off. But when we looked at the data, we learned that after six months, nobody returned to the product. So we were waiting for six more months before writing them off, and we decided to set it to six months.
Or, let’s say we have a pricing update. We use data to answer questions about how many people will be happy or unhappy about it, and what the total impact might be.
The most impactful thing for our product is that we have analytics built around GitHub, and can understand what people are asking us to build and where people are facing problems. Every day, Product Managers get a report that tells them where people are facing problems, which tells them what to build, what to solve, and what to respond to.
When it comes to how data has been used in Postman, I would say that if you can think about a way to use it, we’ve implemented it.
The important thing behind all this is we always ask about the purpose of a request. If you come to us and say “Hey, can I get this data?” then nobody is going to respond to you. We first need to understand the analysis impact of a request, and what people are going to do with the data once we’ve given it to them. That helps us actually answer the question, and helps them answer it better, too. They might even realize they’re not asking the right question.
So, we want people to think before they come to us, and we encourage that a lot. If we just build a model and give it to someone, without knowing what’s going to happen with it, a lot of analysts will be disheartened to see their work go nowhere. Impact-driven Analytics is at the heart of everything we do.
What does your stack look like?
Our data stack starts with ingestion, where we have an in-house tool called Fulcrum built on top of AWS. We also have a tool called Hevo for third-party data. If we want data from Linkedin, Twitter, or Facebook, or from Salesforce or Google, we use Hevo because we can’t keep up with updating our APIs to read from 50 separate tools.
We follow ELT, so we ingest all raw data into Redshift, which is our data warehouse, and once data is there, we use dbt as a transformation layer. So analysts come and write their transformation logic inside dbt.
After transformations, we have Looker, which is our BI tool where people can build dashboards and query. In parallel to Looker, we also have Redash as another querying tool, so if engineers or people outside of the team want to do some ad-hoc analysis, we support that, too.
We also have Reverse ETL, which is again home-grown on top of Fulcrum. We send data back into places like Salesforce or email marketing campaign tools. We also send a lot of data back to the product, cover a lot of recommendation engines, and the search engine within the product.
On top of all that, we have Atlan for data cataloging and data lineage.
Could you describe Postman’s journey with Atlan, and who’s getting value from using it?
As Postman was growing, the most frequent questions we received were “Where is this data?” or “What does this data mean?” and it was taking a lot of our analysts’ time to answer them. This is the reason Atlan exists. Starting with onboarding, we began by putting all of our definitions in Atlan. It was a one-stop solution where we could go to understand what our data means.
Later on, we started using data lineage, so if we realized something was broken in our ingestion or transformation pipelines, we could use Atlan to figure out what assets were impacted. We’re also using lineage to locate all the personally identifiable information in our warehouse and determine whether we’re masking it correctly or not.
As far as personas, there are two that use Atlan heavily, Data Analysts, who use it to discover assets and keep definitions up-to-date, and Data Engineers, who use it for lineage and taking care of PII. The third persona that we could see benefitting are all the Software Engineers who query with Redash, and we’re working on moving people from Redash over to Atlan for that.
What’s next for you and the team? Anything you’re excited about building in the coming year?
I was at dbt Coalesce a couple of months back and I was thinking about this. We have an important pillar of our team called DataOps, and we get daily reports on how our ingestions are going.
We can understand if there are anomalies like our volume of data increasing, the time to ingest data, and if our transformation models are taking longer than expected. We can also understand if we have any broken content in our dashboards. All of this is built in-house, and I saw a lot of new tools coming up to address it. So on one hand, I was proud we did that, and on the other, I was excited to try some new tools.
We’ve also introduced a caching layer because we were finding Looker’s UI to be a little non-performant and we wanted to improve dashboard loading times. This caching layer pre-loads a lot of dashboards, so whenever a consumer opens it, it’s just available to them. I’m really excited to keep bringing down dashboard load times every week, every month.
There’s also a lot of LLMs that have arrived. To me, the biggest problem in data is still discovery. A lot of us are trying to solve it, not just on an asset level, but on an answer or insight level. In the future, what I hope for is a bot that can answer questions across the organization, like “Why is my number going down?”. We’re trying out two new tools for this, but we’re also building something internally.
It’s still very nascent, we don’t know whether it will be successful or not, but we want to improve consumers’ experience with the data team by introducing something automated. A human may not be able to answer, but if I can train somebody to answer when I’m not there, that would be great.
Your team seems to understand their impact very well. What advice would you give your peer teams to do the same?
That’s a very tough question. I’ll divide this into two pieces, Data Engineering and Analytics.
The success of Data Engineering is more easily measurable. I have quality, availability, process performance, and performance metrics.
Quality metrics measure the “correctness” of your data, and how you measure it depends on if you follow processes. If you have Jira, you have bugs and incidents, and you track how fast you’re closing bugs or solving incidents. Over time, it’s important to define a quality metric and see if your score improves or not.
Availability is similar. Whenever people are asking for a dashboard or for a query, are your resources available to them? If they’re not, then measure and track this, seeing if you’re improving over time.
Process Performance addresses the time to resolution when somebody asks you a question. That’s the most important one, because it’s direct feedback. If you’re late, people will say the data team isn’t doing a good job, and this is always fresh in their minds if you’re not answering.
Last is Performance. Your dashboard could be amazing, but it doesn’t matter if it can’t help someone when they need it. If someone opens a dashboard and it doesn’t load, they walk away and it doesn’t matter how good your work was. So for me, performance means how quickly a dashboard loads. I would measure the time a dashboard takes to load, and let’s say I have a target of 10 seconds. I’ll see if everything loads in that time, and what parts of it are loading.
On the Analytics side, an easy way to measure is to send out an NPS form and see if people are happy with your work or not. But the other way requires you to be very process-oriented to measure it, and to use tickets.
Once every quarter, we go back to all the analytics tickets we’ve solved, and determine the impact they’ve created. I like to see how many product changes happened because of our analysis, and how many business decisions were made based on our data.
For insight generation, we could then say we were part of the decision-making process for two sales decisions, two business operations decisions, and three product decisions. How you’ll measure this is up to you, but it’s important that you measure it.
If you’re working in an organization that’s new, or hasn’t had data teams in a long time, what happens is that more often than not, you do 10 analyses, but only one of them is going to impact the business. Most of your hypotheses will be proven wrong more often than they’re right. You can’t just say “I did this one thing last quarter,” so documenting and having a process helps. You need to be able to say “I tried 10 hypotheses, and one worked,” versus saying “I think we just had one hypothesis that worked.”
Try to measure your work, and document it well. You and your team can be satisfied with yourselves, at least, but you can also communicate everything you tried and contributed to.
Photo by Caspar Camille Rubin on Unsplash