A curated list of the year’s best articles from the data world
Just like that, we’re at the end of 2022! And what a rollercoaster ride it has been with major changes and uncertainty across every industry. (Especially for the bird app users…)
A lot happened in the world of the modern data stack this year. We talked about job titles, thought about saying goodbye to data science, debated centralized vs. embedded data teams and bundling vs. unbundling, kickstarted important discussions like the technical pay gap, and so much more.
Whether you’re deep in this community or just started with data, it can be hard to keep up with everything. So, continuing our tradition from last year, we’re sharing the top blogs from 2022 along with some follow-up reading to keep you thinking. Happy reading!
P.S. Special shoutout to everyone who shared their data experiences, learnings, views, and observations this year! Now’s the time to have more open conversations about what we want for the future of data, and we’re so thankful for all the data practitioners who give their time to share insights, spark debate, and keep our industry moving forward.
On data as a product
Data product in changing environments: rethinking and updating investments by Eric Weber
The last few years have been full of ‘here’s what we need to do next’ or ‘once we have this team, we can do this’. We plan how we’d support more personas and areas of the business with more investment, but we don’t think about what we’d do if we had to cut support. I get it. That doesn’t feel very comfortable. But just like succession planning for people, we need to have a plan for what we’d do in hard situations. In some cases, you might drop support for particular personas on a product. In others, you might drop support for a product altogether. It isn’t easy to say what the ‘right answer’ is. But spending time thinking about your answer is important.
More follow-up reading:
- Making data actionable: the immense challenge of good data products by Eric Weber
- What’s the big deal about data products? by Willem Koenders
- Building more effective data teams using the JTBD framework by Emilie Schario
- Types of data products by Luke Lin
On working with data
Should we be grateful for the modern data stack? by Benn Stancil
That’s the paradox we need to solve. Why has data technology advanced so much further than value a data team provides? Does all of this new tooling actually hurt, by causing us to lose focus on the most important problems (e.g., the data in Salesforce) in favor of the shiny new things that don’t actually matter (e.g., the data in our twenty-fifth SaaS app)? Has the industry’s talent not caught up with the capacity of its tools, and we just need to be patient? Is the problem more fundamental? I’m not sure. But if our 2032 selves want to be as grateful for 2020s as we should be for the 2010s, those are the next questions we need to answer.
More follow-up reading:
- How to design your data stack for curiosity by Amit Prakash
- Data management is context management by Randy Au
- Build or buy: how we developed a platform for A/B tests by Olga Berezovsky
- Data systems tend towards production by Ian Macomber
- Not all data requests are urgent, so start by asking these 5 questions by Marie Lefevre
On data contracts
The rise of data contracts by Chad Sanderson
Data Contracts are API-like agreements between Software Engineers who own services and Data Consumers that understand how the business works in order to generate well-modeled, high-quality, trusted, real-time data.
Instead of data teams passively accepting dumps of data from production systems that were never designed for the purpose of analytics or Machine Learning, Data Consumers can design contracts that reflect the semantic nature of the world composed of Entities, events, attributes, and the relationships between each object.
This abstraction allows Software Engineers to decouple their databases/services from analytical and ML-based requirements. Engineers no longer have to worry about causing production-breaking incidents when modifying their databases, and data teams can focus on describing the data they need instead of attempting to stitch the world together retroactively through SQL.
More follow-up reading:
- An engineer’s guide to data contracts – pt. 1 by Chad Sanderson and Adrian Kreuziger
- An engineer’s guide to data contracts – pt. 2 by Chad Sanderson and Adrian Kreuziger
- Why data contracts are obviously a good idea by Yali Sassoon
On building and leading a data team
Growing data teams from reactive to influential by Emily Thompson
Data teams tend to be a fairly scrappy bunch, and often default to rolling up their sleeves and building what they need in order to get unblocked. But there is an opportunity here to start influencing roadmaps on other teams. Rather than filling in the technology gaps themselves with messy workarounds, my team’s charter also prescribed that they make technical recommendations to the teams we depended on.
Because the data team was now required to proactively drive the conversation, they made the time to work with partners and propose cross-functional solutions. Foundational work was considered part of the backlog of ‘impact-driving’ work, which led to specific quarterly goals, and progress was tracked just as every other initiative owned by the data team.
More follow-up reading:
- Good data citizenship doesn’t work by Benn Stancil
- Managing the first year by Alex K Gold
- How I learned to stop worrying and love being a manager by Brittany Bennett
- Executing a data strategy with OKRs by Chris Brown
- Dealing with difficult stakeholders by Oscar Baruffa
- Leaders show their work by Ben Balter
BONUS: We talked with four amazing data leaders — Stephen Bailey (Data Engineer at Whatnot), Erica Louie (Head of Data at dbt labs) and Taylor Murphy (Head of Data at Meltano), and Gordon Wong (Founder of Wong Decision Intelligence; formerly Senior Leader of Business Intelligence at Hubspot) — about what it takes to succeed in your first 365 days as a data leader. Download the Secrets of a Modern Data Leader ebook here.
On metrics, data catalogs, active metadata, and more
People-first data stacks by Ilan Man
The problem is your stakeholders, while giving you the thumbs up the whole time and claiming they’d love an easier way to discover data, are no longer using the tools you’ve painstakingly researched and implemented. They fall into their old habits and inevitably you see an incorrectly defined metric on a Powerpoint slide somewhere.
We need to ensure stakeholders adopt data tools in the ways they should. Reading documentation and taking a training is not enough. We need to reinforce good data-tooling hygiene. I’ve seen many instances of folks starting out in a BI tool, and a few months later they’re back in Excel, pivoting a CSV and pasting it into a presentation. There should always be room for creative solutions and serendipity, but the Data team needs to keep an eye on how stakeholders use the tools they implement. Data models and BI tools need to adapt to business changes.
More follow-up reading:
- Data’s trillion dollar question mark by Benn Stancil
- How to measure data quality by Mikkel Dengsøe
- The many layers of data lineage by Borja Vazquez
- The future of data catalogs by Prukalpa Sankar (aka me!)
Bonus picks
Still want more? Here are a few more articles to keep you reading and thinking through the new year:
- The important purple people outside the data team by Mikkel Dengsøe
- A framework for embedding decision intelligence into your organization by Erik Balodis
- AI is not coming for analyst jobs anytime soon by Amit Prakash
- Manifesto for the data-informed by Julie Zhuo
- Why are we still struggling to answer how many active customers we have? by Seattle Data Guy
- Data teams: break out of your bubble by Mary MacCarthy
- The future history of data engineering by Matt Arderne
- Why it matters where you randomize users in A/B experiments by Adam Stone
This article was also published on Medium.
Header image: Aaron Burden on Unsplash