First, if you’re a human of data – aka someone who works with data for a living – here’s your virtual medal. 🏅🏅🏅 Why? Because we know how frustrating it can be to manage data and all its appendages, from metadata to data lineage and beyond! Now we know we can’t wave a magic wand and wish it all away…
Here’s why data catalogs could be just the thing you need to meet the challenges of data and metadata management and collaboration Two data scientists walk into a library at the end of a long day…. Data scientist #1 to the librarian: “Can I get a copy of this book on statistical methods?” Goes on to share the name of…
DataOps can help you bring together your data, team, tools and processes to become a truly data-driven organization Did you know that DataOps is one of the three innovation triggers listed in data management by Gartner in their 2018 innovation insight report? That means there’s a lot of interest around the concept and several people are actively taking part in…
If you’re a CDO (Chief Data Officer), you already know this—it’s an exciting time to be a CDO! The role is soaring in popularity and importance—the number of CDOs in Fortune 1000 companies shot up from 12% in 2012 to 67.9% in 2018. And companies are all too aware of how CDOs are core to their growth and transformation. Today’s…
Data is most valuable when you have something to compare it to, but these comparisons aren’t helpful if the data is bad or irrelevant. Data is most valuable when you have something to compare it to. For example, it’s nice to know that your program helped 150 people this year, but that doesn’t tell you what you should do next…
In our last post on Apache Airflow, we mentioned how it has taken the data engineering ecosystem by storm. We also talked about how we’ve been using it to move data across our internal systems and explained the steps we took to create an internal workflow. The ETL workflow (e)xtracted PDFs from a website, (t)ransformed them into CSVs and (l)oaded…
Even if you haven’t worked on Kubernetes, chances are you’ve at least heard or read about it. It is already one of the most popular open source projects ever, and it’s still being developed. We need to understand where we’ve come from to appreciate where we’re today. The same rings true for Kubernetes. I was recently told, ‘Boring infrastructure is…
What is the first thing that comes to your mind upon hearing the word ‘Airflow’? Data engineering, right? For good reason, I suppose. You are likely to find Airflow mentioned in every other blog post that talks about data engineering. Apache Airflow is a workflow management platform. To oversimplify, you can think of it as cron, but on steroids! It…
When discussing data collection, outliers inevitably come up. What is an outlier exactly? It’s a data point that is significantly different from other data points in a data set. While this definition might seem straightforward, determining what is or isn’t an outlier is actually pretty subjective, depending on the study and the breadth of information being collected. So what’s the…