We use GitHub issues to keep track of all issues. Please do not report bugs or issues in this blog’s comments. Instead, post them on GitHub as an issue. Before submitting a comment with an issue, please use GitHub search to look for existing issues (both open and closed) that may be similar. The PDF (Portable Document Format) was born out of The…
In our last post on Apache Airflow, we mentioned how it has taken the data engineering ecosystem by storm. We also talked about how we’ve been using it to move data across our internal systems and explained the steps we took to create an internal workflow. The ETL workflow (e)xtracted PDFs from a website, (t)ransformed them into CSVs and (l)oaded…
What is the first thing that comes to your mind upon hearing the word ‘Airflow’? Data engineering, right? For good reason, I suppose. You are likely to find Airflow mentioned in every other blog post that talks about data engineering. Apache Airflow is a workflow management platform. To oversimplify, you can think of it as cron, but on steroids! It…