ETL Pipelines With Airflow
Introduction In this blog post, I want to go over the operations of data engineering called Extract, Transform, Load (ETL) and show how they can be automated and scheduled using Apache Airflow. Extracting data can be done in a multitude of ways, but one of the most common ways is to query a WEB API. If the query is successful, then we will receive data back from the API's server. Often times the data we get back is in the form of JSON. JSON can pretty much be thought of as semi-structured data or as a dictionary where the dictionary keys and values are strings. Since the data is a dictionary of strings this means we must transform it before storing or loading it into a database. Airflow is a platform to schedule and monitor workflows and in this post, I will show you how to use it to extract the daily weather in Ha Noi from the OpenWeatherMap API, convert the temperature to Celsius, and load the data in a simple PostgreSQL database. Let's first get started with how to query an ...