Posts

Showing posts from September, 2022

ETL Process Using Airflow and Docker

Image
  Introduction In this article, I will create an ETL process for extracting Forex data and wrap the whole thing as a data pipeline using Airflow and Docker. The ETL process will extract data from  fixer.io  API, transform it, and load it to a PostgreSQL database. This project aims to have an automated process that constantly feeds the PostgreSQL database with data. Every 2 minutes, the ETL process will load an updated batch of Forex data. Note that this article assumes some knowledge of Airflow, Docker, Python, and SQL. I won't go into too many details to keep this article short. Project Steps     1. Setting up Airflow Architecture     2. 1st Dag - Check if API is available     3. 2nd Dag - Create a table     4. 3rd Dag - Extract     5. 4th Dag - Transform     6. 5th Dag - Load     7. Query data in the pgAdmin UI Step #1 - Setting up Airflow Architecture The first thing we should do is set up the basic ...

How to start with Apache Airflow in Docker (Windows)

Image
  In general terms, Apache Airflow is an open-source tool that allow us to manage, monitor, plan and schedule workflows that is normally used as a workflow (services) orchestrator. Prerequisites:     1. Install Docker Desktop in your computer. Get Started Follow this steps in order to start with your testing Airflow environment in docker:      1. The first step is to download Docker Desktop from the  official website . For this article, I installed 4.4.4 version.     2. After installing Docker Desktop, we need to download a docker-compose.yaml file that you can also find  here .     3. Now that we have both files, we need to create our airflow directory. Go to the following path: C:/Users/<your_user>/. Inside of that directory, create a folder called docker and inside of docker create another folder called airflow.     4. Now that we have our airflow folder, we must do the following: a) Create three folders call...