Posts

Showing posts from October, 2022

Sending Emails using Apache Airflow Email Operator

Image
  Automation reduces manual work and plays a key role in improving productivity in various industries. It is one of the fastest and less time-consuming techniques that most businesses practice to gain higher production rates and improve work efficiency. But, unsure how to perform, some of them fail to automate tasks and end up performing functions manually. Each IT expert has a different job or workflow to perform, right from collecting data from other sources to processing it, uploading, and creating reports. There are many tasks that experts need to perform manually on a daily basis. Thus, to trigger automatic workflow and reduce the time and effort of experts, we recommend using Apache Airflow. Apache Airflow is an open-source tool that assists in managing complex workflows. The powerful workflow management platform helps resolve issues and aids in programmatically authoring, scheduling, and monitoring daily tasks. Data Scientists or Data Engineers often find it helpful for thei...

Apache Spark on Windows: A Docker approach

Image
How to set-up a Apache Spark development environment with minimum effort with Docker for Windows Recently I was allocated to a project where the entire customer database is in Apache Spark / Hadoop . As a standard in all my projects, I first went to prepare the development environment on the corporate laptop, which comes with Windows as standard OS. As many already know, preparing a development environment on a Windows laptop can sometimes be painful and if the laptop is a corporate one it can be even more painful (due to restrictions imposed by the system administrator, corporate VPN, etc...). Creating a development environment for Apache Spark / Hadoop is no different. Installing Spark on Windows is extremely complicated. Several dependencies need to be installed (Java SDK, Python, Winutils, Log4j), services need to be configured and environment variables need to be properly set. Given that, I decided to use Docker as the first option for all my development environments. Why Docker? ...

Create a Simple Crypto Dashboard Using Python

Image
A tutorial on web scraping, time series analysis, and web app deployment using Python and Streamlit. Crypto Dashboard Dashboard is an information management tool (or a Data Scientist's play tool) where you can keep track of the important metrics, performance indicators, etc.. relating to a business or a project. Here, our aim is to build a tool that visualizes the time series price fluctuations of various cryptocurrencies over a span of a few years to daily price fluctuations with all the essential information. A dashboard can be created in a number of different ways depending upon the use case. Since I want this to be a simple dashboard, I included the following aspects:      1. A subset of cryptocurrencies - as there are thousands of them listed on the trading sites. I picked the top 25 cryptos.      2. Financial charts like Candlestick charts - to display price features like open, close, high and low.      3. Line chart to show intraday pr...