Experience installing Apache Airflow on Windows 10

Preamble : by the will of fate from the world of academic science (medicine), I got into the world of information technology, where I have to use my knowledge about the methodology of building an experiment and strategies for analyzing experimental data, however, apply a new technology stack for me. In the process of mastering these technologies, I encounter a number of difficulties, which so far, fortunately, have been overcome. Perhaps this post will be useful to those who are also just starting to work with Apache projects.

So to the point . Inspired by an article by Yuri Emelyanov about the capabilities of Apache Airflow in the field of automation of analytical procedures, I wanted to start using the proposed set of libraries in my work. Those who are not familiar with Apache Airflow at all may be interested in a short review article on the website of the National Library named after N.E. Bauman.

Since the usual instructions for launching Airflow, apparently, do not apply in the Windows environment, and it would be redundant to use docker to solve this problem, I started looking for other solutions. Fortunately for me, I was not the first on this path, so I managed to find a wonderful video tutorial on installing Apache Airflow in Windows 10 without using a docker. But, as often happens, when performing the recommended steps, difficulties arise, and, I believe, not only for me. Therefore, I would like to talk about my experience installing Apache Airflow, maybe it will save some time for someone.

Let's go through the steps of the instructions (spoiler - the 5th step, everything went fine):

1. Installing the Windows subsystem for Linux for subsequent installation of Linux distributions

This is the lesser of the problems, as they say:

Control Panel → Programs → Programs and Components → Enabling and Disabling Windows Components → Windows Subsystem for Linux

2. Installing a Linux distribution of your choice

I used the Ubuntu application.

3. Installation and update pip

sudo apt-get install software-properties-common sudo apt-add-repository universe sudo apt-get update sudo apt-get install python-pip 

4. Install Apache Airflow

 export SLUGIFY_USES_TEXT_UNIDECODE=yes pip install apache-airflow 

5. Database initialization

And this is where my little difficulties began. The instruction instructs you to enter the airflow initdb command and go to the next step. However, I always got airflow: command not found response. It is logical to assume that there were difficulties during the installation phase of Apache Airflow and there simply are no necessary files. After making sure that everything is where it should be, I decided to try to specify the full path to the airflow file (it should look like this: ////airflow initdb ). But the miracle did not happen and the answer was the same airflow: command not found . I tried using the relative path to the file ( ./.local/bin/airflow initdb ), which led to the appearance of a new error ModuleNotFoundError: No module named json' , which can be overcome by updating the werkzeug library (in my case, to version 0.15.4) :

 pip install werkzeug==0.15.4 

Read more about werkzeug here .

After this simple manipulation, the ./.local/bin/airflow initdb command was completed successfully.

6. Starting the Airflow Server

The difficulties with accessing airflow are not over yet. Running the ./.local/bin/airflow webserver -p 8080 command resulted in a No such file or directory error. Probably, an experienced Ubuntu user would immediately try to overcome such difficulties with accessing the file by using the export PATH=$PATH:~/.local/bin/ command (that is, adding the /.local directory to the existing search path for executable files defined by the PATH variable / bin /), but this post is intended for those who primarily work with Windows and may not find this solution obvious.

After the manipulation described above, the ./.local/bin/airflow webserver -p 8080 command was successfully executed.

7. URL: localhost : 8080 /

If everything went well in the previous stages, then you are ready to conquer the analytical peaks.

I hope the experience of installing Apache Airflow on Windows 10 described above will be useful for beginners and will speed up their entry into the universe of modern analytics tools.

Next time I would like to continue the topic and talk about the experience of using Apache Airflow in the field of analyzing the behavior of users of mobile applications.

Source: https://habr.com/ru/post/462161/

All Articles