【原創】運維基礎之Docker(5)docker部署airflow


部署方式:docker+airflow+mysql+LocalExecutor

使用airflow的docker鏡像

https://hub.docker.com/r/puckel/docker-airflow

使用默認的sqlite+SequentialExecutor啟動:

$ docker run -d -p 8080:8080 puckel/docker-airflow webserver

將容器中的airflow.cfg拷貝出來修改

$ docker cp $container_id:/usr/local/airflow/airflow.cfg .

嘗試使用自定義airflow.cfg

-v /usr/local/airflow/airflow.cfg:/usr/local/airflow/airflow.cfg

其中修改sql_alchemy_conn為mysql,修改executor = LocalExecutor

發現使用的還是SequentialExecutor

[2019-02-28 19:37:16,170] {{__init__.py:51}} INFO - Using executor SequentialExecutor

查看Dockerfile:docker-airflow/Dockerfile

ENTRYPOINT ["/entrypoint.sh"]
CMD ["webserver"] # set default arg for entrypoint

發現最后啟動的腳本是entrypoint.sh

查看entrypoint.sh:docker-airflow/script/entrypoint.sh

: "${AIRFLOW__CORE__EXECUTOR:=${EXECUTOR:-Sequential}Executor}"

...

if [ "$AIRFLOW__CORE__EXECUTOR" != "SequentialExecutor" ]; then

  AIRFLOW__CORE__SQL_ALCHEMY_CONN="postgresql+psycopg2://$POSTGRES_USER:$POSTGRES_PASSWORD@$POSTGRES_HOST:$POSTGRES_PORT/$POSTGRES_DB"

  AIRFLOW__CELERY__RESULT_BACKEND="db+postgresql://$POSTGRES_USER:$POSTGRES_PASSWORD@$POSTGRES_HOST:$POSTGRES_PORT/$POSTGRES_DB"

  wait_for_port "Postgres" "$POSTGRES_HOST" "$POSTGRES_PORT"

fi

...

case "$1" in

  webserver)

    airflow initdb

    if [ "$AIRFLOW__CORE__EXECUTOR" = "LocalExecutor" ]; then

      # With the "Local" executor it should all run in one container.

      airflow scheduler &

    fi

    exec airflow webserver

    ;;

1)取環境變量EXECUTOR(取值為Sequential、Local等)來構造環境變量AIRFLOW__CORE__EXECUTOR;
2)如果AIRFLOW__CORE__EXECUTOR不是SequentialExecutor,就等待postgres(這里強制依賴postgres);
3)如果啟動參數為webserver,同時AIRFLOW__CORE__EXECUTOR=LocalExecutor,自動啟動scheduler;

Due to Airflow’s automatic environment variable expansion, you can also set the env var AIRFLOW__CORE__* to temporarily overwrite airflow.cfg.

由於環境變量優先級高於airflow.cfg,所以即使修改了airflow.cfg中executor=LocalExecutor,實際使用的還是SequentialExecutor;將容器中的entrypoint.sh拷貝出來修改

$ docker cp $container_id:/entrypoint.sh .

注釋掉以下行

#if [ "$AIRFLOW__CORE__EXECUTOR" != "SequentialExecutor" ]; then

#  AIRFLOW__CORE__SQL_ALCHEMY_CONN="postgresql+psycopg2://$POSTGRES_USER:$POSTGRES_PASSWORD@$POSTGRES_HOST:$POSTGRES_PORT/$POSTGRES_DB"

#  AIRFLOW__CELERY__RESULT_BACKEND="db+postgresql://$POSTGRES_USER:$POSTGRES_PASSWORD@$POSTGRES_HOST:$POSTGRES_PORT/$POSTGRES_DB"

#  wait_for_port "Postgres" "$POSTGRES_HOST" "$POSTGRES_PORT"

#fi

啟動命令

$ docker run -d -p 8080:8080 -e EXECUTOR=Local -v /usr/local/airflow/airflow.cfg:/usr/local/airflow/airflow.cfg -v /usr/local/airflow/entrypoint.sh:/entrypoint.sh -v /usr/local/airflow/dags:/usr/local/airflow/dags -v /usr/local/airflow/logs:/usr/local/airflow/logs puckel/docker-airflow webserver

 

雖然是單點,但是配合mesos+hdfs nfs可以做成高可用用於生產環境;


參考:
https://github.com/puckel/docker-airflow

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM