Airflow2.1.1超詳細安裝文檔


Mysql

安裝

MySQL安裝可以參考我之前寫過的博客:linux下安裝MySQL5.7及遇到的問題總結

MySQL安裝完成后,需要創建airflow數據庫,用戶,並賦予相關權限

CREATE DATABASE airflow CHARACTER SET utf8;
CREATE USER 'airflow'@'%' IDENTIFIED BY 'yourpassword';
GRANT ALL PRIVILEGES ON *.* TO 'airflow'@'%' IDENTIFIED BY 'yourpassword' WITH GRANT OPTION;
set global explicit_defaults_for_timestamp =1;
FLUSH PRIVILEGES;
安裝python3.7.5(重要)

該部分需要在所有airflow安裝節點進行操作

Airflow官方文檔中,給出的安裝方式是Python3,CentOS7機器上是默認是python2,安裝airflow過程中會出現各種各樣的問題.

安裝編譯相關工具

yum -y groupinstall "Development tools"
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
yum install libffi-devel -y

下載編譯Python3.7

wget https://www.python.org/ftp/python/3.7.5/Python-3.7.5.tar.xz
tar -xvJf Python-3.7.5.tar.xz
mkdir /usr/python3.7
cd Python-3.7.5
./configure --prefix=/usr/python3.7
make && make install

創建軟鏈接

ln -s /usr/python3.7/bin/python3 /usr/bin/python3.7
ln -s /usr/python3.7/bin/pip3 /usr/bin/pip3.7

驗證是否安裝成功

python3.7 -V
pip3.7 -V

如下所示,證明配置成功:

  

 因為執行yum需要python2版本,所以我們還要修改yum的配置

vim /usr/bin/yum

#! /usr/bin/python修改為#! /usr/bin/python2
vim /usr/libexec/urlgrabber-ext-down
#! /usr/bin/python 也要修改為#! /usr/bin/python2

確保安裝必要軟件(重要)

# 安裝airflow pip版本過低會導致安裝失敗
pip3.7 install --upgrade pip
sudo pip3.7 install pymysql
sudo pip3.7 install celery
sudo pip3.7 install flower
sudo pip3.7 install psycopg2-binary

二、安裝Airflow(重要)

注意: 2.1,2.2,2.3部分需要在所有安裝節點進行操作

2.1 配置 airflow sudo權限

這里使用airflow用戶進行

配置airflow用戶 sudo權限

# 以下命令使用root用戶
useradd airflow
vi /etc/sudoers

## Allow root to run any commands anywhere 
rootALL=(ALL) ALL
airflow    ALL=(ALL)                NOPASSWD: ALL #加入這一行

2.2 設置Airflow環境變量

安裝完后airflow安裝路徑默認為: /home/airflow/.local/bin

#使用root用戶執行
vi /etc/profile
export PATH=$PATH:/usr/python3.7/bin:/home/airflow/.local/bin
source /etc/profile

此處的/home/airflow/.local/bin 為~/.local/bin
根據實際配置PATH=$PATH:~/.local/bin

#配置環境變量,使用airflow用戶執行(可選,默認為~/airflow)
export AIRFLOW_HOME=~/airflow

2.3 安裝airflow

su airflow #root用戶
# 以下命令使用airflow用戶
AIRFLOW_VERSION=2.1.1
PYTHON_VERSION="$(python3.7 --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-no-providers-${PYTHON_VERSION}.txt"
# 這里要加sudo,否則會存在部分缺失,並且沒有報錯,這里要注意添加mysql,celery,cncf.kubernetes依賴,否則后續啟動airflow時會報錯
sudo pip3.7 install "apache-airflow[mysql,celery,cncf.kubernetes]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}" -i https://pypi.rasa.com/simple --use-deprecated=legacy-resolver

如果airflow安裝正常,此時將能夠使用airflow命令,並且airflow安裝目錄下有如下文件:

airflow.cfg
webserver_config.py

2.4 配置ariflow

airflow高可用架構如下:

修改{AIRFLOW_HOME}/airflow.cfg文件

# 在{AIRFLOW_HOME}/airflow.cfg 添加或者修改如下配置

# 1. 修改Executor配置
# executor = LocalExecutor
executor = CeleryExecutor
# 2. 修改元數據庫(metestore)配置
#sql_alchemy_conn = sqlite:home/apps/airflow/airflow.db
sql_alchemy_conn = mysql+pymysql://airflow:yourpassword@hostname:3306/airflow

# 3.設置消息隊列broker,此處使用 RabbitMQ
# broker_url = redis://redis:6379/0
broker_url = amqp://admin:yourpassword@hostname:5672/
# 4.設定結果存儲后端backend
# result_backend = db+postgresql://postgres:airflow@postgres/airflow
result_backend = db+mysql://airflow:yourpassword@hostname:3306/airflow

# 5. 修改時區
# default_timezone = utc
default_timezone = Asia/Shanghai
default_ui_timezone = Asia/Shanghai

# 6. 配置web端口(默認8080,因為被ambari占用所以改為8081)
endpoint_url = http://localhost:8081
base_url = http://localhost:8081
web_server_port = 8081

修改后的{AIRFLOW_HOME}/airflow.cfg需要同步到所有安裝airflow的服務器上

同時,需要根據dags_folder,base_log_folder配置創建相關目錄,防止后面執行dag時報錯

2.5 啟動airflow集群

初始化數據庫

airflow db init

mysql中出現如下表結構證明初始化成功

 創建用戶:

airflow users create \
    --username admin \
    --firstname Lixiaolong \
    --lastname Bigdata \
    --role Admin \
    --email spiderman@superhero.org

根據控制台輸出設置Password

Password設置為:yourpassword

啟動webserver:

airflow webserver -D

啟動scheduler

nohup airflow scheduler &;

啟動worker

# 先啟動flower,在需要啟動worker服務器執行
airflow celery flower -D
# 啟動worker,在需要啟動worker服務器執行
airflow celery worker -D

2.6 登錄webui查看

webui: http://master1:8081/
賬號: admin
密碼: 2.5階段設置的密碼

界面顯示如下圖:

 worker的信息可以通過http://hostip:5555進行查看,如下圖:

2.7 使用Airflow配置作業

 Airflow默認配置了32Dag供大家食用,webui選中Dag點擊一下,即可變成Active狀態

接下來,以常用的Hive Operator舉例,如何編寫並執行自定義Dag

依賴安裝

使用Hive Operator,需要首先安裝Hive相關依賴

如果使用中遇到類似如下的問題:

ModuleNotFoundError: No module named 'airflow.providers.apache'

就需要手動安裝hive依賴,命令如下

su airflow
pip3.7 install airflow[hive]

Dag編寫

Dag目錄: 見airflow.cfg配置項dags_folder

將寫好的python文件放置該目錄下,舉例:
該示例為定時 每隔一分鍾查詢hive表中數據,Dag名稱為test_hive2

from airflow import DAG
from airflow.providers.apache.hive.operators.hive import HiveOperator
from datetime import datetime, timedelta
from airflow.models import Variable
from airflow.utils.dates import days_ago

default_args = {
    'owner': 'airflow',
    'depends_on_past': True,
    'start_date': days_ago(1),

    'retries': 10,
    'retry_delay': timedelta(seconds=5),

}

dag = DAG('test_hive2', default_args=default_args, schedule_interval='*/1 * * * *',  catchup=False)

t1 = HiveOperator(
    task_id='hive_task',
    hql='select * from test.data_demo',
    dag=dag)
t1

如果Dag格式正確,將會在webui上刷新出新添加的dag信息

 配置Connection

如下圖所示,界面點擊admin–>Connections配置connection

 

 hive默認使用的connectionshive_cli_default

 

 需要注意下圖中標記出來的幾個配置項:
Conn Type選擇Hive Client Wrapper(如果安裝了hive依賴,默認就是這個)
Host設置為安裝了Hive的節點
Login需要設置為一個有權限執行hive任務的用戶

配置完成,保存即可

任務調度

如下圖所示,為開啟調度,和手動觸發

 任務觸發后,點擊任務欄中間的部分,可以查看任務運行細節,

 舉例,點進去一個任務之后,我們可以看到它的運行細節運行日志

三. 遇到的問題

3.1 python模塊下載報錯

Collecting flask-appbuilder<2.0.0,>=1.12.2; python_version < "3.6"
  Using cached Flask-AppBuilder-1.13.1.tar.gz (1.5 MB)
    ERROR: Command errored out with exit status 1:
     command: /bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-EFxJZq/flask-appbuilder/setup.py'"'"'; __file__='"'"'/tmp/pip-install-EFxJZq/flask-appbuilder/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-StYjJL
         cwd: /tmp/pip-install-EFxJZq/flask-appbuilder/
    Complete output (3 lines):
    /usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'long_description_content_type'
      warnings.warn(msg)
    error in Flask-AppBuilder setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers

解決方案:
setuptools升級到最新版即可

pip install setuptools -U

3.2 執行ariflow相關命令報錯 error: sqlite C library version too old (< {min_sqlite_version}). 

詳細報錯如下:

Traceback (most recent call last):
  File "/usr/python3.7/bin/airflow", line 5, in <module>
    from airflow.__main__ import main
  File "/usr/python3.7/lib/python3.7/site-packages/airflow/__init__.py", line 34, in <module>
    from airflow import settings
  File "/usr/python3.7/lib/python3.7/site-packages/airflow/settings.py", line 35, in <module>
    from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401
  File "/usr/python3.7/lib/python3.7/site-packages/airflow/configuration.py", line 1114, in <module>
    conf.validate()
  File "/usr/python3.7/lib/python3.7/site-packages/airflow/configuration.py", line 202, in validate
    self._validate_config_dependencies()
  File "/usr/python3.7/lib/python3.7/site-packages/airflow/configuration.py", line 243, in _validate_config_dependencies
    f"error: sqlite C library version too old (< {min_sqlite_version}). "
airflow.exceptions.AirflowConfigException: error: sqlite C library version too old (< 3.15.0). See https://airflow.apache.org/docs/apache-airflow/2.1.1/howto/set-up-database.rst#setting-up-a-sqlite-database

原因: airflow默認使用sqlite作為metastore,但我們使用的是mysql,實際上用不到sqlite

解決方案:修改{AIRFLOW_HOME}/airflow.cfg,
將元數據庫信息sql_alchemy_conn修改為

sql_alchemy_conn = mysql+pymysql://airflow:yourpassword@hostname:3306/airflow`

3.3 執行airflow db init失敗 Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql

  File "/usr/python3.7/lib/python3.7/site-packages/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py", line 44, in upgrade
    raise Exception("Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql")
Exception: Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql

解決方法:
進入mysql airflow 數據庫,設置global explicit_defaults_for_timestamp

SHOW GLOBAL VARIABLES LIKE '%timestamp%';
SET GLOBAL explicit_defaults_for_timestamp =1;

設置前:

 

 設置后:

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM