環境准備
1.conda創建虛擬環境
conda create -n 虛擬環境名字 python=版本
conda ccreate -n python3.6 python=3.6
2.查看虛擬環境
conda info -e
3.切換環境
Linux: source activate your_env_name(虛擬環境名稱)
Windows: activate your_env_name(虛擬環境名稱)
source activate python3.6
4.關閉環境
Linux: source deactivate
Windows: deactivate
安裝airflow
1.升級pip
pip install --upgrade pip
2.安裝gcc(有不用安裝)
yum -y install gcc gcc-c++ kernel-devel
3.安裝依賴
pip3 install paramiko yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
4.安裝airflow
pip3 install apache-airflow
5.安裝pymysql
pip3 install pymysql
6.配置環境變量
# vi /etc/profile #airflow export AIRFLOW_HOME=/opt/airflow # source /etc/profile
7.初始化數據庫表(默認使用本地sqlite數據庫)
airflow initdb
會在配置的airflow環境下生成如下文件
ls /opt/airflow airflow.cfg airflow.db logs unittests.cfg
8.配置MySQL數據庫
創建airflow數據庫,並創建用戶和授權,給airflow訪問數據庫使用
如果沒有mysql查看之前的筆記 linux安裝mysql
mysql> CREATE DATABASE airflow; Query OK, 1 row affected (0.00 sec) mysql> GRANT all privileges on root.* TO 'root'@'localhost' IDENTIFIED BY 'root'; ERROR 1819 (HY000): Your password does not satisfy the current policy requirements #這個錯誤與validate_password_policy的值有關。默認值是1,即MEDIUM,所以剛開始設置的密碼必須符合長度,且必須含有數字,小寫或大寫字母,特殊字符。 有時候,只是為了自己測試,不想密碼設置得那么復雜,譬如說,我只想設置root的密碼為root。 必須修改兩個全局參數: 1)首先,修改validate_password_policy參數的值: mysql> set global validate_password_policy=0; Query OK, 0 rows affected (0.00 sec) #這樣,判斷密碼的標准就基於密碼的長度了。這個由validate_password_length參數來決定。 mysql> select @@validate_password_length; +----------------------------+ | @@validate_password_length | +----------------------------+ | 8 | +----------------------------+ 1 row in set (0.00 sec) 2)修改validate_password_length參數,設置密碼僅由密碼長度決定。 mysql> set global validate_password_length=1; Query OK, 0 rows affected (0.00 sec) mysql> select @@validate_password_length; +----------------------------+ | @@validate_password_length | +----------------------------+ | 4 | +----------------------------+ 1 row in set (0.00 sec) mysql> GRANT all privileges on root.* TO 'root'@'localhost' IDENTIFIED BY 'root'; Query OK, 0 rows affected, 1 warning (0.35 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.01 sec)
9.更改數據庫配置
mysql> set @@global.explicit_defaults_for_timestamp=on;
10.配置airflow
vim airflow/airflow.cfg # The executor class that airflow should use. Choices include # SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor, KubernetesExecutor #executor = SequentialExecutor executor = LocalExecutor # The SqlAlchemy connection string to the metadata database. # SqlAlchemy supports many different database engine, more information # their website #sql_alchemy_conn = sqlite:////data/airflow/airflow.db sql_alchemy_conn = mysql+pymysql://root:root@localhost:3306/airflow
執行器executor 有如下選擇
SequentialExecutor:單進程順序執行任務,默認執行器,通常只用於測試
LocalExecutor:多進程本地執行任務
CeleryExecutor:分布式調度,生產常用
DaskExecutor :動態任務調度,主要用於數據分析
11.再次初始化數據庫表
airflow initdb
12.查看創建的airflow數據表
mysql> use airflow;
mysql> show tables;
13.啟動服務
airflow webserver
airflow scheduler
后台運行
# 打開airflow的webserver UI,為了使其后台運行,這里用了nohup nohup airflow webserver -p 8080 > /opt/airflow/webLog.log 2>&1 & # 打開airflow的調度器,以開始定時執行任務 nohup airflow scheduler > /opt/airflow/schedulerLog.log 2>&1 &
kill 進程
ps -ef|grep "airflow "|grep -v grep|cut -c 9-15|xargs kill -9
注釋
"grep -v grep"是在列出的進程中去除含有關鍵字"grep"的進程。
"cut -c 9-15"是截取輸入行的第9個字符到第15個字符,而這正好是進程號PID。
"xargs kill -9"中的xargs命令是用來把前面命令的輸出結果(PID)作為"kill -9"命令的參數,並執行該令。
啟動scheduler時如果出錯
failed to log action with (sqlite3.operationalerror) no such table log
export AIRFLOW_HOME=/opt/airflow
airflow initdb
參考 點這里
14.瀏覽器查看

參考 點這里
