dolphinscheduler 實操


DS 安裝

參考官網:https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/cluster-deployment.html

  1. 部署用戶獲取sudo權限;

  2. host配置,ssh免密,部署目錄修改所屬用戶;

  3. 數據庫初始化:修改conf/datasource.properties;執行script/create-dolphinscheduler.sh;(add:執行完之后,手動添加 sql/xdf 下的sql);

  4. 修改環境變量:dolphinscheduler_env.sh;

  5. 修改配置:conf/config/install_config.conf;

  6. hdfs保存資源:將core-site.xml hdfs-site.xml 放到conf目錄下;

  7. 自動部署:sh install.sh;

  8. 登陸驗證:http://xx.xx.xx.xx:12345/dolphinscheduler admin/dolphinscheduler123

  9. 先下載源碼

  10. 修改 Hadoop hive版本

<!--<hadoop.version>2.7.3</hadoop.version>-->
<hadoop.version>2.7.7</hadoop.version>
<!--<hive.jdbc.version>2.1.0</hive.jdbc.version>-->
<hive.jdbc.version>2.3.6</hive.jdbc.version>
  1. 編譯
    mvn -U clean package -Prelease -Dmaven.test.skip=true

  2. 生成二進制文件
    在 $DOLPHIN_SCHEDULER_HOME/dolphinscheduler-dist/target 目錄中

  3. 數據庫初始化

drop database if exists dolphinscheduler;
CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE
utf8_general_ci;
create user 'ds' IDENTIFIED BY 'QWer_1234';
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'ds'@'%' IDENTIFIED BY
'QWer_1234';
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'ds'@'localhost' IDENTIFIED BY
'QWer_1234';
flush privileges;
  1. 准備MySQL元數據庫
    vim conf/datasource.properties

MySQL

spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://bigdata02:3306/dolphinscheduler?
useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true
spring.datasource.username=ds
spring.datasource.password=QWer_1234

  1. 放入 MySQL 驅動 jar 到 lib 目錄
    cp ~/soft/mysql-connector-java-5.1.47.jar /opt/apps/dolphinscheduler-1.3.4/lib/

  2. 初始化數據庫
    sh script/create-dolphinscheduler.sh

  3. 修改 dolphinscheduler_env.sh
    vim conf/env/dolphinscheduler_env.sh

export HADOOP_HOME=/neworiental/tools/hadoop-3.2.1
export HADOOP_CONF_DIR=/neworiental/tools/hadoop-3.2.1/etc/hadoop
export SPARK_HOME1=/neworiental/tools/spark-3.0.2-bin-hadoop3.2
export SPARK_HOME2=/neworiental/tools/spark-3.0.2-bin-hadoop3.2
export PYTHON_HOME=/usr/bin
export JAVA_HOME=/neworiental/tools/jdk1.8.0_161
export HIVE_HOME=/neworiental/tools/apache-hive-3.1.2-bin
export FLINK_HOME=/neworiental/tools/flink-1.11.1
export DATAX_HOME=/opt/soft/datax
  1. 修改配置 install_config.conf
    vim conf/config/install_config.conf

  2. 拷貝 Hadoop 集群配置文件
    cp ~/apps/hadoop-2.7.7/etc/hadoop/core-site.xml /opt/apps/dolphinscheduler-
    1.3.4/conf/
    cp ~/apps/hadoop-2.7.7/etc/hadoop/hdfs-site.xml /opt/apps/dolphinscheduler-
    1.3.4/conf/

  3. 修改配置 install_config.conf
    vim conf/config/install_config.conf

# postgresql or mysql
dbtype="mysql"
# db config
# db address and port
dbhost="bigdata02:3306"
# db username
username="ds"

# database name
dbname="dolphinscheduler"
# db passwprd
# NOTICE: if there are special characters, please use the \ to escape, for
example, `[` escape to `\[`
password="xxxxxx"
# zk cluster
zkQuorum="bigdata04:2181,bigdata03:2181,bigdata02:2181"
# Note: the target installation path for dolphinscheduler, please not config as
the same as the current path (pwd)
installPath="/opt/apps/dolphinscheduler"
# deployment user
# Note: the deployment user needs to have sudo privileges and permissions to
operate hdfs. If hdfs is enabled, the root directory needs to be created by
itself
deployUser="bigdata"

# alert config
# mail server host
mailServerHost="smtp.exmail.qq.com"
# mail server port
# note: Different protocols and encryption methods correspond to different
ports, when SSL/TLS is enabled, make sure the port is correct.
mailServerPort="25"
# sender
mailSender="xxx@qq.com"
# user
mailUser="aaa@qq.com"
# sender password
# note: The mail.passwd is email service authorization code, not the email login
password.
mailPassword="xxxxxxxxxx"
# TLS mail protocol support
starttlsEnable="true"
# SSL mail protocol support
# only one of TLS and SSL can be in the true state.
sslEnable="false"
#note: sslTrust is the same as mailServerHost
sslTrust="smtp.exmail.qq.com"

# resource storage type:HDFS,S3,NONE
resourceStorageType="HDFS"

defaultFS="hdfs://hadoop277ha:8020/"
# if resourceStorageType is S3, the following three configuration is required,
otherwise please ignore
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# if resourcemanager HA enable, please type the HA ips ; if resourcemanager is
single, make this value empty
yarnHaIps="bigdata04,bigdata05"
# if resourcemanager HA enable or not use resourcemanager, please skip this
value setting; If resourcemanager is single, you only need to replace yarnIp1 to
actual resourcemanager hostname.
singleYarnIp="yarnIp1"
# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs
path, self configuration, please make sure the directory exists on hdfs and have
read write permissions。/dolphinscheduler is recommended
resourceUploadPath="/dolphinscheduler"
# who have permissions to create directory under HDFS/S3 root path
# Note: if kerberos is enabled, please config hdfsRootUser=
hdfsRootUser="bigdata"

# kerberos config
# whether kerberos starts, if kerberos starts, following four items need to
config, otherwise please ignore
kerberosStartUp="false"
# kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"
# api server port
apiServerPort="12345"
# install hosts
# Note: install the scheduled hostname list. If it is pseudo-distributed, just
write a pseudo-distributed hostname
ips="localhost"
# ssh port, default 22
# Note: if ssh port is not default, modify here
sshPort="22"
# run master machine
# Note: list of hosts hostname for deploying master
masters="localhost"

# run worker machine
# note: need to write the worker group name of each worker, the default value is
"default"
workers="localhost:default"
# run alert machine
# note: list of machine hostnames for deploying alert server
alertServer="localhost"
# run api machine
# note: list of machine hostnames for deploying api server
apiServers="localhost"

集群模式如下配置

# install hosts
# Note: install the scheduled hostname list. If it is pseudo-distributed, just
write a pseudo-distributed hostname
ips="bigdata02,bigdata03,bigdata04,bigdata05"
# ssh port, default 22
# Note: if ssh port is not default, modify here
sshPort="22"
# run master machine
# Note: list of hosts hostname for deploying master
masters="bigdata04,bigdata05"
# run worker machine
# note: need to write the worker group name of each worker, the default value is
"default"
workers="bigdata02:default,bigdata03:default,bigdata04:default,bigdata05:default
"
# run alert machine
# note: list of machine hostnames for deploying alert server
alertServer="bigdata03"
# run api machine
# note: list of machine hostnames for deploying api server
apiServers="bigdata04"

11 . 啟動基礎服務
啟動 ZooKeeper
啟動 Hadoop
一鍵停止集群所有服務
sh /opt/apps/dolphinscheduler-1.3.4/bin/stop-all.sh
一鍵開啟集群所有服務
sh /opt/apps/dolphinscheduler-1.3.4/bin/start-all.sh

地址: http://172.24.28.65:12345/dolphinscheduler/ui/#/security/tenant
默認的用戶名和密碼是 admin/dolphinscheduler123

  1. 創建隊列
  2. 創建租戶
    租戶對應的是 Linux 的用戶,用於 worker 提交作業所使用的用戶。如果 Linux 沒有這個用戶,worker 會在執行腳本的時候創建這個用戶。
  3. 創建普通用戶
    就是普通開發賬號
  4. 創建告警組
    告警組用於流程啟動時設置參數,在流程結束以后會將流程的狀態和其他信息以郵件形式發送給告警組。
  5. 創建Worker分組
    指定 worker 運行在那些指定的機器上
  6. 創建數據源
    DS 中工作流處理的數據來自於各種數據源,因此需要提前定義。

先創建項目
在項目中定義工作流

資源管理

資源管理創建的是 Python 、jar等可以運行的程序,腳本或者配置文件
UDF管理 上傳的是 UDF 函數


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM