一、DataHub安裝
1、安裝docker和docker-compose
yum -y install docker
curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
查看是否安裝成功:
docker --version
docker-compose --version
2、安裝jq
yum install epel-release
yum -y install jq
3、安裝python3
yum install python-pip gcc gcc-c++ python-virtualenv cyrus-sasl-devel
yum -y groupinstall "Development tools"
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel
wget https://www.python.org/ftp/python/3.7.3/Python-3.7.3.tgz
tar -zxvf Python-3.7.3.tgz
mkdir /usr/local/python3
cd Python-3.7.3
./configure --prefix=/usr/local/python3
make && make install
修改系統python環境:
rm -rf /usr/bin/python
ln -s /usr/local/python3/bin/python3 /usr/bin/python
修改pip環境:
rm -rf /usr/bin/pip
ln -s /usr/local/python3/bin/pip3 /usr/bin/pip
將python環境改為python3后需要改下yum的文件,默認使用的python2:
vi /usr/bin/yum => 把 #! /usr/bin/python 修改為 #! /usr/bin/python2
vi /usr/libexec/urlgrabber-ext-down => 把 #! /usr/bin/python 修改為 #! /usr/bin/python2
升級pip:
python -m pip install --upgrade pip wheel setuptools
4、安裝和啟動DataHub
python -m pip uninstall datahub acryl-datahub || true
python -m pip install --upgrade acryl-datahub
python -m datahub version
python -m datahub docker quickstart
二、實踐
1、導入mysql元數據信息(這里重新用docker創建一個mysql容器)
docker run -p 13306:3306 --name ownmysql -v /opt/docker_data/mysql/conf:/etc/mysql/conf.d -v /opt/docker_data/mysql/logs:/logs -v /opt/docker_data/mysql/data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=123456 -d mysql
安裝mysql插件:
pip install 'acryl-datahub[mysql]'
檢查已經安裝的插件:
python -m datahub check plugins
2、編寫yam文件,通過rest接口讀取mysql的元數據信息
source:
type: mysql
config:
host_port: node:13306
username: root
password: 123456
database: aucc
sink:
type: "datahub-rest"
config:
server: "http://node:8080"
3、攝取
python -m datahub ingest -c mysql_to_datahub_rest.yml
4、hive元數據信息攝取
安裝前置:
yum install cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl-gssapi
pip install 'acryl-datahub[hive]'
source:
type: hive
config:
host_port: node:10000
username:
password:
database: default
sink:
type: "datahub-rest"
config:
server: "http://node:8080"
python -m datahub ingest -c hive_to_datahub_rest.yml
5、界面