DataHub開源元數據管理工具搭建及使用


一、DataHub安裝

  1、安裝docker和docker-compose

    yum -y install docker

    curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

    chmod +x /usr/local/bin/docker-compose

    查看是否安裝成功:

    docker --version

    docker-compose --version

  2、安裝jq

    yum install epel-release

    yum -y install jq

  3、安裝python3

    yum install python-pip gcc gcc-c++ python-virtualenv cyrus-sasl-devel

    yum -y groupinstall "Development tools"

    yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel

    wget https://www.python.org/ftp/python/3.7.3/Python-3.7.3.tgz

    tar -zxvf Python-3.7.3.tgz

    mkdir /usr/local/python3

    cd Python-3.7.3

    ./configure --prefix=/usr/local/python3

    make && make install

    修改系統python環境:

    rm -rf /usr/bin/python

    ln -s /usr/local/python3/bin/python3 /usr/bin/python

    修改pip環境:

    rm -rf /usr/bin/pip

    ln -s /usr/local/python3/bin/pip3 /usr/bin/pip

    將python環境改為python3后需要改下yum的文件,默認使用的python2:

    vi /usr/bin/yum =>  把 #! /usr/bin/python 修改為 #! /usr/bin/python2

    vi /usr/libexec/urlgrabber-ext-down  => 把 #! /usr/bin/python 修改為 #! /usr/bin/python2

    升級pip:

    python -m pip install --upgrade pip wheel setuptools

  4、安裝和啟動DataHub

    python -m pip uninstall datahub acryl-datahub || true

    python -m pip install --upgrade acryl-datahub

    python -m datahub version

    python -m datahub docker quickstart

    

 

 

 

二、實踐

  1、導入mysql元數據信息(這里重新用docker創建一個mysql容器)

  docker run -p 13306:3306 --name ownmysql -v /opt/docker_data/mysql/conf:/etc/mysql/conf.d -v /opt/docker_data/mysql/logs:/logs -v   /opt/docker_data/mysql/data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=123456 -d mysql

  安裝mysql插件:

  pip install 'acryl-datahub[mysql]'

  檢查已經安裝的插件:

  python -m datahub check plugins

  

  2、編寫yam文件,通過rest接口讀取mysql的元數據信息

source:
  type: mysql
  config:
    host_port: node:13306
    username: root
    password: 123456
    database: aucc

sink:
  type: "datahub-rest"
  config:
    server: "http://node:8080"

 

  3、攝取

  python -m datahub ingest -c mysql_to_datahub_rest.yml

 

  4、hive元數據信息攝取

  安裝前置:

  yum install cyrus-sasl-plain  cyrus-sasl-devel  cyrus-sasl-gssapi

  pip install 'acryl-datahub[hive]'

source:
  type: hive
  config:
    host_port: node:10000
    username:
    password:
    database: default

sink:
  type: "datahub-rest"
  config:
    server: "http://node:8080"

  python -m datahub ingest -c hive_to_datahub_rest.yml

  

  5、界面

  

 

   


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM