DataX 實戰案例 -- 使用datax實現將hdfs數據導入到mysql表中


  • 需求: 將hdfs上數據文件 user.txt 導入到mysql數據庫的user表中。

  • 1、創建作業的配置文件(json格式)

    • 查看配置模板,執行腳本命令
      [hadoop@hadoop03 ~]$ cd /bigdata/install/datax
      [hadoop@hadoop03 datax]$ python bin/datax.py -r hdfsreader -w mysqlwriter
      
      DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
      Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
      
      Please refer to the hdfsreader document:
           https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md 
      
      Please refer to the mysqlwriter document:
           https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md 
       
      Please save the following configuration as a json file and  use
           python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
      to run the job.
      
      {
          "job": {
              "content": [
                  {
                      "reader": {
                          "name": "hdfsreader", 
                          "parameter": {
                          "column": [], 
                          "defaultFS": "", 
                              "encoding": "UTF-8", 
                              "fieldDelimiter": ",", 
                              "fileType": "orc", 
                              "path": ""
                          }
                      }, 
                      "writer": {
                          "name": "mysqlwriter", 
                          "parameter": {
                              "column": [], 
                              "connection": [
                                  {
                                      "jdbcUrl": "", 
                                      "table": []
                                  }
                              ], 
                              "password": "", 
                              "preSql": [], 
                              "session": [], 
                              "username": "", 
                              "writeMode": ""
                          }
                      }
                  }
              ], 
              "setting": {
                  "speed": {
                      "channel": ""
                  }
              }
          }
      }
      
    • 其中hdfsreader插件文檔
  • 2、根據模板寫配置文件

    • 進入到 /bigdata/install/datax/job 目錄,然后創建配置文件 hdfs2mysql.json, 文件內容如下:
      {
          "job": {
              "setting": {
                  "speed": {
                       "channel":1
                  }
              },
              "content": [
                  {
                      "reader": {
                          "name": "hdfsreader",
                          "parameter": {
                          "defaultFS": "hdfs://hadoop01:8020",
                              "path": "/user.txt",                  
                              "fileType": "text",
                              "encoding": "UTF-8",
                              "fieldDelimiter": "\t",
                              "column": [
                                     {
                                      "index": 0,
                                      "type": "long"
                                     },
                                     {
                                      "index": 1,
                                      "type": "string"
                                     },
                                     {
                                      "index": 2,
                                      "type": "long"
                                     }
                              ]
                            }
                        },
                     "writer": {
                          "name": "mysqlwriter",
                          "parameter": {
                              "writeMode": "insert",
                              "username": "root",
                              "password": "123456",
                              "column": [
                                  "id",
                                  "name",
                                  "age"
                              ],
                              "preSql": [
                                  "delete from user"
                              ],
                              "connection": [
                                  {
                                      "jdbcUrl": "jdbc:mysql://hadoop02:3306/datax?useUnicode=true&characterEncoding=utf-8",
                                      "table": [
                                          "user"
                                      ]
                                  }
                              ]
                          }
                      }
                  }
              ]
          }
      }
      
      
  • 3、准備HDFS上測試數據文件 user.txt

    • user.txt文件內容如下
      1	zhangsan  20
      2	lisi  29
      3	wangwu  25
      4	zhaoliu  35
      5	kobe  40
      
    • 文件中每列字段通過\t 制表符進行分割,上傳文件到hdfs上
      [hadoop@hadoop03 ~]$ hdfs dfs -put user.txt /
      
  • 4、創建目標表

    mysql> create table datax.user(id int,name varchar(20),age int);
    
  • 5、啟動DataX

    [hadoop@hadoop03 ~]$ cd /bigdata/install/datax
    [hadoop@hadoop03 bin]$ python bin/datax.py job/hdfs2mysql.json 
    
  • 6、觀察控制台輸出結果

    同步結束,顯示日志如下:
    
    任務啟動時刻                    : 2021-06-18 12:02:47
    任務結束時刻                    : 2021-06-18 12:02:58
    任務總計耗時                    :                 11s
    任務平均流量                    :                4B/s
    記錄寫入速度                    :              0rec/s
    讀出記錄總數                    :                   5
    讀寫失敗總數                    :                   0
    
  • 7、查看user表數據


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM