DataX 實戰案例 -- 使用datax實現將mysql數據導入到hdfs


  • 需求: 將mysql表student的數據導入到hdfs的 /datax/mysql2hdfs/ 路徑下面去。

  • 1、創建mysql數據庫和需要用到的表結構,並導入實戰案例需要用到的數據

      [hadoop@hadoop02 ~] mysql -uroot -p123456
      mysql> create database datax;
      mysql> use datax;
      mysql> create table student(id int,name varchar(20),age int,createtime timestamp );
      mysql> insert into `student` (`id`, `name`, `age`, `createtime`) values('1','zhangsan','18','2021-05-10 18:10:00');
      mysql> insert into `student` (`id`, `name`, `age`, `createtime`) values('2','lisi','28','2021-05-10 19:10:00');
      mysql> insert into `student` (`id`, `name`, `age`, `createtime`) values('3','wangwu','38','2021-05-10 20:10:00');
    
  • 2、創建作業的配置文件(json格式)

    • 查看配置模板,執行腳本命令
      [hadoop@hadoop03 datax]$ cd /bigdata/install/datax
      [hadoop@hadoop03 datax]$ python bin/datax.py -r mysqlreader -w hdfswriter
      
      DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
      Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
      
      Please refer to the mysqlreader document:
           https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md 
      
      Please refer to the hdfswriter document:
           https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md 
       
      Please save the following configuration as a json file and  use
           python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
      to run the job.
      
      {
          "job": {
              "content": [
                  {
                      "reader": {
                          "name": "mysqlreader", 
                          "parameter": {
                              "column": [], 
                              "connection": [
                                  {
                                      "jdbcUrl": [], 
                                      "table": []
                                  }
                              ], 
                              "password": "", 
                              "username": "", 
                              "where": ""
                          }
                      }, 
                      "writer": {
                          "name": "hdfswriter", 
                          "parameter": {
                              "column": [], 
                              "compress": "", 
                              "defaultFS": "", 
                              "fieldDelimiter": "", 
                              "fileName": "", 
                              "fileType": "", 
                              "path": "", 
                              "writeMode": ""
                          }
                      }
                  }
              ], 
              "setting": {
                  "speed": {
                      "channel": ""
                  }
              }
          }
      }
      
    • 其中hdfswriter插件文檔
  • 3、根據模板寫配置文件

    • 進入到 /bigdata/install/datax/job 目錄,然后創建配置文件 mysql2hdfs.json, 文件內容如下:
      {
          "job": {
              "setting": {
                  "speed": {
                       "channel":1
                  }
              },
              "content": [
                  {
                      "reader": {
                          "name": "mysqlreader",
                          "parameter": {
                              "username": "root",
                              "password": "123456",
                              "connection": [
                                  {
                                      "querySql": [
                                          "select id,name,age,createtime from student where age < 30;"
                                      ],
                                      "jdbcUrl": [
                                          "jdbc:mysql://hadoop02:3306/datax"
                                      ]
                                  }
                              ]
                          }
                      },
                        "writer": {
                          "name": "hdfswriter",
                          "parameter": {
                              "defaultFS": "hdfs://hadoop01:8020",
                              "fileType": "text",
                              "path": "/datax/mysql2hdfs/",
                              "fileName": "student.txt",
                              "column": [
                                  {
                                      "name": "id",
                                      "type": "INT"
                                  },
                                  {
                                      "name": "name",
                                      "type": "STRING"
                                  },
                                  {
                                      "name": "age",
                                      "type": "INT"
                                  },
                                  {
                                      "name": "createtime",
                                      "type": "TIMESTAMP"
                                  }
                              ],
                              "writeMode": "append",
                              "fieldDelimiter": "\t",
                              "compress":"gzip"
                          }
                      }
                  }
              ]
          }
      }
      
  • 4、啟HDFS, 創建目標路徑

    [hadoop@hadoop01 ~]$ start-dfs.sh 
    [hadoop@hadoop01 ~]$ hdfs dfs -mkdir -p /datax/mysql2hdfs
    
  • 5、啟動DataX

    [hadoop@hadoop03 bin]$ cd /bigdata/install/datax
    [hadoop@hadoop03 bin]$ python bin/datax.py job/mysql2hdfs.json 
    
  • 6、觀察控制台輸出結果

    同步結束,顯示日志如下:
    
    2021-06-18 01:41:26.452 [job-0] INFO  JobContainer - 
    任務啟動時刻                    : 2021-06-18 01:41:14
    任務結束時刻                    : 2021-06-18 01:41:26
    任務總計耗時                    :                 11s
    任務平均流量                    :                3B/s
    記錄寫入速度                    :              0rec/s
    讀出記錄總數                    :                   2
    讀寫失敗總數                    :                   0
    
  • 7、查看HDFS上文件生成,並驗證結果

    將上邊結果下載解壓后打開,可以看到里面的結果和mysql中結果對比


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM