一、介紹
TxtFileReader提供了讀取本地文件系統數據存儲的能力。在底層實現上,TxtFileReader獲取本地文件數據,並轉換為DataX傳輸協議傳遞給Writer。
二、配置模版
{
"setting": {},
"job": {
"setting": {
"speed": {
"channel": 2
}
},
"content": [
{
"reader": {
"name": "txtfilereader",
"parameter": {
"path": ["/home/haiwei.luo/case00/data"],
"encoding": "UTF-8",
"column": [
{
"index": 0,
"type": "long"
},
{
"index": 1,
"type": "boolean"
},
{
"index": 2,
"type": "double"
},
{
"index": 3,
"type": "string"
},
{
"index": 4,
"type": "date",
"format": "yyyy.MM.dd"
}
],
"fieldDelimiter": ","
}
},
"writer": {
"name": "txtfilewriter",
"parameter": {
"path": "/home/haiwei.luo/case00/result",
"fileName": "luohw",
"writeMode": "truncate",
"format": "yyyy-MM-dd"
}
}
}
]
}
}
三、使用說明
-
支持且僅支持讀取TXT的文件,且要求TXT中shema為一張二維表。
-
支持類CSV格式文件,自定義分隔符。
- 支持多種類型數據讀取(使用String表示),支持列裁剪,支持列常量
四、實踐
最近需要導一張表,原來的表數據是存放在hive上的,利用python腳本處理數據之后直接插入到hive的。現在是要將這張表的數據導入到greenplum中。表數據在7200萬左右
方法:將hive數據導出成csv文件,利用datax導入到greenplum

開干:
配置json文件
{ "content":[ { "reader":{ "name":"txtfilereader", "parameter":{ "column":[ { "format":"yyyy-MM-dd", "index":0, "type":"date" }, { "index":1, "type":"string" }, { "index":2, "type":"string" }, { "index":3, "type":"string" }, { "index":4, "type":"string" }, { "index":5, "type":"long" }, { "index":6, "type":"long" }, { "index":7, "type":"long" }, { "index":8, "type":"long" } ], "encoding":"utf-8", "fieldDelimiter":",", "path":[ "/home/tianyafu/flux_timecount_action.csv" ] } }, "writer":{ "name":"gpdbwriter", "parameter":{ "column":[ "record_date", "outid", "tm_type", "serv", "app", "down_flux", "up_flux", "seconds", "count" ], "connection":[ { "jdbcUrl":"jdbc:postgresql://192.168.100.21:5432/ods", "table":[ "ods_flux_timecount_action" ] } ], "password":"******", "segment_reject_limit":0, "username":"admin" } } } ], "setting":{ "errorLimit":{ "percentage":0.02, "record":0 }, "speed":{ "channel":"1" } } }
然后就失敗了呀

確定錯誤是數據中有null值,無法轉換為Long類型。
查詢到解決方法是添加:
nullFormat配置項
nullFormat 描述:文本文件中無法使用標准字符串定義null(空指針),DataX提供nullFormat定義哪些字符串可以表示為null。 例如如果用戶配置: nullFormat:"\N",那么如果源頭數據是"\N",DataX視作null字段。 必選:否 默認值:\N
那就加上唄,
{ "content":[ { "reader":{ "name":"txtfilereader", "parameter":{ "column":[ { "format":"yyyy-MM-dd", "index":0, "type":"date" }, { "index":1, "type":"string" }, { "index":2, "type":"string" }, { "index":3, "type":"string" }, { "index":4, "type":"string" }, { "index":5, "type":"long" }, { "index":6, "type":"long" }, { "index":7, "type":"long" }, { "index":8, "type":"long" } ], "csvReaderConfig":{ "safetySwitch":false, "skipEmptyRecords":false, "useTextQualifier":false }, "encoding":"utf-8", "fieldDelimiter":",", "nullFormat":"null", "path":[ "/home/tianyafu/flux_timecount_action.csv" ] } }, "writer":{ "name":"gpdbwriter", "parameter":{ "column":[ "record_date", "outid", "tm_type", "serv", "app", "down_flux", "up_flux", "seconds", "count" ], "connection":[ { "jdbcUrl":"jdbc:postgresql://192.168.100.21:5432/ods", "table":[ "ods_flux_timecount_action" ] } ], "password":"******", "segment_reject_limit":0, "username":"admin" } } } ], "setting":{ "errorLimit":{ "percentage":0.02, "record":0 }, "speed":{ "channel":"1" } } }
結果又失敗了

看來是大小寫敏感的,繼續改:
{ "content":[ { "reader":{ "name":"txtfilereader", "parameter":{ "column":[ { "format":"yyyy-MM-dd", "index":0, "type":"date" }, { "index":1, "type":"string" }, { "index":2, "type":"string" }, { "index":3, "type":"string" }, { "index":4, "type":"string" }, { "index":5, "type":"long" }, { "index":6, "type":"long" }, { "index":7, "type":"long" }, { "index":8, "type":"long" } ], "csvReaderConfig":{ "safetySwitch":false, "skipEmptyRecords":false, "useTextQualifier":false }, "encoding":"utf-8", "fieldDelimiter":",", "nullFormat":"NULL", "path":[ "/home/tianyafu/flux_timecount_action.csv" ] } }, "writer":{ "name":"gpdbwriter", "parameter":{ "column":[ "record_date", "outid", "tm_type", "serv", "app", "down_flux", "up_flux", "seconds", "count" ], "connection":[ { "jdbcUrl":"jdbc:postgresql://192.168.100.21:5432/ods", "table":[ "ods_flux_timecount_action" ] } ], "password":"******", "segment_reject_limit":0, "username":"admin" } } } ], "setting":{ "errorLimit":{ "percentage":0.02, "record":0 }, "speed":{ "channel":"1" } } }

終於成功了
看來這個參數是大小寫敏感的
