解決hue/hiveserver2對於hive date類型顯示為NULL的問題


 

用戶報在Hue中執行一條sql:select admission_date, discharge_date,birth_date from hm_004_20170309141149.inpatient_visit limit 20; 返回的結果中date類型的列全部顯示為null,可是在hive客戶端中能正常顯示。

驗證一下:beeline -u jdbc:hive2://0.0.0.0:10000 -e "select admission_date, discharge_date,birth_date from hm_004_20170309141149.inpatient_visit limit 20;"

 懷疑是hiveserver2的問題,可查詢另一個包含date的表,卻顯示正常:select part_dt from default.kylin_sales limit 50;

於是懷疑是serde的問題,hm_004_20170309141149.inpatient_visit用的是org.openx.data.jsonserde.JsonSerDe,default.kylin_sales用的是TextInputFormat.

這個JsonSerDe看着有點怪,一查果然是第三方的,同事之前引入 已向開發者報問題: https://github.com/rcongiu/Hive-JSON-Serde/issues/187 
官方自帶的是org.apache.hive.hcatalog.data.JsonSerDe( https://cwiki.apache.org/confluence/display/Hive/SerDe), 位於$HIVE_HOME/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar, 用此jar測試:
CREATE EXTERNAL TABLE `default.inpatient_visit`(
  `age_m` int COMMENT 'from deserializer', 
  `discharge_date` date COMMENT 'from deserializer', 
  `address_code` string COMMENT 'from deserializer', 
  `admission_date` date COMMENT 'from deserializer', 
  `visit_dept_name` string COMMENT 'from deserializer', 
  `birth_date` date COMMENT 'from deserializer', 
  `outcome` string COMMENT 'from deserializer', 
  `age` int COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hive.hcatalog.data.JsonSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://xxxx/user/hive/warehouse/xx.db/inpatient_visit';

本地測試beeline -u jdbc:hive2://0.0.0.0:10000 -e "add jar /home/work/hive/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar;select admission_date, discharge_date,birth_date from default.inpatient_visit limit 20;"

 

在Hue中測試:

 


 

【測試系統自帶JsonSerDe是否功能一樣】

CREATE TABLE json_nested_test (
    count string,
    usage string,
    pkg map<string,string>,
    languages array<string>,
    store map<string,array<map<string,string>>>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE;

遇到個報錯:

2017-04-25 15:46:38,655 WARN  [main]: data.JsonSerDe (JsonSerDe.java:deserialize(181)) - Error [java.io.IOException: Start of Array expected] parsing json text [{"count":2,"usage":91273,"pkg":{"weight":8,"type":"apple"},"languages":["German","French","Italian"],"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}]}}].
2017-04-25 15:46:38,656 ERROR [main]: CliDriver (SessionState.java:printError(960)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start of Array expected
java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start of Array expected
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start of Array expected
        at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:183)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:488)
        ... 15 more
Caused by: java.io.IOException: Start of Array expected
        at org.apache.hive.hcatalog.data.JsonSerDe.extractCurrentField(JsonSerDe.java:332)
        at org.apache.hive.hcatalog.data.JsonSerDe.extractCurrentField(JsonSerDe.java:356)
        at org.apache.hive.hcatalog.data.JsonSerDe.populateRecord(JsonSerDe.java:218)
        at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:174)
        ... 16 more

經過多輪測試(具體測試過程見:http://www.cnblogs.com/aprilrain/p/6916359.html),發現這個SerDe對於復雜些的嵌套會報此錯,例如map<string,array<string>>

CREATE TABLE s6 (
    store map<string,array<string>>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE;
load data local inpath '/home/work/s6.txt' overwrite into table s6;
select * from s6;
6s.txt的內容
{"store":{"fruit":["weight","8","type","apple"]}}
{"store":{"fruit":["weight","9","type","orange"]}}

向社區報了一個issue: https://issues.apache.org/jira/browse/HIVE-16526

另外還有問題:不支持數據文件中的空行: https://issues.apache.org/jira/browse/HIVE-15475,見下面的例子

org.openx.data.jsonserde.JsonSerDe 不支持空行的例子:
CREATE TABLE json_nested_test_openx (
    count string,
    usage string,
    pkg map<string,string>,
    languages array<string>,
    store map<string,array<map<string,string>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
hive> select pkg['weight'],languages[0],store['fruit'][0]['type'] from json_nested_test_openx;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating store['fruit'][0]['type']
解決:以上錯誤是由於數據文件的最后一行多了一個空行,去掉空行即可解決。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM