hive 中兩張表做關聯查詢時,查詢某些字段會出現異常,底層文件存儲采用 parquet。
錯誤詳情
Caused by: java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable
at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveWritableObject(ParquetStringInspector.java:52)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:420)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:279)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:239)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:201)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:563)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:395)
... 13 more
原因
hive 建表語句中的字段類型與 parquet 文件中的類型不一致導致。
查看 parquet 文件 schema
- 下載 parquet-tools-1.9.0.jar
- hadoop jar ./parquet-tools-1.9.0.jar schema hdfs_file_path
