hive中導入json格式的數據(hive分區表)


hive中建立外部分區表,外部數據格式是json的如何導入呢?

json格式的數據表不必含有分區字段,只需要在hdfs目錄結構中體現出分區就可以了

This is all according to this guide: http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/

1 hive> ADD JAR /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar;
2 Added /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar to class path
3 Added resource: /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar

In /tmp/new I have a file abc.json 

The CREATE EXTERNAL TABLE command runs properly, but it doesn't take in any data:

 1 hive> 
 2     > CREATE EXTERNAL TABLE tweets (
 3     >   id BIGINT,
 4     >   created_at STRING,
 5     >   source STRING,
 6     >   favorited BOOLEAN,
 7     >   retweeted_status STRUCT<
 8     >     text:STRING,
 9     >     user:STRUCT<screen_name:STRING,name:STRING>,
10     >     retweet_count:INT>,
11     >   entities STRUCT<
12     >     urls:ARRAY<STRUCT<expanded_url:STRING>>,
13     >     user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
14     >     hashtags:ARRAY<STRUCT<text:STRING>>>,
15     >   text STRING,
16     >   user STRUCT<
17     >     screen_name:STRING,
18     >     name:STRING,
19     >     friends_count:INT,
20     >     followers_count:INT,
21     >     statuses_count:INT,
22     >     verified:BOOLEAN,
23     >     utc_offset:INT,
24     >     time_zone:STRING>,
25     >   in_reply_to_screen_name STRING
26     > ) 
27     > PARTITIONED BY (datehour INT)
28     > ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
29     > LOCATION '/tmp/new';
30 OK
31 Time taken: 0.142 seconds

注意外部分區表需要手工添加分區

具體步驟

1.) Run the create table statement.
2.) In the directory /tmp/new/ create a sub directory datehour=<some int value>, and then put your .json file inside this.這里就是說只需要在hdfs上建立目錄,目錄體現分區信息,將數據放到對應目錄,然后直接add partiton就好了 3.) Run alter table statement adding this partition to metadata:
    alter table tweets add partition(datehour=<some int value>);
4.) Now run the select statement.

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM