Hive數據提取

本文轉載自查看原文 2019-05-10 16:35 753 Hadoop

Hive是基於Hadoop的ETL工具和數據倉庫。

結構化數據

結構化數據就像RDBMS

hive> create table structured_table(id int, name string)
    > row format delimited
    > fields terminated by ','
    > location '/yandufeng/structured_table';
OK
Time taken: 0.209 seconds
hive> load data local inpath '/home/hive/test2.txt' into table structured_table;
Loading data to table default.structured_table
Table default.structured_table stats: [numFiles=1, totalSize=23]
OK
Time taken: 0.831 seconds
hive> select * from structured_table;
OK
1    hello
2    name
3    world
Time taken: 0.106 seconds, Fetched: 3 row(s)

半結構化的數據，例如：json，xml

hive> 
    > create table json_table(str string);
OK
Time taken: 0.229 seconds
hive> load data local inpath '/home/hive/json_table.json' into table json_table;
Loading data to table default.json_table
Table default.json_table stats: [numFiles=1, totalSize=26]
OK
Time taken: 1.523 seconds
hive> select get_json_object(str, '$.a') from json_table;
OK
2
Time taken: 0.168 seconds, Fetched: 1 row(s)
hive> select get_json_object(str, '$.a'), get_json_object(str, '$.b') from json_table;
OK
2    blah
Time taken: 0.084 seconds, Fetched: 1 row(s)

什么時候使用Hive

當需要強大的統計方法的時候
當要處理結構化或者半結構化數據
當需要基於Hadoop的數據倉庫
可以於Hbase結合

Hive用在什么地方

作為ETL工具和數據倉庫
提供HQL進行數據查詢
為特定的需求，用自定義的map和reduce腳本

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 如何使用Hive&R從Hadoop集群中提取數據進行分析遷移hive表及hive數據 [Hive_4] Hive 插入數據 Hive提取Json字段(字符串格式) HIVE—數據倉庫 hive 導出數據到本地 Hive數據導入Hbase Mongodb同步數據到hive（二） Hadoop——Hive的數據操作 Hive之數據類型