Hive_Hive的數據模型_匯總

本文轉載自查看原文 2017-01-20 16:00 1725 Hive

體系結構：元數據 /HQL的執行
安裝：嵌入 /遠程 /本地
管理： CLI /web界面 /遠程服務
數據類型：基本 /復雜 /時間
數據模型：數據存儲 /內部表 /分區表 /外部表 /桶表 /視圖



============================================================================================= Hive的數據模型_數據存儲 web管理工具察看HDFS文件系統：http://<IP>:50070/ 基於HDFS 沒有專門的數據存儲格式,默認使用制表符 存儲結構主要包括：數據庫，文件，表，視圖 可以直接加載文本文件 創建表時，可以指定Hive數據的列分隔符和行分隔符。 Hive數據模型 表： -Table內部表 -Partition分區表 -External Table 外部表 -Bucket Table 桶表 視圖： ============================================================================================= Hive的數據模型_內部表 - 與數據庫中的Table在概念上是類似。 - 每一個Table在Hive中都有一個相應的目錄存儲數據。 - 所有的Table數據(不包括External Table)都保存在這個目錄中。 create table t1 (tid int, tname string, age int); create table t2 (tid int, tname string, age int) location '/mytable/hive/t2'

create table t3 (tid int, tname string, age int) row format delimited fields terminated by ','; create table t4 as
select * from t1; hdfs dfs -cat /usr/hive/warehouse/tablename/000000_0

alter table t1 add columns(english int); desc t1; drop table t1; if open the recycle bin function of hdfs . we can see the file not delete, but move from one dir to another dir, we can restore it. ============================================================================================= Hive的數據模型_分區表 准備數據表： create table sampledata (sid int, sname string, gender string, language int, math int, english int) row format delimited fields terminated by ',' stored as textfile; 准備文本數據： sampledata.txt 1,Tom,M,60,80,96
2,Mary,F,11,22,33
3,Jerry,M,90,11,23
4,Rose,M,78,77,76
5,Mike,F,99,98,98 將文本數據插入到數據表： hive> load data local inpath '/root/pl62716/hive/sampledata.txt' into table sampledata; -partition對應於數據庫中的Partition 列的密集索引 -在Hive中，表中的一個Partition對應於表下的一個目錄，所有的Partition的數據都存儲在對應的目錄中。 創建分區表： create table partition_table (sid int, sname string) partitioned by (gender string) row format delimited fields terminated by ','; 向分區表中插入數據： hive> insert into table partition_table partition(gender='M') select sid, sname from sampledata where gender='M'; hive> insert into table partition_table partition(gender='F') select sid, sname from sampledata where gender='F'; 從內部表解析比從分區表解析效率低： 內部表： hive> explain select * from sampledata where gender='M'; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0
    Fetch Operator limit: -1 Processor Tree: TableScan alias: sampledata Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (gender = 'M') (type: boolean) Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: sid (type: int), sname (type: string), 'M' (type: string), language (type: int), math (type: int), english (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE ListSink Time taken: 0.046 seconds, Fetched: 20 row(s) 分區表： hive> explain select * from partition_table where gender='M'; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0
    Fetch Operator limit: -1 Processor Tree: TableScan alias: partition_table Statistics: Num rows: 2 Data size: 13 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: sid (type: int), sname (type: string), 'M' (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 2 Data size: 13 Basic stats: COMPLETE Column stats: NONE ListSink Time taken: 0.187 seconds, Fetched: 17 row(s) ============================================================================================= Hive的數據模型_外部表 外部表(External Table) -指向已經在HDFS中存在的數據，可以創建Partition -它和內部表在元數據的組織上是相同的，而實際數據的存儲則有較大的差異。 -外部表侄有一個過程，加載數據和創建表同時完成，並不會移動到數據倉庫目錄中，只是與外部數據建立一個鏈接。當刪除一個外部表時，僅刪除該鏈接。 1、准備幾張相同數據結構的數據txt文件，放在HDFS的/input 目錄下。 2、在hive下創建一張有相同數據結構的外部表external_student，location設置為HDFS的/input 目錄。則external_student會自動關連/input 下的文件。 3、查詢外部表。 4、刪除/input目錄下的部分文件。 5、查詢外部表。刪除的那部分文件數據不存在。 6、將刪除的文件放入/input目錄。 7、查詢外部表。放入的那部分文件數據重現。 (1)准備數據： student1.txt 1,Tom,M,60,80,96
2,Mary,F,11,22,33 student2.txt 3,Jerry,M,90,11,23 student3.txt 4,Rose,M,78,77,76
5,Mike,F,99,98,98 # hdfs dfs -ls / # hdfs dfs -mkdir /input 將文件放入HDFS文件系統 hdfs dfs -put localFileName hdfsFileDir # hdfs dfs -put student1.txt /input # hdfs dfs -put student2.txt /input # hdfs dfs -put student3.txt /input (2)創建外部表 create table external_student (sid int, sname string, gender string, language int, math int, english int) row format delimited fields terminated by ',' location '/input'; (3)查詢外部表 select * from external_student; (4)刪除HDFS上的student1.txt # hdfs dfs -rm /input/student1.txt (5)查詢外部表 select * from external_student; (6)將student1.txt 重新放入HDFS input目錄下 # hdfs dfs -put student1.txt /input (7)查詢外部表 select * from external_student; ============================================================================================= Hive的數據模型_桶表 對數據進行HASH運算，放在不同文件中，降低熱塊，提高查詢速度。 例如：根據sname進行hash運算存入5個桶中。 create table bucket_table (sid int, sname string, age int) clustered by (sname) into 5 buckets; ============================================================================================= Hive的數據模型_視圖 -視圖是一種虛表，是一個邏輯概念；可以跨越多張表 -視圖建立在已有表的基礎上，視圖賴以建立的這些表稱為基表。 -視圖可以簡化復雜的查詢。 創建視圖 create view viewName as
select data from table where condition; 查看視圖結構 desc viewName; 查詢視圖 select * from viewName;

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive之數據模型 Hive的數據模型及各模塊的應用場景大數據時代的技術hive：hive的數據類型和數據模型 Hive_Hive的管理_CLI方式 Hive_Hive的管理_web界面方式一文弄懂Hive基本架構和原理——Hive元數據信息存儲在Hive MetaStore中，Hive 中所有的數據都存儲在 HDFS 中，Hive 中數據模型：Table，External Table，Partition，Bucket;最后將一個SQL變成hadoop MapReduce作業 Hive支持的數據類型匯總 Hive命令匯總 Hive知識匯總數據模型