Hive內部表和外部表


https://blog.csdn.net/qq_36743482/article/details/78393678  

內部表&外部表
未被external修飾的是內部表(managed table),被external修飾的為外部表(external table);
區別:
內部表數據由Hive自身管理,外部表數據由HDFS管理;
內部表數據存儲的位置是hive.metastore.warehouse.dir(默認:/user/hive/warehouse),外部表數據的存儲位置由自己制定;
刪除內部表會直接刪除元數據(metadata)及存儲數據;刪除外部表僅僅會刪除元數據,HDFS上的文件並不會被刪除;
對內部表的修改會將修改直接同步給元數據,而對外部表的表結構和分區進行修改,則需要修復(MSCK REPAIR TABLE table_name;)

官網接受
以下是官網中關於external表的介紹:

A table created without the EXTERNAL clause is called a managed table because Hive manages its data.
Managed and External Tables
By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /apps/hive/warehouse/databasename.db/tablename/. The default location can be overridden by the location property during table creation. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration.
Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.
An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an MSCK REPAIR TABLE table_name statement can be used to refresh metadata information.
Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.
Managed or external tables can be identified using the DESCRIBE FORMATTED table_name command, which will display either MANAGED_TABLE or EXTERNAL_TABLE depending on table type.
Statistics can be managed on internal and external tables and partitions for query optimization.

(沒有EXTERNAL子句創建的表稱為托管表,因為Hive管理其數據。

托管表和外部表
默認情況下,Hive創建托管表,其中文件,元數據和統計信息由內部Hive進程管理。 托管表存儲在hive.metastore.warehouse.dir路徑屬性下,默認情況下存儲在類似於/apps/hive/warehouse/databasename.db/tablename/的文件夾路徑中。 在創建表的過程中,默認位置可以被location屬性覆蓋。 如果刪除了托管表或分區,則將刪除與該表或分區關聯的數據和元數據。 如果未指定PURGE選項,則數據將在定義的持續時間內移至廢紙folder文件夾。
當Hive應該管理表的生命周期或生成臨時表時,請使用托管表。
外部表描述了外部文件上的元數據/架構。

外部表文件可以由Hive外部的進程訪問和管理。 外部表可以訪問存儲在諸如Azure存儲卷(ASV)或遠程HDFS位置的源中的數據。 如果更改了外部表的結構或分區,則可以使用MSCK REPAIR TABLE table_name語句刷新元數據信息。
當文件已經存在或位於遠程位置時,請使用外部表,並且即使表已刪除,文件也應保留。
可以使用DESCRIBE FORMATTED table_name命令來標識托管表或外部表,該命令將根據表類型顯示MANAGED_TABLE或EXTERNAL_TABLE。
可以在內部和外部表和分區上管理統計信息以優化查詢。)

 

Hive官網介紹:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribeTable/View/Column

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM