https://blog.csdn.net/qq_36743482/article/details/78393678
内部表&外部表
未被external修饰的是内部表(managed table),被external修饰的为外部表(external table);
区别:
内部表数据由Hive自身管理,外部表数据由HDFS管理;
内部表数据存储的位置是hive.metastore.warehouse.dir(默认:/user/hive/warehouse),外部表数据的存储位置由自己制定;
删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除;
对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)
官网接受
以下是官网中关于external表的介绍:
A table created without the EXTERNAL clause is called a managed table because Hive manages its data.
Managed and External Tables
By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /apps/hive/warehouse/databasename.db/tablename/. The default location can be overridden by the location property during table creation. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration.
Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.
An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an MSCK REPAIR TABLE table_name statement can be used to refresh metadata information.
Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.
Managed or external tables can be identified using the DESCRIBE FORMATTED table_name command, which will display either MANAGED_TABLE or EXTERNAL_TABLE depending on table type.
Statistics can be managed on internal and external tables and partitions for query optimization.
(没有EXTERNAL子句创建的表称为托管表,因为Hive管理其数据。
托管表和外部表
默认情况下,Hive创建托管表,其中文件,元数据和统计信息由内部Hive进程管理。 托管表存储在hive.metastore.warehouse.dir路径属性下,默认情况下存储在类似于/apps/hive/warehouse/databasename.db/tablename/的文件夹路径中。 在创建表的过程中,默认位置可以被location属性覆盖。 如果删除了托管表或分区,则将删除与该表或分区关联的数据和元数据。 如果未指定PURGE选项,则数据将在定义的持续时间内移至废纸folder文件夹。
当Hive应该管理表的生命周期或生成临时表时,请使用托管表。
外部表描述了外部文件上的元数据/架构。
外部表文件可以由Hive外部的进程访问和管理。 外部表可以访问存储在诸如Azure存储卷(ASV)或远程HDFS位置的源中的数据。 如果更改了外部表的结构或分区,则可以使用MSCK REPAIR TABLE table_name语句刷新元数据信息。
当文件已经存在或位于远程位置时,请使用外部表,并且即使表已删除,文件也应保留。
可以使用DESCRIBE FORMATTED table_name命令来标识托管表或外部表,该命令将根据表类型显示MANAGED_TABLE或EXTERNAL_TABLE。
可以在内部和外部表和分区上管理统计信息以优化查询。)
Hive官网介绍:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribeTable/View/Column