Hive 各版本關鍵新特性(Key New Feature)介紹


Hive 各版本關鍵新特性(Key New Feature)介紹

https://my.oschina.net/leejun2005/blog/272188

開源世界里的代碼受社區推動和極客文化的影響,變化一直都很快。這點在 hadoop 生態圈里表現尤為突出,不過這也與 hadoop 得到業界的廣泛應用以及各種需求推動密不可分(近幾年大數據、雲計算被炒爛的節奏 哈哈~)。生態圈里各個組件各種 bug、改進、新特性滿天飛,剛看到下面某同學整理的 hadoop 版本變遷圖之后,感覺也有必要整理下 hive 的新特性演進史,以備忘。

1、Hive 0.8.0

添加 Bitmap Indexes、TIMESTAMP datatype、Plugin Developer Kit、JDBC Driver Improvements 等新特性

該版本年代久遠了,就不詳述了~

具體請參考:http://blog.cloudera.com/blog/2011/11/coming-attractions-apache-hive-0-8-0/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&version=12316178

2、Hive 0.9.0

1. 支持CREATE OR REPLACE VIEW
2. 增加錯誤提示
3. 支持NOT IN 和 NOT LIKE
4. Ctrl+c將會提交kill命令,kill掉當前運行的query job,並且不會退出hive cli
5. 輸出map數和reduce數
6. 提升"select xx,xx from xxx LIMIT xxx"性能
7. 支持BETWEEN操作
8. PRINTF()函數
9. COALESCE/UNION ALL操作時候對數據類型寬限
10. 增加TIMESTAMP數據類型
11. 增加"INSERT OVERWRITE TABLE X PARTITION (a=b, c=d) IF NOT EXISTS ..."操作,如果分區存在,則不會動.
12. 提升hive任務提交后任務編譯和啟動的性能。
具體請參考:Whats new in Apache Hive 0.9.0

https://cwiki.apache.org/confluence/download/attachments/27362054/WhatsNewInHive090HadoopSummit2012BoF.pdf?version=1&modificationDate=1339872131000

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&version=12317742

3、Hive 0.10.0

 

Cube and Rollup: Hive now has support for creating cubes with rollups. Thanks to Namit!

List Bucketing: This is an optimization that lets you better handle skew in your tables. Thanks to Gang!

Better Windows Support: Several Hive 0.10.0 fixes support running Hive natively on Windows. There is no more cygwin dependency. Thanks to Kanna!

Explain’ Adds More Info: Now you can do an explain dependency and the explain plan will contain all the tables and partitions touched upon by the query. Thanks to Sambavi!

Improved Authorization: The metastore can now optionally do authorization checks on the server side instead of on the client, providing you with a better security profile. Thanks to Sushanth!

Faster Simple Queries: Some simple queries that don’t require aggregations, and therefore MapReduce jobs, can now run faster.Thanks to Navis!

Better YARN Support: This release contains additional work aimed at making Hive work well with Hadoop YARN. While not all test cases are passing yet, there has been a lot of good progress made with this release. Thanks to Zhenxiao!

Union Optimization: Hive queries with unions will now result in a lower number of MapReduce jobs under certain conditions. Thanks to Namit!

Undo Your Drop Table: While not really truly ‘undo’, you can now reinstate your table after dropping it. Thanks to Andrew!

Show Create Table: The lets you see how you created your table. Thanks to Feng!

Support for Avro Data: Hive now has built-in support for reading/writing Avro data. Thanks to Jakob!

Skewed Joins: Hive’s support for joins involving skewed data is now improved. Thanks to Namit!

Robust Connection Handling at the Metastore Layer: Connection handling between a metastore client and server  and also between a metastore server and the database layer has been improved. Thanks to Bhushan and Jean!

More Statistics: Its now possible to collect and store scalar-valued statistics for your tables and partitions. This will enable better query planning in upcoming releases. Thanks to Shreepadma!

Better-Looking HWI : HWI now uses a bootstrap javascript library. It looks really slick.

具體請參考: http://zh.hortonworks.com/blog/apache-hive-0-10-0-is-now-available/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12320745&styleName=Text&projectId=12310843

https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup

4、Hive 0.11.0

 

  • ORCFile.  It’s Optimized.
    The ORC File (Optimized RC File) presents key new features that speed access of data Apache Hive as it adds meta information at the file and block data level so that queries can be more intelligent and use meta data to optimize access.  Further, with the ORC file, only the bytes from the required columns are read from HDFS which minimizes I/O and speeds the query chain.  These are major advances for improved performance in Hive.

  • Improved Data Types
    As Apache Hive marches towards full SQL-compatibility, an update to the decimal data type was made more usable.

  • Analytic Functions
    Hive 0.11 introduces windowing functions for RANK, LEAD/LAG, ROW_NUMBER, FIRST_VALUE, LAST_VALUE and more. It also introduces aggregate OVER functions with PARTITION BY and ORDER BY

  • Joins improved in Hive 0.11
    Both the broadcast join and the SMB join were improved considerably in Hive 0.11.  Both joins work without user hints, so that the Hive optimizer now picks the correct join rather than depending on the user to do so. More broadcast joins are now packed into a single MapReduce job, making star join queries much more efficient.

  • Implement HiveServer2

  • when output hive table to file,users should could have a separator of their own choice

具體請參考:http://zh.hortonworks.com/blog/apache-hive-0-11-stinger-phase-1-delivered/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323587&styleName=Text&projectId=12310843

5、Hive 0.12.0

Hive12deux

 

具體請參考:http://zh.hortonworks.com/blog/announcing-apache-hive-0-12/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12324312&styleName=Text&projectId=12310843

6、Hive 0.13.0

hivesidebar

具體請參考:http://zh.hortonworks.com/blog/announcing-apache-hive-0-13-completion-stinger-initiative/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12324986&styleName=Text&projectId=12310843

7、Hive 0.14.0

 

[HIVE-5317] - Implement insert, update, and delete in Hive with full ACID support

[HIVE-5775] - Introduce Cost Based Optimizer to Hive

[HIVE-5823] - Support for DECIMAL primitive type in AvroSerDe

[HIVE-6455] - Scalable dynamic partitioning and bucketing optimization

[HIVE-6469] - skipTrash option in hive command line

[HIVE-6806] - CREATE TABLE should support STORED AS AVRO

[HIVE-7036] - get_json_object bug when extract list of list with index

[HIVE-7054] - Support ELT UDF in vectorized mode

[HIVE-7068] - Integrate AccumuloStorageHandler

[HIVE-7090] - Support session-level temporary tables in Hive

[HIVE-7158] - Use Tez auto-parallelism in Hive

[HIVE-7203] - Optimize limit 0

[HIVE-7255] - Allow partial partition spec in analyze command

[HIVE-7299] - Enable metadata only optimization on Tez

[HIVE-7341] - Support for Table replication across HCatalog instances

[HIVE-7390] - Make single quote character optional and configurable in BeeLine CSV/TSV output

[HIVE-7416] - provide context information to authorization checkPrivileges api call

[HIVE-7430] - Implement SMB join in tez

[HIVE-7446] - Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

[HIVE-7506] - MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

[HIVE-7509] - Fast stripe level merging for ORC

[HIVE-7547] - Add ipAddress and userName to ExecHook

[HIVE-7587] - Fetch aggregated stats from MetaStore

[HIVE-7654] - A method to extrapolate columnStats for partitions of a table

[HIVE-7826] - Dynamic partition pruning on Tez

[HIVE-8531] - Fold is not null filter if there are other comparison filter present on same column

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12326450&styleName=Text&projectId=12310843

8、hive 1.0

該版本無新特性

9、hive 1.1

 

[HIVE-3405] - UDF initcap to obtain a string with the first letter of each word in uppercase other letters in lowercase

[HIVE-7122] - Storage format for create like table

[HIVE-8435] - Add identity project remover optimization

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&styleName=Text&version=12329363

10、hive 1.2

 

[HIVE-7998] - Enhance JDBC Driver to not require class specification

[HIVE-9039] - Support Union Distinct

[HIVE-9188] - BloomFilter support in ORC

[HIVE-9277] - Hybrid Hybrid Grace Hash Join

[HIVE-9302] - Beeline add commands to register local jdbc driver names and jars

[HIVE-9780] - Add another level of explain for RDBMS audience

[HIVE-10038] - Add Calcite's ProjectMergeRule.

[HIVE-10099] - Enable constant folding for Decimal

[HIVE-10591] - Support limited integer type promotion in ORC

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329345&styleName=Text&projectId=12310843

11、Hive 2.0

  • [HIVE-686] - add UDF substring_index

  • [HIVE-3404] - Create quarter UDF

  • [HIVE-7926] - long-lived daemons for query fragment execution, I/O and caching

  • [HIVE-10591] - Support limited integer type promotion in ORC

  • [HIVE-10592] - ORC file dump in JSON format

  • [HIVE-10673] - Dynamically partitioned hash join for Tez

  • [HIVE-10761] - Create codahale-based metrics system for Hive

  • [HIVE-10785] - Support aggregate push down through joins

  • [HIVE-11103] - Add banker's rounding BROUND UDF

  • [HIVE-11461] - Transform flat AND/OR into IN struct clause

  • [HIVE-11488] - Add sessionId and queryId info to HS2 log

  • [HIVE-11593] - Add aes_encrypt and aes_decrypt UDFs

  • [HIVE-11600] - Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())

  • [HIVE-11684] - Implement limit pushdown through outer join in CBO

  • [HIVE-11699] - Support special characters in quoted table names

  • [HIVE-11706] - Implement "show create database"

  • [HIVE-11775] - Implement limit push down through union all in CBO

  • [HIVE-11785] - Support escaping carriage return and new line for LazySimpleSerDe

  • [HIVE-11976] - Extend CBO rules to being able to apply rules only once on a given operator

  • [HIVE-12080] - Support auto type widening (int->bigint & float->double) for Parquet table

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12332641&styleName=Text&projectId=12310843

Refer:

[1] hive0.80, 0.90新特性  http://superlxw1234.iteye.com/blog/1564461

[2] hive 0.10 0.11新增特性綜述  http://blog.csdn.net/lalaguozhe/article/details/11730817

[3] http://hive.apache.org/downloads.html

[4] Hive未來兩年的路線圖  http://www.infoq.com/cn/news/2014/09/hive

(1)支持ACID事務——用戶將可以插入、更新和刪除現有數據。Hive將由傳統的一次寫入、頻繁讀取的系統發展為一個支持變化數據分析的系統。
(2)實現亞秒級查詢——用戶可以將Hive用於像交互式儀表板和探究性分析這樣對響應時間有更高要求的應用場景。
(3)全面支持SQL:2011 Analytics——用戶可以使用標准SQL在Hive上部署復雜的報表,而且更快捷、更簡便、更可靠。而基於成本的、功能強大的優化器可以確保工具生成的查詢和復雜查詢的運行速度。屆時,Hive將在Hadoop上提供企業級SQL用戶所享有的全部表達能力。它將在支持窗口函數、用戶自定義函數、子查詢、Rollup、Cube、標准聚集、內連接、外連接、半連接和交叉連接的基礎上,增加對不等連接、集合函數(並、交、差)、時間間隔類型等的支持。
Stinger.next計划用時18個月,將分三個階段交付。事務支持將於2014年底發布,亞秒級查詢將在2015年上半年推出,而對SQL:2011 Analytics的全面支持則將於2015年底完成。
此外,Hive還將與機器學習框架Spark集成,使用戶可以通過Hive運行機器學習模型。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM