【原創】大叔經驗分享(26)hive通過外部表讀寫elasticsearch數據


hive通過外部表讀寫elasticsearch數據,和讀寫hbase數據差不多,差別是需要下載elasticsearch-hadoop-hive-6.6.2.jar,然后使用其中的EsStorageHandler;

Connect the massive data storage and deep processing power of Hadoop with the real-time search and analytics of Elasticsearch. The Elasticsearch-Hadoop (ES-Hadoop) connector lets you get quick insight from your big data and makes working in the Hadoop ecosystem even better.

官方:https://www.elastic.co/products/hadoop
下載:https://www.elastic.co/downloads/hadoop

 

目前最新的版本是6.6.2

# wget https://artifacts.elastic.co/downloads/elasticsearch-hadoop/elasticsearch-hadoop-6.6.2.zip
# unzip elasticsearch-hadoop-6.6.2.zip

使用其中的elasticsearch-hadoop-6.6.2/dist/elasticsearch-hadoop-hive-6.6.2.jar

add jar /path/to/elasticsearch-hadoop-hive-6.6.2.jar;

CREATE EXTERNAL TABLE hive_elasticsearch_table (
id string,
name string,
desc string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes' = '$es_server1:9200,$es_server2:9200',
'es.index.auto.create' = 'false',
'es.resource' = 'testdoc/testtype',
'es.read.metadata' = 'true',
'es.mapping.names' = 'id:_metadata._id, name:name, desc:desc');

 主要是配置es.nodes、es.resource和es.mapping.names,一個是es服務器地址,一個是index名和type名,一個是hive字段和es字段的一一映射,然后就可以在hive中讀寫es數據:

select * from hive_elasticsearch_table limit 10;
insert into table hive_elasticsearch_table select '2', 'testname', 'testdesc';

但是這樣發現id是被hash過的

+------------------------------+--------------------------------+--------------------------------+--+
| hive_elasticsearch_table.id | hive_elasticsearch_table.name | hive_elasticsearch_table.desc |
+------------------------------+--------------------------------+--------------------------------+--+
| 6mpoc2gBohlnD12tvBoF | testname | testdesc |
+------------------------------+--------------------------------+--------------------------------+--+

還需要再加一個es.mapping.id,定義哪個字段是document的id

CREATE EXTERNAL TABLE hive_elasticsearch_table (
id string,
name string,
desc string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes' = '$es_server1:9200,$es_server2:9200',
'es.index.auto.create' = 'false',
'es.resource' = 'testdoc/testtype',
'es.read.metadata' = 'true',
'es.mapping.id' = 'id',
'es.mapping.names' = 'id:_metadata._id, name:name, desc:desc');

這次正常了

+------------------------------+--------------------------------+--------------------------------+--+
| hive_elasticsearch_table.id | hive_elasticsearch_table.name | hive_elasticsearch_table.desc |
+------------------------------+--------------------------------+--------------------------------+--+
| 6mpoc2gBohlnD12tvBoF | testname | testdesc |
| 4 | hello | world |
+------------------------------+--------------------------------+--------------------------------+--+

 

關於字段類型映射,詳見:https://www.elastic.co/guide/en/elasticsearch/hadoop/current/mapping.html 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM