ElasticSearch入門系列（三）文檔，索引，搜索和聚合

本文轉載自查看原文 2016-08-05 09:08 6335 ElasticSearch/ Elasticsearch

一、文檔

在實際使用中的對象往往擁有復雜的數據結構

Elasticsearch是面向文檔的，這意味着他可以存儲整個對象或文檔，然而他不僅僅是存儲，還會索引每個文檔的內容使之可以被搜索，在Elasticsearch中可以對文檔進行索引、搜索、排序、過濾。

Elasticsearch使用JSON作為文檔序列化格式。

使用json表示一個用戶對象：

{
    "email": "john@smith.com", "first_name": "John", "last_name": "Smith", "info": { "bio": "Eco-warrior and defender of the weak", "age": 25, "interests": [ "dolphins", "whales" ] }, "join_date": "2014/05/01" }

經原始的user對象很復雜但他的結構和對象的含義已經被完整的體現在JSON中

簡單的開始教程：建立員工搜索目錄

二、索引

首先要做的是存儲員工數據，每個文檔代表一個員工，在ElasticSearch中存儲數據的行為叫做索引，不過在索引之前，需要明確數據應該存儲在哪里。

在elasticsearch中，文檔歸屬於一種類型，而這些類型存在於索引中

elasticsearch與傳統數據庫的比較

Relational DB ->Databases ->Tables -> Rows ->Columns

Elasticsearch -> Indices ->Types -> Documents ->Fields

Elasticsearch集群可以包含多個索引（indices）(數據庫)，每一個索引可以包含多個類型（type），一個類型包含多個文檔（documents）(行)，然后每個文檔包含多個字段（fields）（列）

默認情況下，文檔中的所有字段都會被索引（擁有一個倒排索引），只有這樣他們才是可被搜索的。

因此為了做上述的員工目錄，我們將做如下操作：

為每個員工的文檔（document）建立索引，每個文檔包含了相應員工的所有信息

每個文檔的類型為employee

employee類型歸屬於索引megacorp

megacorp索引存儲在ElasticSearch集群中

PUT /megacorp/employee/1 { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] }

我們看到path：/magecorp/employee/1包含三部分信息：

megacorp 索引名

employee 類型名

1 這個員工的ID

請求實體（JSON文檔包含了這個員工的所有信息。

我們不需要用做額外的管理操作，比如創建索引或者定義每個字段的數據類型，我們能夠直接索引文檔，Elasticsearch已經內置所有的缺省設置，所有管理操作都是透明的。

按照統一的樣式加入更多的員工信息、

三、檢索

現在Elasticsearch中已經存儲了一些數據。

①：檢索單個員工的信息：執行HTTP GET請求並指出文檔的“地址”--索引、類型和ID

GET /megacorp/employee/1 響應結果中包含一些文檔的元信息

{
  "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] } }

我們通過HTTP方法GET來檢索翁當，同樣，我們可以使用DELETE方法刪除文檔，使用HEAD方法檢查某文檔是否存在，如果想要更新已存在的文文檔，我們只需再PUT一次。

②：搜索全部的員工

GET /megacorp/employee/_search 默認返回前10個結果：

{
   "took": 6, "timed_out": false, "_shards": { ... }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "3", "_score": 1, "_source": { "first_name": "Douglas", "last_name": "Fir", "age": 35, "about": "I like to build cabinets", "interests": [ "forestry" ] } }, { "_index": "megacorp", "_type": "employee", "_id": "1", "_score": 1, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_index": "megacorp", "_type": "employee", "_id": "2", "_score": 1, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }

響應內容不僅會告訴我們哪些文檔被匹配到，而且這些文檔內容完整的被包含在其中

③：搜索姓氏中包含Smith的員工。我們要用到查詢字符串（query string）搜索

GET /megacorp/employee/_search?q=last_name:Smith

請求中依舊使用_search關鍵字，然后將查詢語句傳遞給參數q=

{
   ...
   "hits": { "total": 2, "max_score": 0.30685282, "hits": [ { ... "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { ... "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }

④：使用DSL語句查詢

查詢字符串搜索便於通過命令行完成特定的搜索，但是他也有局限性，Elasticsearch提供豐富且靈活的查詢語言叫做DSL查詢（Query DSL）它允許構建更加復雜、強大的查詢、

DSL(Domain Specific Language特定領域語言)以JSON請求體的形式出現，例如將之前查詢姓氏Smith的方法變為：

GET /megacorp/employee/_search
{
    "query" : { "match" : { "last_name" : "Smith" } } }

與之前結果一樣，只是不再使用查詢字符串作為參數，而是使用請求體代替，其中使用了match語句。

⑤：復雜的查詢

修改上例為查詢姓氏Smith並且年齡大於30歲的員工，我們的語句將添加過濾器。

GET /megacorp/employee/_search
{
    "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 30 } <1> } }, "query" : { "match" : { "last_name" : "smith" <2> } } } } }

<1>這部分查詢屬於區間過濾器，他用於查找所有年齡大於30歲的數據

<2>這部分查詢與之前的match語句一致

結果顯示為：

{
   ...
   "hits": { "total": 1, "max_score": 0.30685282, "hits": [ { ... "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }

⑥：全文搜索

以上的搜索都很簡單：搜索特定的名字，通過年齡篩選。以下我們來看全文搜索。

比如我們搜索所有喜歡“rock climbing”的員工

GET /megacorp/employee/_search
{
    "query" : { "match" : { "about" : "rock climbing" } } }

使用了之前的match查詢

結果為：

{
   ...
   "hits": { "total": 2, "max_score": 0.16273327, "hits": [ { ... "_score": 0.16273327, <1> "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { ... "_score": 0.016878016, <2> "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }

<1><2>為結果相關性評分

默認情況下，Elasticsearch根據結果相關性評分來對結果進行排序，所謂的結果相關性評分就是文檔與查詢條件的匹配程度

⑦：短語搜索

確切的匹配單詞或短語只要將match變為match_phrase查詢即可：

GET /megacorp/employee/_search
{
    "query" : { "match_phrase" : { "about" : "rock climbing" } } }

結果為：

{
   ...
   "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { ... "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } } ] } }

⑧：高亮我們的搜索

在之前的語句上增加highlight參數：

GET /megacorp/employee/_search
{
    "query" : { "match_phrase" : { "about" : "rock climbing" } }, "highlight": { "fields" : { "about" : {} } } }

結果為：並且用<em>標簽來標識匹配的單詞

{
   ...
   "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { ... "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] }, "highlight": { "about": [ "I love to go <em>rock</em> <em>climbing</em>" <1> ] } } ] } }

<1>原有文本中高亮的片段

四、聚合

Elasticsearch有一個功能叫做聚合（aggregations）他允許在數據上生成復雜的分析統計，就像SQL中的GROUP BY,但是功能上更強大。

比如查看員工中最大的共同點是什么

GET /megacorp/employee/_search
{
  "aggs": { "all_interests": { "terms": { "field": "interests" } } } }

結果：

{
   ...
   "hits": { ... }, "aggregations": { "all_interests": { "buckets": [ { "key": "music", "doc_count": 2 }, { "key": "forestry", "doc_count": 1 }, { "key": "sports", "doc_count": 1 } ] } } }

我們可以看到結果中匹配的數據。

如果我們要增加條件，比如增加姓氏為Smith的最大興趣愛好，只要加過濾就好：

GET /megacorp/employee/_search
{
  "query": { "match": { "last_name": "smith" } }, "aggs": { "all_interests": { "terms": { "field": "interests" } } } }

結果：

...
  "all_interests": { "buckets": [ { "key": "music", "doc_count": 2 }, { "key": "sports", "doc_count": 1 } ] }

聚合頁允許分級匯總，比如統計每種興趣下職工的平均年齡：

GET /megacorp/employee/_search
{
    "aggs" : { "all_interests" : { "terms" : { "field" : "interests" }, "aggs" : { "avg_age" : { "avg" : { "field" : "age" } } } } } }

結果：

...
  "all_interests": { "buckets": [ { "key": "music", "doc_count": 2, "avg_age": { "value": 28.5 } }, { "key": "forestry", "doc_count": 1, "avg_age": { "value": 35 } }, { "key": "sports", "doc_count": 1, "avg_age": { "value": 25 } } ] }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ElasticSearch搜索引擎的入門實戰全文搜索引擎 Elasticsearch 入門 Elasticsearch索引聚合Aggregation ElasticSearch聚合aggs入門全文搜索引擎 Elasticsearch 入門教程【Elasticsearch全文搜索引擎實戰】之Filebeat快速入門 ElasticSearch 2 (35) - 信息聚合系列之近似聚合 ElasticSearch 2 (33) - 信息聚合系列之聚合過濾 Elasticsearch系列（4）：基本搜索 Elasticsearch系列（5）：深入搜索