ElasticSearch基礎入門學習筆記

本文轉載自查看原文 2020-02-20 16:16 862 ElasticSearch

前言

本筆記的內容主要是在從0開始學習ElasticSearch中，按照官方文檔以及自己的一些測試的過程。

安裝

由於是初學者，按照官方文檔安裝即可。前面ELK入門使用主要就是講述了安裝過程，這里不再贅述。

學習教程

找了很久，文檔大多比較老。即使是官方文檔也是基於2.x介紹的，官網最新已經演進到6了。不過基礎入門還是可以的。接下來將參照官方文檔來學習。

安裝好ElasticSearch和Kibana之后. 打開localhost:5601, 選擇Dev Tools。

索引(存儲)雇員文檔

測試的數據源是公司雇員的信息列表。其中，每個雇員的信息叫做一個文檔，添加一條信息叫做索引一個文檔。

在console里輸入

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

megacorp 是索引名稱
employee 是類型名稱
1 是id，同樣是雇員的id

光標定位到第一行，點擊綠色按鈕執行。

這個是簡化的存入快捷方式, 其本質還是通過ES提供的REST API來實現的。上述可以用postman或者curl來實現，域名為ES的地址，即localhost:9200。對於postman，get方法不允許傳body，用post也可以。

這樣就將一個文檔存入了ES。接下來，多存儲幾個

PUT /megacorp/employee/2
{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}

PUT /megacorp/employee/3
{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}

然后，我們可以去查看，點擊Management，Index Patterns，Configure an index pattern，輸入megacorp，確定。

點擊Discover, 就可以看到我們存儲的信息了。

檢索文檔

存入數據后，想要查詢出來。查詢id為1的員工。

GET /megacorp/employee/1

返回：
{
  "_index": "megacorp",
  "_type": "employee",
  "_id": "1",
  "_version": 5,
  "found": true,
  "_source": {
    "first_name": "John",
    "last_name": "Smith",
    "age": 25,
    "about": "I love to go rock climbing",
    "interests": [
      "sports",
      "music"
    ]
  }
}

區別於保存一條記錄，只是http method不同。

put 添加
get 獲取
delete 刪除
head 查詢是否存在
想要更新，再次put即可

輕量搜索

我們除了findById，最常見就是條件查詢了。

先來查看所有：

GET /megacorp/employee/_search

對了，可以查看記錄個數count

GET /megacorp/employee/_count

想要查看last_name是Smith的

GET /megacorp/employee/_search?q=last_name:Smith

加一個參數q，字段名:Value的形式查詢。

查詢表達式

Query-string 搜索通過命令非常方便地進行臨時性的即席搜索，但它有自身的局限性（參見輕量搜索）。Elasticsearch 提供一個豐富靈活的查詢語言叫做查詢表達式，它支持構建更加復雜和健壯的查詢。

領域特定語言（DSL），指定了使用一個 JSON 請求。我們可以像這樣重寫之前的查詢所有 Smith 的搜索

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}

更復雜的查詢

繼續修改上一步的查詢

GET /megacorp/employee/_search
{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "last_name" : "smith" 
                }
            },
            "filter": {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            }
        }
    }
}

多了一個range過濾，要求age大於30.

結果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      }
    ]
  }
}

全文檢索

截止目前的搜索相對都很簡單：單個姓名，通過年齡過濾。現在嘗試下稍微高級點兒的全文搜索--一項傳統數據庫確實很難搞定的任務。

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}

結果

{
  "took": 32,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.53484553,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.53484553,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.26742277,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 32,
          "about": "I like to collect rock albums",
          "interests": [
            "music"
          ]
        }
      }
    ]
  }
}

有個排序，以及是分數_score。可以看到只有一個字母匹配到的也查出來了. 如果我們想完全匹配, 換一個種查詢.

match_phrase 會完全匹配短語.

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}

我們百度搜索的時候, 命中的關鍵字還會高亮, es也可以返回高亮的位置.

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        },
        "highlight": {
          "about": [
            "I love to go <em>rock</em> <em>climbing</em>"
          ]
        }
      }
    ]
  }
}

聚合計算Group by

在sql里經常遇到統計的計算, 比如sum, count, avg. es可以這樣:

GET /megacorp/employee/_search
{
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}

aggs表示聚合, all_interests是返回的變量名稱, terms 表示count計算. 這個語句的意思是, 對interests進行count統計. 然后, es可能會返回:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "megacorp",
        "node": "iqHCjOUkSsWM2Hv6jT-xUQ",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status": 400
}

意思是,對字符的統計, 需要開啟一個設置fielddata=true.

這就需要修改index設置了, 相當於修改關系型數據庫表結構.

修改index mapping

我們先來查看一個配置:

GET /megacorp/employee/_mapping

結果:

{
  "megacorp": {
    "mappings": {
      "employee": {
        "properties": {
          "about": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "age": {
            "type": "long"
          },
          "first_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "interests": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "last_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

簡單可以看出是定義了各個字段類型. 上個問題是需要增加一個配置

"fielddata": true

更新方法如下:


PUT /megacorp/employee/_mapping
{
        "properties": {
          "about": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "age": {
            "type": "long"
          },
          "first_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "interests": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            },
            "fielddata": true
          },
          "last_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }

{
  "acknowledged": true
}

表示更新成功了. 然后可以繼續我們之前的聚合計算了.

聚合計算 group by count

對於sql類似於

select interests, count(*) from index_xxx
where last_name = 'smith'
group by interests.

在es里可以這樣查詢:

GET /megacorp/employee/_search
{
  "_source": false,
  "query": {
    "match": {
      "last_name": "smith"
    }
  },
    "size": 0,
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
  }
}

_source=false 是為了不返回hit命中的item的屬性, 默認true.

"size": 0,表示不返回hits. 默認會返回所有的行, 我們不需要, 我們只要返回統計結果.

aggs表示一個聚合操作.

all_interests是自定義的一個變量名稱, 可以隨便寫一個.

terms 表示進行count操作, 對應的字段是interests.

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "all_interests": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "music",
          "doc_count": 2
        },
        {
          "key": "sports",
          "doc_count": 1
        }
      ]
    }
  }
}

可以得到需要的字段的count. 同樣可以計算sum, avg.



GET /megacorp/employee/_search
{
    "_source": false, 
    "size": 0, 
    "aggs" : {
        "avg_age" : {
            "avg" : { "field" : "age" }
        },
        "sum_age" : {
            "sum" : { "field" : "age" }
        }
    }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avg_age": {
      "value": 30.666666666666668
    },
    "sum_age": {
      "value": 92
    }
  }
}

總結

上述是官方文檔的第一節, 基礎入門. 這里只是摘抄和實現了一遍. 沒做更多的突破,但增加了個人理解. 可以知道es基本怎么用了. 更多更詳細的語法后面慢慢來.

參考

https://www.elastic.co/guide/cn/elasticsearch/guide/current/_search_with_query_dsl.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 .net Elasticsearch 學習入門筆記 tensorflow學習筆記二：入門基礎 ElasticSearch 連載一基礎入門 ElasticSearch(ES)學習筆記 Elasticsearch學習筆記（一） Spark (Python版) 零基礎學習筆記（一）—— 快速入門 Hadoop學習筆記—2.不怕故障的海量存儲：HDFS基礎入門 VS2013中Python學習筆記[基礎入門] 卷積神經網絡(CNN)學習筆記1：基礎入門 ElasticSearch7.10.0入門學習