前言
本筆記的內容主要是在從0開始學習ElasticSearch中,按照官方文檔以及自己的一些測試的過程。
安裝
由於是初學者,按照官方文檔安裝即可。前面ELK入門使用主要就是講述了安裝過程,這里不再贅述。
學習教程
找了很久,文檔大多比較老。即使是官方文檔也是基於2.x介紹的,官網最新已經演進到6了。不過基礎入門還是可以的。接下來將參照官方文檔來學習。
安裝好ElasticSearch和Kibana之后. 打開localhost:5601, 選擇Dev Tools。
索引(存儲)雇員文檔
測試的數據源是公司雇員的信息列表。其中,每個雇員的信息叫做一個文檔,添加一條信息叫做索引一個文檔。
在console里輸入
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
- megacorp 是索引名稱
- employee 是類型名稱
- 1 是id,同樣是雇員的id
光標定位到第一行,點擊綠色按鈕執行。
這個是簡化的存入快捷方式, 其本質還是通過ES提供的REST API來實現的。上述可以用postman或者curl來實現,域名為ES的地址,即localhost:9200。對於postman,get方法不允許傳body,用post也可以。
這樣就將一個文檔存入了ES。接下來,多存儲幾個
PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
PUT /megacorp/employee/3
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
然后,我們可以去查看,點擊Management,Index Patterns,Configure an index pattern, 輸入megacorp
,確定。
點擊Discover, 就可以看到我們存儲的信息了。
檢索文檔
存入數據后,想要查詢出來。查詢id為1的員工。
GET /megacorp/employee/1
返回:
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_version": 5,
"found": true,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
}
區別於保存一條記錄,只是http method不同。
- put 添加
- get 獲取
- delete 刪除
- head 查詢是否存在
- 想要更新,再次put即可
輕量搜索
我們除了findById,最常見就是條件查詢了。
先來查看所有:
GET /megacorp/employee/_search
對了,可以查看記錄個數count
GET /megacorp/employee/_count
想要查看last_name是Smith的
GET /megacorp/employee/_search?q=last_name:Smith
加一個參數q,字段名:Value的形式查詢。
查詢表達式
Query-string 搜索通過命令非常方便地進行臨時性的即席搜索 ,但它有自身的局限性(參見 輕量 搜索 )。Elasticsearch 提供一個豐富靈活的查詢語言叫做 查詢表達式 , 它支持構建更加復雜和健壯的查詢。
領域特定語言 (DSL), 指定了使用一個 JSON 請求。我們可以像這樣重寫之前的查詢所有 Smith 的搜索
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
更復雜的查詢
繼續修改上一步的查詢
GET /megacorp/employee/_search
{
"query" : {
"bool": {
"must": {
"match" : {
"last_name" : "smith"
}
},
"filter": {
"range" : {
"age" : { "gt" : 30 }
}
}
}
}
}
多了一個range過濾,要求age大於30.
結果
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.2876821,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
]
}
}
全文檢索
截止目前的搜索相對都很簡單:單個姓名,通過年齡過濾。現在嘗試下稍微高級點兒的全文搜索--一項傳統數據庫確實很難搞定的任務。
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
結果
{
"took": 32,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.53484553,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.53484553,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.26742277,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
]
}
}
有個排序,以及是分數_score
。可以看到只有一個字母匹配到的也查出來了. 如果我們想完全匹配, 換一個種查詢.
match_phrase 會完全匹配短語.
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
我們百度搜索的時候, 命中的關鍵字還會高亮, es也可以返回高亮的位置.
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
返回
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.5753642,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</em>"
]
}
}
]
}
}
聚合計算Group by
在sql里經常遇到統計的計算, 比如sum, count, avg. es可以這樣:
GET /megacorp/employee/_search
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
aggs
表示聚合, all_interests
是返回的變量名稱, terms
表示count計算. 這個語句的意思是, 對interests
進行count統計. 然后, es可能會返回:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "megacorp",
"node": "iqHCjOUkSsWM2Hv6jT-xUQ",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
},
"status": 400
}
意思是,對字符的統計, 需要開啟一個設置fielddata=true
.
這就需要修改index設置了, 相當於修改關系型數據庫表結構.
修改index mapping
我們先來查看一個配置:
GET /megacorp/employee/_mapping
結果:
{
"megacorp": {
"mappings": {
"employee": {
"properties": {
"about": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"first_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"interests": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
簡單可以看出是定義了各個字段類型. 上個問題是需要增加一個配置
"fielddata": true
更新方法如下:
PUT /megacorp/employee/_mapping
{
"properties": {
"about": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"first_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"interests": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"fielddata": true
},
"last_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
返回:
{
"acknowledged": true
}
表示更新成功了. 然后可以繼續我們之前的聚合計算了.
聚合計算 group by count
對於sql類似於
select interests, count(*) from index_xxx
where last_name = 'smith'
group by interests.
在es里可以這樣查詢:
GET /megacorp/employee/_search
{
"_source": false,
"query": {
"match": {
"last_name": "smith"
}
},
"size": 0,
"aggs": {
"all_interests": {
"terms": {
"field": "interests"
}
}
}
}
_source=false
是為了不返回hit命中的item的屬性, 默認true.
"size": 0,
表示不返回hits. 默認會返回所有的行, 我們不需要, 我們只要返回統計結果.
aggs
表示一個聚合操作.
all_interests
是自定義的一個變量名稱, 可以隨便寫一個.
terms
表示進行count操作, 對應的字段是interests
.
返回:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"all_interests": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "music",
"doc_count": 2
},
{
"key": "sports",
"doc_count": 1
}
]
}
}
}
可以得到需要的字段的count. 同樣可以計算sum, avg.
GET /megacorp/employee/_search
{
"_source": false,
"size": 0,
"aggs" : {
"avg_age" : {
"avg" : { "field" : "age" }
},
"sum_age" : {
"sum" : { "field" : "age" }
}
}
}
返回
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"avg_age": {
"value": 30.666666666666668
},
"sum_age": {
"value": 92
}
}
}
總結
上述是官方文檔的第一節, 基礎入門. 這里只是摘抄和實現了一遍. 沒做更多的突破,但增加了個人理解. 可以知道es基本怎么用了. 更多更詳細的語法后面慢慢來.