Elasticsearch學習記錄(入門篇)


Elasticsearch學習記錄(入門篇)

1、 Elasticsearch的請求與結果

請求結構

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
  • VERB HTTP方法:GET, POST, PUT, HEAD, DELETE
  • PROTOCOL http或者https協議(只有在Elasticsearch前面有https代理的時候可用)
  • HOST Elasticsearch集群中的任何一個節點的主機名,如果是在本地的節點,那么就叫localhost
  • PORT Elasticsearch HTTP服務所在的端口,默認為9200
  • PATH API路徑(例如_count將返回集群中文檔的數量),PATH可以包含多個組件,例如_cluster/stats或者_nodes/stats/jvm
  • QUERY_STRING 一些可選的查詢請求參數,例如?pretty參數將使請求返回更加美觀易讀的JSON數據
    BODY 一個JSON格式的請求主體(如果請求需要的話)

PUT創建(索引創建)

$ curl -XPUT 'http://localhost:9200/megacorp/employee/3?pretty' -d ' 

{

"first_name" :  "Douglas",
"last_name" :   "Fir",
"age" :         35,
"about":        "I like to build cabinets",
"interests":  [ "forestry" ]

}

{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_version" : 1,
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : true
}

##GET請求(搜索)
###檢索文檔

$ curl -XGET 'http://localhost:9200/megacorp/employee/1?pretty'

{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
}
}

###簡單搜索
使用`megacorp`索引和`employee`類型,但是我們在結尾使用關鍵字\_search來取代原來的文檔ID。響應內容的hits數組中包含了我們所有的三個文檔。默認情況下搜索會返回前10個結果。

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty'

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [ "music" ]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about" : "I like to build cabinets",
"interests" : [ "forestry" ]
}
} ]
}
}

接下來,讓我們搜索姓氏中包含“Smith”的員工。我們將在命令行中使用輕量級的搜索方法。這種方法常被稱作查詢字符串(query string)搜索,因為我們像傳遞URL參數一樣去傳遞查詢語句:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?q=last_name:Smith&pretty'


{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.30685282,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [ "music" ]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.30685282,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
}
} ]
}
}

###使用DSL語句查詢
查詢字符串搜索便於通過命令行完成特定(ad hoc)的搜索,但是它也有局限性(參閱簡單搜索章節)。Elasticsearch提供豐富且靈活的查詢語言叫做DSL查詢(Query DSL),它允許你構建更加復雜、強大的查詢。

DSL(Domain Specific Language特定領域語言)以JSON請求體的形式出現。我們可以這樣表示之前關於“Smith”的查詢:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
'

###更復雜的搜索
我們讓搜索稍微再變的復雜一些。我們依舊想要找到姓氏為“Smith”的員工,但是我們只想得到年齡大於30歲的員工。我們的語句將添加過濾器(filter),它使得我們高效率的執行一個結構化搜索:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"filtered" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 } --<1>
}
},
"query" : {
"match" : {
"last_name" : "smith" --<2>
}
}
}
}
}
'


* <1> 這部分查詢屬於區間過濾器(range filter),它用於查找所有年齡大於30歲的數據——gt為"greater than"的縮寫。
* <2> 這部分查詢與之前的match語句(query)一致。

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.30685282,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [ "music" ]
}
} ]
}
}

###全文搜索
到目前為止搜索都很簡單:搜索特定的名字,通過年齡篩選。讓我們嘗試一種更高級的搜索,全文搜索——一種傳統數據庫很難實現的功能。

我們將會搜索所有喜歡“rock climbing”的員工:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
'

你可以看到我們使用了之前的`match`查詢,從`about`字段中搜索"**rock climbing**",我們得到了兩個匹配文檔:

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.16273327,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.16273327,<1>
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.016878016,<2>
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [ "music" ]
}
} ]
}
}


* <1><2> 結果相關性評分。

默認情況下,Elasticsearch根據結果相關性評分來對結果集進行排序,所謂的「結果相關性評分」就是文檔與查詢條件的匹配程度。很顯然,排名第一的`John Smith`的`about`字段明確的寫到“**rock climbing**”

但是為什么`Jane Smith`也會出現在結果里呢?原因是“**rock**”在她的abuot字段中被提及了。因為只有“**rock**”被提及而“**climbing**”沒有,所以她的`_score`要低於John。

###短語搜索
目前我們可以在字段中搜索單獨的一個詞,這挺好的,但是有時候你想要確切的匹配若干個單詞或者短語(phrases)。例如我們想要查詢同時包含"rock"和"climbing"(並且是相鄰的)的員工記錄。

要做到這個,我們只要將`match`查詢變更為`match_phrase`查詢即可:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
'

{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.23013961,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.23013961,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
}
} ]
}
}

###高亮我們的搜索
很多應用喜歡從每個搜索結果中**高亮(highlight)**匹配到的關鍵字,這樣用戶可以知道為什么這些文檔和查詢相匹配。在Elasticsearch中高亮片段是非常容易的。

讓我們在之前的語句上增加`highlight`參數:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
'

當我們運行這個語句時,會命中與之前相同的結果,但是在返回結果中會有一個新的部分叫做`highlight`,這里包含了來自`about`字段中的文本,並且用\<em>\</em>來標識匹配到的單詞。

{
"took" : 33,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.23013961,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.23013961,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
},
"highlight" : {
"about" : [ "I love to go rock climbing" ]
}
} ]
}
}

##聚合
###分析
最后,我們還有一個需求需要完成:允許管理者在職員目錄中進行一些分析。 Elasticsearch有一個功能叫做**聚合(aggregations)**,它允許你在數據上生成復雜的分析統計。它很像SQL中的`GROUP BY`但是功能更強大。

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
'

查詢結果:

{...
"aggregations" : {
"all_interests" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "music",
"doc_count" : 2
}, {
"key" : "forestry",
"doc_count" : 1
}, {
"key" : "sports",
"doc_count" : 1
} ]
}
}
}

這些數據並沒有被預先計算好,它們是實時的從匹配查詢語句的文檔中動態計算生成的。

如果我們想知道所有姓"Smith"的人最大的共同點(興趣愛好),我們只需要增加合適的語句既可:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query": {
"match": {
"last_name": "smith"
}
},
"aggs": {
"all_interests": {
"terms": {
"field": "interests"
}
}
}
}
'

all_interests聚合已經變成只包含和查詢語句相匹配的文檔了:

...
"all_interests": {
"buckets": [
{
"key": "music",
"doc_count": 2
},
{
"key": "sports",
"doc_count": 1
}
]
}


聚合也允許分級匯總。例如,讓我們統計每種興趣下職員的平均年齡:

$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"aggs" : {
"all_interests" : {
"terms" : { "field" : "interests" },
"aggs" : {
"avg_age" : {
"avg" : { "field" : "age" }
}
}
}
}
}
'


雖然這次返回的聚合結果有些復雜,但仍然很容易理解:

...
"all_interests": {
"buckets": [
{
"key": "music",
"doc_count": 2,
"avg_age": {
"value": 28.5
}
},
{
"key": "forestry",
"doc_count": 1,
"avg_age": {
"value": 35
}
},
{
"key": "sports",
"doc_count": 1,
"avg_age": {
"value": 25
}
}
]
}

該聚合結果比之前的聚合結果要更加豐富。我們依然得到了興趣以及數量(指具有該興趣的員工人數)的列表,但是現在每個興趣額外擁有`avg_age`字段來顯示具有該興趣員工的平均年齡。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM