Elasticsearch學習記錄(入門篇)

1、 Elasticsearch的請求與結果

請求結構
curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
VERB HTTP方法：GET, POST, PUT, HEAD, DELETE

PROTOCOL http或者https協議（只有在Elasticsearch前面有https代理的時候可用）

HOST Elasticsearch集群中的任何一個節點的主機名，如果是在本地的節點，那么就叫localhost

PORT Elasticsearch HTTP服務所在的端口，默認為9200

PATH API路徑（例如_count將返回集群中文檔的數量），PATH可以包含多個組件，例如_cluster/stats或者_nodes/stats/jvm

QUERY_STRING 一些可選的查詢請求參數，例如?pretty參數將使請求返回更加美觀易讀的JSON數據
BODY 一個JSON格式的請求主體（如果請求需要的話）

PUT創建(索引創建)

$ curl -XPUT 'http://localhost:9200/megacorp/employee/3?pretty' -d '

{

"first_name" :  "Douglas",
"last_name" :   "Fir",
"age" :         35,
"about":        "I like to build cabinets",
"interests":  [ "forestry" ]
}
’
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_version" : 1,
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : true
}
##GET請求(搜索)
###檢索文檔
$ curl -XGET 'http://localhost:9200/megacorp/employee/1?pretty'
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
}
}
###簡單搜索
使用`megacorp`索引和`employee`類型，但是我們在結尾使用關鍵字\_search來取代原來的文檔ID。響應內容的hits數組中包含了我們所有的三個文檔。默認情況下搜索會返回前10個結果。
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [ "music" ]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about" : "I like to build cabinets",
"interests" : [ "forestry" ]
}
} ]
}
}
接下來，讓我們搜索姓氏中包含“Smith”的員工。我們將在命令行中使用輕量級的搜索方法。這種方法常被稱作查詢字符串(query string)搜索，因為我們像傳遞URL參數一樣去傳遞查詢語句：
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?q=last_name:Smith&pretty'
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.30685282,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [ "music" ]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.30685282,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
}
} ]
}
}
###使用DSL語句查詢
查詢字符串搜索便於通過命令行完成特定(ad hoc)的搜索，但是它也有局限性（參閱簡單搜索章節）。Elasticsearch提供豐富且靈活的查詢語言叫做DSL查詢(Query DSL),它允許你構建更加復雜、強大的查詢。

DSL(Domain Specific Language特定領域語言)以JSON請求體的形式出現。我們可以這樣表示之前關於“Smith”的查詢:
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
'
###更復雜的搜索
我們讓搜索稍微再變的復雜一些。我們依舊想要找到姓氏為“Smith”的員工，但是我們只想得到年齡大於30歲的員工。我們的語句將添加過濾器(filter),它使得我們高效率的執行一個結構化搜索：
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"filtered" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 } --<1>
}
},
"query" : {
"match" : {
"last_name" : "smith" --<2>
}
}
}
}
}
'
* <1> 這部分查詢屬於區間過濾器(range filter),它用於查找所有年齡大於30歲的數據——gt為"greater than"的縮寫。
* <2> 這部分查詢與之前的match語句(query)一致。
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.30685282,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [ "music" ]
}
} ]
}
}
###全文搜索
到目前為止搜索都很簡單：搜索特定的名字，通過年齡篩選。讓我們嘗試一種更高級的搜索，全文搜索——一種傳統數據庫很難實現的功能。

我們將會搜索所有喜歡“rock climbing”的員工：
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
'
你可以看到我們使用了之前的`match`查詢，從`about`字段中搜索"**rock climbing**"，我們得到了兩個匹配文檔：
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.16273327,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.16273327,<1>
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
}
}, {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.016878016,<2>
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [ "music" ]
}
} ]
}
}
* <1><2> 結果相關性評分。

默認情況下，Elasticsearch根據結果相關性評分來對結果集進行排序，所謂的「結果相關性評分」就是文檔與查詢條件的匹配程度。很顯然，排名第一的`John Smith`的`about`字段明確的寫到“**rock climbing**”

但是為什么`Jane Smith`也會出現在結果里呢？原因是“**rock**”在她的abuot字段中被提及了。因為只有“**rock**”被提及而“**climbing**”沒有，所以她的`_score`要低於John。

###短語搜索
目前我們可以在字段中搜索單獨的一個詞，這挺好的，但是有時候你想要確切的匹配若干個單詞或者短語(phrases)。例如我們想要查詢同時包含"rock"和"climbing"（並且是相鄰的）的員工記錄。

要做到這個，我們只要將`match`查詢變更為`match_phrase`查詢即可:
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
'
{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.23013961,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.23013961,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
}
} ]
}
}
###高亮我們的搜索
很多應用喜歡從每個搜索結果中**高亮(highlight)**匹配到的關鍵字，這樣用戶可以知道為什么這些文檔和查詢相匹配。在Elasticsearch中高亮片段是非常容易的。

讓我們在之前的語句上增加`highlight`參數：
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
'
當我們運行這個語句時，會命中與之前相同的結果，但是在返回結果中會有一個新的部分叫做`highlight`，這里包含了來自`about`字段中的文本，並且用\<em>\</em>來標識匹配到的單詞。
{
"took" : 33,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.23013961,
"hits" : [ {
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_score" : 0.23013961,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests" : [ "sports", "music" ]
},
"highlight" : {
"about" : [ "I love to go rock climbing" ]
}
} ]
}
}
##聚合
###分析
最后，我們還有一個需求需要完成：允許管理者在職員目錄中進行一些分析。 Elasticsearch有一個功能叫做**聚合(aggregations)**，它允許你在數據上生成復雜的分析統計。它很像SQL中的`GROUP BY`但是功能更強大。
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"aggs": {
"all_interests": {
"terms": { "field": "interests" }
}
}
}
'
查詢結果：
{...
"aggregations" : {
"all_interests" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "music",
"doc_count" : 2
}, {
"key" : "forestry",
"doc_count" : 1
}, {
"key" : "sports",
"doc_count" : 1
} ]
}
}
}
這些數據並沒有被預先計算好，它們是實時的從匹配查詢語句的文檔中動態計算生成的。

如果我們想知道所有姓"Smith"的人最大的共同點（興趣愛好），我們只需要增加合適的語句既可：
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"query": {
"match": {
"last_name": "smith"
}
},
"aggs": {
"all_interests": {
"terms": {
"field": "interests"
}
}
}
}
'
all_interests聚合已經變成只包含和查詢語句相匹配的文檔了：
...
"all_interests": {
"buckets": [
{
"key": "music",
"doc_count": 2
},
{
"key": "sports",
"doc_count": 1
}
]
}
聚合也允許分級匯總。例如，讓我們統計每種興趣下職員的平均年齡：
$ curl -XGET 'http://localhost:9200/megacorp/employee/_search?pretty' -d '
{
"aggs" : {
"all_interests" : {
"terms" : { "field" : "interests" },
"aggs" : {
"avg_age" : {
"avg" : { "field" : "age" }
}
}
}
}
}
'
雖然這次返回的聚合結果有些復雜，但仍然很容易理解：
...
"all_interests": {
"buckets": [
{
"key": "music",
"doc_count": 2,
"avg_age": {
"value": 28.5
}
},
{
"key": "forestry",
"doc_count": 1,
"avg_age": {
"value": 35
}
},
{
"key": "sports",
"doc_count": 1,
"avg_age": {
"value": 25
}
}
]
}
該聚合結果比之前的聚合結果要更加豐富。我們依然得到了興趣以及數量（指具有該興趣的員工人數）的列表，但是現在每個興趣額外擁有`avg_age`字段來顯示具有該興趣員工的平均年齡。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elasticsearch入門篇 GAN學習入門篇（一） LabView軟件學習筆記之入門篇【深度學習與TensorFlow 2.0】入門篇 Elasticsearch(入門篇)——Query DSL與查詢行為 Python入門篇 Ceph 入門篇 StandfordParser：入門篇 Rust入門篇 (1) modbus入門篇