以下操作均在 6.7.1版本中正常
c# ES客戶端 測試項目地址:https://gitee.com/dhclly/IceDog.ElasticSearchClient/tree/master/src/IceDog.ElasticSearchClient.MSTest
文檔
- https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html《Elasticsearch 權威指南》中文版(文檔已經過時,只是因為是英文,方便快速入門)
- https://www.elastic.co/guide/en/elasticsearch/reference/6.7/getting-started.html 6.7的英文文檔
basic
GET /?pretty
curl 'http://localhost:9200/?pretty'
正常返回類似如下結果
{
"name" : "id_Cdrf",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "OVVjOYXmRLmH0_x6QnS6sw",
"version" : {
"number" : "6.7.1",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "2f32220",
"build_date" : "2019-04-02T15:59:27.961366Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
操作記錄
所有操作都基於數據集合 s_flights
,詳情查看文件 es-operation-datasource.md
。
默認查詢全部實際上只返回前10條數據
數據說明
航班信息(注釋后面是mapping的類型):
{
"FlightNum" : "EN9FHUD", //航班編號 keyword
"DestCountry" : "CA", //目的地國家名 keyword
"OriginWeather" : "Rain",//出發地天氣 keyword
"OriginCityName" : "Detroit", // 出發地城市名 keyword
"AvgTicketPrice" : 798.6925673856011, // 平均機票價格 float
"DistanceMiles" : 1586.2909176475928,//出發地到目的地的英里數 float
"FlightDelay" : true, //航班是否延遲 boolean
"DestWeather" : "Rain", //目的地天氣 keyword
"Dest" : "Edmonton International Airport",//目的地機場 keyword
"FlightDelayType" : "Security Delay", // 航班延遲類型 keyword
"OriginCountry" : "US", //出發地國家名 keyword
"dayOfWeek" : 6, //星期 integer
"DistanceKilometers" : 2552.8877705706477, //出發地到目的地的公里數 float
"timestamp" : "2019-04-28T06:25:17", //時間戳 date
"DestLocation" : {//目的地坐標 geo_point
"lat" : "53.30970001", //緯度
"lon" : "-113.5800018" //經度
},
"DestAirportID" : "CYEG", //目的地機場ID keyword
"Carrier" : "Kibana Airlines", //航空公司名 keyword
"Cancelled" : false, //是否取消航班 boolean
"FlightTimeMin" : 451.3759823515883, //航班飛行最小分鍾數 float
"Origin" : "Detroit Metropolitan Wayne County Airport",//出發地機場 keyword
"OriginLocation" : {//出發地坐標 geo_point
"lat" : "42.21239853",//緯度
"lon" : "-83.35340118"//經度
},
"DestRegion" : "CA-AB",//目的地區域 keyword
"OriginAirportID" : "DTW", //出發地的機場ID keyword
"OriginRegion" : "US-MI", //出發地區域 keyword
"DestCityName" : "Edmonton", //目的地城市名稱 keyword
"FlightTimeHour" : 7.522933039193138, //航班飛行小時數 keyword
"FlightDelayMin" : 255 //航班延遲最小分鍾數 integer
}
操作記錄
由於數據體數據太多,所以需要對數據返回進行如下方式過濾
如下只返回定義的五個屬性
輕量查詢方式:
GET /s_flights/_doc/_search?_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
請求體方式:
GET /s_flights/_doc/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
]
}
添加數據
mapping定義的數據不需要所有都填寫
添加單獨一條數據
PUT /s_flights/_doc/100
{
"FlightNum": "C12345A",
"Origin": "重慶",
"Dest": "北京"
}
添加單獨一條數據自動生成 id
POST /s_flights/_doc/
{
"FlightNum": "C12345A",
"Origin": "重慶",
"Dest": "北京"
}
確保是添加新文檔,而不是更新文檔
PUT /s_flights/_doc/BxOIj2oBXEJSGxQ5yn4k?op_type=create
{
"FlightNum": "C12345A",
"Origin": "重慶",
"Dest": "北京"
}
PUT /s_flights/_doc/BxOIj2oBXEJSGxQ5yn4k/_create
{
"FlightNum": "C12345A",
"Origin": "重慶",
"Dest": "北京"
}
doc as upsert
POST /s_flights/_doc/1
{
"doc": {
"FlightNum": "C123333345A",
"Origin": "重222慶",
"Dest": "北京",
"bb":11
},
"doc_as_upsert": true
}
存在屬性再更新就會返回 noop(no operation) 操作
POST /s_flights/_doc/1/_update
{
"doc" : {
"moreInfo":{
"counter":1
}
}
}
批量插入數據
POST _bulk
{"create":{"_index":"s_flights","_type":"_doc","_id":"101"}}
{"FlightNum":"C12345B","Origin":"重慶","Dest":"上海"}
{"create":{"_index":"s_flights","_type":"_doc","_id":"102"}}
{"FlightNum":"C12345C","Origin":"重慶","Dest":"武漢"}
刪除數據
刪除數據_id
為100的
DELETE /s_flights/_doc/100
批量刪除數據
POST _bulk
{"delete":{"_index":"s_flights","_type":"_doc","_id":"101"}}
{"delete":{"_index":"s_flights","_type":"_doc","_id":"102"}}
修改(更新)數據
https://www.elastic.co/guide/en/elasticsearch/reference/6.7/docs-update.html
更新或添加數據
POST /s_flights/_doc/1/_update
{
"doc" : {
"moreInfo":{
"tags" : [ "airline","flight","aeroplane","airplane"],
"counter":1,
"memo_1":""
}
}
}
GET /s_flights/_doc/1?_source=moreInfo
腳本方式更新字段值:
POST s_flights/_doc/1/_update
{
"script" : "ctx._source.moreInfo.counter+=1"
}
POST s_flights/_doc/1/_update
{
"script" : "ctx._source.moreInfo.memo_1='good time'"
}
GET /s_flights/_doc/1?_source=moreInfo
如果數據不存在,則先使用upsert
創建文檔,然后再次執行,則會執行腳本進行遞增。
DELETE s_flights/_doc/1
POST s_flights/_doc/1/_update
{
"script": {
"source": "ctx._source.view_counter += params.count",
"lang": "painless",
"params": {
"count": 1
}
},
"upsert": {
"moreInfo": {
"tags": [
"airline",
"flight",
"aeroplane",
"airplane"
],
"counter": 1,
"memo_1": ""
},
"view_counter": 1
}
}
GET /s_flights/_doc/1
flag scripted_upsert 的作用,如果數據不存在,則先使用upsert
創建文檔,然后執行腳本
POST s_flights/_doc/1/_update
{
"scripted_upsert":true,
"script": {
"source": "ctx._source.view_counter += params.count",
"lang": "painless",
"params": {
"count": 1
}
},
"upsert": {
"moreInfo": {
"tags": [
"airline",
"flight",
"aeroplane",
"airplane"
],
"counter": 1,
"memo_1": ""
},
"view_counter": 1
}
}
GET /s_flights/_doc/1
通過scripts更新文檔數據
POST s_flights/_doc/1/_update
{
"script": {
"source": "ctx._source.moreInfo.counter_status = ctx._source.moreInfo.counter === params.count ? 'isEnough' : params.count",
"params": {
"count": 10
},
"lang": "painless"
}
}
GET /s_flights/_doc/1?_source=moreInfo
查詢數據
https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html
https://www.cnblogs.com/ghj1976/p/5293250.html
https://donlianli.iteye.com/blog/2094305
https://blog.csdn.net/weixin_43430036/article/details/83272018
輕量查詢
獲取所有數據(默認只查詢10條出來)
GET /s_flights/_search?&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
獲取id為1的數據
GET /s_flights/_doc/1?&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
獲取數據請求頭,返回的是狀態碼,以此判斷數據是否存在
HEAD /s_flights/_doc/1
查詢出發國家是US的數據
GET /s_flights/_search?q=OriginCountry:US&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
查詢出發國家是US CN的數據
GET /s_flights/_search?q=OriginCountry:US+CN&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
查詢mapping
GET /s_flights/_mapping/_doc
查看集群健康
GET /_cluster/health
通配符查詢(wildcards query)
查詢出發地國家名以C開頭的
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"wildcard": {
"OriginCountry": {
"value": "C*"
}
}
}
}
查詢航班編號值是 F開頭 M9結尾的
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"wildcard": {
"FlightNum": {
"value": "F????M9"
}
}
}
}
短語查詢(term query)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
查詢 FlightNum 為 FFEVPM9的結果
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"term": {
"FlightNum": {
"value": "FFEVPM9"
}
}
}
}
多短語查詢(terms query)
查詢 FlightNum 為 6DJ0DZM ILXJVIF 的結果
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"terms": {
"FlightNum": ["6DJ0DZM","ILXJVIF"]
}
}
}
term set query
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-set-query.html
(范圍查詢)range query
查詢飛行距離在100km(包含100km)到200km(包含200km)之間的結果
GET /s_flights/_search
{
"_source": [
"FlightNum",
"DistanceKilometers"
],
"query": {
"range": {
"DistanceKilometers": {
"gte": 100,
"lte": 200
}
}
}
}
exists query
prefix query
regexp query
fuzzy query
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin"
],
"query": {
"fuzzy": {
"Origin": "bai"
}
}
}
type query
ids query
請求體查詢(ad-hoc)
查詢所有數據
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match_all": {}
}
}
上面的等價於
GET /s_flights/_search?&_source=FlightNum,Origin,OriginCountry,Dest,DestCountry
匹配查詢出發國家是US的數據
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"OriginCountry": "US"
}
}
}
查詢出發地包含Shanghai Tokyo的數據 (Origin是text類型,可以進行分詞查詢)
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin": "Shanghai Tokyo"
}
}
}
匹配查詢出發地包含Shanghai Tokyo的數據 (Origin此處被當做是keyword類型,不可以進行分詞查詢,查詢不出結果)
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin.keyword": "Shanghai Tokyo"
}
}
}
匹配查詢出發地包含Shanghai Tokyo的數據 (Origin此處被當做是keyword類型,不可以進行分詞查詢),可以對整個詞作為關鍵字進行查詢,有結果
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin.keyword": "Shanghai Hongqiao International Airport"
}
}
}
匹配短語查詢出發地包含Shanghai Hongqiao International的數據 (Origin是text類型,可以進行分詞查詢)
正常查詢,查出一堆數據
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin": "Shanghai Hongqiao International"
}
}
}
匹配短語查詢,只能查出一條
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match_phrase": {
"Origin": "Shanghai Hongqiao International"
}
}
}
高亮查詢,匹配短語查詢出發地包含Shanghai Hongqiao International的數據 (Origin是text類型,可以進行分詞查詢),並對查詢結果進行高亮(即對返回結果添加額外的標簽)
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin": "Shanghai Hongqiao International"
}
}
, "highlight": {
"fields": {
"Origin":{}
}
}
}
自定義highlight 標簽
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin": "Shanghai Hongqiao International"
}
}
, "highlight": {
"fields": {
"Origin":{
"pre_tags": "<span class='highlight '>",
"post_tags": "</span>"
}
}
}
}
高亮標簽設置內部優先。
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"match": {
"Origin": "Shanghai Hongqiao International"
}
},
"highlight": {
"fields": {
"Origin": {
"pre_tags": "<span class='hightlight-origin'>",
"post_tags": "</span>"
}
},
"pre_tags": "<span class='hightlight'>",
"post_tags": "</span>"
}
}
查詢出發國家是US,同時航班飛行最小分鍾數小於100分鍾的數據
GET /s_flights/_search
{
"_source": [
"FlightNum",
"Origin",
"OriginCountry",
"Dest",
"DestCountry"
],
"query": {
"bool": {
"must": {
"match": {
"OriginCountry": "US"
}
},
"filter": {
"range": {
"FlightTimeMin": {
"lt": 100
}
}
}
}
}
}
查詢 OriginCountry
為USA和UK的航班記錄
GET /s_flights/_search?q=OriginCountry:US+CA
GET /s_flights/_search
{
"query": {
"terms": {
"OriginCountry": ["US","CA"]
}
}
}
查詢結果只顯示字段 OriginCountry
和 DestCountry
GET /s_flights/_search
{
"_source": {
"includes": [ "OriginCountry", "DestCountry" ]
},
"query": {
"match_all": {}
}
}
GET /s_flights/_search
{
"_source":[ "OriginCountry", "DestCountry" ],
"query": {
"match_all": {}
}
}
查詢結果只顯示字段 OriginCountry
GET /s_flights/_search
{
"_source":"OriginCountry",
"query": {
"match_all": {}
}
}
不返回 _source
GET /s_flights/_search
{
"_source":false,
"query": {
"match_all": {}
}
}
查詢出出發地是"US","NL", "JP" 的航班統計同時查詢出每個出發地的不同目的地航班統計
GET /s_flights/_search
{
"_source":false,
"query": {
"terms": {
"OriginCountry": [
"US",
"NL",
"JP",
"CN"
]
}
},
"aggs": {
"all_origin": {
"terms": {
"field": "OriginCountry"
},
"aggs": {
"all_dest": {
"terms": {
"field": "DestCountry"
}
}
}
}
}
}
查詢出出發地是"US","NL", "JP" 的航班統計同時查詢出每個出發地的不同目的地航班統計,同時統計最小、最大和平均里程數
GET /s_flights/_search
{
"_source": "OriginCountry",
"query": {
"terms": {
"OriginCountry": [
"US",
"NL",
"JP",
"CN"
]
}
},
"aggs": {
"all_origin": {
"terms": {
"field": "OriginCountry"
},
"aggs": {
"all_dest": {
"terms": {
"field": "DestCountry"
}
},
"minDistanceKilometers": {
"min": {
"field": "DistanceKilometers"
}
},
"maxDistanceKilometers": {
"max": {
"field": "DistanceKilometers"
}
},
"avgDistanceKilometers": {
"avg": {
"field": "DistanceKilometers"
}
}
}
},
"minDistanceKilometers": {
"min": {
"field": "DistanceKilometers"
}
},
"maxDistanceKilometers": {
"max": {
"field": "DistanceKilometers"
}
},
"avgDistanceKilometers": {
"avg": {
"field": "DistanceKilometers"
}
}
}
}
查詢出發地和目的地都是US的數據
GET /s_flights/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"OriginCountry": "US"
}
},
{
"match": {
"DestCountry": "US"
}
}
]
}
}
}
查詢無延機航班,時間范圍在 2019-04-28
一天內,同時出發地國家為 "US","NL", "JP" 的航班統計同時查詢出每個出發地的不同目的地航班統計
GET /s_flights/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"match": {
"FlightDelayType": "No Delay"
}
}
]
}
},
{
"range": {
"timestamp": {
"gte": "2019-04-28 00:00:00",
"lte": "2019-04-29 00:00:00",
"time_zone": "+08:00",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
},
{
"terms": {
"OriginCountry": [
"US",
"NL",
"JP"
]
}
}
]
}
},
"aggs": {
"all_origin": {
"terms": {
"field": "OriginCountry"
},
"aggs": {
"all_dest": {
"terms": {
"field": "DestCountry"
}
}
}
}
}
}
查詢匹配天氣,匹配 機場id ,最小飛行時間范圍,不匹配 DestCountry I 開頭 E結尾的
注意:官方數據這個里面如小時數是keyword類型,也就是string類型,所以會導致查詢范圍會出現有時候又數據,有時候沒有數據,range應該針對number date。
GET /s_flights/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase": {
"DestWeather": "Clear"
}
},
{
"match_phrase": {
"DestWeather": "Sunny"
}
}
]
}
},
{
"terms": {
"OriginAirportID": [
"SHA",
"DWC"
]
}
},
{
"range": {
"FlightTimeMin": {
"gte": 100,
"lte": 900
}
}
}
],
"must_not": [
{
"wildcard": {
"DestCountry": "*E"
}
},
{
"wildcard": {
"DestCountry": {
"value": "I*"
}
}
}
]
}
}
}
_mget查詢
獲取id 為 1 2的數據
POST /s_flights/_doc/_mget
{
"ids":[1,2]
}
獲取id 為 1 3的數據
POST /s_flights/_doc/_mget
{
"docs": [
{
"_id": 1
},
{
"_id": 3
}
]
}
官方教程4 分頁
GET /_search
GET /_search?timeout=10ms
# 在所有的索引中搜索所有的類型
GET /_search
# 在 gb 索引中搜索所有的類型
GET /gb/_search
# 在 gb 和 us 索引中搜索所有的文檔
GET /gb,us/_search
# 在任何以 g 或者 u 開頭的索引中搜索所有的類型
GET /g*,u*/_search
#在 gb 索引中搜索 user 類型
/gb/user/_search
#在 gb 和 us 索引中搜索 user 和 tweet 類型
/gb,us/user,tweet/_search
#在所有的索引中搜索 user 和 tweet 類型
/_all/user,tweet/_search
GET /_search?size=5
GET /_search?size=5&from=5
GET /_search?size=5&from=10
和 SQL 使用 LIMIT 關鍵字返回單個 page 結果的方法相同,Elasticsearch 接受 from 和 size 參數:
# 顯示應該返回的結果數量,默認是 10
size
# 顯示應該跳過的初始結果數量,默認是 0
from
在分布式系統中深度分頁
理解為什么深度分頁是有問題的,我們可以假設在一個有 5 個主分片的索引中搜索。 當我們請求結果的第一頁(結果從 1 到 10 ),每一個分片產生前 10 的結果,並且返回給 協調節點 ,協調節點對 50 個結果排序得到全部結果的前 10 個。
現在假設我們請求第 1000 頁--結果從 10001 到 10010 。所有都以相同的方式工作除了每個分片不得不產生前10010個結果以外。 然后協調節點對全部 50050 個結果排序最后丟棄掉這些結果中的 50040 個結果。
可以看到,在分布式系統中,對結果排序的成本隨分頁的深度成指數上升。這就是 web 搜索引擎對任何查詢都不要返回超過 1000 個結果的原因。
https://www.elastic.co/guide/cn/elasticsearch/guide/current/pagination.html
批量寫入操作 _bulk
bulk api可以在單個請求中一次執行多個索引或者刪除操作,使用這種方式可以極大的提升索引性能。
批量操作數據需要在一行
兩行數據構成了一次操作,第一行是操作類型可以index,create,update,或者delete,第二行就是我們的可選的數據體,使用這種方式批量插入的時候,我們需要設置的它的Content-Type為application/json
。
針對不同的操作類型,第二行里面的可選的數據體是不一樣的,如下:
(1)index 和 create 第二行是source數據體
(2)delete 沒有第二行
(3)update 第二行可以是partial doc,upsert或者是script
我們可以將我們的操作直接寫入到一個文本文件中,然后使用curl命令把它發送到服務端:
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
https://www.elastic.co/guide/cn/elasticsearch/guide/current/_Document_Metadata.html
https://www.cnblogs.com/wangzhuxing/p/9351245.html
https://blog.51cto.com/13630803/2162641?source=dra
https://elasticsearch.cn/question/5340
https://blog.csdn.net/jianjun200607/article/details/51262976/
https://blog.csdn.net/huwei2003/article/details/47004745
https://www.cnblogs.com/wulaiwei/p/9319821.html