Es分頁查詢:from+size
{ "query": { "bool": { "must": [ { "term": { "architect.keyword": { "value": "郭鋒" } } }, { "range": { "NRunTime": { "lte": 100 } } } ] } }, "size": 10, "from": 100 }
from 相當於offset,size相當於每頁多少個,上邊例子中代表從第100個數據開始(第11頁),查詢出10條數據
Es多字段查詢 multi_match
{
"query": {
"bool": {
"must": {
"multi_match" : {
"query" : "search_key",
"type" : "best_fields",
"fields" : ["column1", "column2"], //字段column1、column2模糊匹配search_key
"analyzer" : "ik_smart" //漢字按ik_smart分詞
}
},
"filter": { //filter range lte(小於) hte(大於)
"range":{
"column3":{
"lte":1//小於
}
}
}
}
},
"stored_fields": ["column1", "column2", "column3", "column4","column5"],
"highlight" : {//高亮顯示
"fields" : {
"column1" : {},
"column2" : {}
}
}
}
Es match match_phrase查詢
match:會將查詢字段分隔,比如查詢java spark,采用match會分詞 java/spark,將es中包含java、spark、以及java***spark的查詢出來
match_phrase:不會講查詢字段分隔,比如查詢java spark,采用match_phrase會將es中包含 ***java spark***的內容查詢出來
match提高查詢召回率,match_phrase提高查詢精度
match查詢例子:

1、match查詢 GET /forum/article/_search { "query": { "bool": { "must": [ { "match": { "content": "java spark" } } ] } } } 查詢結果 { "took": 54, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.68640786, "hits": [ { "_index": "forum", "_type": "article", "_id": "2", "_score": 0.68640786, "_source": { "articleID": "KDKE-B-9947-#kL5", "userID": 1, "hidden": false, "postDate": "2017-01-02", "tag": [ "java" ], "tag_cnt": 1, "view_cnt": 50, "title": "this is java blog", "content": "i think java is the best programming language", "sub_title": "learned a lot of course", "author_first_name": "Smith", "author_last_name": "Williams", "new_author_last_name": "Williams", "new_author_first_name": "Smith" } }, { "_index": "forum", "_type": "article", "_id": "5", "_score": 0.68324494, "_source": { "articleID": "DHJK-B-1395-#Ky5", "userID": 3, "hidden": false, "postDate": "2017-03-01", "tag": [ "elasticsearch" ], "tag_cnt": 1, "view_cnt": 10, "title": "this is spark blog", "content": "spark is best big data solution based on scala ,an programming language similar to java spark", "sub_title": "haha, hello world", "author_first_name": "Tonny", "author_last_name": "Peter Smith", "new_author_last_name": "Peter Smith", "new_author_first_name": "Tonny" } } ] } }
match_phrase查詢例子:

GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "java spark", "slop" : 50 } } } } 結果: { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.5753642, "hits": [ { "_index": "forum", "_type": "article", "_id": "5", "_score": 0.5753642, "_source": { "articleID": "DHJK-B-1395-#Ky5", "userID": 3, "hidden": false, "postDate": "2017-03-01", "tag": [ "elasticsearch" ], "tag_cnt": 1, "view_cnt": 10, "title": "this is spark blog", "content": "spark is best big data solution based on scala ,an programming language similar to java spark", "sub_title": "haha, hello world", "author_first_name": "Tonny", "author_last_name": "Peter Smith", "new_author_last_name": "Peter Smith", "new_author_first_name": "Tonny" } } ] } }
結果發現match_phrase只會返回既包含java又包含spark的數據,召回率降低
參考:https://www.jianshu.com/p/18d80cafe145
分詞器
通過analyzer指定分詞器類型
系統默認分詞器:
① standard分詞器:單詞切分,將詞匯轉為小寫、去掉標點符號 ------按單詞分

POST _analyze { "analyzer": "standard", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } text分詞結果: [ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]
② whitespace分詞器:空格分詞,對字符沒有lowcase化 -----按空格分

POST _analyze { "analyzer": "whitespace", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } 分詞結果: [ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]
③simple分詞器:通過非字母字符來分隔文本信息,有lowcase化,該分詞器去掉數字類型字符 ----按非首字母分

POST _analyze { "analyzer": "simple", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } 分詞結果: [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
④stop分詞器:通過非字母字符來分隔文本信息,同時去掉英文中a、an、the等常用字符,通過stopwords也可以自己設置需要過濾掉的單詞, 該分詞器去掉數字類型字符 ----按非首字母分 ,去a、an、the
PUT my_index { "settings": { "analysis": { "analyzer": { "my_stop_analyzer": { "type": "stop", "stopwords": ["the", "over"] } } } } } POST my_index/_analyze { "analyzer": "my_stop_analyzer", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } 結果: [ quick, brown, foxes, jumped, lazy, dog, s, bone ]
中文分詞器:
① ik-max-world:會將文本做最細粒度的拆分;盡可能多的拆分出詞語
ik-smart:做最粗粒度的拆分;已被拆出詞語將不會再次被其他詞語占用
ik分詞器熱刺更新配置:
修改 IK 的配置文件 :ES 目錄/plugins/ik/config/ik/IKAnalyzer.cfg.xml
ik-max-world VS ik-smart例子;

curl -XGET 'http://localhost:9200/_analyze?pretty&analyzer=ik_max_word' -d '聯想是全球最大的筆記本廠商' #返回 { "tokens" : [ { "token" : "聯想", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0 }, { "token" : "是", "start_offset" : 2, "end_offset" : 3, "type" : "CN_CHAR", "position" : 1 }, { "token" : "全球", "start_offset" : 3, "end_offset" : 5, "type" : "CN_WORD", "position" : 2 }, { "token" : "最大", "start_offset" : 5, "end_offset" : 7, "type" : "CN_WORD", "position" : 3 }, { "token" : "的", "start_offset" : 7, "end_offset" : 8, "type" : "CN_CHAR", "position" : 4 }, { "token" : "筆記本", "start_offset" : 8, "end_offset" : 11, "type" : "CN_WORD", "position" : 5 }, { "token" : "筆記", "start_offset" : 8, "end_offset" : 10, "type" : "CN_WORD", "position" : 6 }, { "token" : "本廠", "start_offset" : 10, "end_offset" : 12, "type" : "CN_WORD", "position" : 7 }, { "token" : "廠商", "start_offset" : 11, "end_offset" : 13, "type" : "CN_WORD", "position" : 8 } ] } # ik_smart curl -XGET 'http://localhost:9200/_analyze?pretty&analyzer=ik_smart' -d '聯想是全球最大的筆記本廠商' # 返回 { "tokens" : [ { "token" : "聯想", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0 }, { "token" : "是", "start_offset" : 2, "end_offset" : 3, "type" : "CN_CHAR", "position" : 1 }, { "token" : "全球", "start_offset" : 3, "end_offset" : 5, "type" : "CN_WORD", "position" : 2 }, { "token" : "最大", "start_offset" : 5, "end_offset" : 7, "type" : "CN_WORD", "position" : 3 }, { "token" : "的", "start_offset" : 7, "end_offset" : 8, "type" : "CN_CHAR", "position" : 4 }, { "token" : "筆記本", "start_offset" : 8, "end_offset" : 11, "type" : "CN_WORD", "position" : 5 }, { "token" : "廠商", "start_offset" : 11, "end_offset" : 13, "type" : "CN_WORD", "position" : 6 } ] }
參考:https://segmentfault.com/a/1190000012553894
es聚合查詢
aggs
"aggs": { "NAME": {# 指定結果的名稱 "AGG_TYPE": {# 指定具體的聚合方法, TODO: # 聚合體內制定具體的聚合字段 } } TODO: # 該處可以嵌套聚合 }
例子:

{ "size": 0, "aggs": { "sum_install": { "date_histogram": { "field": "yyyymmdd", "interval": "day" }, "aggs": { "types": { "terms": { "field": "type.keyword", "size": 10 }, "aggs": { "install": { "sum": { "field": "install" } } } } } } } }
作用查詢每天,不同type對應install總量
range用法 && must should
range :field:lte gte
must:and操作 should:or操作
{
"query": {
"bool": {
"must": [
{"range": {
"recive_time": {
"gte": "2017-12-25T01:00:00.000Z",
"lte": "2017-12-25T02:10:00.000Z"
}
}},
{
"bool": {
"should": [
{"range": {
"live_delay": {
"gte": 1500
}
}},
{
"range": {
"stream_break_count.keyword": {
"gte": 1
}
}
}
]
}
}
]
}
}
}
must的兩個條件都必須滿足,should中的兩個條件至少滿足一個就可以
Es 原來索引中添加字段
PUT /my_index/_mapping/my_type
{ "properties": { "new_field_name": { "type": "string" } } }
給新添加字段賦值
POST my_index/_update_by_query //批量更新用_update_by_query語法
{ "script": { "lang": "painless", "inline": "ctx._source.new_field_name= '02'" } } //通過painless更新對象值
參考:https://blog.csdn.net/qq_36330643/article/details/79771652
雜亂查詢語法:
aggs+avg+max+min+order+cardinality(等價於count(distinct(a)))+filter+sort
terms相當於sql中的groupby
aggs:基於搜索查詢,可以嵌套組合復雜查詢語法
"aggs": { "NAME": {# 指定結果的名稱 "AGG_TYPE": {# 指定具體的聚合方法, TODO: # 聚合體內制定具體的聚合字段 } } TODO: # 該處可以嵌套聚合 }
一、聚合起步
1、創建索引
1.1 創建索引DSL實現
put cars
POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
1.2 創建mysql庫表sql實現
CREATE TABLE `cars` (
`id` int(11) NOT NULL,
`price` int(11) DEFAULT NULL,
`color` varchar(255) DEFAULT NULL,
`make` varchar(255) DEFAULT NULL,
`sold` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
2、統計不同顏色車的數目
2.1 統計不同顏色車的DSL實現
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"popular_colors": {
"terms": {
"field": "color.keyword"
}
}
}
}
返回結果:
lve
2.2 統計不同顏色的mysql實現
select color, count(color) as cnt from cars group by color order by cnt desc;
返回結果:
red 4
green 2
blue 2
3、統計不同顏色車的平均價格
3.1 統計不同顏色車的平均價格DSL實現:
terms 相當於groupby
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
返回聚合結果:
lve
3.2 統計不同顏色車的平均價格sql實現:
select color, count(color) as cnt, avg(price) as avg_price from cars group by color order by cnt desc;
color cnt avg_price
red 4 32500.0000
green 2 21000.0000
blue 2 20000.0000
4、每種顏色汽車制造商的分布
4.1 統計每種顏色汽車制造商的分布dsl實現
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"make": {
"terms": {
"field": "make.keyword"
}
}
}
}
}
}
返回結果:
4.2 統計每種顏色汽車制造商的分布sql實現
說明:和dsl的實現不嚴格對應
select color, make from cars order by color;
color make
blue toyota
blue ford
green ford
green toyota
red bmw
red honda
red honda
red honda
5、統計每個制造商的最低價格、最高價格
5.1 統計每個制造商的最低、最高價格的DSL實現
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"make_class": {
"terms": {
"field": "make.keyword"
},
"aggs": {
"min_price": {
"min": {
"field": "price"
}
},
"max_price": {
"max": {
"field": "price"
}
}
}
}
}
}
聚合結果:
5.2 統計每個制造商的最低、最高價格的sql實現
select make, min(price) as min_price, max(price) as max_price from cars group by make;
make min_price max_price
bmw 80000 80000
ford 25000 30000
honda 10000 20000
toyota 12000 15000
二、聚合進階
1、條形圖聚合
1.1 分段統計每個區間的汽車銷售價格總和
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"price": {
"histogram": {
"field": "price",
"interval": 20000
},
"aggs": {
"revenue": {
"sum": {
"field": "price"
}
}
}
}
}
}
汽車銷售價格區間:定義為20000;
分段統計price和用sum統計。
1.2 多維度度量不同制造商的汽車指標
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"makes": {
"terms": {
"field": "make.keyword",
"size": 10
},
"aggs": {
"stats": {
"extended_stats": {
"field": "price"
}
}
}
}
}
}
輸出截取片段:
{
"key": "ford",
"doc_count": 2,
"stats": {
"count": 2,
"min": 25000,
"max": 30000,
"avg": 27500,
"sum": 55000,
"sum_of_squares": 1525000000,
"variance": 6250000,
"std_deviation": 2500,
"std_deviation_bounds": {
"upper": 32500,
"lower": 22500
}
}
}
2、按時間統計聚合
2.1 按月份統計制造商汽車銷量dsl實現
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales":{
"date_histogram":{
"field":"sold",
"interval":"month",
"format":"yyyy-MM-dd"
}
}
}
}
返回結果:
2.2 按月份統計制造商汽車銷量sql實現
SELECT make, count(make) as cnt, CONCAT(YEAR(sold),',',MONTH(sold)) AS data_time
FROM `cars`
GROUP BY YEAR(sold) DESC,MONTH(sold)
查詢結果如下:
make cnt data_time
bmw 1 2014,1
ford 1 2014,2
ford 1 2014,5
toyota 1 2014,7
toyota 1 2014,8
honda 1 2014,10
honda 2 2014,11
2.3 包含12月份的處理DSL實現
以上2.1 中沒有12月份的統計結果顯示。
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales":{
"date_histogram":{
"field":"sold",
"interval":"month",
"format":"yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds":{
"min":"2014-01-01",
"max":"2014-12-31"
}
}
}
}
}
2.4 以季度為單位統計DSL實現
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales":{
"date_histogram":{
"field":"sold",
"interval":"quarter",
"format":"yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds":{
"min":"2014-01-01",
"max":"2014-12-31"
}
},
"aggs":{
"per_make_sum":{
"terms":{
"field": "make.keyword"
},
"aggs":{
"sum_price":{
"sum":{ "field": "price"}
}
}
},
"top_sum": {
"sum": {"field":"price"}
}
}
}
}
}
2.5 基於搜索的(范圍限定)聚合操作
2.5.1 基礎查詢聚合
GET /cars/transactions/_search
{
"query" : {
"match" : {
"make.keyword" : "ford"
}
},
"aggs" : {
"colors" : {
"terms" : {
"field" : "color.keyword"
}
}
}
}
對應的sql實現:
select make, color from cars
where make = "ford";
結果返回如下:
make color
ford green
ford blue
三、過濾聚合
1. 過濾操作
統計全部汽車的平均價錢以及單品平均價錢;
GET /cars/transactions/_search
{
"size" : 0,
"query" : {
"match" : {
"make.keyword" : "ford"
}
},
"aggs" : {
"single_avg_price": {
"avg" : { "field" : "price" }
},
"all": {
"global" : {},
"aggs" : {
"avg_price": {
"avg" : { "field" : "price" }
}
}
}
}
}
等價於:
select make, color, avg(price) from cars
where make = "ford" ;
select avg(price) from cars;
2、范圍限定過濾(過濾桶)
我們可以指定一個過濾桶,當文檔滿足過濾桶的條件時,我們將其加入到桶內。
GET /cars/transactions/_search
{
"size" : 0,
"query":{
"match": {
"make": "ford"
}
},
"aggs":{
"recent_sales": {
"filter": {
"range": {
"sold": {
"from": "now-100M"
}
}
},
"aggs": {
"average_price":{
"avg": {
"field": "price"
}
}
}
}
}
}
mysql的實現如下:
select *, avg(price) from cars where period_diff(date_format(now() , '%Y%m') , date_format(sold, '%Y%m')) > 30
and make = "ford";
mysql查詢結果如下:
id price color make sold avg
3 30000 green ford 2014-05-18 27500.0000
3、后過濾器
只過濾搜索結果,不過濾聚合結果——post_filter實現
GET /cars/transactions/_search
{
"query": {
"match": {
"make": "ford"
}
},
"post_filter": {
"term" : {
"color.keyword" : "green"
}
},
"aggs" : {
"all_colors": {
"terms" : { "field" : "color.keyword" }
}
}
}
post_filter 會過濾搜索結果,只展示綠色 ford 汽車。這在查詢執行過 后 發生,所以聚合不受影響。
小結
選擇合適類型的過濾(如:搜索命中、聚合或兩者兼有)通常和我們期望如何表現用戶交互有關。選擇合適的過濾器(或組合)取決於我們期望如何將結果呈現給用戶。
在 filter 過濾中的 non-scoring 查詢,同時影響搜索結果和聚合結果。
filter 桶影響聚合。
post_filter 只影響搜索結果。
四、多桶排序
4.1 內置排序
GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"colors" : {
"terms" : {
"field" : "color.keyword",
"order": {
"_count" : "asc"
}
}
}
}
}
4.2 按照度量排序
以下是按照汽車平均售價的升序進行排序。
過濾條件:汽車顏色;
聚合條件:平均價格;
排序條件:汽車的平均價格升序。
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword",
"order": {
"avg_price": "asc"
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
多條件聚合后排序如下所示:
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword",
"order": {
"stats.variance": "asc"
}
},
"aggs": {
"stats": {
"extended_stats": {
"field": "price"
}
}
}
}
}
}
4.3 基於“深度”的度量排序
太復雜,不推薦!
五、近似聚合
cardinality的含義是“基數”;
5.1 統計去重后的數量
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"distinct_colors": {
"cardinality": {
"field": "color.keyword"
}
}
}
}
類似於:
SELECT COUNT(DISTINCT color) FROM cars;
以下:
以月為周期統計;
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"months": {
"date_histogram": {
"field": "sold",
"interval": "month"
},
"aggs": {
"distinct_colors": {
"cardinality": {
"field": "color.keyword"
}
}
}
}
}
}