基於ELK的ElasticSearch技術整理1：基礎理論與DSL語法及 Java操作ES

本文轉載自查看原文 2021-12-13 23:35 908 數據庫

基礎理論和DSL語法

准備工作

什么是ElasticSearch？它和Lucene以及solr的關系是什么？

這些是自己的知識獲取能力，自行百度百科

下載ElasticSearch的window版

linux版后續說明

自行百度Elastic，然后進到官網進行下載，我的版本是：7.8.0

下載postman

自行百度進行下載

ElasticSearch中的目錄解讀

會tomcat，看到這些目錄就不陌生

進到bin目錄下，點擊 elasticsearch.bat 文件即可啟動 ES 服務

ELK技術是什么意思？

就圖中這三個

注意事項

保證自己的JDK是1.8或以上，最低要求1.8

ES非關系型和關系型數據庫對應關系

注意：ES 7.x之后，type已經被淘汰了，其他的沒變

只要玩ES，那么這個圖就要牢牢地記在自己腦海里，后續的名詞解釋不再過多說明，就是操作這幅圖中的東西

基礎理論

正向索引和倒排索引

elasticsearch中使用的就是倒排索引

倒排索引中又有3個小東西：

詞條：是指索引中的最小存儲或查詢單元。這個其實很好理解，白話文來講就是：字或者詞組，英文就是一個單詞，中文就是字或詞組嘛，比如：你要查詢的內容中具備含義的某一個字或詞組，這就是詞條唄，如：我是中國人，就可以分為：我、是、中國人、中國、國人這樣的幾個詞條。但是數據千千萬萬，一般的數據結構能夠存的下嗎？不可能的，所以這里做了文章，采用的是B+樹和hash存儲(如：hashmap)
詞典：就是詞條的集合嘛。字或者詞組組成的內容唄
倒排表：就是指關鍵字 / 關鍵詞在索引中的位置。 有點類似於數組，你查詢數組中某個元素的位置，但是區別很大啊，我只是為了好理解，所以才這么舉例子的

type 類型

這玩意兒就相當於關系型數據庫中的表，注意啊：關系型中表是在數據庫下，那么ES中也相應的類型是在索引之下建立的

表是個什么玩意呢？行和列嘛，這行和列有多少？N多行和N多列嘛，所以：ES中的類型也一樣，可以定義N種類型。
同時：每張表要存儲的數據都不一樣吧，所以表是用來干嘛的？分類 / 分區嘛，所以ES中的類型的作用也來了：就是為了分類嘛。
另外：關系型中可以定義N張表，那么在ES中，也可以定義N種類型

因此：ES中的類型類似於關系型中的表，作用：為了分類 / 分區，同時：可以定義N種類型，但是：類型必須是在索引之下建立的（是索引的邏輯體現嘛）

但是：不同版本的ES，類型也發生了變化，上面的解讀不是全通用的

field 字段

這也就類似於關系型中的列。對文檔數據根據不同屬性（列字段）進行的分類標識

字段常見的簡單類型：注意：id的類型在ES中id是字符串，這點需要注意

字符串：text（可分詞的文本）、keyword（精確值，例如：品牌、國家、ip地址）。text和keyword的區別如下；
- text類型支持全文檢索和完全查詢，即：我搜索時只用字符串中的一個字符照樣得到結果。原理：text使用了分詞，就是把字符串拆分為單個字符串了
- keyword類型支持完全查詢，即：精確查詢，前提：index不是false。原理：keyword不支持分詞，所以：查詢時必須是完全查詢（所有字符匹配上才可以）
數值：long、integer、short、byte、double、float、
布爾：boolean
日期：date
對象：object
地圖類型：geo_point 和 geo_shape
- geo_point：有緯度(latitude) 和經度(longitude)確定的一個點，如：“32.54325453, 120.453254”
- geo_shape：有多個geo_point組成的復雜集合圖形，如一條直線 “LINESTRING (-77.03653 38.897676, -77.009051 38.889939)”
自動補全類型：completion

注意：沒有數組類型，但是可以實現出數組，因為每種類型可以有“多個值”，即可實現出類似於數組類型，例如下面的格式：

{
    "age": 21,	// Integer類型
    "weight": 52.1,		// float類型
    "isMarried": false,		// boolean類型
    "info": "這就是一個屌絲女",		// 字符串類型 可能為test，也可能為keyword 需要看mapping定義時對文檔的約束時什么
    "email": "zixq8@slafjkl.com",	// 字符串類型 可能為test，也可能為keyword 需要看mapping定義時對文檔的約束時什么
    "score": [99.1, 99.5, 98.9],	// 類似數組	就是利用了一個類型可以有多個值
    "name": {		// object對象類型
        "firstName": "紫",
        "lastName": "邪情"
    }
}

還有一個字段的拷貝： 可以使用copy_to屬性將當前字段拷貝到指定字段

使用場景： 多個字段放在一起搜索的時候

注意： 定義的要拷貝的那個字段在ES中看不到，但是確實是存在的，就像個虛擬的一樣

// 定義了一個字段
"all": {
    "type": "text",
    "analyzer": "ik_max_word"
}


"name": {
    "type": "text",
    "analyzer": "ik_max_word",
    "copy_to": "all"		// 將當前字段 name 拷貝到 all字段中去
}

document 文檔

這玩意兒類似於關系型中的行。一個文檔是一個可被索引的基礎信息單元，也就是一條數據嘛

即：用來搜索的數據，其中的每一條數據就是一個文檔。例如一個網頁、一個商品信息

新增文檔：

// 這是kibana中進行的操作，要是使用如postman風格的東西發請求，則在 /索引庫名/_doc/文檔id 前加上es主機地址即可
POST /索引庫名/_doc/文檔id		// 指定了文檔id，若不指定則es自動創建
{
    "字段1": "值1",
    "字段2": "值2",
    "字段3": {
        "子屬性1": "值3",
        "子屬性2": "值4"
    },
    // ...
}

查看指定文檔id的文檔：

GET /{索引庫名稱}/_doc/{id}

刪除指定文檔id的文檔：

DELETE /{索引庫名}/_doc/id值

修改文檔：有兩種方式

全量修改：直接覆蓋原來的文檔。其本質是：
- 根據指定的id刪除文檔
- 新增一個相同id的文檔
- 注意：如果根據id刪除時，id不存在，第二步的新增也會執行，也就從修改變成了新增操作了

// 語法格式
PUT /{索引庫名}/_doc/文檔id
{
    "字段1": "值1",
    "字段2": "值2",
    // ... 略
}

增量/局部修改：是只修改指定id匹配的文檔中的部分字段

// 語法格式
POST /{索引庫名}/_update/文檔id
{
    "doc": {
         "字段名": "新的值",
    }
}

mapping 映射

指的就是：結構信息 / 限制條件

還是對照關系型來看，在關系型中表有哪些字段、該字段是否為null、默認值是什么........諸如此的限制條件，所以ES中的映射就是：數據的使用規則設置

mapping是對索引庫中文檔的約束，常見的mapping屬性包括：

index：是否創建索引，默認為true
analyzer：使用哪種分詞器
properties：該字段的子字段

創建索引庫，最關鍵的是mapping映射，而mapping映射要考慮的信息包括：

字段名
字段數據類型
是否參與搜索
是否需要分詞
如果分詞，分詞器是什么？

其中：

字段名、字段數據類型，可以參考數據表結構的名稱和類型
是否參與搜索要分析業務來判斷，例如圖片地址，就無需參與搜索
是否分詞呢要看內容，內容如果是一個整體就無需分詞，反之則要分詞
分詞器，我們可以統一使用ik_max_word

{
  "mappings": {
    "properties": {		// 子字段
      "字段名1":{		// 定義字段名
        "type": "text",		// 該字段的類型
        "analyzer": "ik_smart"		// 該字段采用的分詞器類型 這是ik分詞器中的，一種為ik_smart 一種為ik_max_word，具體看一開始給的系列知識鏈接
      },
      "字段名2":{
        "type": "keyword",
        "index": "false"		// 該字段是否可以被索引，默認值為trus，即：不想被搜索的字段就可以顯示聲明為false
      },
      "字段名3":{
        "properties": {
          "子字段": {
            "type": "keyword"
          }
        }
      },
      // ...略
    }
  }
}

創建索引庫的同時，創建數據結構約束：

// 格式
PUT /索引庫名稱				// 創建索引庫
{						// 同時創建數據結構約束信息
  "mappings": {
    "properties": {
      "字段名":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "字段名2":{
        "type": "keyword",
        "index": "false"
      },
      "字段名3":{
        "properties": {
          "子字段": {
            "type": "keyword"
          }
        }
      },
      // ...略
    }
  }
}



// 示例
PUT /user
{
  "mappings": {
    "properties": {
      "info":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "email":{
        "type": "keyword",
        "index": "falsae"
      },
      "name":{
        "properties": {
          "firstName": {
            "type": "keyword"
          },
		 "lastName": {
			"type": "keyword"
          }
        }
      },
      // ... 略
    }
  }
}

index 索引庫

所謂索引：類似於關系型數據庫中的數據庫

但是索引這個東西在ES中又有點東西，它的作用和關系型數據庫中的索引是一樣的，相當於門牌號，一個標識，旨在：提高查詢效率，當然，不是說只針對查詢，CRUD都可以弄索引，所以這么一說ES中的索引和關系型數據庫中的索引是一樣的，就不太類似於關系型中的數據庫了，此言差矣！在關系型中有了數據庫，才有表結構（行、列、類型...... ）

而在ES中就是有了索引，才有doc、field.....，因此：這就類似於關系型中的數據庫，只是作用和關系型中的索引一樣罷了

因此：ES中索引類似於關系型中的數據庫，作用：類似於關系型中的索引，旨在：提高查詢效率，當然：在一個集群中可以定義N多個索引，同時：索引名字必須采用全小寫字母

當然：也別忘了有一個倒排索引

關系型數據庫通過增加一個B+樹索引到指定的列上，以便提升數據檢索速度。而ElasticSearch 使用了一個叫做 倒排索引 的結構來達到相同的目的

創建索引： 相當於在創建數據庫

# 在kibana中進行的操作
PUT /索引庫名稱

# 在postman之類的地方創建
http://ip:port/indexName     如：http://127.0.0.1:9200/createIndex    	請求方式：put

注：put請求具有冪等性，冪等性指的是：不管進行多少次重復操作，都是實現相同的結果。可以采用把下面的請求多執行幾次，然后：觀察返回的結果

具有冪等性的有：put、delete、get

查看索引庫：

# 查看指定的索引庫
GET /索引庫名

# 查看所有的索引庫
GET /_cat/indices?v

修改索引庫：

倒排索引結構雖然不復雜，但是一旦數據結構改變（比如改變了分詞器），就需要重新創建倒排索引，這簡直是災難。因此索引庫一旦創建，無法修改mapping。

雖然無法修改mapping中已有的字段，但是卻允許添加新的字段到mapping中，因為不會對倒排索引產生影響。

語法說明：

PUT /索引庫名/_mapping
{
  "properties": {
    "新字段名":{
      "type": "integer"
        // ............
    }
  }
}

刪除索引庫：

DELETE /索引庫名

文檔_doc

使用post創建doc

這種方式：是采用ES隨機生成id時使用的請求方式

注：需要先創建索引，因為：這就類似於關系型數據庫中在數據庫的表中創建數據

語法：

http://ip:port/indexName/_doc     如： http://ip:9200/createIndex/_doc    請求方式：post

使用put創建doc-轉冪等性-自定義id

在路徑后面加一個要創建的id值即可

查詢文檔_doc - 重點

id查詢單條_doc

語法：

http://ip:port/indexName/_doc/id      如： http://ip:9200/createIndex/_doc/100001     請求方式：get

查詢ES中索引下的全部_doc

語法：

http://ip:port/indexName/_search    如： http://ip:9200/createIndex/_search     請求方式：get

注意：別再body中攜帶數據了，不然就會報：

Unknown key for a VALUE_STRING in [title]

返回的結果：

{
    "took": 69,   // 查詢花費的時間  毫秒值
    "timed_out": false,     // 是否超時
    "_shards": {    // 分片  還沒學，先不看
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [   // 查詢出來的 當前索引下的所有_doc文檔
            // .............................
        ]
    }
}

文檔_doc的修改

全量修改

原理：利用內容覆蓋，重新發一份文檔罷了

語法：

http://ip:port/indexName/_doc/id      如： http://ip:9200/createIndex/_doc/100001     請求方式：post

局部修改

語法：

http://ip:port/indexName/_update/id   如： http://ip:9200/createIndex/_update/100001    請求方式：post

文檔_doc的刪除

使用delete請求即可

文檔DSL查詢

elasticsearch的查詢依然是基於JSON風格的DSL來實現的

DSL查詢分類

ElasticSearch提供了基於JSON的DSL（Domain Specific Language）來定義查詢。常見的查詢類型包括：

查詢所有：查詢出所有數據，一般測試用。例如：match_all
全文檢索(full text)查詢：利用分詞器對用戶輸入內容分詞，然后去倒排索引庫中匹配。例如：
- match_query
- multi_match_query
精確查詢：根據精確詞條值查找數據，一般是查找keyword、數值、日期、boolean等類型字段，所以不會對搜索條件分詞。例如：
- ids
- range
- term
地理（geo）查詢：根據經緯度查詢。例如：
- geo_distance
- geo_bounding_box
復合（compound）查詢：復合查詢可以將上述各種查詢條件組合起來，合並查詢條件。例如：
- bool
- function_score
聚合(aggregations)查詢: 可以讓我們極其方便的實現對數據的統計、分析、運算，例如：
- 桶（Bucket）聚合：用來對文檔做分組
- 度量（Metric）聚合：用以計算一些值，比如：最大值、最小值、平均值等
- 管道（pipeline）聚合：其它聚合的結果為基礎做聚合

查詢的語法基本一致：除了聚合查詢

GET /indexName/_search
{
  "query": {
    "查詢類型": {
      "查詢條件": "條件值"
    }
  }
}



// 例如：查詢所有
GET /indexName/_search
{
  "query": {
    "match_all": {		// 查詢類型為match_all
    }				  // 沒有查詢條件
  }
}

其它查詢無非就是查詢類型、查詢條件的變化

全文檢索查詢

定義： 利用分詞器對用戶輸入內容分詞，然后去倒排索引庫中匹配

全文檢索查詢的基本流程如下：

對用戶搜索的內容做分詞，得到詞條
根據詞條去倒排索引庫中匹配，得到文檔id
根據文檔id找到文檔，返回給用戶

使用場景： 搜索框搜索內容，如百度輸入框搜索、google搜索框搜索……….

注意： 因為是拿着詞條去匹配，因此參與搜索的字段必須是可分詞的text類型的字段

常見的全文檢索查詢包括：

match查詢：單字段查詢
multi_match查詢：多字段查詢，任意一個字段符合條件就算符合查詢條件

match查詢語法如下：

GET /indexName/_search
{
  "query": {
    "match": {
      "field": "搜索的文本內容text"
    }
  }
}


// 例如：
GET /indexName/_search
{
  "query": {
    "match": {
      "name": "紫邪情"
    }
  }
}

mulit_match語法如下：

GET /indexName/_search
{
  "query": {
    "multi_match": {
      "query": "搜索的文本內容text",
      "fields": ["field1", "field2"]
    }
  }
}


// 例如：
GET /indexName/_search
{
  "query": {
    "multi_match": {
      "query": "Java",
      "fields": ["username","title", "context"]
    }
  }
}

注意： 搜索字段越多，對查詢性能影響越大，因此建議采用copy_to，然后使用單字段查詢的方式(即：match查詢)

精准查詢

定義： 根據精確詞條值查找數據，一般是查找keyword、數值、日期、boolean等類型字段，所以不會對搜索條件分詞

常見的精准查詢有：

term：根據詞條精確值查詢
range：根據值的范圍查詢

term查詢/精確查詢

因為精確查詢的字段搜是不分詞的字段，因此查詢的條件也必須是不分詞的詞條。查詢時，用戶輸入的內容跟自動值完全匹配時才認為符合條件。如果用戶輸入的內容過多，反而搜索不到數據

語法說明：

// term查詢
GET /indexName/_search
{
  "query": {
    "term": {
      "field": {
        "value": "要精確查詢的內容"
      }
    }
  }
}


// 例如：
GET /indexName/_search
{
  "query": {
    "term": {
      "field": {
        "value": "遙遠的救世主"
      }
    }
  }
}

range查詢/范圍查詢

范圍查詢，一般應用在對數值類型做范圍過濾的時候。比如做價格范圍過濾

基本語法：

// range查詢
GET /indexName/_search
{
  "query": {
    "range": {
      "FIELD": {
        "gte": 10, // gte代表大於等於，gt則代表大於
        "lte": 20 // lte代表小於等於，lt則代表小於
      }
    }
  }
}

// 例如：
GET /indexName/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 10000,
        "lte": 20000
      }
    }
  }
}

地理坐標查詢

所謂的地理坐標查詢，其實就是根據經緯度查詢，官方文檔：https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-queries.html

常見的使用場景包括：

攜程：搜索我附近的酒店
滴滴：搜索我附近的出租車
微信：搜索我附近的人

矩形范圍查詢

矩形范圍查詢，也就是geo_bounding_box查詢，查詢坐標落在某個矩形范圍的所有文檔

查詢時，需要指定矩形的左上、右下兩個點的坐標，然后畫出一個矩形(就是對兩個點畫“十”字，中間交匯的部分就是要的矩形)，落在該矩形內的都是符合條件的點，比如下圖

DKV9HZbVS6

語法如下：

// geo_bounding_box查詢
GET /indexName/_search
{
  "query": {
    "geo_bounding_box": {
      "FIELD": {
        "top_left": { // 左上點
          "lat": 31.1,	// 這個點的經度
          "lon": 121.5	// 這個點的緯度
        },
        "bottom_right": { // 右下點
          "lat": 30.9,
          "lon": 121.7
        }
      }
    }
  }
}

附近查詢/距離查詢

附近查詢，也叫做距離查詢（geo_distance）：查詢到指定中心點小於某個距離值的所有文檔

換句話來說，在地圖上找一個點作為圓心，以指定距離為半徑，畫一個圓，落在圓內的坐標都算符合條件，如下

vZrdKAh19C

語法說明：

// geo_distance 查詢
GET /indexName/_search
{
  "query": {
    "geo_distance": {
      "distance": "距離", // 半徑
      "field": "經度,緯度" // 圓心
    }
  }
}



// 例如：在經緯度為 31.21,121.5 的方圓15km的附近
GET /indexName/_search
{
  "query": {
    "geo_distance": {
      "distance": "15km", // 半徑
      "location": "31.21,121.5" // 圓心
    }
  }
}

復合查詢

復合查詢可以將其它簡單查詢組合起來，實現更復雜的搜索邏輯

常見的復合查詢有兩種：

fuction score：算分函數查詢，可以控制文檔相關性算分，控制文檔排名
bool query：布爾查詢，利用邏輯關系組合多個其它的查詢，實現復雜搜索

function_score 算分函數查詢

算分函數查詢可以控制文檔相關性算分，控制文檔排名

以百度為例，你搜索的結果中，並不是相關度越高排名越靠前，而是誰掏的錢多排名就越靠前

要想人為控制相關性算分，就需要利用elasticsearch中的function score 查詢了

語法格式說明：

function score 查詢中包含四部分內容：

原始查詢條件：query部分，基於這個條件搜索文檔，並且基於BM25算法給文檔打分，原始算分（query score)
過濾條件：filter部分，符合該條件的文檔才會重新算分
算分函數：符合filter條件的文檔要根據這個函數做運算，得到的函數算分（function score），有四種函數
1. weight：函數結果是常量
2. field_value_factor：以文檔中的某個字段值作為函數結果
3. random_score：以隨機數作為函數結果
4. script_score：自定義算分函數算法
運算模式：算分函數的結果、原始查詢的相關性算分，兩者之間的運算方式，包括：
1. multiply：相乘
2. replace：用function score替換query score
3. 其它，例如：sum、avg、max、min

function score的運行流程如下：

根據原始條件查詢搜索文檔，並且計算相關性算分，稱為原始算分（query score）
根據過濾條件，過濾文檔
符合過濾條件的文檔，基於算分函數運算，得到函數算分（function score）
將原始算分（query score）和函數算分（function score）基於運算模式做運算，得到最終結果，作為相關性算分。

因此，其中的關鍵點是：

過濾條件：決定哪些文檔的算分被修改
算分函數：決定函數算分的算法
運算模式：決定最終算分結果

bool 布爾查詢

布爾查詢是一個或多個查詢子句的組合，每一個子句就是一個子查詢。子查詢的組合方式有：

must：必須匹配每個子查詢，類似“與”
should：選擇性匹配子查詢，類似“或”
must_not：必須不匹配，不參與算分，類似“非”
filter：必須匹配，不參與算分

注意： 搜索時，參與打分的字段越多，查詢的性能也越差。因此這種多條件查詢時，建議這樣做：

搜索框的關鍵字搜索，是全文檢索查詢，使用must查詢，參與算分
其它過濾條件，采用filter查詢。不參與算分

示例：

GET /indexName/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {"city": "上海" }}
      ],
      "should": [
        {"term": {"brand": "皇冠假日" }},
        {"term": {"brand": "華美達" }}
      ],
      "must_not": [
        { "range": { "price": { "lte": 500 } }}
      ],
      "filter": [
        { "range": {"score": { "gte": 45 } }}
      ]
    }
  }
}

排序查詢

elasticsearch默認是根據相關度算分（_score）來排序，但是也支持自定義方式對搜索結果排序。可以排序字段類型有：keyword類型、數值類型、地理坐標類型、日期類型等

keyword、數值、日期類型排序的語法基本一致

語法：

GET /indexName/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "FIELD": "desc"  // 排序字段、排序方式ASC、DESC
    }
    // 多個字段排序就繼續寫
  ]
}

排序條件是一個數組，也就是可以寫多個排序條件。按照聲明的順序，當第一個條件相等時，再按照第二個條件排序，以此類推

地理坐標排序略有不同

提示：獲取你的位置的經緯度的方式：https://lbs.amap.com/demo/jsapi-v2/example/map/click-to-get-lnglat/

語法說明：

GET /indexName/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "_geo_distance" : {
          "FIELD" : "緯度，經度", // 文檔中geo_point類型的字段名、目標坐標點
          "order" : "asc", // 排序方式
          "unit" : "km" // 排序的距離單位
      }
    }
  ]
}

這個查詢的含義是：

指定一個坐標，作為目標點
計算每一個文檔中，指定字段（必須是geo_point類型）的坐標到目標點的距離是多少
根據距離排序

分頁查詢

elasticsearch 默認情況下只返回top10的數據。而如果要查詢更多數據就需要修改分頁參數了。elasticsearch中通過修改from、size參數來控制要返回的分頁結果：

from：從第幾個文檔開始
size：總共查詢幾個文檔

類似於mysql中的limit ?, ?

基本分頁

分頁的基本語法如下：

GET /indexName/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0, // 分頁開始的位置，默認為0
  "size": 10, // 期望獲取的文檔總數
  "sort": [
    {"price": "asc"}
  ]
}

優點：支持隨機翻頁
缺點：深度分頁問題，默認查詢上限（from + size）是10000
場景：百度、京東、谷歌、淘寶這樣的隨機翻頁搜索

深度分頁問題

現在，我要查詢990~1000的數據，查詢邏輯要這么寫：

GET /indexName/_search
{
  "query": {
    "match_all": {}
  },
  "from": 990, // 分頁開始的位置，默認為0
  "size": 10, // 期望獲取的文檔總數
  "sort": [
    {"price": "asc"}
  ]
}

這里是查詢990開始的數據，也就是第990~第1000條數據

不過，elasticsearch內部分頁時，必須先查詢 0~1000條，然后截取其中的990 ~ 1000的這10條：

查詢TOP1000，如果es是單點模式，這並無太大影響

但是elasticsearch將來一定是集群，例如我集群有5個節點，我要查詢TOP1000的數據，並不是每個節點查詢200條就可以了

因為節點A的TOP200，在另一個節點可能排到10000名以外了

因此要想獲取整個集群的TOP1000，必須先查詢出每個節點的TOP1000，匯總結果后，重新排名，重新截取TOP1000

那如果我要查詢9900~10000的數據呢？是不是要先查詢TOP10000呢？那每個節點都要查詢10000條？匯總到內存中？

當查詢分頁深度較大時，匯總數據過多，對內存和CPU會產生非常大的壓力，因此elasticsearch會禁止from+ size 超過10000的請求

針對深度分頁，ES提供了兩種解決方案，官方文檔：

search after：分頁時需要排序，原理是從上一次的排序值開始，查詢下一頁數據。官方推薦使用的方式
- 優點：沒有查詢上限（單次查詢的size不超過10000）
- 缺點：只能向后逐頁查詢，不支持隨機翻頁
- 場景：沒有隨機翻頁需求的搜索，例如手機向下滾動翻頁
scroll：原理將排序后的文檔id形成快照，保存在內存。官方已經不推薦使用
- 優點：沒有查詢上限（單次查詢的size不超過10000）
- 缺點：會有額外內存消耗，並且搜索結果是非實時的
- 場景：海量數據的獲取和遷移。從ES7.1開始不推薦，建議用 search after方案

高亮查詢

高亮顯示的實現分為兩步：

給文檔中的所有關鍵字都添加一個標簽，例如<em>標簽
頁面給<em>標簽編寫CSS樣式

高亮的語法：

GET /indexName/_search
{
  "query": {
    "match": {
      "field": "TEXT" // 查詢條件，高亮一定要使用全文檢索查詢
    }
  },
  "highlight": {
    "fields": {
      "FIELD": { // 指定要高亮的字段
        "pre_tags": "<em>",  // 用來標記高亮字段的前置標簽，es默認添加的標簽就是em
        "post_tags": "</em>" // 用來標記高亮字段的后置標簽
      }
    }
  }
}

注意：

高亮是對關鍵字高亮，因此搜索條件必須帶有關鍵字，而不能是范圍這樣的查詢。
默認情況下，高亮的字段，必須與搜索指定的字段一致，否則無法高亮
如果要對非搜索字段高亮，則需要添加一個屬性：required_field_match=false，可以解決的場景：要高亮的字段和搜索指定字段不一致。如：

GET /indexName/_search
{
  "query": {
    "match": {
      "name": "紫邪情" // 查詢條件，高亮一定要使用全文檢索查詢
    }
  },
  "highlight": {
    "fields": {
      "all": { // 假如這里的all字段是利用copy_to將其他很多字段copy進來的，就造成上面搜索字段name與這里要高亮得到字段不一致
        "pre_tags": "<em>",
        "post_tags": "</em>",
        "require_field_match": "false"		// 是否要求字段匹配，即：要高亮字段和搜索字段是否匹配，默認是true
      }
    }
  }
}

聚合查詢/數據聚合

聚合（aggregations ）可以讓我們極其方便的實現對數據的統計、分析、運算。例如：

什么品牌的手機最受歡迎？
這些手機的平均價格、最高價格、最低價格？
這些手機每月的銷售情況如何？

實現這些統計功能的比數據庫的sql要方便的多，而且查詢速度非常快，可以實現近實時搜索效果

聚合的分類

聚合常見的有三類：

桶（Bucket）聚合：用來對文檔做分組
- TermAggregation：按照文檔字段值分組，例如按照品牌值分組、按照國家分組
- Date Histogram：按照日期階梯分組，例如一周為一組，或者一月為一組
度量（Metric）聚合：用以計算一些值，比如：最大值、最小值、平均值等
- Avg：求平均值
- Max：求最大值
- Min：求最小值
- Stats：同時求max、min、avg、sum等
管道（pipeline）聚合：其它聚合的結果為基礎做聚合

注意：參加聚合的字段必須是keyword、日期、數值、布爾類型，即：只要不是 text 類型即可，因為text類型會進行分詞，而聚合不能進行分詞

Bucket 桶聚合

桶（Bucket）聚合：用來對文檔做分組

TermAggregation：按照文檔字段值分組，例如按照品牌值分組、按照國家分組
Date Histogram：按照日期階梯分組，例如一周為一組，或者一月為一組

語法如下：

GET hhtp://ip:port/indexName/_search
{
  "query": {	// 加入基礎查詢，從而限定聚合范圍，不然默認是將es中的文檔全部查出來再聚合
    "查詢類型": {
      "查詢條件": "條件值"
    }
  },
  "size": 0,  // 設置size為0，結果中不包含文檔，只包含聚合結果	即：去掉結果hits中的hits數組的數據
  "aggs": { // 定義聚合
    "AggName": { //給聚合起個名字
      "aggType": { // 聚合的類型，跟多類型去官網
        "field": "value", // 參與聚合的字段
        "size": 20, // 希望獲取的聚合結果數量	默認是10
		"order": {	// 改變聚合的排序規則，默認是 desc 降序
			"_key": "asc" // 按照什么關鍵字以什么類型排列
        }
      }
    }
  }
}

例如：

// 數據聚合
GET /indexName/_search
{
  "query": {
    "range": {
      "price": {
        "lte": 200
      }
    }
  }, 
  "size": 0, 
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 15,
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}

Metric 度量聚合

度量（Metric）聚合：用以計算一些值，比如：最大值、最小值、平均值等

Avg：求平均值
Max：求最大值
Min：求最小值
Stats：同時求max、min、avg、sum等

語法如下：

GET /indexName/_search
{
  "size": 0, 
  "aggs": {
    "aggName": { 
      "aggType": { 
        "field": "value", 
        "size": 20,
        "order": {
            "_key": "orderType"
        }
      },
      "aggs": { // brands聚合的子聚合，也就是分組后對每組分別計算
        "aggName": { // 聚合名稱
          "aggType": { // 聚合類型，這里stats可以計算min、max、avg等
            "field": "value" // 聚合字段
          }
        }
      }
    }
  }
}


// 例如：
GET /indexName/_search
{
  "size": 0, 
  "aggs": {
    "brandAgg": { 
      "terms": { 
        "field": "brand", 
        "size": 20,
        "order": {
            "scoreAgg.avg": "asc"	// 注意：若是要使用子聚合來對每個桶進行排序，則這里的寫法有點區別
        }
      },
      "aggs": {
        "scoreAgg": {
          "stats": {
            "field": "score"
          }
        }
      }
    }
  }
}

自動補全查詢 completion

elasticsearch提供了Completion Suggester查詢來實現自動補全功能。這個查詢會匹配以用戶輸入內容開頭的詞條並返回。為了提高補全查詢的效率，對於文檔中字段的類型有一些約束：

參與補全查詢的字段必須是completion類型
字段的內容一般是用來補全的多個詞條形成的數組

場景： 搜索框輸入關鍵字，搜索框下面就會彈出很多相應的內容出來

比如，一個這樣的索引庫：

// 創建索引庫
PUT test
{
  "mappings": {
    "properties": {
      "title":{
        "type": "completion"	// 指定字段類型為 completion
      }
    }
  }
}

然后插入下面的數據：

// 示例數據
POST test/_doc
{
  "title": ["Sony", "WH-1000XM3"]	// 字段內容為多個詞條組成的數組
}
POST test/_doc
{
  "title": ["SK-II", "PITERA"]
}
POST test/_doc
{
  "title": ["Nintendo", "switch"]
}

查詢的DSL語句如下：

// 自動補全查詢
GET /test/_search
{
  "suggest": {
    "title_suggest": {	// 起個名字
      "text": "s", // 關鍵字
      "completion": {
        "field": "title", // 補全查詢的字段
        "skip_duplicates": true, // 跳過重復的
        "size": 10 // 獲取前10條結果
      }
    }
  }
}

Java操作ES篇 - 重點

摸索Java鏈接ES的流程

自行創建一個maven項目

父項目依賴管理

<properties>
    <ES-version>7.8.0</ES-version>
    <log4j-version>1.2.17</log4j-version>
    <junit-version>4.13.2</junit-version>
    <jackson-version>2.13.0</jackson-version>
    <fastjson.version>1.2.83</fastjson.version>
</properties>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <!-- 注意：這里的版本問題，要和下載的window的ES版本一致，甚至后續用linux搭建也是一樣的
                          到時用linux時，ES、kibana的版本都有這樣的限定
                -->
            <version>${ES-version}</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <!-- 注意：這里別搞成了elasticsearch-client
                    這個東西在7.x已經不推薦使用了，而到了8.0之后，這個elasticsearch-client已經完全被廢棄了
                 -->
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <!-- 同樣的，注意版本問題 -->
            <version>${ES-version}</version>
        </dependency>

        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>${log4j-version}</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>${junit-version}</version>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>${jackson-version}</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>${fastjson.version}</version>
        </dependency>
    </dependencies>
</dependencyManagement>

摸索鏈接流程

獲取父項目中的依賴

<dependencies>
    <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch</artifactId>
    </dependency>

    <dependency>
        <groupId>org.elasticsearch.client</groupId>
        <artifactId>elasticsearch-rest-high-level-client</artifactId>
    </dependency>

    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
    </dependency>
</dependencies>

代碼編寫：

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.Test;

import Java.io.IOException;


public class ConnectionTest {
    /**
     * 倒着看邏輯即可
     */
    @Test
    public void test() throws IOException {

        // 3、創建HttpHost
        HttpHost host = new HttpHost("127.0.0.1", 9200);	// 需要：String hostname, int port
      // 當然：這個方法重載中有一個參數scheme  這個是：訪問方式 根據需求用http / https都可以  這里想傳的話用：http就可以了

        // 2、創建RestClientBuilder
        RestClientBuilder clientBuilder = RestClient.builder(host);
        // 發現1、有重載；2、重載之中有幾個參數，而HttpHost... hosts 這個參數貌似貼近我們想要的東西了，所以建一個HttpHost


        // 1、要鏈接client，那肯定需要一個client咯，正好：導入得有high-level-client
        RestHighLevelClient esClient = new RestHighLevelClient(clientBuilder);
        // 發現需要RestClientBuilder restClientBuilder，那就建

        // 4、釋放資源
        esClient.close();
    }
}

Java中操作ES索引

向父項目獲取自己要的依賴

<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
</dependency>

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>

<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <scope>test</scope>
</dependency>

封裝鏈接對象

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;

/**
 * @ClassName ESClientUtil
 * @Author ZiXieQing
 * @Date 2021/12/14
 * Version 1.0
 **/
public class ESClientUtil {

    private static final String HOST = "127.0.0.1";
    private static final Integer PORT = 9200;

    public static RestHighLevelClient getESClient() {
        return new RestHighLevelClient(RestClient.builder(new HttpHost(HOST, PORT)));
        // 還有一種方式
        // return new RestHighLevelClient(RestClient.builder(HttpHost.create("http://ip:9200")));
    }
}

操作索引

import org.apache.http.HttpHost;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.admin.indices.flush.FlushRequest;
import org.elasticsearch.action.admin.indices.flush.FlushResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

import static com.zixieqing.hotel.constant.MappingConstant.mappingContext;

/**
 * elasticsearch的索引庫測試
 * 規律：esClient.indices().xxx(xxxIndexRequest(IndexName), RequestOptions.DEFAULT)
 *      其中 xxx 表示要對索引進行得的操作，如：create、delete、get、flush、exists.............
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest(classes = HotelApp.class)
public class o1IndexTest {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://ip:9200")));
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

    /**
     * 創建索引 並 創建字段的mapping映射關系
     */
    @Test
    void createIndexAndMapping() throws IOException {
        // 1、創建索引
        CreateIndexRequest request = new CreateIndexRequest("person");
        // 2、創建字段的mapping映射關系   參數1：編寫的mapping json字符串  參數2：采用的文本類型
        request.source(mappingContext, XContentType.JSON);
        // 3、發送請求 正式創建索引庫與mapping映射關系
        CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
        // 查看是否創建成功
        System.out.println("response.isAcknowledged() = " + response.isAcknowledged());
        // 判斷指定索引庫是否存在
        boolean result = client.indices().exists(new GetIndexRequest("person"), RequestOptions.DEFAULT);
        System.out.println(result ? "hotel索引庫存在" : "hotel索引庫不存在");
    }

    /**
     * 刪除指定索引庫
     */
    @Test
    void deleteIndexTest() throws IOException {
        // 刪除指定的索引庫
        AcknowledgedResponse response = client.indices()
                .delete(new DeleteIndexRequest("person"), RequestOptions.DEFAULT);
        // 查看是否成功
        System.out.println("response.isAcknowledged() = " + response.isAcknowledged());
    }

    // 索引庫一旦創建，則不可修改，但可以添加mapping映射

    /**
     * 獲取指定索引庫
     */
    @Test
    void getIndexTest() throws IOException {
        // 獲取指定索引
        GetIndexResponse response = client.indices()
                .get(new GetIndexRequest("person"), RequestOptions.DEFAULT);
    }

    /**
     * 刷新索引庫
     */
    @Test
    void flushIndexTest() throws IOException {
        // 刷新索引庫
        FlushResponse response = client.indices().flush(new FlushRequest("person"), RequestOptions.DEFAULT);
        // 檢查是否成功
        System.out.println("response.getStatus() = " + response.getStatus());
    }
}

Java操作ES中的文檔_doc - 重點

這里還需要json依賴，使用jackson或fastjson均可

同時：為了偷懶，所以把lombok也一起導入了

基本的文檔CRUD

import com.alibaba.fastjson.JSON;
import com.zixieqing.hotel.pojo.Hotel;
import com.zixieqing.hotel.pojo.HotelDoc;
import com.zixieqing.hotel.service.IHotelService;
import org.apache.http.HttpHost;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

/**
 * elasticsearch的文檔測試
 * 規律：esClient.xxx(xxxRequest(IndexName, docId), RequestOptions.DEFAULT)
 *      其中 xxx 表示要進行的文檔操作，如：
 *          index   新增文檔
 *          delete  刪除指定id文檔
 *          get     獲取指定id文檔
 *          update  修改指定id文檔的局部數據
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest(classes = HotelApp.class)
public class o2DocumentTest {
    @Autowired
    private IHotelService service;

    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

    /**
     * 添加文檔
     */
    @Test
    void addDocumentTest() throws IOException {

        // 1、准備要添加的文檔json數據
        // 通過id去數據庫獲取數據
        Hotel hotel = service.getById(36934L);
        // 當數據庫中定義的表結構和es中定義的字段mapping映射不一致時：將從數據庫中獲取的數據轉成 es 中定義的mapping映射關系對象
        HotelDoc hotelDoc = new HotelDoc(hotel);

        // 2、准備request對象    指定 indexName+文檔id
        IndexRequest request = new IndexRequest("hotel").id(hotel.getId().toString());

        // 3、把數據轉成json
        request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);

        // 4、發起請求，正式在ES中添加文檔    就是根據數據建立倒排索引，所以這里調研了index()
        IndexResponse response = client.index(request, RequestOptions.DEFAULT);

        // 5、檢查是否成功     使用下列任何一個API均可   若成功二者返回的結果均是 CREATED
        System.out.println("response.getResult() = " + response.getResult());
        System.out.println("response.status() = " + response.status());
    }

    /**
     * 根據id刪除指定文檔
     */
    @Test
    void deleteDocumentTest() throws IOException {
        // 1、准備request對象
        DeleteRequest request = new DeleteRequest("indexName", "docId");

        // 2、發起請求
        DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
        // 查看是否成功   成功則返回 OK
        System.out.println("response.status() = " + response.status());
    }

    /**
     * 獲取指定id的文檔
     */
    @Test
    void getDocumentTest() throws IOException {
        // 1、獲取request
        GetRequest request = new GetRequest"indexName", "docId");

        // 2、發起請求，獲取響應對象
        GetResponse response = client.get(request, RequestOptions.DEFAULT);

        // 3、解析結果
        HotelDoc hotelDoc = JSON.parseObject(response.getSourceAsString(), HotelDoc.class);
        System.out.println("hotelDoc = " + hotelDoc);
    }

    /**
     * 修改指定索引庫 和 文檔id的局部字段數據
     * 全量修改是直接刪除指定索引庫下的指定id文檔，然后重新添加相同文檔id的文檔即可
     */
    @Test
    void updateDocumentTest() throws IOException {
        // 1、准備request對象
        UpdateRequest request = new UpdateRequest("indexName", "docId");

        // 2、要修改那個字段和值      注：參數是 key, value 形式 中間是 逗號
        request.doc(
                "price",500
        );

        // 3、發起請求
        UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
        // 查看結果 成功則返回 OK
        System.out.println("response.status() = " + response.status());
    }
}

批量操作文檔

本質：把請求封裝了而已，從而讓這個請求可以傳遞各種類型參數，如：刪除的、修改的、新增的，這樣就可以搭配for循環

package com.zixieqing.hotel;

import com.alibaba.fastjson.JSON;
import com.zixieqing.hotel.pojo.Hotel;
import com.zixieqing.hotel.pojo.HotelDoc;
import com.zixieqing.hotel.service.IHotelService;
import org.apache.http.HttpHost;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.get.MultiGetItemResponse;
import org.elasticsearch.action.get.MultiGetRequest;
import org.elasticsearch.action.get.MultiGetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;
import java.util.List;

/**
 * elasticsearch 批量操作文檔測試
 * 規律：EsClient.bulk(new BulkRequest()
 *                    .add(xxxRequest("indexName").id().source())
 *                    , RequestOptions.DEFAULT)
 * 其中：xxx 表示要進行的操作，如
 *      index   添加
 *      delete  刪除
 *      get     查詢
 *      update  修改
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest(classes = HotelApp.class)
public class o3BulkDocumentTest {
    @Autowired
    private IHotelService service;

    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

    /**
     * 批量添加文檔數據到es中
     */
    @Test
    void bulkAddDocumentTest() throws IOException {
        // 1、去數據庫批量查詢數據
        List<Hotel> hotels = service.list();

        // 2、將數據庫中查詢的數據轉成 es 的mapping需要的對象
        BulkRequest request = new BulkRequest();
        for (Hotel hotel : hotels) {
            HotelDoc hotelDoc = new HotelDoc(hotel);
            // 批量添加文檔數據到es中
            request.add(new IndexRequest("hotel")
                    .id(hotelDoc.getId().toString())
                    .source(JSON.toJSONString(hotelDoc), XContentType.JSON));
        }

        // 3、發起請求
        BulkResponse response = client.bulk(request, RequestOptions.DEFAULT);
        // 檢查是否成功   成功則返回OK
        System.out.println("response.status() = " + response.status());
    }

    /**
     * 批量刪除es中的文檔數據
     */
    @Test
    void bulkDeleteDocumentTest() throws IOException {
        // 1、准備要刪除數據的id
        List<Hotel> hotels = service.list();

        // 2、准備request對象
        BulkRequest request = new BulkRequest();
        for (Hotel hotel : hotels) {
            // 根據批量數據id 批量刪除es中的文檔
            request.add(new DeleteRequest("hotel").id(hotel.getId().toString()));
        }

        // 3、發起請求
        BulkResponse response = client.bulk(request, RequestOptions.DEFAULT);
        // 檢查是否成功       成功則返回 OK
        System.out.println("response.status() = " + response.status());
    }

    
    // 批量獲取和批量修改是同樣的套路  批量獲取還可以使用 mget 這個API


    /**
     * mget批量獲取
     */
    @Test
    void mgetTest() throws IOException {
        List<Hotel> hotels = service.list();

        // 1、准備request對象
        MultiGetRequest request = new MultiGetRequest();
        for (Hotel hotel : hotels) {
            // 添加get數據    必須指定index 和 文檔id，可以根據不同index查詢
            request.add("hotel", hotel.getId().toString());
        }

        // 2、發起請求，獲取響應
        MultiGetResponse responses = client.mget(request, RequestOptions.DEFAULT);
        for (MultiGetItemResponse response : responses) {
            GetResponse resp = response.getResponse();
            // 如果存在則打印響應信息
            if (resp.isExists()) {
                System.out.println("獲取到的數據= " +resp.getSourceAsString());
            }
        }
    }
}

Java進行DSL文檔查詢

其實這種查詢都是套路而已，一看前面玩的DSL查詢的json形式是怎么寫的，二看你要做的是什么查詢，然后就是用 queryBuilds 將對應的查詢構建出來，其他都是相同套路了

查詢所有 match all

match all：查詢出所有數據

package com.zixieqing.hotel.dsl_query_document;

import com.zixieqing.hotel.HotelApp;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

/**
 * es的dsl文檔查詢之match all查詢所有，也可以稱之為 全量查詢
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest
public class o1MatchAll {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }


    /**
     * 全量查詢：查詢所有數據
     */
    @Test
    void matchAllTest() throws IOException {
        // 1、准備request
        SearchRequest request = new SearchRequest("indexName");
        // 2、指定哪種查詢/構建DSL語句
        request.source().query(QueryBuilders.matchAllQuery());
        // 3、發起請求 獲取響應對象
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4、處理響應結果
        // 4.1、獲取結果中的Hits
        SearchHits searchHits = response.getHits();
        // 4.2、獲取Hits中的total
        long total = searchHits.getTotalHits().value;
        System.out.println("總共獲取了 " + total + " 條數據");
        // 4.3、獲取Hits中的hits
        SearchHit[] hits = searchHits.getHits();
        for (SearchHit hit : hits) {
            // 4.3.1、獲取hits中的source 也就是真正的數據，獲取到之后就可以用來處理自己要的邏輯了
            String source = hit.getSourceAsString();
            System.out.println("source = " + source);
        }
    }
}

Java代碼和前面玩的DSL語法的對應情況：

全文檢索查詢

match 單字段查詢與 multi match多字段查詢

下面的代碼根據情境需要，可以自行將響應結果處理進行抽取

package com.zixieqing.hotel.dsl_query_document;

import com.zixieqing.hotel.HotelApp;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

/**
 * DLS之全文檢索查詢：利用分詞器對用戶輸入內容分詞，然后去倒排索引庫中匹配
 * match_query 單字段查詢 和 multi_match_query 多字段查詢
 *
 * <p>@author       : ZiXieqing</p>
 */


@SpringBootTest
public class o2FullTextTest {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

    /**
     * match_query  單字段查詢
     */
    @Test
    void matchQueryTest() throws IOException {
        // 1、准備request
        SearchRequest request = new SearchRequest("indexName");
        // 2、准備DSL
        request.source().query(QueryBuilders.matchQuery("city", "上海"));
        // 3、發送請求，獲取響應對象
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 處理響應結果，后面都是一樣的流程 都是解析json結果而已
        SearchHits searchHits = response.getHits();
        long total = searchHits.getTotalHits().value;
        System.out.println("獲取了 " + total + " 條數據");
        for (SearchHit hit : searchHits.getHits()) {
            String dataJson = hit.getSourceAsString();
            System.out.println("dataJson = " + dataJson);
        }
    }

    /**
     * multi match 多字段查詢 任意一個字段符合條件就算符合查詢條件
     */
    @Test
    void multiMatchTest() throws IOException {
        SearchRequest request = new SearchRequest("indexName");
        request.source().query(QueryBuilders.multiMatchQuery("成人用品", "name", "business"));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 處理響應結果，后面都是一樣的流程 都是解析json結果而已
        SearchHits searchHits = response.getHits();
        long total = searchHits.getTotalHits().value;
        System.out.println("獲取了 " + total + " 條數據");
        for (SearchHit hit : searchHits.getHits()) {
            String dataJson = hit.getSourceAsString();
            System.out.println("dataJson = " + dataJson);
        }
    }
}

精確查詢

精確查詢：根據精確詞條值查找數據，一般是查找keyword、數值、日期、boolean等類型字段，所以不會對搜索條件分詞

range 范圍查詢和 term精准查詢

term：根據詞條精確值查詢

range：根據值的范圍查詢

package com.zixieqing.hotel.dsl_query_document;

import com.zixieqing.hotel.HotelApp;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

/**
 * DSL之精確查詢：根據精確詞條值查找數據，一般是查找keyword、數值、日期、boolean等類型字段，所以 不會 對搜索條件分詞
 * range 范圍查詢 和 term 精准查詢
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest
public class o3ExactTest {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

    /**
     * term 精准查詢 根據詞條精確值查詢
     * 和 match 單字段查詢有區別，term要求內容完全匹配
     */
    @Test
    void termTest() throws IOException {
        SearchRequest request = new SearchRequest("indexName");
        request.source().query(QueryBuilders.termQuery("city", "深圳"));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 處理響應結果，后面都是一樣的流程 都是解析json結果而已
        SearchHits searchHits = response.getHits();
        long total = searchHits.getTotalHits().value;
        System.out.println("獲取了 " + total + " 條數據");
        for (SearchHit hit : searchHits.getHits()) {
            String dataJson = hit.getSourceAsString();
            System.out.println("dataJson = " + dataJson);
        }
    }

    /**
     * range 范圍查詢
     */
    @Test
    void rangeTest() throws IOException {
        SearchRequest request = new SearchRequest("indexName");
        request.source().query(QueryBuilders.rangeQuery("price").lte(250));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 處理響應結果，后面都是一樣的流程 都是解析json結果而已
        SearchHits searchHits = response.getHits();
        long total = searchHits.getTotalHits().value;
        System.out.println("獲取了 " + total + " 條數據");
        for (SearchHit hit : searchHits.getHits()) {
            String dataJson = hit.getSourceAsString();
            System.out.println("dataJson = " + dataJson);
        }
    }
}

地理坐標查詢

geo_distance 附近查詢

package com.zixieqing.hotel.dsl_query_document;

import com.zixieqing.hotel.HotelApp;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

/**
 * DSL之地理位置查詢
 * geo_bounding_box 矩形范圍查詢 和 geo_distance 附近查詢
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest
public class o4GeoTest {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

    /**
     * geo_distance 附近查詢
     */
    @Test
    void geoDistanceTest() throws IOException {
        SearchRequest request = new SearchRequest("indexName");
        request.source()
                .query(QueryBuilders.geoDistanceQuery("location")
                        .distance("15km").point(31.21,121.5));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 處理響應結果，后面都是一樣的流程 都是解析json結果而已
        SearchHits searchHits = response.getHits();
        long total = searchHits.getTotalHits().value;
        System.out.println("獲取了 " + total + " 條數據");
        for (SearchHit hit : searchHits.getHits()) {
            String dataJson = hit.getSourceAsString();
            System.out.println("dataJson = " + dataJson);
        }
    }
}

復合查詢

function_score 算分函數查詢是差不多的道理

bool 布爾查詢之must、should、must not、filter查詢

布爾查詢是一個或多個查詢子句的組合，每一個子句就是一個子查詢。子查詢的組合方式有：

must：必須匹配每個子查詢，類似“與”
should：選擇性匹配子查詢，類似“或”
must_not：必須不匹配，不參與算分，類似“非”
filter：必須匹配，不參與算分

注意： 搜索時，參與打分的字段越多，查詢的性能也越差。因此這種多條件查詢時，建議這樣做：

搜索框的關鍵字搜索，是全文檢索查詢，使用must查詢，參與算分
其它過濾條件，采用filter查詢。不參與算分

package com.zixieqing.hotel.dsl_query_document;

import com.zixieqing.hotel.HotelApp;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

/**
 * DSL之復合查詢：基礎DSL查詢進行組合，從而得到實現更復雜邏輯的復合查詢
 * function_score 算分函數查詢
 *
 * bool布爾查詢
 *  must     必須匹配每個子查詢   即：and “與”   參與score算分
 *  should   選擇性匹配子查詢    即：or “或”    參與score算分
 *  must not 必須不匹配         即：“非"       不參與score算分
 *  filter   必須匹配           即：過濾        不參與score算分
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest
public class o5Compound {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }


    /**
     * bool布爾查詢
     *  must     必須匹配每個子查詢   即：and “與”   參與score算分
     *  should   選擇性匹配子查詢    即：or “或”    參與score算分
     *  must not 必須不匹配         即：“非"       不參與score算分
     *  filter   必須匹配           即：過濾        不參與score算分
     */
    @Test
    void boolTest() throws IOException {
        SearchRequest request = new SearchRequest("indexName");
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        // 構建must   即：and 與
        boolQueryBuilder.must(QueryBuilders.termQuery("city", "北京"));
        // 構建should   即：or 或
        boolQueryBuilder.should(QueryBuilders.multiMatchQuery("速8", "brand", "name"));
        // 構建must not   即：非
        boolQueryBuilder.mustNot(QueryBuilders.rangeQuery("price").gte(250));
        // 構建filter   即：過濾
        boolQueryBuilder.filter(QueryBuilders.termQuery("starName", "二鑽"));

        request.source().query(boolQueryBuilder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 處理響應結果，后面都是一樣的流程 都是解析json結果而已
        SearchHits searchHits = response.getHits();
        long total = searchHits.getTotalHits().value;
        System.out.println("獲取了 " + total + " 條數據");
        for (SearchHit hit : searchHits.getHits()) {
            String dataJson = hit.getSourceAsString();
            System.out.println("dataJson = " + dataJson);
        }
    }
}

Java代碼和前面玩的DSL語法對應關系：

fuzzy 模糊查詢

package com.zixieqing.hotel.dsl_query_document;

import com.zixieqing.hotel.HotelApp;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.Fuzziness;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

/**
 * DSL之模糊查詢
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest
public class o6FuzzyTest {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

	/**
     * 模糊查詢
     */
    @Test
    void fuzzyTest() throws IOException {
        SearchRequest request = new SearchRequest("indexName");
        // fuzziness(Fuzziness.ONE)     表示的是：字符誤差數  取值有：zero、one、two、auto
        // 誤差數  指的是：fuzzyQuery("name","深圳")這里面匹配的字符的誤差    可以有幾個字符不一樣，多/少幾個字符？
        request.source().query(QueryBuilders.fuzzyQuery("name", "深圳").fuzziness(Fuzziness.ONE));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 處理響應結果，后面都是一樣的流程 都是解析json結果而已
        SearchHits searchHits = response.getHits();
        long total = searchHits.getTotalHits().value;
        System.out.println("獲取了 " + total + " 條數據");
        for (SearchHit hit : searchHits.getHits()) {
            String dataJson = hit.getSourceAsString();
            System.out.println("dataJson = " + dataJson);
        }
    }
}

排序和分頁查詢

package com.zixieqing.hotel.dsl_query_document;

import com.zixieqing.hotel.HotelApp;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.sort.SortOrder;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

/**
 * DSL之排序和分頁
 *
 * <p>@author       : ZiXieqing</p>
 */


@SpringBootTest
public class o7SortAndPageTest {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

    /**
     * sort 排序查詢
     */
    @Test
    void sortTest() throws IOException {
        SearchRequest request = new SearchRequest("indexName");
        request.source()
                .query(QueryBuilders.matchAllQuery())
                .sort("price", SortOrder.ASC);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);

        // 處理響應結果，后面都是一樣的流程 都是解析json結果而已
        SearchHits searchHits = response.getHits();
        long total = searchHits.getTotalHits().value;
        System.out.println("獲取了 " + total + " 條數據");
        for (SearchHit hit : searchHits.getHits()) {
            String dataJson = hit.getSourceAsString();
            System.out.println("dataJson = " + dataJson);
        }
    }

    /**
     * page 分頁查詢
     */
    @Test
    void pageTest() throws IOException {
        int page = 2, size = 20;
        SearchRequest request = new SearchRequest("indexName");
        request.source()
                .query(QueryBuilders.matchAllQuery())
                .from((page - 1) * size).size(size);

        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 處理響應結果，后面都是一樣的流程 都是解析json結果而已
        SearchHits searchHits = response.getHits();
        long total = searchHits.getTotalHits().value;
        System.out.println("獲取了 " + total + " 條數據");
        for (SearchHit hit : searchHits.getHits()) {
            String dataJson = hit.getSourceAsString();
            System.out.println("dataJson = " + dataJson);
        }
    }
}

高亮查詢

返回結果處理的邏輯有點區別，但思路都是一樣的

package com.zixieqing.hotel.dsl_query_document;

import com.alibaba.fastjson.JSON;
import com.zixieqing.hotel.HotelApp;
import com.zixieqing.hotel.pojo.HotelDoc;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.util.CollectionUtils;

import java.io.IOException;
import java.util.Map;

/**
 * DSL之高亮查詢
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest(classes = HotelApp.class)
public class o8HighLightTest {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

    /**
     * 高亮查詢
     * 返回結果處理不太一樣
     */
    @Test
    void highLightTest() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        request.source()
                .query(QueryBuilders.matchQuery("city", "北京"))
                .highlighter(SearchSourceBuilder.highlight()
                        .field("name")  // 要高亮的字段
                        .preTags("<em>")    // 前置HTML標簽 默認就是em
                        .postTags("</em>")  // 后置標簽
                        .requireFieldMatch(false));     // 是否進行查詢字段和高亮字段匹配

        // 發起請求，獲取響應對象
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 處理響應結果
        for (SearchHit hit : response.getHits()) {
            String originalData = hit.getSourceAsString();
            HotelDoc hotelDoc = JSON.parseObject(originalData, HotelDoc.class);
            System.out.println("原始數據為：" + originalData);

            // 獲取高亮之后的結果
            // key 為要進行高亮的字段，如上為field("name")   value 為添加了標簽之后的高亮內容
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            if (!CollectionUtils.isEmpty(highlightFields)) {
                // 根據高亮字段，獲取對應的高亮內容
                HighlightField name = highlightFields.get("name");
                if (name != null) {
                    // 獲取高亮內容   是一個數組
                    String highLightStr = name.getFragments()[0].string();
                    hotelDoc.setName(highLightStr);
                }
            }

            System.out.println("hotelDoc = " + hotelDoc);
        }
    }
}

代碼和DSL語法對應關系： request.source() 獲取到的就是返回結果的整個json文檔

聚合查詢

聚合（aggregations ）可以讓我們極其方便的實現對數據的統計、分析、運算

聚合常見的有三類：

桶（Bucket）聚合：用來對文檔做分組
- TermAggregation：按照文檔字段值分組，例如按照品牌值分組、按照國家分組
- Date Histogram：按照日期階梯分組，例如一周為一組，或者一月為一組
度量（Metric）聚合：用以計算一些值，比如：最大值、最小值、平均值等
- Avg：求平均值
- Max：求最大值
- Min：求最小值
- Stats：同時求max、min、avg、sum等
管道（pipeline）聚合：其它聚合的結果為基礎做聚合

注意：參加聚合的字段必須是keyword、日期、數值、布爾類型，即：只要不是 text 類型即可，因為text類型會進行分詞，而聚合不能進行分詞

package com.zixieqing.hotel.dsl_query_document;

import com.zixieqing.hotel.HotelApp;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.Aggregations;
import org.elasticsearch.search.aggregations.BucketOrder;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;
import java.util.List;

/**
 * 數據聚合 aggregation 可以讓我們極其方便的實現對數據的統計、分析、運算
 * 桶（Bucket）聚合：用來對文檔做分組
 *      TermAggregation：按照文檔字段值分組，例如按照品牌值分組、按照國家分組
 *      Date Histogram：按照日期階梯分組，例如一周為一組，或者一月為一組
 *
 *  度量（Metric）聚合：用以計算一些值，比如：最大值、最小值、平均值等
 *      Avg：求平均值
 *      Max：求最大值
 *      Min：求最小值
 *      Stats：同時求max、min、avg、sum等
 *
 *  管道（pipeline）聚合：其它聚合的結果為基礎做聚合
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest(classes = HotelApp.class)
public class o9AggregationTest {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

    @Test
    void aggregationTest() throws IOException {
        // 獲取request
        SearchRequest request = new SearchRequest("indexName");
        // 組裝DSL
        request.source()
                .size(0)
                .query(QueryBuilders
                        .rangeQuery("price")
                        .lte(250)
                )
                .aggregation(AggregationBuilders
                        .terms("brandAgg")
                        .field("brand")
                        .order(BucketOrder.aggregation("scoreAgg.avg",true))
                        .subAggregation(AggregationBuilders
                                .stats("scoreAgg")
                                .field("score")
                        )
                );

        // 發送請求，獲取響應
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 處理響應結果
        System.out.println("response = " + response);
        // 獲取全部聚合結果對象 getAggregations
        Aggregations aggregations = response.getAggregations();
        // 根據聚合名 獲取其聚合對象
        Terms brandAgg = aggregations.get("brandAgg");
        // 根據聚合類型 獲取對應聚合對象
        List<? extends Terms.Bucket> buckets = brandAgg.getBuckets();
        for (Terms.Bucket bucket : buckets) {
            // 根據key獲取其value
            String value = bucket.getKeyAsString();
            // 將value根據需求做處理
            System.out.println("value = " + value);
        }
    }
}

請求組裝對應關系：

響應結果對應關系：

自動補全查詢

package com.zixieqing.hotel.dsl_query_document;

import com.zixieqing.hotel.HotelApp;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.search.suggest.Suggest;
import org.elasticsearch.search.suggest.SuggestBuilder;
import org.elasticsearch.search.suggest.SuggestBuilders;
import org.elasticsearch.search.suggest.completion.CompletionSuggestion;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

/**
 * 自動補全 completion類型： 這個查詢會匹配以用戶輸入內容開頭的詞條並返回
 *  參與補全查詢的字段 必須 是completion類型
 *  字段的內容一般是用來補全的多個詞條形成的數組
 *
 * <p>@author       : ZiXieqing</p>
 */

@SpringBootTest(classes = HotelApp.class)
public class o10Suggest {
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        this.client = new RestHighLevelClient(
                RestClient.builder(HttpHost.create("http://ip:9200"))
        );
    }

    @AfterEach
    void tearDown() throws IOException {
        this.client.close();
    }

    @Test
    void completionTest() throws IOException {
        // 准備request
        SearchRequest request = new SearchRequest("hotel");
        // 構建DSL
        request.source()
                .suggest(new SuggestBuilder()
                        .addSuggestion(
                                "title_suggest",
                                SuggestBuilders.completionSuggestion("title")
                                        .prefix("s")
                                        .skipDuplicates(true)
                                        .size(10)
                        ));

        // 發起請求，獲取響應對象
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 解析響應結果
        // 獲取整個suggest對象
        Suggest suggest = response.getSuggest();
        // 通過指定的suggest名字，獲取其對象
        CompletionSuggestion titleSuggest = suggest.getSuggestion("title_suggest");
        for (CompletionSuggestion.Entry options : titleSuggest) {
            // 獲取每一個options中的test內容
            String context = options.getText().string();
            // 按需求對內容進行處理
            System.out.println("context = " + context);
        }
    }
}

代碼與DSL、響應結果對應關系：

ES與MySQL數據同步

這里的同步指的是：MySQL發生變化，則elasticsearch索引庫也需要跟着發生變化

數據同步一般有三種方式：同步調用方式、異步通知方式、監聽MySQL的binlog方式

1、同步調用：

優點：實現簡單，粗暴
缺點：業務耦合度高

2、異步通知：

優點：低耦合，實現難度一般
缺點：依賴mq的可靠性

3、監聽MySQL的binlog文件：

優點：完全解除服務間耦合
缺點：開啟binlog增加數據庫負擔、實現復雜度高

高級篇鏈接

地址：https://www.cnblogs.com/xiegongzi/p/15770665.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ES 基礎理論配置調優 Elasticsearch 基礎理論 & 配置調優 ES(elasticsearch) query DSL 查詢語法 java基礎理論知識的一些總結 ES DSL 基礎查詢語法學習筆記 Smart3D基礎理論軟件測試基礎理論（一） Java基礎語法吐血整理【ELK】7. elasticsearch linux上操作es命令詳解數據倉庫3NF基礎理論和實例

基於ELK的ElasticSearch技術整理1：基礎理論與DSL語法 及 Java操作ES

基礎理論和DSL語法

准備工作

什么是ElasticSearch？它和Lucene以及solr的關系是什么？

下載ElasticSearch的window版

下載postman

ElasticSearch中的目錄解讀

ELK技術是什么意思？

注意事項

ES非關系型和關系型數據庫對應關系

基礎理論

正向索引和倒排索引

type 類型

field 字段

document 文檔

mapping 映射

index 索引庫

文檔_doc

使用post創建doc

使用put創建doc-轉冪等性-自定義id

查詢文檔_doc - 重點

id查詢單條_doc

查詢ES中索引下的全部_doc

文檔_doc的修改

全量修改

局部修改

文檔_doc的刪除

文檔DSL查詢

DSL查詢分類

全文檢索查詢

精准查詢

term查詢/精確查詢

range查詢/范圍查詢

地理坐標查詢

矩形范圍查詢

附近查詢/距離查詢

復合查詢

相關性算分算法

function_score 算分函數查詢

bool 布爾查詢

排序查詢

分頁查詢

基本分頁

深度分頁問題

高亮查詢

聚合查詢/數據聚合

聚合的分類

Bucket 桶聚合

Metric 度量聚合

自動補全查詢 completion

Java操作ES篇 - 重點

摸索Java鏈接ES的流程

父項目依賴管理

摸索鏈接流程

Java中操作ES索引

封裝鏈接對象

操作索引

Java操作ES中的文檔_doc - 重點

基本的文檔CRUD

批量操作文檔

Java進行DSL文檔查詢

查詢所有 match all

全文檢索查詢

match 單字段查詢 與 multi match多字段查詢

精確查詢

range 范圍查詢 和 term精准查詢

地理坐標查詢

geo_distance 附近查詢

復合查詢

bool 布爾查詢之must、should、must not、filter查詢

fuzzy 模糊查詢

排序和分頁查詢

高亮查詢

聚合查詢

自動補全查詢

ES與MySQL數據同步

高級篇鏈接

免責聲明！

基於ELK的ElasticSearch技術整理1：基礎理論與DSL語法及 Java操作ES

match 單字段查詢與 multi match多字段查詢

range 范圍查詢和 term精准查詢