Elasticsearch7-分布式及分布式搜索機制

本文轉載自查看原文 2020-01-02 21:34 1457 Linux-運維

分布式特性

Elasticsearch的分布式帶來的好處：

存儲的水平擴容
提供系統的可用性，部分節點停止服務，整個集群不受影響

Elasticsearch的分布式架構

不同集群通過不同集群名稱區分，默認"elasticsearch"
通過配置文件修改，或者在命令行中-E cluster.name="ops-es"進行設定

節點

節點是一個Elasticsearch實例：

本質上就是一個JAVA進程
一台機器上可以運行多個Elasticsearch進程，但是生產環境一般建議一台機器上就運行一個Elasticsearch實例

每一個節點都有名字，通過配置文件，或者啟動的時候-E node.name=es01指定

每一個節點啟動后，都會生產一個UID，保存在data目錄下

Coordinating Node

處理請求的節點叫 Coordinating Node

路由到正確的節點，例如創建索引，就路由到master節點

所有節點默認都是Coordinating Node

通過將其他類型設置成False,使其變成Coordinating Node節點

Data Node

可以保存數據的節點，就叫Data Node節點

節點啟動后，默認就是數據節點，可以設置成node.data: false 禁止

Data Node的職責

保存分片數據，在數據擴展上起到至關重要的作用，（由Master Node決定如何把分片分發到數據節點上）

通過增加數據節點

可以解決數據水平擴展和解決數據單點的問題

Master Node

Master Node的職責

處理創建、刪除索引等請求、決定分片分到那個節點
維護並更新Cluster 狀態

Master Node最佳實踐

Master 節點非常重要，在部署的時候需要考慮單點的問題
為一個集群設置多個Master節點/每一個節點只承擔Master單一角色

集群狀態信息

集群狀態信息，維護一個集群中，必要信息

所有節點信息
所有索引和其相關的Mapping和setting信息
分片路由信息

在每一個節點上都保存了集群的狀態信息

但是，只有Master節點上才能修改集群狀態的信息，並負責同步給其他節點

因為，任意節點都能修改信息會導致Cluster state信息的不一致

Master Eligible Nodes & 選主的過程

相互ping對方，Node ID低的會成為被選舉的節點

其他節點會加入集群，但是不承擔Master 節點的角色，一旦發現被選中的節點丟失，就會選舉出新的Master節點

腦裂問題

Split-Brain,分布式系統的經典網絡問題，當出現網絡問題，一個節點和其他節點無法連接

Node2 和Node3會重新選舉Master
Node1 自己還是作為Master,組成一個集群，同時更新Cluster state
導致2個Master節點，維護不同的cluster state。當網絡恢復時，無法選擇正確恢復

如何避免腦裂問題

限定一個選舉條件，設置quorum(仲裁)，只有在Master eligishble 節點數大於quorum時，才能進行選舉

quorum = （master節點數/2）+1
當3個master eligible時，設置discovery.zen.minimum_master_nodes為2，既避免腦裂

從7.0開始，無需此配置

移除minimum_master_nodes參數，讓Elasticsearch自己選擇可以形成仲裁的節點
典型的主節點選舉現在只需要很短的時間就可以完成。集群的伸縮變得更安全、更容易、並且可能造成丟失數據的系統配置選項更少了
節點更清楚的記錄它們的狀態，有助於判斷為什么它們不能加入集群或為什么無法選舉出主節點

Primary Shard

分片是Elasticsearch分布式存儲基石

主分片/副本分片

通過主分片將數據分布在所有節點上

primary shard,可以將一份索引的數據，分散在多個Data Node上，實現存儲的水平擴展
主分片數在索引創建時指定，后續默認不能修改，如需修改，需要重新索引

分片數設定

如何規划一個索引的主分片和副本分片數

主分片數過小：例如創建1個primary shard 的index
- 如果該索引增長很快，集群無法通過增加節點實現對這個索引的數據擴展
主分片數設置過大：導致單個shard容量很小，引發一個節點上過多分片，影響性能
副本分片設置過多，會降低集群整體寫入性能

文檔存儲在分片上

文檔會存儲在具體的某個主分片和副本分片上，例如：文檔1，會存儲在P0和R0分片上

文檔到分片的映射算法：

確保文檔能均勻分布在所有分片上，充分利用硬件資源，避免部分機器空閑，部分機器繁忙
潛在算法
- 隨機/Round Robin。當查詢文檔1，分片數很多，需要多次查詢才可能查到文檔1
- 維護文檔到分片的映射關系，當文檔數據量很大的時候，維護成本高
- 實時計算，通過文檔1，自動算出，需要去那個分片上獲取文檔

文檔到分片的路由算法

shard = hash(_routing) % number_of_primary_shards

hash算法確保文檔均勻分散到分片中
默認的_routing值是文檔id
可以自行限定_ronting數值，例如相同國家的商品，都分配到指定的shard
設置Index settings 后，Primary數，不能隨意修改的根本原因

分片的內部原理

什么是ES的分片

ES中最小的工作單元:是一個Lucene的index

一些問題：

為什么ES的搜索是近實時的
ES如何保證在斷電時數據也不會丟失
為什么刪除文檔，並不會立即釋放空間

倒排索引的不可變性

倒排索引采用Immutable Design,一旦生產，不可更改
不可變性，帶來的好處：
- 無需考慮並發寫文件的問題，避免了鎖機制帶來的性能問題
- 一旦寫入內核的文件系統緩存，便留在哪里。只要文件系統存有足夠的空間，大部分請求就會直接請求內存，不會命中磁盤，提升了很大的性能
- 緩存容易生產和維護、數據可以被壓縮
不可變性，帶來了的挑戰：如果需要讓一個新文檔可以被搜索，需要從建整個索引。

Lucene Index

在Lucene中，單個倒排索引文件被成為Segment，Sgement是自包含的，不可變更的，多個Sgement匯總在一起，稱為Lucene的Index,其對應的就是ES中的Shard
當有新文檔寫入時，會生成新的Segment,查詢時會同時查詢所有的Segment，並且對結果匯總，Lucene中有一個文件，用來記錄所有Segment信息，叫做Commit Point
刪除的文檔信息，保存在“.del”文件中

什么Refresh

將Index Buffer寫入Segment的過程叫Refresh。Refresh不執行fsync操作
Refresh頻率：默認1秒發生一次，可通過index.refresh_interval配置。Refersh后，數據就可以被搜索到了。這也是為什么Elasticsearch是近實時查詢的原因
如果系統有大量的數據寫入，那就會產生很多Segment
Index Buffer被占滿時，會觸發Refresh，默認值是JVM的10%

什么是Transaction Log

Segment寫入磁盤的過程相對耗時，借助文件系統緩存，Refresh時，先將Segment寫入緩存以開放查詢
為了保證數據不會丟失。所以在Index文檔時，同時寫Transaction Log，高版本開始，Transaction Log默認落盤，每個分片有一個Transaction Log
在ES Refresh 時，Index Buffer被清空，Transaction Log不會被清空

什么是Flush

ES Flush & Luence Commit

調用Refresh，Index Buffer清空並且Refresh
調用fsync，將緩存中的Segment寫入磁盤
清空Transaction Log
默認30分鍾調用一次
Transaction Log滿（默認512M）

什么是Merge

Segment很多，需要被定期被合並
- 減少Segment/刪除已經刪除的文檔
ES和Luence會自動進行Merge操作
- POST my_index/_forcemerge

分布式搜索機制

Elasticsearch的搜索分為兩步：

第一步-Query

第二部-Fetch

用戶發出搜索的請求到ES節點，節點搜到請求后，會以Coordinating節點身份，在6個主副本分片中隨機選擇3個分片，發出查詢請求
被選中的分片執行查詢，進行排序。然后，每個分片都會返回From+Size個排序后文檔id和排序值給Coordinating節點
Coordinating節點會將Query階段，從每個分片獲取的排序后的文檔Id列表，重新進行排序。選取From到From + Size個文檔的ID
以 multi get 請求的方式，到相應的分片獲取詳細的文檔數據

Query Then Fetch 的潛在問題

性能問題：

每個分片上需要查的文檔個數=From + Size
最終協調節點需要處理：number_of_shard * (From+size)
深度分頁

相關性算分

每一個都基於自己上分片數據進行相關度算分。這會導致大分偏離的情況，特別是數據量很少時，相關性算分在分片之間是相互獨立，當文檔總數很少情況下，如果主分片大於1，主分片數越多，相關性算法越不准

分頁& 遍歷

From：開始的位置
Size：期望獲取文檔的總數

ES天生就是分布式系統，查詢信息，但是數據分別保存在多個分片中，多台機器上，ES天生就需要滿足排序的需求（按照相關性算分）

當一個查詢：From=990, Size=10

會在每個分片中獲取1000個文檔。然后，在通過Coordinating Node聚合所有結果。最好再通過排序選取前1000個文檔
頁數越深，占用內存越多。為了避免深度分頁帶來的內存開銷，ES有一個設定，默認限定10000個文檔

Search After避免深度分頁的問題

避免深度分頁的性能問題，可以實時獲取下一頁文檔信息
- 不支持指定頁數（From）
- 只能往下分頁
第一步搜索需要指定sort,並且保證值是唯一的（可以通過加入_id保證唯一性）
然后使用上一次，最后一個文檔的sort值進行查詢

Bucket & Metric 聚合分析及嵌套聚合

Metric 一些一系列的統計方法
Bucket 一組滿足條件的文檔

Metric Aggregation

單值分析

max min avg sum
Cardinality(類似 distinct count)

多值分析

stats、extended stats
percentile、percentile rank
top hits

Demo

生產數據

#定義員工表索引的定義
PUT /employees/ 
{
  "mappings":{
    "properties":{
      "age":{
        "type": "integer"
      },
      "gender":{
        "type": "keyword"
      },
      "job":{
        "type": "text",
        "fields":{
          "keyword": {
            "type": "keyword",
            "ignore_above": 50
          }
        }
      },
      "name":{
        "type": "keyword"
      },
      "salary":{
        "type" : "integer"
      }
    }
  }
}
#插入數據
PUT /employees/_bulk
{ "index" : {  "_id" : "1" } }
{ "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
{ "index" : {  "_id" : "2" } }
{ "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
{ "index" : {  "_id" : "3" } }
{ "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
{ "index" : {  "_id" : "4" } }
{ "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
{ "index" : {  "_id" : "5" } }
{ "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
{ "index" : {  "_id" : "6" } }
{ "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
{ "index" : {  "_id" : "7" } }
{ "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
{ "index" : {  "_id" : "8" } }
{ "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
{ "index" : {  "_id" : "9" } }
{ "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
{ "index" : {  "_id" : "10" } }
{ "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
{ "index" : {  "_id" : "11" } }
{ "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
{ "index" : {  "_id" : "12" } }
{ "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
{ "index" : {  "_id" : "13" } }
{ "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
{ "index" : {  "_id" : "14" } }
{ "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
{ "index" : {  "_id" : "15" } }
{ "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
{ "index" : {  "_id" : "16" } }
{ "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "17" } }
{ "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : {  "_id" : "18" } }
{ "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
{ "index" : {  "_id" : "19" } }
{ "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
{ "index" : {  "_id" : "20" } }
{ "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}

測試樣例

#Metric 聚合 找到最低工資
POST employees/_search
{
  "size":0,
  "aggs": {
    "min_salary": {
      "min": {
        "field": "salary"
      }
    }
  }
}
#查詢結果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "min_salary" : {
      "value" : 9000.0
    }
  }
}
#Metric 聚合 找到最高工資
POST employees/_search
{
  "size":0,
  "aggs": {
    "max_salary": {
      "max": {
        "field": "salary"
      }
    }
  }
}
#查詢結果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_salary" : {
      "value" : 50000.0
    }
  }
}
#多個Metric 聚合 找到 最低最高平均工資
POST employees/_search
{
  "size": 0,
  "aggs": {
    "max_salary": {
      "max": {
        "field": "salary"
      }
    },
    "min_salary": {
      "min": {
        "field": "salary"
      }
    },
    "avg_salary": {
      "avg": {
        "field": "salary"
      }
    }
  }
}
#查詢結果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_salary" : {
      "value" : 50000.0
    },
    "avg_salary" : {
      "value" : 24700.0
    },
    "min_salary" : {
      "value" : 9000.0
    }
  }
}
# 一個聚合，輸出多值，統計
POST employees/_search
{
  "size": 0,
  "aggs": {
    "stats_salary": {
      "stats": {
        "field":"salary"
      }
    }
  }
}
#查詢結果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "stats_salary" : {
      "count" : 20,
      "min" : 9000.0,
      "max" : 50000.0,
      "avg" : 24700.0,
      "sum" : 494000.0
    }
  }
}

Bucket聚合分析

按照一定規則，將文檔分配到不同的桶中，從而達到分類的目的，ES提供常見Bucket Aggregation

Terms
數字類型
- Range/Data Range
- Histogram/Data Histogram
支持嵌套（桶中桶）

Terms Aggregation

字段需要打開fieldata,才能進行Terms Aggregation
- keyword 默認支持Terms Aggregation
- Text需要在Mapping中enable。會按照分詞后的執行結果分

# 對job的keyword 進行聚合
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field":"job.keyword"
      }
    }
  }
}
#查詢結果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "jobs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Programmer",
          "doc_count" : 7
        },
        {
          "key" : "Javascript Programmer",
          "doc_count" : 4
        },
        {
          "key" : "QA",
          "doc_count" : 3
        },
        {
          "key" : "DBA",
          "doc_count" : 2
        },
        {
          "key" : "Web Designer",
          "doc_count" : 2
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1
        },
        {
          "key" : "Product Manager",
          "doc_count" : 1
        }
      ]
    }
  }
}

對Text類型的進行聚合分析的話，需要打開fieldata功能

# 對 Text 字段打開 fielddata，支持terms aggregation
PUT employees/_mapping
{
  "properties" : {
    "job":{
       "type":     "text",
       "fielddata": true
    }
  }
}
# 對 Text 字段進行 terms 分詞。分詞后的terms
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field":"job"
      }
    }
  }
}
#查詢結果，而keyword不同，
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "jobs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "programmer",
          "doc_count" : 11
        },
        {
          "key" : "java",
          "doc_count" : 7
        },
        {
          "key" : "javascript",
          "doc_count" : 4
        },
        {
          "key" : "qa",
          "doc_count" : 3
        },
        {
          "key" : "dba",
          "doc_count" : 2
        },
        {
          "key" : "designer",
          "doc_count" : 2
        },
        {
          "key" : "manager",
          "doc_count" : 2
        },
        {
          "key" : "web",
          "doc_count" : 2
        },
        {
          "key" : "dev",
          "doc_count" : 1
        },
        {
          "key" : "product",
          "doc_count" : 1
        }
      ]
    }
  }
}

對terms統計的的做法

# 對job.keyword 和 job 進行 terms 聚合，分桶的總數並不一樣
POST employees/_search
{
  "size": 0,
  "aggs": {
    "cardinate": {
      "cardinality": {
        "field": "job.keyword"
      }
    }
  }
}
#查詢結果
{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "cardinate" : {
      "value" : 7
    }
  }
}

對性別分桶

# 對 性別的 keyword 進行聚合
POST employees/_search
{
  "size": 0,
  "aggs": {
    "gender": {
      "terms": {
        "field":"gender"
      }
    }
  }
}
#查詢結果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "gender" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "male",
          "doc_count" : 12
        },
        {
          "key" : "female",
          "doc_count" : 8
        }
      ]
    }
  }
}

指定size

#指定 bucket 的 size
POST employees/_search
{
  "size": 0,
  "aggs": {
    "ages_5": {
      "terms": {
        "field":"age",
        "size":3
      }
    }
  }
}
#查詢結果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ages_5" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 12,
      "buckets" : [
        {
          "key" : 25,
          "doc_count" : 3
        },
        {
          "key" : 32,
          "doc_count" : 3
        },
        {
          "key" : 27,
          "doc_count" : 2
        }
      ]
    }
  }
}

Bucket Size

# 指定size，不同工種中，年紀最大的3個員工的具體信息
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field":"job.keyword"
      },
      "aggs":{
        "old_employee":{
          "top_hits":{
            "size":3,
            "sort":[
              {
                "age":{
                  "order":"desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}
#查詢結果
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "jobs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Programmer",
          "doc_count" : 7,
          "old_employee" : {
            "hits" : {
              "total" : {
                "value" : 7,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "11",
                  "_score" : null,
                  "_source" : {
                    "name" : "Jenny",
                    "age" : 36,
                    "job" : "Java Programmer",
                    "gender" : "female",
                    "salary" : 38000
                  },
                  "sort" : [
                    36
                  ]
                },
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "15",
                  "_score" : null,
                  "_source" : {
                    "name" : "King",
                    "age" : 33,
                    "job" : "Java Programmer",
                    "gender" : "male",
                    "salary" : 28000
                  },
                  "sort" : [
                    33
                  ]
                },
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "9",
                  "_score" : null,
                  "_source" : {
                    "name" : "Gregory",
                    "age" : 32,
                    "job" : "Java Programmer",
                    "gender" : "male",
                    "salary" : 22000
                  },
                  "sort" : [
                    32
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "Javascript Programmer",
          "doc_count" : 4,
          "old_employee" : {
            "hits" : {
              "total" : {
                "value" : 4,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "14",
                  "_score" : null,
                  "_source" : {
                    "name" : "Marshall",
                    "age" : 32,
                    "job" : "Javascript Programmer",
                    "gender" : "male",
                    "salary" : 25000
                  },
                  "sort" : [
                    32
                  ]
                },
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "18",
                  "_score" : null,
                  "_source" : {
                    "name" : "Catherine",
                    "age" : 29,
                    "job" : "Javascript Programmer",
                    "gender" : "female",
                    "salary" : 20000
                  },
                  "sort" : [
                    29
                  ]
                },
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "17",
                  "_score" : null,
                  "_source" : {
                    "name" : "Goodwin",
                    "age" : 25,
                    "job" : "Javascript Programmer",
                    "gender" : "male",
                    "salary" : 16000
                  },
                  "sort" : [
                    25
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "QA",
          "doc_count" : 3,
          "old_employee" : {
            "hits" : {
              "total" : {
                "value" : 3,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "6",
                  "_score" : null,
                  "_source" : {
                    "name" : "Lucy",
                    "age" : 31,
                    "job" : "QA",
                    "gender" : "female",
                    "salary" : 25000
                  },
                  "sort" : [
                    31
                  ]
                },
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "7",
                  "_score" : null,
                  "_source" : {
                    "name" : "Byrd",
                    "age" : 27,
                    "job" : "QA",
                    "gender" : "male",
                    "salary" : 20000
                  },
                  "sort" : [
                    27
                  ]
                },
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "5",
                  "_score" : null,
                  "_source" : {
                    "name" : "Rose",
                    "age" : 25,
                    "job" : "QA",
                    "gender" : "female",
                    "salary" : 18000
                  },
                  "sort" : [
                    25
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "old_employee" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "19",
                  "_score" : null,
                  "_source" : {
                    "name" : "Boone",
                    "age" : 30,
                    "job" : "DBA",
                    "gender" : "male",
                    "salary" : 30000
                  },
                  "sort" : [
                    30
                  ]
                },
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "20",
                  "_score" : null,
                  "_source" : {
                    "name" : "Kathy",
                    "age" : 29,
                    "job" : "DBA",
                    "gender" : "female",
                    "salary" : 20000
                  },
                  "sort" : [
                    29
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "Web Designer",
          "doc_count" : 2,
          "old_employee" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "4",
                  "_score" : null,
                  "_source" : {
                    "name" : "Rivera",
                    "age" : 26,
                    "job" : "Web Designer",
                    "gender" : "female",
                    "salary" : 22000
                  },
                  "sort" : [
                    26
                  ]
                },
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "3",
                  "_score" : null,
                  "_source" : {
                    "name" : "Tran",
                    "age" : 25,
                    "job" : "Web Designer",
                    "gender" : "male",
                    "salary" : 18000
                  },
                  "sort" : [
                    25
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "old_employee" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "2",
                  "_score" : null,
                  "_source" : {
                    "name" : "Underwood",
                    "age" : 41,
                    "job" : "Dev Manager",
                    "gender" : "male",
                    "salary" : 50000
                  },
                  "sort" : [
                    41
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "Product Manager",
          "doc_count" : 1,
          "old_employee" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "employees",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_score" : null,
                  "_source" : {
                    "name" : "Emma",
                    "age" : 32,
                    "job" : "Product Manager",
                    "gender" : "female",
                    "salary" : 35000
                  },
                  "sort" : [
                    32
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

#Ranges 分桶

#Salary Ranges 分桶，可以自己定義 key
POST employees/_search
{
  "size": 0,
  "aggs": {
    "salary_range": {
      "range": {
        "field":"salary",
        "ranges":[
          {
            "to":10000
          },
          {
            "from":10000,
            "to":20000
          },
          {
            "key":">20000",
            "from":20000
          }
        ]
      }
    }
  }
}
#查詢結果
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "salary_range" : {
      "buckets" : [
        {
          "key" : "*-10000.0",
          "to" : 10000.0,
          "doc_count" : 1
        },
        {
          "key" : "10000.0-20000.0",
          "from" : 10000.0,
          "to" : 20000.0,
          "doc_count" : 4
        },
        {
          "key" : ">20000",
          "from" : 20000.0,
          "doc_count" : 15
        }
      ]
    }
  }
}

#Salary Histogram,工資0到10萬，以 5000一個區間進行分桶
POST employees/_search
{
  "size": 0,
  "aggs": {
    "salary_histrogram": {
      "histogram": {
        "field":"salary",
        "interval":5000,
        "extended_bounds":{
          "min":0,
          "max":100000

        }
      }
    }
  }
}

Bucket 子聚合分析、子聚合可以是Bucket 或者 Metric

# 嵌套聚合1，按照工作類型分桶，並統計工資信息
POST employees/_search
{
  "size": 0,
  "aggs": {
    "Job_salary_stats": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "salary": {
          "stats": {
            "field": "salary"
          }
        }
      }
    }
  }
}
#查詢結果
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "Job_salary_stats" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Programmer",
          "doc_count" : 7,
          "salary" : {
            "count" : 7,
            "min" : 9000.0,
            "max" : 38000.0,
            "avg" : 25571.428571428572,
            "sum" : 179000.0
          }
        },
        {
          "key" : "Javascript Programmer",
          "doc_count" : 4,
          "salary" : {
            "count" : 4,
            "min" : 16000.0,
            "max" : 25000.0,
            "avg" : 19250.0,
            "sum" : 77000.0
          }
        },
        {
          "key" : "QA",
          "doc_count" : 3,
          "salary" : {
            "count" : 3,
            "min" : 18000.0,
            "max" : 25000.0,
            "avg" : 21000.0,
            "sum" : 63000.0
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "salary" : {
            "count" : 2,
            "min" : 20000.0,
            "max" : 30000.0,
            "avg" : 25000.0,
            "sum" : 50000.0
          }
        },
        {
          "key" : "Web Designer",
          "doc_count" : 2,
          "salary" : {
            "count" : 2,
            "min" : 18000.0,
            "max" : 22000.0,
            "avg" : 20000.0,
            "sum" : 40000.0
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "salary" : {
            "count" : 1,
            "min" : 50000.0,
            "max" : 50000.0,
            "avg" : 50000.0,
            "sum" : 50000.0
          }
        },
        {
          "key" : "Product Manager",
          "doc_count" : 1,
          "salary" : {
            "count" : 1,
            "min" : 35000.0,
            "max" : 35000.0,
            "avg" : 35000.0,
            "sum" : 35000.0
          }
        }
      ]
    }
  }
}

# 多次嵌套。根據工作類型分桶，然后按照性別分桶，計算工資的統計信息
POST employees/_search
{
  "size": 0,
  "aggs": {
    "Job_gender_stats": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "gender_stats": {
          "terms": {
            "field": "gender"
          },
          "aggs": {
            "salary_stats": {
              "stats": {
                "field": "salary"
              }
            }
          }
        }
      }
    }
  }
}
#查詢結果
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "Job_gender_stats" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Programmer",
          "doc_count" : 7,
          "gender_stats" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "male",
                "doc_count" : 5,
                "salary_stats" : {
                  "count" : 5,
                  "min" : 9000.0,
                  "max" : 32000.0,
                  "avg" : 22200.0,
                  "sum" : 111000.0
                }
              },
              {
                "key" : "female",
                "doc_count" : 2,
                "salary_stats" : {
                  "count" : 2,
                  "min" : 30000.0,
                  "max" : 38000.0,
                  "avg" : 34000.0,
                  "sum" : 68000.0
                }
              }
            ]
          }
        },
        {
          "key" : "Javascript Programmer",
          "doc_count" : 4,
          "gender_stats" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "male",
                "doc_count" : 3,
                "salary_stats" : {
                  "count" : 3,
                  "min" : 16000.0,
                  "max" : 25000.0,
                  "avg" : 19000.0,
                  "sum" : 57000.0
                }
              },
              {
                "key" : "female",
                "doc_count" : 1,
                "salary_stats" : {
                  "count" : 1,
                  "min" : 20000.0,
                  "max" : 20000.0,
                  "avg" : 20000.0,
                  "sum" : 20000.0
                }
              }
            ]
          }
        },
        {
          "key" : "QA",
          "doc_count" : 3,
          "gender_stats" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "female",
                "doc_count" : 2,
                "salary_stats" : {
                  "count" : 2,
                  "min" : 18000.0,
                  "max" : 25000.0,
                  "avg" : 21500.0,
                  "sum" : 43000.0
                }
              },
              {
                "key" : "male",
                "doc_count" : 1,
                "salary_stats" : {
                  "count" : 1,
                  "min" : 20000.0,
                  "max" : 20000.0,
                  "avg" : 20000.0,
                  "sum" : 20000.0
                }
              }
            ]
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "gender_stats" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "female",
                "doc_count" : 1,
                "salary_stats" : {
                  "count" : 1,
                  "min" : 20000.0,
                  "max" : 20000.0,
                  "avg" : 20000.0,
                  "sum" : 20000.0
                }
              },
              {
                "key" : "male",
                "doc_count" : 1,
                "salary_stats" : {
                  "count" : 1,
                  "min" : 30000.0,
                  "max" : 30000.0,
                  "avg" : 30000.0,
                  "sum" : 30000.0
                }
              }
            ]
          }
        },
        {
          "key" : "Web Designer",
          "doc_count" : 2,
          "gender_stats" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "female",
                "doc_count" : 1,
                "salary_stats" : {
                  "count" : 1,
                  "min" : 22000.0,
                  "max" : 22000.0,
                  "avg" : 22000.0,
                  "sum" : 22000.0
                }
              },
              {
                "key" : "male",
                "doc_count" : 1,
                "salary_stats" : {
                  "count" : 1,
                  "min" : 18000.0,
                  "max" : 18000.0,
                  "avg" : 18000.0,
                  "sum" : 18000.0
                }
              }
            ]
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "gender_stats" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "male",
                "doc_count" : 1,
                "salary_stats" : {
                  "count" : 1,
                  "min" : 50000.0,
                  "max" : 50000.0,
                  "avg" : 50000.0,
                  "sum" : 50000.0
                }
              }
            ]
          }
        },
        {
          "key" : "Product Manager",
          "doc_count" : 1,
          "gender_stats" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "female",
                "doc_count" : 1,
                "salary_stats" : {
                  "count" : 1,
                  "min" : 35000.0,
                  "max" : 35000.0,
                  "avg" : 35000.0,
                  "sum" : 35000.0
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Pipeline 聚合分析

管道的概念：支持聚合分析的結果，再次聚合分析

Pipeline的分析結果輸出到原結果當中，根據位置的不同，分為兩類：

sibling 結果和現有結果同級
- min max avg sum Bucket
- stats，Extended status Bucket
- Percentiles Bucket
parent 結果內嵌到現有聚合分析結果之中
- Derivative(求導)
- Cumultive Sum (累計求和)
- Moving Function (移動窗口)

# 平均工資最低的工作類型
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        }
      }
    },
    "min_salary_by_job":{
      "min_bucket": {
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}


# 平均工資最高的工作類型
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        }
      }
    },
    "max_salary_by_job":{
      "max_bucket": {
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}


# 平均工資的平均工資
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        }
      }
    },
    "avg_salary_by_job":{
      "avg_bucket": {
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}


# 平均工資的統計分析
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        }
      }
    },
    "stats_salary_by_job":{
      "stats_bucket": {
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}


# 平均工資的百分位數
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        }
      }
    },
    "percentiles_salary_by_job":{
      "percentiles_bucket": {
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}



#按照年齡對平均工資求導
POST employees/_search
{
  "size": 0,
  "aggs": {
    "age": {
      "histogram": {
        "field": "age",
        "min_doc_count": 1,
        "interval": 1
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        },
        "derivative_avg_salary":{
          "derivative": {
            "buckets_path": "avg_salary"
          }
        }
      }
    }
  }
}


#Cumulative_sum
POST employees/_search
{
  "size": 0,
  "aggs": {
    "age": {
      "histogram": {
        "field": "age",
        "min_doc_count": 1,
        "interval": 1
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        },
        "cumulative_salary":{
          "cumulative_sum": {
            "buckets_path": "avg_salary"
          }
        }
      }
    }
  }
}

#Moving Function
POST employees/_search
{
  "size": 0,
  "aggs": {
    "age": {
      "histogram": {
        "field": "age",
        "min_doc_count": 1,
        "interval": 1
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        },
        "moving_avg_salary":{
          "moving_fn": {
            "buckets_path": "avg_salary",
            "window":10,
            "script": "MovingFunctions.min(values)"
          }
        }
      }
    }
  }
}

作用范圍和排序

ES聚合分析默認作用范圍是query的查詢結果集

同時ES還支持一下方式改變聚合查詢的作用范圍

Filter
Post Filter
Global

#作用范圍
# Query 的作用范圍
POST employees/_search
{
  "size": 0,
  "query": {
    "range": {
      "age": {
        "gte": 20
      }
    }
  },
  "aggs": {
    "jobs": {
      "terms": {
        "field":"job.keyword"
        
      }
    }
  }
}


#Filter 的作用范圍
POST employees/_search
{
  "size": 0,
  "aggs": {
    "older_person": {
      "filter":{
        "range":{
          "age":{
            "from":35
          }
        }
      },
      "aggs":{
         "jobs":{
           "terms": {
        "field":"job.keyword"
      }
      }
    }},
    "all_jobs": {
      "terms": {
        "field":"job.keyword"
        
      }
    }
  }
}



#Post field. 一條語句，找出所有的job類型。還能找到聚合后符合條件的結果
POST employees/_search
{
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword"
      }
    }
  },
  "post_filter": {
    "match": {
      "job.keyword": "Dev Manager"
    }
  }
}


#global
POST employees/_search
{
  "size": 0,
  "query": {
    "range": {
      "age": {
        "gte": 40
      }
    }
  },
  "aggs": {
    "jobs": {
      "terms": {
        "field":"job.keyword"
        
      }
    },
    
    "all":{
      "global":{},
      "aggs":{
        "salary_avg":{
          "avg":{
            "field":"salary"
          }
        }
      }
    }
  }
}

排序：

指定order，安裝count和key進行排序

默認情況下，按照count降序排序
指定size，就能返回相應的桶

#排序 order
#count and key
POST employees/_search
{
  "size": 0,
  "query": {
    "range": {
      "age": {
        "gte": 20
      }
    }
  },
  "aggs": {
    "jobs": {
      "terms": {
        "field":"job.keyword",
        "order":[
          {"_count":"asc"},
          {"_key":"desc"}
          ]
        
      }
    }
  }
}


#排序 order
#count and key
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field":"job.keyword",
        "order":[  {
            "avg_salary":"desc"
          }]
        
        
      },
    "aggs": {
      "avg_salary": {
        "avg": {
          "field":"salary"
        }
      }
    }
    }
  }
}


#排序 order
#count and key
POST employees/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field":"job.keyword",
        "order":[  {
            "stats_salary.min":"desc"
          }]
        
        
      },
    "aggs": {
      "stats_salary": {
        "stats": {
          "field":"salary"
        }
      }
    }
    }
  }
}

UpdateByQuery & Reindex

使用場景：

一般以下情況，需要重新索引

索引的mapping發送變更：字段類型、分詞器及字典更新
索引的setting發送變更：索引主分片數發送改變
集群內，集群間需要做數據遷移

ES內置提供的API

UpdateByQuery 在現有索引上重建
Reindex 在其他索引上重建索引

案例1

#重建索引
DELETE blogs/

# 寫入文檔
PUT blogs/_doc/1
{
  "content":"Hadoop is cool",
  "keyword":"hadoop"
}

# 查看 Mapping
GET blogs/_mapping

# 修改 Mapping，增加子字段，使用英文分詞器
PUT blogs/_mapping
{
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "english" : {
              "type" : "text",
              "analyzer":"english"
            }
          }
        }
      }
    }
# 寫入文檔
PUT blogs/_doc/2
{
  "content":"Elasticsearch rocks",
    "keyword":"elasticsearch"
}

# 查詢新寫入文檔
POST blogs/_search
{
  "query": {
    "match": {
      "content.english": "Elasticsearch"
    }
  }

}

# 查詢 Mapping 變更前寫入的文檔
POST blogs/_search
{
  "query": {
    "match": {
      "content.english": "Hadoop"
    }
  }
}


# Update所有文檔
POST blogs/_update_by_query
{

}

# 執行update_by_query后 再查詢之前寫入的文檔
POST blogs/_search
{
  "query": {
    "match": {
      "content.english": "Hadoop"
    }
  }
}

案例2，更新已有字段的mapping

ES不允許在原有mapping上對字段類型進行修改
只能創建新的索引，並且設定正確的字段類型，再重新導入數據

# 查詢
GET blogs/_mapping
#結果查詢，我們看keyword 的字段類型是Text
{
  "blogs" : {
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "english" : {
              "type" : "text",
              "analyzer" : "english"
            },
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "keyword" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}
#嘗試修改類型，報錯，ES不允許對已有字段進行修改
PUT blogs/_mapping
{
        "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "english" : {
              "type" : "text",
              "analyzer" : "english"
            }
          }
        },
        "keyword" : {
          "type" : "keyword"
        }
      }
}
# 創建新的索引並且設定新的Mapping
PUT blogs_fix/
{
  "mappings": {
        "properties" : {
        "content" : {
          "type" : "text",
          "fields" : {
            "english" : {
              "type" : "text",
              "analyzer" : "english"
            }
          }
        },
        "keyword" : {
          "type" : "keyword"
        }
      }    
  }
}
# Reindx API
POST  _reindex
{
  "source": {
    "index": "blogs"
  },
  "dest": {
    "index": "blogs_fix"
  }
}
#查看新索引
GET  blogs_fix/_doc/1
#查詢結果
{
  "_index" : "blogs_fix",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "content" : "Hadoop is cool",
    "keyword" : "hadoop"
  }
}
# 測試 Term Aggregation
POST blogs_fix/_search
{
  "size": 0,
  "aggs": {
    "blog_keyword": {
      "terms": {
        "field": "keyword",
        "size": 10
      }
    }
  }
}
#我們修改成keyword類型，只有keyword 才能Term Aggregation
#查詢結果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "blog_keyword" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "elasticsearch",
          "doc_count" : 1
        },
        {
          "key" : "hadoop",
          "doc_count" : 1
        }
      ]
    }
  }
}

Reindex以上總結

Reindex API支持從一個索引拷貝到另一個索引中

使用ReindexAPI的場景：

修改索引的主分片數
改變字段的Mapping字段類型
集群內/外數據遷移

IngestPipeline & PainlessScript

Ingest Node

ES5.0后，引入的一種新的節點類型，默認配置下，每個節點都是Ingest Node

具有預處理數據的能力，可攔截Index或者Bulk API 的請求
對數據進行轉換，並重新返回給Index 或者Bulk API

無需Logstash，就可以進行數據的預處理，例如：

為某個字段設置默認值：重命名某個字段的字段名；對字段進行Split操作
支持設置Painless腳本，對數據進行更多復雜加工

Demo

創建文檔

#Blog數據，包含3個字段，tags用逗號間隔
PUT tech_blogs/_doc/1
{
  "title":"Introducing big data......",
  "tags":"hadoop,elasticsearch,spark",
  "content":"You konw, for big data"
}

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split blog tags",
    // 按，切割
    "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "title": "Introducing big data......",
        "tags": "hadoop,elasticsearch,spark",
        "content": "You konw, for big data"
      }
    },
    {
      "_index": "index",
      "_id": "idxx",
      "_source": {
        "title": "Introducing cloud computering",
        "tags": "openstack,k8s",
        "content": "You konw, for cloud"
      }
    }
  ]
}

#同時為文檔，增加一個字段。blog查看量
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split blog tags",
    "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      },
// 增加一個字段，
      {
        "set":{
          "field": "views",
          "value": 0
        }
      }
    ]
  },

  "docs": [
    {
      "_index":"index",
      "_id":"id",
      "_source":{
        "title":"Introducing big data......",
  "tags":"hadoop,elasticsearch,spark",
  "content":"You konw, for big data"
      }
    },


    {
      "_index":"index",
      "_id":"idxx",
      "_source":{
        "title":"Introducing cloud computering",
  "tags":"openstack,k8s",
  "content":"You konw, for cloud"
      }
    }

    ]
}

以上是測試可以使用，我們測試完成后，在ES上創建一個Pipeline

PUT _ingest/pipeline/blog_pipeline
{
  "description": "a blog pipeline",
  "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      },

      {
        "set":{
          "field": "views",
          "value": 0
        }
      }
    ]
}

#查看Pipleline
GET _ingest/pipeline/blog_pipeline

#測試pipeline，只需要提供文檔的數組就可以了
POST _ingest/pipeline/blog_pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "title": "Introducing cloud computering",
        "tags": "openstack,k8s",
        "content": "You konw, for cloud"
      }
    }
  ]
}

#測試2  情況索引
DELETE tech_blogs

#不使用pipeline更新數據
PUT tech_blogs/_doc/1
{
  "title":"Introducing big data......",
  "tags":"hadoop,elasticsearch,spark",
  "content":"You konw, for big data"
}

#使用pipeline更新數據
PUT tech_blogs/_doc/2?pipeline=blog_pipeline
{
  "title": "Introducing cloud computering",
  "tags": "openstack,k8s",
  "content": "You konw, for cloud"
}


#查看兩條數據，一條被處理，一條未被處理
POST tech_blogs/_search
{}

#update_by_query 會導致錯誤
POST tech_blogs/_update_by_query?pipeline=blog_pipeline
{
}

#增加update_by_query的條件
POST tech_blogs/_update_by_query?pipeline=blog_pipeline
{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "views"
                }
            }
        }
    }
}
#再次索引，這次我們可以看到文檔1也被pipeline處理了
POST tech_blogs/_search

一些內置的Processors

Split 給一個字段分成數組
Remove / Rename 移除或者重命名一個字段
Append 增加一個新標簽
Convert 從字符串轉換成float類型
Date / JSON 日期格式轉換，字符串轉JSON
Data Index Name 將通過該處理器的文檔，分配到指定時間格式的索引中
Fail 一旦出現異常，該Pipeline指定的錯誤信息能返回給用戶
Foreach 數組字段，數組的每個元素都會使用到一個相同的處理器
Grok 日志的格式切割
Gsub /Join /Split 字符串轉換數組轉換字符串字符串轉換數組
Lowercase /Upcase 大小寫轉換

Painless

自ES5.x后引入，專門為ES設計，擴展了JAVA的語法
6.0開始，ES只支持Painless。Groovy JavaScript和Python 都不支持
Painless支持所有java數據類型及Java API子集
Painless Script 具備以下特性：
- 高性能 / 安全
- 支持顯示類型或者動態定義類型

Painless 用途：

可以對文檔字段加工處理

更新刪除字段，處理數據聚合操作
Script Field: 對返回字段提前進行計算
Fcunction Score: 對文檔的算分進行處理

在Ingest Pipeline 中執行腳本

在Reindex API, Update By Query時，對數據進行處理

#########Demo for Painless###############

# 增加一個 Script Prcessor
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split blog tags",
    "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      },
      {
        "script": {
          "source": """
          if(ctx.containsKey("content")){
            ctx.content_length = ctx.content.length();
          }else{
            ctx.content_length=0;
          }


          """
        }
      },

      {
        "set":{
          "field": "views",
          "value": 0
        }
      }
    ]
  },

  "docs": [
    {
      "_index":"index",
      "_id":"id",
      "_source":{
        "title":"Introducing big data......",
  "tags":"hadoop,elasticsearch,spark",
  "content":"You konw, for big data"
      }
    },


    {
      "_index":"index",
      "_id":"idxx",
      "_source":{
        "title":"Introducing cloud computering",
  "tags":"openstack,k8s",
  "content":"You konw, for cloud"
      }
    }

    ]
}


DELETE tech_blogs
PUT tech_blogs/_doc/1
{
  "title":"Introducing big data......",
  "tags":"hadoop,elasticsearch,spark",
  "content":"You konw, for big data",
  "views":0
}

POST tech_blogs/_update/1
{
  "script": {
    "source": "ctx._source.views += params.new_views",
    "params": {
      "new_views":100
    }
  }
}

# 查看views計數
POST tech_blogs/_search
{

}

#保存腳本在 Cluster State
POST _scripts/update_views
{
  "script":{
    "lang": "painless",
    "source": "ctx._source.views += params.new_views"
  }
}

POST tech_blogs/_update/1
{
  "script": {
    "id": "update_views",
    "params": {
      "new_views":1000
    }
  }
}


GET tech_blogs/_search
{
  "script_fields": {
    "rnd_views": {
      "script": {
        "lang": "painless",
        "source": """
          java.util.Random rnd = new Random();
          doc['views'].value+rnd.nextInt(1000);
        """
      }
    }
  },
  "query": {
    "match_all": {}
  }
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ES 分布式搜索 ElasticSearch 分布式集群分布式鎖機制分布式搜索Elasticsearch——QueryBuilders.matchPhrasePrefixQuery ES(ElasticSearch)分布式全文搜索引擎分布式搜索引擎Elasticsearch的簡單使用分布式搜索引擎-ElasticSearch詳解分布式搜索引擎Elasticsearch的查詢與過濾 Elasticsearch系列---分布式架構機制講解分布式搜索的面試題1