es高級用法之冷熱分離

本文轉載自查看原文 2018-04-18 10:20 925 elasticsearch

背景

用戶需求：近期數據查詢速度快，較遠歷史數據運行查詢速度慢？

對於開發人員而言即數據的冷熱分離，實現此功能有2個前提條件：

硬件：處理速度不同的硬件，最起碼有讀寫速度不同的硬盤，如SSD、機械硬盤HDD。
軟件配置：可以配置不同的數據存儲在不同的硬盤，如近期數據存儲在SSD，較遠歷史數據存儲在HDD。對於linux系統而言，存儲於不同的硬盤就是存儲在不同的路徑。

elasticsearch的冷熱分離配置主要依賴於分片分布規則設置。

es配置之分片分布規則設置

自定義分片分布規則

cluster.routing.allocation.awareness.attributes
根據某個屬性作為分片分布規則。

#設置節點屬性rack_id及屬性值rack_one
node.rack_id: rack_one  
#設置rack_id屬性作為分片分布規則
cluster.routing.allocation.awareness.attributes: rack_id

可以為分片分布規則設置多個屬性，例如：

cluster.routing.allocation.awareness.attributes: rack_id,zone

注意：當設置了分片分布屬性時，如果集群中的節點沒有設置其中任何一個屬性，那么分片就不會分布到這個節點中。

強制分布規則

副本被分布到相同分布規則屬性值的一群節點上，那么，我們可以強制分片規則為一個指定的值。

cluster.routing.allocation.awareness.force.zone.values: zone1,zone2  
cluster.routing.allocation.awareness.attributes: zone

啟動兩個node.zone設置成zone1的節點，然后創建一個5個分片，一個副本的索引。索引建立完成后只有5個分片（沒有副本），只有當我們啟動node.zone設置成zone2的節點時，副本才會分配到那節點上。

上面配置的意思就是設置屬性zone作為分布規則，並且屬性zone的值為zone1/zone2,由於副本與主分片不分配在一類節點中，則副本分片到zone2節點中。

分片分布過濾

通過include/exclude過濾器來控制分片的分布。這些過濾器可以設置在索引級別上或集群級別上。

node.tag: hot
node.tag: cold
node.tag: value3

"index.routing.allocation.include.tag": "hot"
"index.routing.allocation.exclude.tag" : "value3"

include或exclude過濾器的值都會使用通配符來匹配，如value*。一個特別的屬性名是_ip，它可以用來匹配節點的ip地址。
顯然，一個節點可能擁有多個屬性值，所有屬性的名字和值都在配置文件中配置。如，下面是多個節點的配置：

node.group1: group1_value1   
node.group2: group2_value4

同樣的方法，include和exclude也可以設置多個值，如：

curl -XPUT localhost:9200/test/_settings -d '{      
    "index.routing.allocation.include.group1" : "xxx"      
    "index.routing.allocation.include.group2" : "yyy",      
    "index.routing.allocation.exclude.group3" : "zzz",      
}'

上面的設置可以通過索引更新的api實時更新到索引上，允許實時移動索引分片。

集群范圍的過濾器也可以定義，可以通過集群更新api實時更新到集群上。這些設置可以用來做讓一些節點退出集群的操作。下面是通過ip地址去掉一個節點的操作：

curl -XPUT localhost:9200/_cluster/settings -d '{      
    "transient" : {      
        "cluster.routing.allocation.exclude._ip" : "10.0.0.1"      
    }      
}'

冷熱分離實踐

step1：划分冷熱節點

node.tag: hot
node.tag: cold
node.max_local_storage_nodes: 2   #允許每個機器啟動兩個es進程(可選)

step2：按時間規律等建索引。比如按天、按周建索引

索引模板logstash：所有 logstash* 的索引匹配的模板。

PUT /_template/logstash
{
        "order": 0,
        "template": "logstash*",
        "settings": {
            "index.routing.allocation.include.tag": "hot",
            "index.refresh_interval": "30s",
            "index.number_of_replicas": "1",
            "index.number_of_shards": "1",
            "index.translog.flush_threshold_ops": "30000"
        }
}

“index.routing.allocation.include.tag”: “hot”，表示新建索引將分配到 node.tag = hot 的節點下

step3: 定時任務將歷史索引分配到 cold 節點下。
最新索引保存在hot節點，歷史索引定時保存到cold節點。有2種方式：

自己寫腳本，將歷史索引標記為stale

PUT /index_name/_settings
{
   "index.routing.allocation.include.tag" : "cold"
}

這樣舊索引數據會自動遷移到cold集群上

安裝 elasticsearch的命令行管理工具curator, 編寫curator腳本，每天4點將歷史索引分配到 cold 節點下

pip install elasticsearch-curator==5.x

# 將3天以前的 logstash* 索引分配到 cold 節點下
 1:
    action: allocation
    description: "Apply shard allocation filtering rules to the specified indices"
    options:
      key: box_type
      value: cold
      allocation_type: require
      wait_for_completion: true
      timeout_override:
      continue_if_exception: false
      disable_action: false
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 3

強制索引合並，每個分片1個分段。

2:
    action: forcemerge
    description: "Perform a forceMerge on selected indices to 'max_num_segments' per shard"
    options:
      max_num_segments: 1
      delay:
      timeout_override: 21600 
      continue_if_exception: false
      disable_action: false
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 3

讀寫分離探討

類似於mysql的讀寫分離，將副本數據存儲於特點節點上。這個沒辦法通過自動配置完成，可以通過reroute api來手動進行索引分片的分配.

不過要想完全手動，必須先把cluster.routing.allocation.disable_allocation參數設置為true，禁止es進行自動索引分片分配，否則你從一節點把分片移到另外一個節點，那么另外一個節點的一個分片又會移到那個節點。

一共有三種操作，分別為：移動（move），取消（cancel）和分配（allocate）。下面分別介紹這三種情況：

移動（move）
把分片從一節點移動到另一個節點。可以指定索引名和分片號。
取消（cancel）
取消分配一個分片。可以指定索引名和分片號。node參數可以指定在那個節點取消正在分配的分片。allow_primary參數支持取消分配主分片。
分配（allocate）
分配一個未分配的分片到指定節點。可以指定索引名和分片號。node參數指定分配到那個節點。allow_primary參數可以強制分配主分片，不過這樣可能導致數據丟失。

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{  
    "commands" : [ {  
        "move" :   
            {  
              "index" : "test", "shard" : 0,   
              "from_node" : "node1", "to_node" : "node2"  
            }  
        },  
       "cancel" :   
            {  
              "index" : "test", "shard" : 0, "node" : "node1"  
            }  
        },  
        {  
          "allocate" : {  
              "index" : "test", "shard" : 1, "node" : "node3"  
          }  
        }  
    ]  
}'

注意事項：

因為es的replica也是一個實際需要cpu和io的indexing過程，而且indexing本身也有要求要寫夠一定副本數來才算寫入完成。所以，你即使只請求replica，也是有可能影響到寫入的。

所以，你只能開啟一個副本——因為寫入副本數要求是從二個副本開始才有。

冷熱數據查詢

GET /_search?preference=xyzabc123
{
    "query": {
        "match": {
            "title": "elasticsearch"
        }
    }
}

https://www.elastic.co/guide/en/elasticsearch/reference/master/search-request-preference.html?q=preference

參考文獻

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 冷熱數據分離思路 HBase——冷熱分離方案 ES數據冷熱隔離數據歸檔，冷熱數據分離 Elasticsearch使用小結之冷熱分離如何做冷熱數據分離 [elk]elasticsearch實現冷熱數據分離 Hadoop——HDFS異構存儲&HBase冷熱分離冷熱分離之OTS表格存儲實戰 elasticsearch數據冷熱分離、數據冷備