Elasticsearch 節點磁盤使用率過高，導致ES集群索引無副本

本文轉載自查看原文 2020-01-04 22:07 3389 ELK

一、問題
二、問題的原因
三、問題解決的辦法
四、擴展

一、問題

最近在查看線上的 es，發現最近2天的索引沒有副本，集群的狀態也是為 yellow 的。

二、問題的原因

es 所在的服務器磁盤是還有剩余空間的。只不過磁盤使用了大概 89%，按道理來說應該是會繼續使用的，並創建索引的副本的，我們經過查閱官方文檔。

cluster.routing.allocation.disk.watermark.low

Controls the low watermark for disk usage. It defaults to 85%, meaning that Elasticsearch will not allocate shards to nodes that have more than 85% disk used. It can also be set to an absolute byte value (like 500mb) to prevent Elasticsearch from allocating shards if less than the specified amount of space is available. This setting has no effect on the primary shards of newly-created indices or, specifically, any shards that have never previously been allocated.

cluster.routing.allocation.disk.watermark.high

Controls the high watermark. It defaults to 90%, meaning that Elasticsearch will attempt to relocate shards away from a node whose disk usage is above 90%. It can also be set to an absolute byte value (similarly to the low watermark) to relocate shards away from a node if it has less than the specified amount of free space. This setting affects the allocation of all shards, whether previously allocated or not.

cluster.routing.allocation.disk.watermark.flood_stage

Controls the flood stage watermark. It defaults to 95%, meaning that Elasticsearch enforces a read-only index block (index.blocks.read_only_allow_delete) on every index that has one or more shards allocated on the node that has at least one disk exceeding the flood stage. This is a last resort to prevent nodes from running out of disk space. The index block must be released manually once there is enough disk space available to allow indexing operations to continue.

我們可以知道，es 集群的默認配置是當集群中的某個節點磁盤達到使用率為 85% 的時候，就不會在該節點進行創建副本，當磁盤使用率達到 90% 的時候，嘗試將該節點的副本重分配到其他節點。當磁盤使用率達到95% 的時候，當前節點的所有索引將被設置為只讀索引。

三、問題解決的辦法

1. 擴大磁盤

……

2. 刪除部分歷史索引

可以看看 elasticsearch定時刪除索引第二版

3. 更改es設置

更改配置文件(需要重啟es)
動態更改(api，無需重啟)

es 的設置默認是 85% 和 90 %，我們更改為 90%和 95%。

3.1、更改配置文件(需要重啟es)

在elasticsearch.yml 文件配置：

cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 90%
cluster.routing.allocation.disk.watermark.high: 95%
cluster.routing.allocation.disk.watermark.flood_stage: 98%

3.2、動態更改

所謂的動態更改就是通過 es 的 api 進行更改。transient 臨時更改，persistent是永久更改。

api 接口 /_cluster/settings

注意 cluster.routing.allocation.disk.watermark.flood_stage 參數是 6.0 版本開始才有的，在5的版本是沒有該配置的，是不支持的，我在修改5.6 的版本的時候添加了該參數，是有錯誤返回的 "reason":"persistent setting [cluster.routing.allocation.disk.watermark.flood_stage], not dynamically updateable"},"status":4001. 5.6 版本官方文檔鏈接：https://www.elastic.co/guide/en/elasticsearch/reference/5.6/disk-allocator.html

查看es 當前的配置

查看es 當前的配置 get 請求 /_cluster/settings 。

curl 172.1.2.208:9200/_cluster/settings
{
	"persistent": {
		"xpack": {
			"monitoring": {
				"collection": {
					"enabled": "true"
				}
			}
		}
	},
	"transient": {
		"cluster": {
			"routing": {
				"allocation": {
					"disk": {
						"watermark": {
							"low": "90%",
							"high": "95%"
						}
					}
				}
			},
			"info": {
				"update": {
					"interval": "1m"
				}
			}
		}
	}
}

永久更改 persistent

重啟后不失效。

{"persistent": 
   {  
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.info.update.interval": "1m"
    }
}

臨時更改 transient

重啟后配置失效。

{"transient": 
   {  
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.info.update.interval": "1m"
    }
}

示例：

root@111:~# curl -H "Content-Type: application/json"  -XPUT  172.1.2.208:9200/_cluster/settings  -d '{"transient": {    "cluster.routing.allocation.disk.watermark.low": "90%", "cluster.routing.allocation.disk.watermark.high": "95%", "cluster.info.update.interval": "1m"}}'

{"acknowledged":true,"persistent":{},"transient":{"cluster":{"routing":{"allocation":{"disk":{"watermark":{"low":"90%","high":"95%"}}}},"info":{"update":{"interval":"1m"}}}}}

四、擴展

其實我們在官方文檔也就可以看到，就是我們不僅僅可以使用百分比來進行設置，我們也可以使用空間的大小來進行設置，類似500mb這樣。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 elasticsearch 集群、節點、索引、分片、副本概念 kswapd0 進程導致CPU使用率過高 Linux inode索引節點使用率100%解決性能分析（3）- 短時進程導致用戶 CPU 使用率過高案例 CPU使用率過高怎么辦 Oracle查詢語句導致CPU使用率過高問題處理 cpu資源長期使用率過高導致系統內核鎖問題 CPU使用率過高代碼定位 centos7 未啟用swap導致內存使用率過高。性能分析（5）- 軟中斷導致 CPU 使用率過高的案例