優化elastic集群踩過的坑
原架構
新架構
想象很美好、過程很艱辛、結果很nice
經過上面的架構調整,es集群能夠運行的“穩如老狗”,能夠輕輕松松承擔聚合,數據寫入,集群基本處於不敗之地,退一步就算無法聚合,但是也不影響簡單的日志查詢。
當然在改造過程中也遇到了各種問題,簡單記錄如下:
問題一:變更了集群架構,將原來的master節點直接轉換為node節點,新添加的三台配置低的機器作為新的master節點部署集群,在將原架構的master和node節點變更新node節點加入新的集群,這個會有一個問題 原來的node節點中是存在數據的,而數據里面會有一個cluster uuid的標識,而這個正是原來集群的存在的uuid,而將node加入到新集群中會有如下報錯:
"Caused by: org.elasticsearch.transport.RemoteTransportException: [hot-8][172.18.0.2:9300][internal:cluster/coordination/join/validate]",
"Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a differentcluster uuid yawFIpzSS-erlTvqNOLI_g than local cluster uuid IsAK0BSURZyZoVTI3eopfw, rejecting",
解決方案:最簡單的解決方案是將原來的master節點依舊保持為master節點
那么問題來了,在整體架構調整中master節點的配置是不需要很高的,而原來的master的配置是非常高的,這樣就有了資源浪費等問題( 畢竟大家基本都是用的雲環境,包年包月的基礎將配-眾所周知是比較燒錢的),面對這一問題的解決方案是,將原集群的master正常啟動,並且保證node節點可以正常加入到原集群,也就是恢復到原來的樣子,等集群恢復正常之后(關鍵點來了) 在講新的master節點加入到舊的集群當中,全部加入之后就有了6個master節點,這個時候再把舊的master下掉,這樣集群的uuid就不會變更,這個時候在將原來的master變更為node節點加入到新的集群當中,這樣就完美解決掉出現的第一個問題(其實這也是方案未考慮周全所采的坑)
問題二:重新生成新集群證書設置密碼的時候遇到的各種問題
# 生成密碼
[root@28ef648fe6a6 elasticsearch]# bin/elasticsearch-setup-passwords interactive
Unexpected response code [500] from calling GET http://10.105.6.223:9200/_security/_authenticate?pretty
It doesn't look like the X-Pack security feature is enabled on this Elasticsearch node.
Please check if you have enabled X-Pack security in your elasticsearch.yml configuration file.
ERROR: X-Pack Security is disabled by configuration.
解決:編輯 elasticsearch.yml將和X-Pack相關的配置打開
[root@79d23e3e6630 elasticsearch]# bin/elasticsearch-setup-passwords interactive
Your cluster health is currently RED.
This means that some cluster data is unavailable and your cluster is not fully functional.
It is recommended that you resolve the issues with your cluster before running elasticsearch-setup-passwords.
It is very likely that the password changes will fail when run against an unhealthy cluster.
Do you want to continue with the password setup process [y/N]y
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y
Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana]:
Reenter password for [kibana]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:
Connection failure to: http://10.105.5.201:9200/_security/user/apm_system/_password?pretty failed: Read timed out
ERROR: Failed to set password for user [apm_system].
# 解決
# 集群是red狀態,集群未恢復所以設置密碼的之后會一直報錯,等集群恢復之后在設置密碼
[root@79d23e3e6630 elasticsearch]# bin/elasticsearch-setup-passwords interactive
Failed to authenticate user 'elastic' against http://10.105.5.201:9200/_security/_authenticate?pretty
Possible causes include:
* The password for the 'elastic' user has already been changed on this cluster
* Your elasticsearch node is running against a different keystore
This tool used the keystore at /usr/share/elasticsearch/config/elasticsearch.keystore
ERROR: Failed to verify bootstrap password
# 解決
# 密碼已經設置過了,不能重復設置,仍要設置密碼需要刪除elasticsearch.keystore文件並重啟集群
結果就是現在很穩