問題

一直在研究ES的監控怎么做，想偷點懶，不去通過API獲取然后計算，就想找個現成的插件或者監控軟件，只要裝個agent就可以，然后就找到了x-pack，插件裝好了之后，需要重啟ES集群，線上的ES集群我想着既然是集群一台一台重啟應該不會有問題的，太高估了，重啟一台后，整個集群掛了......

操作過程

1、系統

[centos@ip-172-0-0-233 bin]$ cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core)

2、ES版本

[centos@ip-172-0-0-233 bin]$ ./elasticsearch --version
Version: 5.0.2, Build: f6b4951/2016-11-24T10:07:18.101Z, JVM: 1.8.0_131

3、殺進程

ps -ef | grep pid
kill -9 pid

這樣操作完就后悔了，不是每個服務都是這么殺的，不知道這步操作對集群掛了有沒有一定的影響。

4、報錯信息

[2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] loaded module [lang-painless]
[2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] loaded module [percolator]
[2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] loaded module [reindex]
[2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] loaded module [transport-netty3]
[2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] loaded module [transport-netty4]
[2019-10-17T08:43:39,084][INFO ][o.e.p.PluginsService     ] [node-1] no plugins loaded
[2019-10-17T08:43:41,612][INFO ][o.e.n.Node               ] [node-1] initialized
[2019-10-17T08:43:41,613][INFO ][o.e.n.Node               ] [node-1] starting ...
[2019-10-17T08:43:41,812][INFO ][o.e.t.TransportService   ] [node-1] publish_address {172.0.0.16:9300}, bound_addresses {172.30.36.146:9300}
[2019-10-17T08:43:41,817][INFO ][o.e.b.BootstrapCheck     ] [node-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks

[2019-10-17T08:44:11,833][WARN ][o.e.n.Node               ] [node-1] timed out while waiting for initial discovery state - timeout: 30s
[2019-10-17T08:44:11,839][INFO ][o.e.h.HttpServer         ] [node-1] publish_address {172.0.0.16:9200}, bound_addresses {172.30.36.146:9200}
[2019-10-17T08:44:11,839][INFO ][o.e.n.Node               ] [node-1] started
[2019-10-17T08:44:12,001][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,001][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,003][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,010][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,010][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,228][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,758][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,759][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,760][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,814][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,814][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,815][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,815][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,817][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,817][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,817][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,820][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,820][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,821][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,822][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,822][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,823][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,824][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,826][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,827][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,827][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,828][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,828][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,830][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:12,830][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-1] no known master node, scheduling a retry
[2019-10-17T08:44:42,012][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[2019-10-17T08:44:42,012][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[2019-10-17T08:44:42,013][WARN ][r.suppressed             ] path: /_cluster/state/metadata, params: {metric=metadata}
org.elasticsearch.discovery.MasterNotDiscoveredException
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:214) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:350) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:240) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:957) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) [elasticsearch-5.0.2.jar:5.0.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2019-10-17T08:44:42,013][WARN ][r.suppressed             ] path: /_cluster/state/metadata, params: {metric=metadata}
org.elasticsearch.discovery.MasterNotDiscoveredException
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:214) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:350) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:240) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:957) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) [elasticsearch-5.0.2.jar:5.0.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2019-10-17T08:44:42,760][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [node-1] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[2019-10-17T08:44:42,761][WARN ][r.suppressed             ] path: /_cluster/state/metadata, params: {metric=metadata}
org.elasticsearch.discovery.MasterNotDiscoveredException
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:214) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:350) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:240) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:957) [elasticsearch-5.0.2.jar:5.0.2]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) [elasticsearch-5.0.2.jar:5.0.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

5、配置文件

cluster.name: lile
node.name: node-1
bootstrap.memory_lock: true
network.host: 172.0.0.16
http.port: 9200
discovery.zen.ping.unicast.hosts: ["172.0.0.16","172.0.0.17","172.0.0.18"]
discovery.zen.minimum_master_nodes: 2
http.cors.enabled: true 
http.cors.allow-origin: "*"
path.data: /data/elasticsearch/data
path.logs: /data/elasticsearch/logs

三、解決辦法

          各種重啟都沒有，在網上查到的，都是重啟就好了，但是使勁的重啟也沒好。但是當discovery.zen.minimum_master_nodes這個值設置為1的時候，可以啟動成功，但是三台都成了master了。后來看到有個這個參數，加上然后全部重啟就好了。 
        

discovery.zen.ping_timeout: 60s

四、分析原因

還沒細究，感覺是集群互相查找的時間太短了，沒有找到對方，因為得2台才能形成集群

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 OceanBase三節點部署&&擴容 zookeeper集群創建（三節點） ceph搭建配置-三節點 clickhouse高可用(三節點) 三節點Hadoop集群搭建部署elasticsearch（三節點）集群+filebeat+kibana MongoDB三節點副本集搭建 1. 環境准備 — OpenStack Queens 三節點部署 k8s 三節點簽發所需證書 Cassandra集群：一，搭建一個三節點的集群