今天在搞Spring結合JedisCluster操作Redis集群的時候突然發生了異常,令人不得費解...
redis.clients.jedis.exceptions.JedisConnectionException Could not get a resource from the pool
在度娘上查了好久的資料也沒有解決,最終把問題定位到了集群的身上,果然...集群中有個節點晾涼了.
好了開始解決問題吧(僅限於集群宕機或者與預期配置不符檢測)
怎么確認我們的集群有問題呢?
1.連接你的任意一個集群節點客戶端工具 ./redis-cli -p ip -c port -c
這么里邊多多輸入一些內容進行測試,看看會不會報如下的這個錯誤(CRC16算法會自動進行slot的匹配,簡單的測試就是set a a set b b...)
127.0.0.1:8001> set nima nia -> Redirected to slot [16259] located at :0 Could not connect to Redis at :0: Name or service not known Could not connect to Redis at :0: Name or service not known not connected>
2.通過redis-itrib.rb進行驗證(默認這個文件是在你解壓redis的src目錄下面)
./redis-3.0.0/src/redis-trib.rb check IP:8001|more 8001意思是你的集群的任意一個端口|more 可以裂解為無限制它自動去掃描
[root@rebirth redis-cluster]# ./redis-trib.rb check 169.254.18.18:8001 我的redis-trib.rb在集群目錄下面
Connecting to node 169.254.18.18:8001: OK
Connecting to node 169.254.18.18:8005: OK
Connecting to node 169.254.18.18:8006: OK
Connecting to node 169.254.18.18:8003: OK
Connecting to node 169.254.18.18:8002: OK
Connecting to node 169.254.18.18:8004: OK
>>> Performing Cluster Check (using node 169.254.18.18:8001)
M: d7d92360722c41bd710c9bfa8ebaff46df8230e2 169.254.18.18:8001
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: 6fa989430c84e882becc18af9064a89bf4a4d7de 169.254.18.18:8005
slots: (0 slots) slave
replicates cdba77220c27f07a24e7d93e61441a3219fac88d
S: 4804979442ac59b91558b2da228270b4218f1e90 169.254.18.18:8006
slots: (0 slots) slave
replicates 842bc2d2fe084dbae685953806c2b5f30f016e0b
M: 842bc2d2fe084dbae685953806c2b5f30f016e0b 169.254.18.18:8003
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: cdba77220c27f07a24e7d93e61441a3219fac88d 169.254.18.18:8002
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: f2ff252f8820291c9640474943ea7ebdccb08ddb 169.254.18.18:8004
slots: (0 slots) slave
replicates d7d92360722c41bd710c9bfa8ebaff46df8230e2
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
看看這里面
是否跟你配置的集群信息是否相符,是否有沒有連接成功的,缺少的話就證明你集群配置有問題了...
如果是集群配置有問題請往下看
解決方案:
1.把每個節點redis里面的nodes.conf文件全部刪除,只要看到這個在你集群文件里,你就rm -rf 干掉就ok了
2.重新配置集群 例如我們的一個6個redis 這里不過多講解,請移居度娘
./redis-trib.rb create --replicas 1 169.254.18.18:8001 169.254.18.18:8002 169.254.18.18:8003 169.254.18.18:8004 169.254.18.18:8005 169.254.18.18:8006