記consul集群和spring cloud集成遇到的問題。

本文轉載自查看原文 2019-11-25 17:02 467

前兩天想在線上的consul組成一個集群，但只有兩台機器，兩台機器無法抵御一台機器失效，至少三台（https://www.consul.io/docs/internals/consensus.html#deployment-table）。但兩台機器consul起來時是沒有報錯的，從 server:8500/ui/上看服務也確實加入到了集群。但線上由gateway分發的服務卻會報：“微服務異常”，是由於zuul發生了(a failure occurs on a route). 我把一台機器上的服務關閉就正常了。

為了解決這個問題，試着在本地先解決。用本地兩台機器，本機(10.0.42.94)和旁邊一台開發環境機器（10.0.41.110）。兩台機器的consul啟動命令為：

nohup /bin/bash -c '/opt/consul agent -server --retry-join=10.0.42.94 -ui -bootstrap-expect=2 -data-dir=/usr/local/consul -node=devslave -advertise=10.0.41.110 -bind=0.0.0.0 -client=0.0.0.0' > /data/logs/consul/consul.log &

consul.exe agent -server -ui -bootstrap-expect=2 -data-dir=D:\data-dir\consul -node=devmaster  -advertise=10.0.42.94 -bind=0.0.0.0 -client=0.0.0.0

此時muc的服務在94和110的請求上都起了，往網關發的請求zuul會按照負載均衡的原則，查找服務名為muc的服務，分發請求到不同的機器。但此時我用postman往110的網關發請求，http://10.0.41.110:7979/muc/auth/code/image，看到所有的請求都跑到了110，但用postman往本地發請求，發現有的請求指向了94，有的請求指向了110.也就是說110的網關請求沒有實現負載均衡。但其實這兩個consul上，無論從10.0.42.94:8500/ui還是10.0.41.110:8500/ui上看，服務都是一模一樣的。但本機的地址是localhost，如圖：

點進去，如圖：

發現這個地址是無線網卡的內網ip，而另外一台顯示的是正常網卡ip。所以想到可能是這個原因，本機在consul注冊用了無線網卡ip，所以另外一台機器的請求無法找到本機，也就不會把請求分發到本機了，但如果是發向本機的請求卻能找到另外一台機器的服務。但為什么會是無線網卡的ip呢？這個問題沒有解決，但只要把無線網卡關了，重啟consul和服務，發現網關的負載均衡就正常了。
然后consul的ui展現如下：

可以看到，地址不再是localhost，檢查的地址也變成了正常的ip。
本地集群沒問題之后，轉到測試環境。兩台機器啟動命令如下：

nohup /bin/bash -c '/opt/consul agent -server  -ui -bootstrap-expect=2 -data-dir=/usr/local/consul -node=testmaster -advertise=192.168.101.220 -bind=0.0.0.0 -client=0.0.0.0' > /data/logs/consul/consul.log &
nohup /bin/bash -c '/opt/consul agent -server --retry-join=192.168.101.220 -ui -bootstrap-expect=2 -data-dir=/usr/local/consul -node=testslave -advertise=192.168.101.221 -bind=0.0.0.0 -client=0.0.0.0' > /data/logs/consul/consul.log &

但是很不幸，兩台機器無法選出leader

019/11/25 17:53:17 [WARN]  raft: not part of stable configuration, aborting election
    2019/11/25 17:53:18 [ERR] agent: failed to sync remote state: No cluster leader

后來查了下google，https://learn.hashicorp.com/consul/day-2-operations/outage，https://support.hashicorp.com/hc/en-us/articles/115015603408-Consul-Errors-And-Warnings

[WARN] raft: not part of stable configuration, aborting election

-> This means you don’t have a complete peers.json on all the servers (the server is not seeing itself in the peer configuration). You’ll need to stop all the servers and create an identical peers.json file on each, which includes all the server IP:port pairs. Once they all have the same peers.json file you can start them again.

后來我把兩台機器下面的/usr/local/consul全部刪了，然后重啟就沒問題了，也沒有像文檔里說的創建peers.json

可以看到這里用了host做為地址，但我在33上 curl http://iZbp1guuix5grexo50gpgzZ:8000/actuator/health是連不上的，可能這里就是問題。
於是我在/etc/hosts文件上加上下面這句就ok了。

192.168.101.221 iZbp1guuix5grexo50gpgzZ iZbp1guuix5grexo50gpgzZ

那為什么會用host作為地址呢？發現只有config的是ip

在每個工程的application-dev.yml配置文件里都加了 prefer-ip-address: true 這一項，但只有spring cloud config工程里的起了作用。因為其他的工程的配置項都是要用spring cloud config里取的。此時我發現配在application-dev.yml里的

    consul:
      host: rockysaas-consul
      port: 8500
      discovery:
        prefer-ip-address: true
        instance-id: instance-${spring.cloud.client.ip-address}-${spring.application.name}-${server.port}
        service-name: ${spring.application.name}

這些其實都沒用（可以隨便寫一個host沒有反應）。可能是spring cloud consul的配置和spring cloud config結合后，需要把cousul的配置加在bootstrap.yml里才有用。（Spring cloud consul里有提到

Distributed Configuration with Consul

）這種情況，是加載在 "bootstrap" phase。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 在Spring Cloud 2020中使用Consul配置中心遇到的問題 Spring Cloud Consul綜合整理集群環境下Consul集成Envoy實踐 SpringBoot + Spring Cloud Consul 服務注冊和發現 Spring Cloud Consul 實現服務注冊和發現集成 spring-cloud-starter-alibaba-seata @GlobalTransactional 失效的問題 Spring集成redis集群 spring-boot集成activiti的model遇到問題匯總 spring-data-redis,jedis和redis主從集成和遇到的問題 spring cloud集群負載均衡