在之前的文章中我們提到服務的優雅下線,見:
SpringCloud服務如何在Eureka安全優雅的下線
但這個對於ribbon調用其實是不平滑的,shutdown請求到后服務就馬上關閉了,服務消費此時未感應到服務下線了,會仍然往這個服務發送請求,從而導致報錯。
簡介方案有:一、開啟重試(前提是保證接口做好冪等處理)。
二、使用pause來下線服務(推薦)
操作步驟如下:
1、 服務提供方配置
| 后台端點禁用安全校驗 management.security.enabled=false # 開啟服務暫停端點 endpoints.pause.enabled=true # 禁用密碼驗證 endpoints.pause.sensitive=false |
由於這些管理端點比較敏感需要加一個filter來過濾IP白名單
代碼參考:對actuator的管理端點進行ip白名單限制(springBoot添加filter)
2、 服務消費者
| # 2秒拉取最新的注冊信息 eureka.client.registry-fetch-interval-seconds=2 # 2秒刷新ribbon中的緩存信息 |
3、發布流程
| Curl –X POST http://127.0.0.1:端口/pause Sleep 6S Kill -9 Java –jar xx.jar啟動服務 curl -I -m 10 -o /dev/null -s -w %{http_code} http://127.0.0.1:端口/health 來檢測是否是200,持續N秒,如果失敗則需要回滾發布並終止后續節點的發布。 |
說明:這里的sleep的最大理論值為: eureka.client.registry-fetch-interval-seconds + (ribbon.ServerListRefreshInterval+eureka.client.registry-fetch-interval-seconds) = 6S;
后面括號里的相加是因為這2個定時有可能恰好非常巧的錯過了才會出現,為了安全起見我們可以基於上述的公式再加個一兩秒。
為什么要訪問/health呢?主要是為了對服務進行預熱(主要是數據庫連接池/jedis連接池等),這樣當超時時間很多的服務在第一次請求時不會出現超時。
4、eureka
| # 5秒清理一次過期的注冊信息 # 如果是按照上面的流程來執行發布則其實可以不配,使用默認值 eureka.server.eviction-interval-timer-in-ms=5000 # 關閉自我保護 # 內網服務不需要進行分區保護 eureka.server.enable-self-preservation=false # 服務注冊5秒即可被發現 |
三、擴展tomcat的shutdownhook(不推薦,如果切換為成其他容器則無效了)
import java.time.Duration; import java.time.LocalDateTime; import java.util.concurrent.Executor; import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; import org.apache.catalina.connector.Connector; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.context.embedded.tomcat.TomcatConnectorCustomizer; import org.springframework.context.ApplicationListener; import org.springframework.context.annotation.Configuration; import org.springframework.context.event.ContextClosedEvent; import lombok.extern.slf4j.Slf4j; /** * 優雅關閉tomcat * @author yangzl * @data 2019年4月2日 * */ @Slf4j @Configuration public class TomcatGracefulShutdown implements TomcatConnectorCustomizer, ApplicationListener<ContextClosedEvent> { // 有個等待時間的配置 @Autowired private ShutdownProperties properties; private volatile Connector connector; @Override public void customize(Connector connector) { this.connector = connector; } @Override public void onApplicationEvent(final ContextClosedEvent event) { LocalDateTime startShutdown = LocalDateTime.now(); LocalDateTime stopShutdown = LocalDateTime.now(); try { log.info("We are now in down mode, please wait " + properties.getWaitSecond() + " second(s)..."); if (connector == null) { log.info("We are running unit test ... "); Thread.sleep(properties.getWaitSecond() * 1000); return; } connector.pause(); final Executor executor = connector.getProtocolHandler().getExecutor(); if (executor instanceof ThreadPoolExecutor) { log.info("executor is ThreadPoolExecutor"); final ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor; threadPoolExecutor.shutdown(); if (!threadPoolExecutor.awaitTermination(properties.getWaitSecond(), TimeUnit.SECONDS)) { log.warn("Tomcat thread pool did not shut down gracefully within " + properties.getWaitSecond() + " second(s). Proceeding with force shutdown"); } else { log.debug("Tomcat thread pool is empty, we stop now"); } } stopShutdown = LocalDateTime.now(); } catch (final InterruptedException ex) { log.error("The await termination has been interrupted : " + ex.getMessage()); Thread.currentThread().interrupt(); } finally { final long seconds = Duration.between(startShutdown, stopShutdown).getSeconds(); log.info("Shutdown performed in " + seconds + " second(s)"); } } }
調用shutdown時tomcat會此等待M秒后再退出,效果基本等同於第二種方案,但最終退出時有時會報錯,而且也僅僅適配tomcat,不夠通用。
