果然好記性不如爛筆頭,再簡單的東西不記錄下來總是會忘的!
本文首先會分析eureka中的緩存架構。並在此基礎上優化服務之間的感知
Eureka-Client獲取注冊信息
eureka-client獲取注冊信息可分為兩種,分別是全量獲取和增量獲取。
Eureka-Client 啟動時,首先執行一次全量獲取進行本地緩存注冊信息,代碼如下:
@Inject
DiscoveryClient(ApplicationInfoManager applicationInfoManager, EurekaClientConfig config, AbstractDiscoveryClientOptionalArgs args,
Provider<BackupRegistry> backupRegistryProvider) {
if (clientConfig.shouldFetchRegistry() && !fetchRegistry(false)) {
fetchRegistryFromBackup();
}
}
項目中配置
eureka.client.fetch-registry=true
便可以調用fetchRegistry方法,從eureka-server全量獲取注冊信息
Eureka-Client 啟動時,還會初始化一個緩存刷新定時任務
private void initScheduledTasks() {
if (clientConfig.shouldFetchRegistry()) {
// registry cache refresh timer
int registryFetchIntervalSeconds = clientConfig.getRegistryFetchIntervalSeconds();
int expBackOffBound = clientConfig.getCacheRefreshExecutorExponentialBackOffBound();
scheduler.schedule(
new TimedSupervisorTask(
"cacheRefresh",
scheduler,
cacheRefreshExecutor,
registryFetchIntervalSeconds,
TimeUnit.SECONDS,
expBackOffBound,
new CacheRefreshThread()
),
registryFetchIntervalSeconds, TimeUnit.SECONDS);
}
}
每間隔 registryFetchIntervalSeconds(默認值是30) 秒執行一次CacheRefreshThread任務。CacheRefreshThread最終還是執行了fetchRegistry方法。
private boolean fetchRegistry(boolean forceFullRegistryFetch) {
try {
Applications applications = getApplications();
if (clientConfig.shouldDisableDelta()
|| (!Strings.isNullOrEmpty(clientConfig.getRegistryRefreshSingleVipAddress()))
|| forceFullRegistryFetch
|| (applications == null)
|| (applications.getRegisteredApplications().size() == 0)
|| (applications.getVersion() == -1)) //Client application does not have latest library supporting delta
{
getAndStoreFullRegistry();
} else {
getAndUpdateDelta(applications);
}
applications.setAppsHashCode(applications.getReconcileHashCode());
} catch (Throwable e) {
logger.error(PREFIX + appPathIdentifier + " - was unable to refresh its cache! status = " + e.getMessage(), e);
return false;
} finally {
if (tracer != null) {
tracer.stop();
}
}
// Notify about cache refresh before updating the instance remote status
onCacheRefreshed();
// Update remote status based on refreshed data held in the cache
updateInstanceRemoteStatus();
// registry was fetched successfully, so return true
return true;
}
fetchRegistry首先判斷是全量獲取還是增量獲取,然后請求server端獲取注冊信息,成功后更新注冊信息。再觸發CacheRefreshed事件
Eureka-Server管理注冊信息
客戶端的請求到Server端后,通過ResponseCache返回服務信息
@GET
public Response getContainers(@PathParam("version") String version,
@HeaderParam(HEADER_ACCEPT) String acceptHeader,
@HeaderParam(HEADER_ACCEPT_ENCODING) String acceptEncoding,
@HeaderParam(EurekaAccept.HTTP_X_EUREKA_ACCEPT) String eurekaAccept,
@Context UriInfo uriInfo,
@Nullable @QueryParam("regions") String regionsStr) {
boolean isRemoteRegionRequested = null != regionsStr && !regionsStr.isEmpty();
String[] regions = null;
if (!isRemoteRegionRequested) {
EurekaMonitors.GET_ALL.increment();
} else {
regions = regionsStr.toLowerCase().split(",");
Arrays.sort(regions); // So we don't have different caches for same regions queried in different order.
EurekaMonitors.GET_ALL_WITH_REMOTE_REGIONS.increment();
}
// 判斷是否可以訪問
if (!registry.shouldAllowAccess(isRemoteRegionRequested)) {
return Response.status(Status.FORBIDDEN).build();
}
CurrentRequestVersion.set(Version.toEnum(version));
// 返回數據格式
KeyType keyType = Key.KeyType.JSON;
String returnMediaType = MediaType.APPLICATION_JSON;
if (acceptHeader == null || !acceptHeader.contains(HEADER_JSON_VALUE)) {
keyType = Key.KeyType.XML;
returnMediaType = MediaType.APPLICATION_XML;
}
// 響應緩存鍵( KEY )
Key cacheKey = new Key(Key.EntityType.Application,
ResponseCacheImpl.ALL_APPS,
keyType, CurrentRequestVersion.get(), EurekaAccept.fromString(eurekaAccept), regions
);
Response response;
if (acceptEncoding != null && acceptEncoding.contains(HEADER_GZIP_VALUE)) {
// 根據cacheKey返回注冊信息
response = Response.ok(responseCache.getGZIP(cacheKey))
.header(HEADER_CONTENT_ENCODING, HEADER_GZIP_VALUE)
.header(HEADER_CONTENT_TYPE, returnMediaType)
.build();
} else {
response = Response.ok(responseCache.get(cacheKey))
.build();
}
return response;
}
重點就是在responseCache中的get方法了了
String get(final Key key, boolean useReadOnlyCache) {
Value payload = getValue(key, useReadOnlyCache);
if (payload == null || payload.getPayload().equals(EMPTY_PAYLOAD)) {
return null;
} else {
return payload.getPayload();
}
}
private final ConcurrentMap<Key, Value> readOnlyCacheMap = new ConcurrentHashMap<Key, Value>();
private final LoadingCache<Key, Value> readWriteCacheMap;
this.readWriteCacheMap =
CacheBuilder.newBuilder().initialCapacity(1000)
.expireAfterWrite(serverConfig.getResponseCacheAutoExpirationInSeconds(), TimeUnit.SECONDS)
.removalListener(new RemovalListener<Key, Value>() {
@Override
public void onRemoval(RemovalNotification<Key, Value> notification) {
Key removedKey = notification.getKey();
if (removedKey.hasRegions()) {
Key cloneWithNoRegions = removedKey.cloneWithoutRegions();
regionSpecificKeys.remove(cloneWithNoRegions, removedKey);
}
}
})
.build(new CacheLoader<Key, Value>() {
@Override
public Value load(Key key) throws Exception {
if (key.hasRegions()) {
Key cloneWithNoRegions = key.cloneWithoutRegions();
regionSpecificKeys.put(cloneWithNoRegions, key);
}
Value value = generatePayload(key);
return value;
}
});
Value getValue(final Key key, boolean useReadOnlyCache) {
Value payload = null;
try {
if (useReadOnlyCache) {
//從只讀緩存中獲取注冊信息
final Value currentPayload = readOnlyCacheMap.get(key);
if (currentPayload != null) {
payload = currentPayload;
} else {
//只讀緩存不存在便從讀寫緩存中獲取信息
payload = readWriteCacheMap.get(key);
readOnlyCacheMap.put(key, payload);
}
} else {
payload = readWriteCacheMap.get(key);
}
} catch (Throwable t) {
logger.error("Cannot get value for key :" + key, t);
}
return payload;
}
這里采用了雙層緩存的結構首先從readOnlyCacheMap讀取數據,如果readOnlyCacheMap讀取不到則從readWriteCacheMap讀取數據。readOnlyCacheMap是個ConcurrentMap結構,而readWriteCacheMap則是一個guava cache,最大容量1000,180s后自動過期。
兩個map之間的數據是如何交互的呢。這里有個定時任務每隔30秒去對比一次兩個緩存中的數據,如果發現兩者不一致,則用readWriteCacheMap的值覆蓋readOnlyCacheMap的值
if (shouldUseReadOnlyResponseCache) {
timer.schedule(getCacheUpdateTask(),
new Date(((System.currentTimeMillis() / responseCacheUpdateIntervalMs) * responseCacheUpdateIntervalMs)
+ responseCacheUpdateIntervalMs),
responseCacheUpdateIntervalMs);
}
private TimerTask getCacheUpdateTask() {
return new TimerTask() {
@Override
public void run() {
logger.debug("Updating the client cache from response cache");
for (Key key : readOnlyCacheMap.keySet()) {
try {
CurrentRequestVersion.set(key.getVersion());
Value cacheValue = readWriteCacheMap.get(key);
Value currentCacheValue = readOnlyCacheMap.get(key);
//對比兩個緩存的值
if (cacheValue != currentCacheValue) {
readOnlyCacheMap.put(key, cacheValue);
}
} catch (Throwable th) {
logger.error("Error while updating the client cache from response cache", th);
}
}
}
};
}
現在我們知道了readOnlyCacheMap中的數據是從readWriteCacheMap獲得的,並且每隔30s同步一次。那么還有一個問題就是readWriteCacheMap中的數據是從哪里來的呢?
在readWriteCacheMap變量上find usages無法找到明確的信息,便在build方法中添加斷點
this.readWriteCacheMap =
CacheBuilder.newBuilder().initialCapacity(1000)
.expireAfterWrite(serverConfig.getResponseCacheAutoExpirationInSeconds(), TimeUnit.SECONDS)
.removalListener(new RemovalListener<Key, Value>() {
@Override
public void onRemoval(RemovalNotification<Key, Value> notification) {
Key removedKey = notification.getKey();
if (removedKey.hasRegions()) {
Key cloneWithNoRegions = removedKey.cloneWithoutRegions();
regionSpecificKeys.remove(cloneWithNoRegions, removedKey);
}
}
})
.build(new CacheLoader<Key, Value>() {
@Override
public Value load(Key key) throws Exception {
if (key.hasRegions()) {
Key cloneWithNoRegions = key.cloneWithoutRegions();
regionSpecificKeys.put(cloneWithNoRegions, key);
}
//添加斷點
Value value = generatePayload(key);
return value;
}
});
最終發現readWriteCacheMap的值是在同步任務中添加的
private TimerTask getCacheUpdateTask() {
return new TimerTask() {
@Override
public void run() {
logger.debug("Updating the client cache from response cache");
for (Key key : readOnlyCacheMap.keySet()) {
try {
CurrentRequestVersion.set(key.getVersion());
Value cacheValue = readWriteCacheMap.get(key);
//觸發load方法加載Value
Value currentCacheValue = readOnlyCacheMap.get(key);
//對比兩個緩存的值
if (cacheValue != currentCacheValue) {
readOnlyCacheMap.put(key, cacheValue);
}
} catch (Throwable th) {
logger.error("Error while updating the client cache from response cache", th);
}
}
}
};
}
好,觸發時機我們現在也知道了,我們再看下數據時怎么產生的。大致我們可以了解到readWriteCacheMap中的value是通過AbstractInstanceRegistry中的registry變量得到的
private final AbstractInstanceRegistry registry;
private Value generatePayload(Key key) {
Stopwatch tracer = null;
try {
String payload;
switch (key.getEntityType()) {
case Application:
boolean isRemoteRegionRequested = key.hasRegions();
if (ALL_APPS.equals(key.getName())) {
if (isRemoteRegionRequested) {
tracer = serializeAllAppsWithRemoteRegionTimer.start();
payload = getPayLoad(key, registry.getApplicationsFromMultipleRegions(key.getRegions()));
} else {
tracer = serializeAllAppsTimer.start();
payload = getPayLoad(key, registry.getApplications());
}
} else if (ALL_APPS_DELTA.equals(key.getName())) {
if (isRemoteRegionRequested) {
tracer = serializeDeltaAppsWithRemoteRegionTimer.start();
versionDeltaWithRegions.incrementAndGet();
versionDeltaWithRegionsLegacy.incrementAndGet();
payload = getPayLoad(key,
registry.getApplicationDeltasFromMultipleRegions(key.getRegions()));
} else {
tracer = serializeDeltaAppsTimer.start();
versionDelta.incrementAndGet();
versionDeltaLegacy.incrementAndGet();
payload = getPayLoad(key, registry.getApplicationDeltas());
}
} else {
tracer = serializeOneApptimer.start();
payload = getPayLoad(key, registry.getApplication(key.getName()));
}
break;
case VIP:
case SVIP:
tracer = serializeViptimer.start();
payload = getPayLoad(key, getApplicationsForVip(key, registry));
break;
default:
logger.error("Unidentified entity type: " + key.getEntityType() + " found in the cache key.");
payload = "";
break;
}
return new Value(payload);
} finally {
if (tracer != null) {
tracer.stop();
}
}
}
AbstractInstanceRegistry中的registry是一個多層緩存結構。client注冊,續約,下線的數據都是通過registry進行保存
private final ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry
= new ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>>();
registry有一個定時任務每隔60s去剔除過期的數據
evictionTimer.schedule(evictionTaskRef.get(),
//60*1000
serverConfig.getEvictionIntervalTimerInMs(),
serverConfig.getEvictionIntervalTimerInMs());
@Override
public void run() {
try {
long compensationTimeMs = getCompensationTimeMs();
logger.info("Running the evict task with compensationTime {}ms", compensationTimeMs);
evict(compensationTimeMs);
} catch (Throwable e) {
logger.error("Could not run the evict task", e);
}
}
總結下
eureka客戶端注冊,續約,下線都會請求到server端,server端把數據保存在registry這個雙層map中。每隔60s會有定時任務去檢查registry中保存的租約是否已經過期(租約有效期是90s),然后每隔30s會有定時任務更新readWriteCacheMap的值以及同步readWriteCacheMap和readOnlyCacheMap的值
服務感知優化
基於上述流程,想象下,假如一個服務異常下線server端沒有接受到下線請求,那么會有以下情況
- 0s 時服務未通知 Eureka Client 直接下線;
- 29s 時第一次過期檢查 evict 未超過 90s;
- 89s 時第二次過期檢查 evict 未超過 90s;
- 149s 時第三次過期檢查 evict 未續約時間超過了 90s,故將該服務實例從 registry 中刪除;
- 179s 時定時任務更新readWriteCacheMap以及從 readWriteCacheMap 更新至 readOnlyCacheMap;
- 209s 時 Eureka Client 從 Eureka Server 的 readOnlyCacheMap 更新;
- 239s 時 Ribbon 從 Eureka Client 更新。
(ribbon同樣也有緩存更新策略,默認30s)
因此,極限情況下服務消費者最長感知時間將無限趨近 240s。
怎么優化呢
server端:
減少registry服務剔除任務時間
減少兩個緩存同步定時任務時間
小型系統可以直接去掉readOnlyCacheMap
服務提供端
減少心跳時間
減少租約過期時間
服務消費端
減少ribbon更新時間
減少fetchRegist時間
EurekaServer修改如下配置:
#eureka server刷新readCacheMap的時間,注意,client讀取的是readCacheMap,這個時間決定了多久會把readWriteCacheMap的緩存更新到readCacheMap上
#默認30s
eureka.server.responseCacheUpdateIntervalMs=3000
#eureka server緩存readWriteCacheMap失效時間,這個只有在這個時間過去后緩存才會失效,失效前不會更新,過期后從registry重新讀取注冊服務信息,registry是一個ConcurrentHashMap。
#由於啟用了evict其實就用不太上改這個配置了
#默認180s
eureka.server.responseCacheAutoExpirationInSeconds=180
#啟用主動失效,並且每次主動失效檢測間隔為3s
Eureka Server會定時(間隔值是eureka.server.eviction-interval-timer-in-ms,默認值為0,默認情況不刪除實例)進行檢查,
如果發現實例在在一定時間(此值由客戶端設置的eureka.instance.lease-expiration-duration-in-seconds定義,默認值為90s)
內沒有收到心跳,則會注銷此實例。
eureka.server.eviction-interval-timer-in-ms=3000
Eureka服務提供方修改如下配置:
#服務過期時間配置,超過這個時間沒有接收到心跳EurekaServer就會將這個實例剔除
#注意,EurekaServer一定要設置eureka.server.eviction-interval-timer-in-ms否則這個配置無效,這個配置一般為服務刷新時間配置的三倍
#默認90s
eureka.instance.lease-expiration-duration-in-seconds=15
#服務刷新時間配置,每隔這個時間會主動心跳一次
#默認30s
eureka.instance.lease-renewal-interval-in-seconds=5
Eureka服務調用方修改如下配置:
#eureka client刷新本地緩存時間
#默認30s
eureka.client.registryFetchIntervalSeconds=5
#eureka客戶端ribbon刷新時間
#默認30s
ribbon.ServerListRefreshInterval=5000