項目架構:
部分組件如下:
SpringCloudAlibaba(Nacos+Gateway+OpenFeign)+SpringBoot2.x+Redis
問題背景:
最近由於用戶量增大,在高峰時期,會導致用戶服務偶爾Redis出現連接超時的情況,
例如:從Redis中獲取手機驗證碼 ,登錄成功后,將token存入Redis,以及涉及到使用Redis的場景都會出現RedisConnectionFailureException
異常日志:
237614 2021-03-02 17:24:42.595 ERROR [d03f845825644cee8753539f24d840ad] [http-nio-7122-exec-32] c.l.c.b.e.GlobalExceptionHandler -java.net.SocketTimeoutException: Read timed out; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out 237615 org.springframework.data.redis.RedisConnectionFailureException: java.net.SocketTimeoutException: Read timed out; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Readtimed out 237616 at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:65) 237617 at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:42) 237618 at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) 237619 at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) 237620 at org.springframework.data.redis.connection.jedis.JedisConnection.convertJedisAccessException(JedisConnection.java:135) 237621 at org.springframework.data.redis.connection.jedis.JedisStringCommands.convertJedisAccessException(JedisStringCommands.java:751) 237622 at org.springframework.data.redis.connection.jedis.JedisStringCommands.get(JedisStringCommands.java:67) 237623 at org.springframework.data.redis.connection.DefaultedRedisConnection.get(DefaultedRedisConnection.java:260) 237624 at org.springframework.data.redis.connection.DefaultStringRedisConnection.get(DefaultStringRedisConnection.java:398) 237625 at org.springframework.data.redis.core.DefaultValueOperations$1.inRedis(DefaultValueOperations.java:57) 237626 at org.springframework.data.redis.core.AbstractOperations$ValueDeserializingRedisCallback.doInRedis(AbstractOperations.java:60) 237627 at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:228) 237628 at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:188) 237629 at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96) 237630 at org.springframework.data.redis.core.DefaultValueOperations.get(DefaultValueOperations.java:53) 237631 at com.xxxx.xxx.xxx.utils.RedisUtil.get(RedisUtil.java:242)
Maven相關的Redis依賴:
<!-- redis --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-redis</artifactId> <exclusions> <exclusion> <groupId>io.lettuce</groupId> <artifactId>lettuce-core</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>redis.clients</groupId> <artifactId>jedis</artifactId> </dependency>
Redis配置(單節點配置,沒有做分布式部署)
spring: redis: pool: maxActive: 300 maxIdle: 100 maxWait: 1000 host: xxxxxxxxx port: 6379 password: timeout: 2000 database: 5
排查過程:
這里分析可能的原因如下:
原因1.代碼中是否有keys *類似的查詢,由於Redis是單線程的,數據量大,單個命令執行時間過長,導致Redis客戶端請求超時,keys *類似的查詢非常消耗Redis的性能;
原因2.Redis配置文件配置的 timeout 超時時間過短,上一個請求還沒有執行結束,下一個請求無法獲執行,最終超時導致請求失敗;
原因3.Redis連接池配置的鏈接數太小,通過Prometheus 監控發現用戶服務 高峰時間請求量最高為180,考慮是否是連接數太小導致無法獲取Redis連接,從而失敗;
針對原因1:
這邊排查了項目中的代碼,沒有類似keys * 查詢,因此排除了這個可能行
針對原因2:
這邊在觀察了在出現 RedisConnectionFailureException時候,確認當前服務器Redis連接數峰值為15,配置文件中配置的超時時間配置為2000ms,由於確認原因1中的沒有非常耗時的查詢
所以這種可能行也被排除了;
由於以上原因1和原因2都排除了,這里考慮原因3,是連接數的問題
查看配置發現最大連接數是300,遠大於峰值180,配置數據似乎沒問題,
於是,在開發環境測試該配置,由於項目中使用的是Jedis連接池,沒有使用lettuce連接池(注意:SpringBoot2.x對應的Spring-Boot-Data-Redis依賴默認使用的連接池是lettuce,如果要使用Jedis連接池,需要排除默認連接池配置,引入Jedis連接池,見上面的Maven依賴)
進一步追蹤源碼發現
配置連接數相關的類為:
package org.apache.commons.pool2.impl; public class GenericObjectPoolConfig<T> extends BaseObjectPoolConfig<T> { public static final int DEFAULT_MAX_TOTAL = 8; public static final int DEFAULT_MAX_IDLE = 8; public static final int DEFAULT_MIN_IDLE = 0; private int maxTotal = 8; private int maxIdle = 8; private int minIdle = 0; ... }
加載該配置類的時機是在項目啟動初始化連接池的時候
package org.springframework.data.redis.connection.jedis; import java.time.Duration; import java.util.Optional; import javax.net.ssl.HostnameVerifier; import javax.net.ssl.SSLParameters; import javax.net.ssl.SSLSocketFactory; import org.apache.commons.pool2.impl.GenericObjectPoolConfig; import org.springframework.lang.Nullable; /** * Default implementation of {@literal JedisClientConfiguration}. * * @author Mark Paluch * @author Christoph Strobl * @since 2.0 */ class DefaultJedisClientConfiguration implements JedisClientConfiguration { private final boolean useSsl; private final Optional<SSLSocketFactory> sslSocketFactory; private final Optional<SSLParameters> sslParameters; private final Optional<HostnameVerifier> hostnameVerifier; private final boolean usePooling; private final Optional<GenericObjectPoolConfig> poolConfig; private final Optional<String> clientName; private final Duration readTimeout; private final Duration connectTimeout; DefaultJedisClientConfiguration(boolean useSsl, @Nullable SSLSocketFactory sslSocketFactory, @Nullable SSLParameters sslParameters, @Nullable HostnameVerifier hostnameVerifier, boolean usePooling, @Nullable GenericObjectPoolConfig poolConfig, @Nullable String clientName, Duration readTimeout, Duration connectTimeout) { this.useSsl = useSsl; this.sslSocketFactory = Optional.ofNullable(sslSocketFactory); this.sslParameters = Optional.ofNullable(sslParameters); this.hostnameVerifier = Optional.ofNullable(hostnameVerifier); this.usePooling = usePooling; this.poolConfig = Optional.ofNullable(poolConfig); this.clientName = Optional.ofNullable(clientName); this.readTimeout = readTimeout; this.connectTimeout = connectTimeout; }
Debug發現加載后仍然使用的是默認的連接數
public static final int DEFAULT_MAX_TOTAL = 8; public static final int DEFAULT_MAX_IDLE = 8; public static final int DEFAULT_MIN_IDLE = 0; private int maxTotal = 8; private int maxIdle = 8; private int minIdle = 0;
這里可能就是問題所在,配置文件中配置的最大連接數未生效,於是發現配置中這段配置已經失效
redis: pool: maxActive: 300 maxIdle: 100 maxWait: 1000
需要改為
redis: jedis: pool: maxActive: 300 maxIdle: 100 max-wait: 1000ms
修改后重啟生效,如配置的數據一致