【Redis連接超時】記錄線上RedisConnectionFailureException異常排查過程


項目架構:

  部分組件如下:

  SpringCloudAlibaba(Nacos+Gateway+OpenFeign)+SpringBoot2.x+Redis

問題背景:

  最近由於用戶量增大,在高峰時期,會導致用戶服務偶爾Redis出現連接超時的情況,

  例如:從Redis中獲取手機驗證碼 ,登錄成功后,將token存入Redis,以及涉及到使用Redis的場景都會出現RedisConnectionFailureException

  異常日志:

237614  2021-03-02 17:24:42.595 ERROR [d03f845825644cee8753539f24d840ad] [http-nio-7122-exec-32] c.l.c.b.e.GlobalExceptionHandler -java.net.SocketTimeoutException: Read timed out; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
237615  org.springframework.data.redis.RedisConnectionFailureException: java.net.SocketTimeoutException: Read timed out; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Readtimed out
237616          at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:65)
237617          at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:42)
237618          at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
237619          at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
237620          at org.springframework.data.redis.connection.jedis.JedisConnection.convertJedisAccessException(JedisConnection.java:135)
237621          at org.springframework.data.redis.connection.jedis.JedisStringCommands.convertJedisAccessException(JedisStringCommands.java:751)
237622          at org.springframework.data.redis.connection.jedis.JedisStringCommands.get(JedisStringCommands.java:67)
237623          at org.springframework.data.redis.connection.DefaultedRedisConnection.get(DefaultedRedisConnection.java:260)
237624          at org.springframework.data.redis.connection.DefaultStringRedisConnection.get(DefaultStringRedisConnection.java:398)
237625          at org.springframework.data.redis.core.DefaultValueOperations$1.inRedis(DefaultValueOperations.java:57)
237626          at org.springframework.data.redis.core.AbstractOperations$ValueDeserializingRedisCallback.doInRedis(AbstractOperations.java:60)
237627          at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:228)
237628          at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:188)
237629          at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96)
237630          at org.springframework.data.redis.core.DefaultValueOperations.get(DefaultValueOperations.java:53)
237631          at com.xxxx.xxx.xxx.utils.RedisUtil.get(RedisUtil.java:242)

  Maven相關的Redis依賴:

  <!-- redis -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>io.lettuce</groupId>
                    <artifactId>lettuce-core</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
        </dependency>

 

  Redis配置(單節點配置,沒有做分布式部署)

spring: 
    redis:
      pool:
      maxActive: 300
      maxIdle: 100
      maxWait: 1000
      host: xxxxxxxxx
      port: 6379
      password:
      timeout: 2000
      database: 5

 

排查過程:

  這里分析可能的原因如下:

  原因1.代碼中是否有keys *類似的查詢,由於Redis是單線程的,數據量大,單個命令執行時間過長,導致Redis客戶端請求超時,keys *類似的查詢非常消耗Redis的性能;

  原因2.Redis配置文件配置的 timeout 超時時間過短,上一個請求還沒有執行結束,下一個請求無法獲執行,最終超時導致請求失敗;

  原因3.Redis連接池配置的鏈接數太小,通過Prometheus 監控發現用戶服務  高峰時間請求量最高為180,考慮是否是連接數太小導致無法獲取Redis連接,從而失敗;

  

  針對原因1:

    這邊排查了項目中的代碼,沒有類似keys * 查詢,因此排除了這個可能行

  針對原因2:

    這邊在觀察了在出現 RedisConnectionFailureException時候,確認當前服務器Redis連接數峰值為15,配置文件中配置的超時時間配置為2000ms,由於確認原因1中的沒有非常耗時的查詢

    所以這種可能行也被排除了;

  

  由於以上原因1和原因2都排除了,這里考慮原因3,是連接數的問題

  查看配置發現最大連接數是300,遠大於峰值180,配置數據似乎沒問題,

  於是,在開發環境測試該配置,由於項目中使用的是Jedis連接池,沒有使用lettuce連接池(注意:SpringBoot2.x對應的Spring-Boot-Data-Redis依賴默認使用的連接池是lettuce,如果要使用Jedis連接池,需要排除默認連接池配置,引入Jedis連接池,見上面的Maven依賴)

  進一步追蹤源碼發現

  配置連接數相關的類為:

package org.apache.commons.pool2.impl;

public class GenericObjectPoolConfig<T> extends BaseObjectPoolConfig<T> {
    public static final int DEFAULT_MAX_TOTAL = 8;
    public static final int DEFAULT_MAX_IDLE = 8;
    public static final int DEFAULT_MIN_IDLE = 0;
    private int maxTotal = 8;
    private int maxIdle = 8;
    private int minIdle = 0;
...

}

  加載該配置類的時機是在項目啟動初始化連接池的時候

    

package org.springframework.data.redis.connection.jedis;

import java.time.Duration;
import java.util.Optional;

import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.SSLParameters;
import javax.net.ssl.SSLSocketFactory;

import org.apache.commons.pool2.impl.GenericObjectPoolConfig;
import org.springframework.lang.Nullable;

/**
 * Default implementation of {@literal JedisClientConfiguration}.
 *
 * @author Mark Paluch
 * @author Christoph Strobl
 * @since 2.0
 */
class DefaultJedisClientConfiguration implements JedisClientConfiguration {

    private final boolean useSsl;
    private final Optional<SSLSocketFactory> sslSocketFactory;
    private final Optional<SSLParameters> sslParameters;
    private final Optional<HostnameVerifier> hostnameVerifier;
    private final boolean usePooling;
    private final Optional<GenericObjectPoolConfig> poolConfig;
    private final Optional<String> clientName;
    private final Duration readTimeout;
    private final Duration connectTimeout;

    DefaultJedisClientConfiguration(boolean useSsl, @Nullable SSLSocketFactory sslSocketFactory,
            @Nullable SSLParameters sslParameters, @Nullable HostnameVerifier hostnameVerifier, boolean usePooling,
            @Nullable GenericObjectPoolConfig poolConfig, @Nullable String clientName, Duration readTimeout,
            Duration connectTimeout) {

        this.useSsl = useSsl;
        this.sslSocketFactory = Optional.ofNullable(sslSocketFactory);
        this.sslParameters = Optional.ofNullable(sslParameters);
        this.hostnameVerifier = Optional.ofNullable(hostnameVerifier);
        this.usePooling = usePooling; 
        this.poolConfig = Optional.ofNullable(poolConfig);
        this.clientName = Optional.ofNullable(clientName);
        this.readTimeout = readTimeout;
        this.connectTimeout = connectTimeout;
    }

  Debug發現加載后仍然使用的是默認的連接數 

    public static final int DEFAULT_MAX_TOTAL = 8; public static final int DEFAULT_MAX_IDLE = 8; public static final int DEFAULT_MIN_IDLE = 0; private int maxTotal = 8; private int maxIdle = 8; private int minIdle = 0;

這里可能就是問題所在,配置文件中配置的最大連接數未生效,於是發現配置中這段配置已經失效
 redis:
      pool:
      maxActive: 300
      maxIdle: 100
      maxWait: 1000
 需要改為
  redis:
      jedis:
        pool:
          maxActive: 300
          maxIdle: 100
          max-wait: 1000ms
 
        

  修改后重啟生效,如配置的數據一致





免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM