問題背景
在業務使用redis過程中,出現了read timeout 的異常。
問題排查
直接原因
運維查詢redis慢查詢日志,發現在異常時間節點,有redis慢查詢日志,執行sadd 命令花費了1秒鍾。但由於redis是單線程應用,執行單條命令的阻塞,會造成其他命令的排隊等候,導致read timeout。
深入排查-為什么sadd這么慢呢
為什么sadd這么慢呢?查閱redis文檔看到,sadd操作的復雜度是O(1)的,實際使用本機docker搭建redis進行測試,使用腳本進行sadd,直到800W以上的量級才偶爾出現100毫秒以上的情況。(測試過程詳見后面)
搭建redis環境
偷懶在本機就行測試,使用docker跑起了redis應用,過程如下:
docker pull redis # 使用redis3.x版本docker run -itv ~/redis.conf:/redis.conf -p 32768:6379 --name myredis5 -d redis redis-server /redis.conf
測試腳本
#coding=utf-8import timeimport redisimport random
r = redis.Redis(host='x.x.x.x', port=xxxx, decode_responses=True)
k = 'key4'tarr = []
st = time.clock()
st2 = time.clock()
r.sadd(k, 1) # 創建連接也會有耗時for i in range(1, 1600000):
t1 = time.clock() * 1000
rn = random.randint(100000000000, 20000000000000)
r.sadd(k, rn)
t2 = time.clock() * 1000
c = t2 - t1
tarr.append(str(c)) if c > 100: print i, cprint time.clock()
s = "\n".join(tarr)with open('./result.txt', 'w') as f:
f.write(s)
測試結果
到達800W的時候開始偶爾出現sadd需要100ms的現象。
問題分析
查詢了很多資料,無意中看到redis del操作復雜度為O(n),這里補充一下超時的更多背景,舉例如下:
慢查詢日志時間:16號00點00分01秒,命令為sadd prefix_20180215, 且key有過期時間。
看到這里答案已經呼之欲出,是不是sadd觸發了redis是過期刪除操作,同時由於del命令的復雜度為O(n),時間花在了刪除過期數據上。
測試重現
for i in range(1, 1000000): t1 = time.clock() * 1000 rn = random.randint(100000000000, 20000000000000) r.sadd(k, rn) t2 = time.clock() * 1000 c = t2 - t1 tarr.append(str(c)) if c > 100: print i, c x = int(time.time()) x += 10 #延時10每秒過期r.expire(k, 10)while True: y = time.time() t1 = time.clock() * 1000 rn = random.randint(1, 1000000000) r.sadd(k, rn) t2 = time.clock() * 1000 tarr.append(str(c)) if c > 100:#復現sadd慢查詢的情況 print i, c if y > x + 5: # 超時時間就break breakprint time.clock()
重現的步驟很簡單,
-
給某個key sadd上足夠的數據(百萬級)
-
給key設置一個相對過期時間。
-
持續調用sadd命令,記錄調用時間。
-
最后觀察redis的慢查詢日志。
如猜想一樣,慢查詢日志中出現了SADD命令,耗時1秒。
解決方案與總結
由於redis 對於集合鍵的del操作復雜度均為O(n),所以對於集合鍵,最好設置通過分片,避免單個key的值過大。
另外,redis4.0已經通過配置支持延時刪除,可以通過lazyfree_lazy_expire/azyfree_lazy_eviction/lazyfree_lazy_server_del 來實現異步刪除的操作,避免異步阻塞
延伸閱讀
最后,讓我們來看看redis3.x和4.x處理刪除key的源碼吧。
redis 有三種淘汰key的機制,分別為
-
del命令
-
被動淘汰(當請求命令對應的鍵過期時進行刪除)
-
主動刪除(redis主動對鍵進行淘汰,回收內存)
我們先看看redis3.x版本中上面三種淘汰機制的入口代碼。
del命令 - delCommand
void delCommand(client *c) {
int deleted = 0, j; for (j = 1; j < c->argc; j++) {
expireIfNeeded(c->db,c->argv[j]); if (dbDelete(c->db,c->argv[j])) {
signalModifiedKey(c->db,c->argv[j]);
notifyKeyspaceEvent(NOTIFY_GENERIC, "del",c->argv[j],c->db->id);
server.dirty++;
deleted++;
}
}
addReplyLongLong(c,deleted);
}
處理流程相當的簡單,先檢查鍵是否過期,然后調用dbDelete進行刪除
被動淘汰 - expireIfNeeded
int expireIfNeeded(redisDb *db, robj *key) { mstime_t when = getExpire(db,key); //獲取過期時間
mstime_t now; if (when < 0) return 0; /* No expire for this key */
/* Don't expire anything while loading. It will be done later. */
if (server.loading) return 0; /* If we are in the context of a Lua script, we claim that time is
* blocked to when the Lua script started. This way a key can expire
* only the first time it is accessed and not in the middle of the
* script execution, making propagation to slaves / AOF consistent.
* See issue #1525 on Github for more information. */
now = server.lua_caller ? server.lua_time_start : mstime(); // 過去當前時間
/* If we are running in the context of a slave, return ASAP:
* the slave key expiration is controlled by the master that will
* send us synthesized DEL operations for expired keys.
*
* Still we try to return the right information to the caller,
* that is, 0 if we think the key should be still valid, 1 if
* we think the key is expired at this time. */
if (server.masterhost != NULL) return now > when; /* Return when this key has not expired */
if (now <= when) return 0;
/* Delete the key */
server.stat_expiredkeys++;
propagateExpire(db,key); // 把過期時間傳遞出去(從庫、AOF備份等)
notifyKeyspaceEvent(NOTIFY_EXPIRED,
"expired",key,db->id); // 對db內的鍵發生的變動進行通知,適用於pubsub 通過pubsub來傳遞消息,可以用來作為redis的執行監控
return dbDelete(db,key);
}
主動淘汰 - serverCron
server.c文件
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
/**
* sth not important
*/
...
/* We need to do a few operations on clients asynchronously. */
clientsCron(); /* Handle background operations on Redis databases. */
databasesCron(); /**
* sth not important
*/
...
server.cronloops++; return 1000/server.hz;
}/* This function handles 'background' operations we are required to do
* incrementally in Redis databases, such as active key expiring, resizing,
* rehashing. */void databasesCron(void) { /* Expire keys by random sampling. Not required for slaves
* as master will synthesize DELs for us. */
if (server.active_expire_enabled && server.masterhost == NULL)
activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW); /**
* sth not important
*/
}/* Try to expire a few timed out keys. The algorithm used is adaptive and
* will use few CPU cycles if there are few expiring keys, otherwise
* it will get more aggressive to avoid that too much memory is used by
* keys that can be removed from the keyspace.
*
* No more than CRON_DBS_PER_CALL databases are tested at every
* iteration.
*
* This kind of call is used when Redis detects that timelimit_exit is
* true, so there is more work to do, and we do it more incrementally from
* the beforeSleep() function of the event loop.
*
* Expire cycle type:
*
* If type is ACTIVE_EXPIRE_CYCLE_FAST the function will try to run a
* "fast" expire cycle that takes no longer than EXPIRE_FAST_CYCLE_DURATION
* microseconds, and is not repeated again before the same amount of time.
*
* If type is ACTIVE_EXPIRE_CYCLE_SLOW, that normal expire cycle is
* executed, where the time limit is a percentage of the REDIS_HZ period
* as specified by the REDIS_EXPIRELOOKUPS_TIME_PERC define. */void activeExpireCycle(int type) { int dbs_per_call = CRON_DBS_PER_CALL; /* We usually should test CRON_DBS_PER_CALL per iteration, with
* two exceptions:
*
* 1) Don't test more DBs than we have.
* 2) If last time we hit the time limit, we want to scan all DBs
* in this iteration, as there is work to do in some DB and we don't want
* expired keys to use memory for too much time. */
if (dbs_per_call > server.dbnum || timelimit_exit)
dbs_per_call = server.dbnum; //每次清理掃描的數據庫數
/* We can use at max ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC percentage of CPU time
* per iteration. Since this function gets called with a frequency of
* server.hz times per second, the following is the max amount of
* microseconds we can spend in this function. */
timelimit = 1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/server.hz/100;
timelimit_exit = 0; if (timelimit <= 0) timelimit = 1; if (type == ACTIVE_EXPIRE_CYCLE_FAST)
timelimit = ACTIVE_EXPIRE_CYCLE_FAST_DURATION; /* in microseconds. */
for (j = 0; j < dbs_per_call; j++) { int expired;
redisDb *db = server.db+(current_db % server.dbnum); /* Increment the DB now so we are sure if we run out of time
* in the current DB we'll restart from the next. This allows to
* distribute the time evenly across DBs. */
current_db++; /* Continue to expire if at the end of the cycle more than 25%
* of the keys were expired. */
// 如果有超過25%的鍵過期了則繼續掃描
do { unsigned long num, slots; long long now, ttl_sum; int ttl_samples; /* If there is nothing to expire try next DB ASAP. */
if ((num = dictSize(db->expires)) == 0) { //當前沒有需要過期的鍵
db->avg_ttl = 0; break;
}
slots = dictSlots(db->expires);
now = mstime(); /* When there are less than 1% filled slots getting random
* keys is expensive, so stop here waiting for better times...
* The dictionary will be resized asap. */
if (num && slots > DICT_HT_INITIAL_SIZE &&
(num*100/slots < 1)) break; /* The main collection cycle. Sample random keys among keys
* with an expire set, checking for expired ones. */
expired = 0;
ttl_sum = 0;
ttl_samples = 0; if (num > ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP)
num = ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP; // 3.2.11為20次
while (num--) {
dictEntry *de; long long ttl; if ((de = dictGetRandomKey(db->expires)) == NULL) break; //隨機獲取一個鍵
ttl = dictGetSignedIntegerVal(de)-now; if (activeExpireCycleTryExpire(db,de,now)) expired++; if (ttl > 0) { /* We want the average TTL of keys yet not expired. */
ttl_sum += ttl;
ttl_samples++;
}
} /**
* 這里有一些控制刪除時間的邏輯和其他邏輯。
*/
if (timelimit_exit) return; /* We don't repeat the cycle if there are less than 25% of keys
* found expired in the current DB. */
} while (expired > ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP/4); // 20次 / 4
}
}/* ======================= Cron: called every 100 ms ======================== *//* Helper function for the activeExpireCycle() function.
* This function will try to expire the key that is stored in the hash table
* entry 'de' of the 'expires' hash table of a Redis database.
*
* If the key is found to be expired, it is removed from the database and
* 1 is returned. Otherwise no operation is performed and 0 is returned.
*
* When a key is expired, server.stat_expiredkeys is incremented.
*
* The parameter 'now' is the current time in milliseconds as is passed
* to the function to avoid too many gettimeofday() syscalls. */int activeExpireCycleTryExpire(redisDb *db, dictEntry *de, long long now) { long long t = dictGetSignedIntegerVal(de); if (now > t) {
sds key = dictGetKey(de);
robj *keyobj = createStringObject(key,sdslen(key));
propagateExpire(db,keyobj);
dbDelete(db,keyobj);
notifyKeyspaceEvent(NOTIFY_EXPIRED, "expired",keyobj,db->id);
decrRefCount(keyobj);
server.stat_expiredkeys++; return 1;
} else { return 0;
}
}
主動刪除的調用路徑為serverCron -> databasesCron -> activeExpireCycle -> activeExpireCycleTryExpire, 我們主要看看activeExpireCycleTryExpire。
主動淘汰是通過隨機采樣來進行刪除的,隨機的算法很簡單,就是通過random來進行的,先random出slot,然后random出slot上的鏈表中的某個節點。另外會根據刪除時間長短和過期鍵的數量來決定一次 主動淘汰的掃描db數量和次數。
順帶說說,serverCron是redis的 周期任務,通過定時器注冊,databasesCron除了主動淘汰鍵,還會做rehash、resize等事情。
底層調用
三種機制雖然不同,但他們調用的底層都是相同的——dbDelete方法。
db.c 文件
/* Delete a key, value, and associated expiration entry if any, from the DB */int dbDelete(redisDb *db, robj *key) { /* Deleting an entry from the expires dict will not free the sds of
* the key, because it is shared with the main dictionary. */
if (dictSize(db->expires) > 0) dictDelete(db->expires,key->ptr); if (dictDelete(db->dict,key->ptr) == DICT_OK) { if (server.cluster_enabled) slotToKeyDel(key); return 1;
} else { return 0;
}
}
dict.c文件
int dictDelete(dict *ht, const void *key) { return dictGenericDelete(ht,key,0);
}/* Search and remove an element */static int dictGenericDelete(dict *d, const void *key, int nofree)
{
unsigned int h, idx;
dictEntry *he, *prevHe;
int table; if (d->ht[0].size == 0) return DICT_ERR; /* d->ht[0].table is NULL */
if (dictIsRehashing(d)) _dictRehashStep(d);
h = dictHashKey(d, key); for (table = 0; table <= 1; table++) {
idx = h & d->ht[table].sizemask;
he = d->ht[table].table[idx];
prevHe = NULL; while(he) { if (key==he->key || dictCompareKeys(d, key, he->key)) { /* Unlink the element from the list */
if (prevHe)
prevHe->next = he->next; else
d->ht[table].table[idx] = he->next; if (!nofree) {
dictFreeKey(d, he);
dictFreeVal(d, he);
}
zfree(he);
d->ht[table].used--; return DICT_OK;
}
prevHe = he;
he = he->next;
} if (!dictIsRehashing(d)) break;
} return DICT_ERR; /* not found */}/* ------------------------------- Macros ------------------------------------*/#define dictFreeVal(d, entry) \
if ((d)->type->valDestructor) \
(d)->type->valDestructor((d)->privdata, (entry)->v.val)
server.c
/* Db->dict, keys are sds strings, vals are Redis objects. */dictType dbDictType = {
dictSdsHash, /* hash function */
NULL, /* key dup */
NULL, /* val dup */
dictSdsKeyCompare, /* key compare */
dictSdsDestructor, /* key destructor */
dictObjectDestructor /* val destructor */};void dictObjectDestructor(void *privdata, void *val){
DICT_NOTUSED(privdata); if (val == NULL) return; /* Values of swapped out keys as set to NULL */
decrRefCount(val);
}
object.c
void decrRefCount(robj *o) { if (o->refcount <= 0) serverPanic("decrRefCount against refcount <= 0"); if (o->refcount == 1) { switch(o->type) { case OBJ_STRING: freeStringObject(o); break; case OBJ_LIST: freeListObject(o); break; case OBJ_SET: freeSetObject(o); break; case OBJ_ZSET: freeZsetObject(o); break; case OBJ_HASH: freeHashObject(o); break; default: serverPanic("Unknown object type"); break;
}
zfree(o);
} else {
o->refcount--;
}
}
void freeSetObject(robj *o) { switch (o->encoding) { case OBJ_ENCODING_HT:
dictRelease((dict*) o->ptr); break; case OBJ_ENCODING_INTSET:
zfree(o->ptr); break; default:
serverPanic("Unknown set encoding type");
}
}
可以看到核心的刪除是在dictFreeVal里,對應了一個宏,這個宏調用的是對應dictType的 valDestructor,也就是dbDictType里指定的dictObjectDestructor函數,對應的刪除操作在decrRefCount(嚴格來說是通過引用計數來管理聲明周期)
decrRefCount內對每種數據類型有對應的釋放方法,我們來看set的釋放方法freeSetObject方法。根據Set的兩種數據類型有兩種處理方式,intset只需要釋放指針就好了,如果是哈希表則調用dictRelease方法。
dict.c
/* Clear & Release the hash table */void dictRelease(dict *d)
{
_dictClear(d,&d->ht[0],NULL);
_dictClear(d,&d->ht[1],NULL);
zfree(d);
}/* Destroy an entire dictionary */int _dictClear(dict *d, dictht *ht, void(callback)(void *)) {
unsigned long i; /* Free all the elements */
for (i = 0; i < ht->size && ht->used > 0; i++) {
dictEntry *he, *nextHe; if (callback && (i & 65535) == 0) callback(d->privdata); if ((he = ht->table[i]) == NULL) continue; while(he) {
nextHe = he->next;
dictFreeKey(d, he);
dictFreeVal(d, he);
zfree(he);
ht->used--;
he = nextHe;
}
} /* Free the table and the allocated cache structure */
zfree(ht->table); /* Re-initialize the table */
_dictReset(ht); return DICT_OK; /* never fails */}
至此(dictClear方法)我們可以看到這是一個O(N)的過程,需要遍歷ht每一個元素並進行刪除,所以都存在阻塞redis的風險。(即使是主動淘汰的機制)
這一點在redis4.x系列已經通過延遲刪除解決。
