了解leveldb 的snapshot首先得了解SequenceNumber。當插入數據時,SequenceNumber會依次增長,例如插入key1, key2, key3, key4等數據時,依次對應的SequenceNumber為1, 2, 3, 4。當然,並不是每次都會如此簡單,當存在合並寫時,例如key1, key2, key3, key4,key5. key1對應的SequenceNumber為1, key2, key3, key4對應的SequenceNumber為2, key5對應的SequenceNumber為5.
一條kv鍵對會安如下格式插入到memtable里去:
internal_key_size internal_key value_size value
----------------------------|-----------------------|-----------------------|---------------
其中,internal_key 里就帶了SequenceNumber, internal_key格式如下:
key SequenceNumber type(value類型)
---------------------|--------------------------------------|--------------------------
也就是說SequenceNumber會跟隨着kv鍵對存儲的。
接下來,我們看看snapshot的api, 接口和實現如下:
1 const Snapshot* DBImpl::GetSnapshot() { 2 MutexLock l(&mutex_); 3 return snapshots_.New(versions_->LastSequence()); 4 } 5 6 void DBImpl::ReleaseSnapshot(const Snapshot* s) { 7 MutexLock l(&mutex_); 8 snapshots_.Delete(reinterpret_cast<const SnapshotImpl*>(s)); 9 }
snapshots_為一個維護snapshot的雙向鏈表。每次獲取一個snapshot,就以當前的SequenceNumber new一個snapshot, 並插入到雙向鏈表中。當釋放一個snapshot時,就從雙向鏈表中刪除。
那么如何保持快照的數據不會被刪除了?在leveldb中,唯一會刪除數據的地方就是compaction了。so,我們看下DBImpl::DoCompactionWork的核心部分
1 Status DBImpl::DoCompactionWork(CompactionState* compact) { 2 //................... 3 if (snapshots_.empty()) { 4 compact->smallest_snapshot = versions_->LastSequence(); 5 } else { 6 compact->smallest_snapshot = snapshots_.oldest()->number_; 7 } 8 9 // Release mutex while we're actually doing the compaction work 10 mutex_.Unlock(); 11 12 Iterator* input = versions_->MakeInputIterator(compact->compaction); 13 input->SeekToFirst(); 14 Status status; 15 ParsedInternalKey ikey; 16 std::string current_user_key; 17 bool has_current_user_key = false; 18 SequenceNumber last_sequence_for_key = kMaxSequenceNumber; 19 for (; input->Valid() && !shutting_down_.Acquire_Load(); ) { 20 //.............................. 21 // Handle key/value, add to state, etc. 22 bool drop = false; 23 if (!ParseInternalKey(key, &ikey)) { 24 // Do not hide error keys 25 current_user_key.clear(); 26 has_current_user_key = false; 27 last_sequence_for_key = kMaxSequenceNumber; 28 } else { 29 if (!has_current_user_key || 30 user_comparator()->Compare(ikey.user_key, 31 Slice(current_user_key)) != 0) { 32 // First occurrence of this user key 33 current_user_key.assign(ikey.user_key.data(), ikey.user_key.size()); 34 has_current_user_key = true; 35 last_sequence_for_key = kMaxSequenceNumber; 36 } 37 38 if (last_sequence_for_key <= compact->smallest_snapshot) { 39 // Hidden by an newer entry for same user key 40 drop = true; // (A) 41 } else if (ikey.type == kTypeDeletion && 42 ikey.sequence <= compact->smallest_snapshot && 43 compact->compaction->IsBaseLevelForKey(ikey.user_key)) { 44 // For this user key: 45 // (1) there is no data in higher levels 46 // (2) data in lower levels will have larger sequence numbers 47 // (3) data in layers that are being compacted here and have 48 // smaller sequence numbers will be dropped in the next 49 // few iterations of this loop (by rule (A) above). 50 // Therefore this deletion marker is obsolete and can be dropped. 51 drop = true; 52 } 53 54 last_sequence_for_key = ikey.sequence; 55 } 56 57 if (!drop) { 58 //.............................. 59 } 60 61 input->Next(); 62 } 63 }
在第6行中,compact->smallest_snapshot 賦值為最舊的snapshot的SequenceNumber. 隨后創建了compation目標的iterator, 對於同一個key_a, 遍歷時可能會出現
(key_a, value5)--------(key_a, value4)--------(key_a, value3)--------(key_a, value2)--------(key_a, value1)的順序。
當遍歷至(key_a, value5)時, 會運行33-35行的代碼。隨后last_sequence_for_key賦值為(key_a, value5) , 下一次遍歷至(key_a, value4)時,將last_sequence_for_key 和compact->smallest_snapshot做比較,如果last_sequence_for_key小於compact->smallest_snapshot時,表示last_sequence_for_key比最舊的snaphot的SequenceNumber還要小,因此(key_a, value4)可以在compact時drop掉。否則,如果(key_a, value4)是刪除操作,並且其sequency小於最舊的snaphot的SequenceNumber, 並且比該kv所在level更高level上沒有相同key時這三個條件都滿足時,也可以在compact時drop掉。其它情況都不可以drop.
這樣的compact邏輯就是為了舊snapshot可以讀到舊的值,而不會因為后續的更新而變化。達到快照的目的。
Get時,可以通過option傳入snapshot參數。在Get邏輯中,實際的seek時會跳過SequenceNumber比snapshot大的kv鍵對。從而保證讀到的時snapshot時的值,而非后續的新值。
