redis bloom filter-功能介紹和原理

本文轉載自查看原文 2021-04-23 18:00 299 中間件/ 后端/ redis

快速安裝體驗
RedisBloom-func
- 參數設置
  - BF.RESERVE
- 添加item操作
  - BF.ADD
  - BF.MADD
  - BF.INSERT
- 檢測item
  - BF.EXISTS
  - BF.MEXISTS
- 其他
擴展
RedisBloom壓測
參考文檔

快速安裝體驗

build

git clone https://github.com/RedisBloom/RedisBloom.git
cd redisbloom
make
-----
以上命令會生成redisbloom.so文件

動態load redisbloom

# MODULE LOAD /redisbloom.so (編譯出的so路徑)
查看已加載的插件module list
1) 1) "name"  插件名字
   2) "bf"    模塊名
   3) "ver"   模塊版本號
   4) (integer) 999999 
# 動態執行模塊卸載
# MODULE UNLOAD 模塊名

啟動加載

# Assuming you have a redis build from the unstable branch:
./redis-server --loadmodule ./redisbloom.so (編譯出的so路徑)

redis-server --loadmodule /path/to/redisbloom.so INITIAL_SIZE 400 ERROR_RATE 0.004
The default error rate is 0.01 and the default initial capacity is 100 .

RedisBloom-func

參數設置

BF.RESERVE

Format:BF.RESERVE {key} {error_rate} {capacity} [EXPANSION {expansion}] [NONSCALING]
eg：bf.reserve key3 0.1 5 NONSCALING
OK
127.0.0.1:6379> bf.add key3 0
(integer) 1
127.0.0.1:6379> bf.add key3 1
(integer) 1
127.0.0.1:6379> bf.add key3 2
(integer) 1
127.0.0.1:6379> bf.add key3 3
(integer) 1
127.0.0.1:6379> bf.add key3 4
(integer) 1
127.0.0.1:6379> bf.add key3 5
(error) ERR non scaling filter is full
容量設置為5，且配置為不可以擴容，添加第6個元素時即提示BloomFilter is full。

Parameters:

key：filter 名字
error_rate：期望錯誤率，期望錯誤率越低，需要的空間就越大。
capacity：初始容量，當實際元素的數量超過這個初始化容量時，誤判率上升。
可選參數
EXPANSION：當添加到布隆過濾器中的數據達到初始容量后，布隆過濾器會自動創建一個子過濾器，子過濾器的大小是上一個過濾器大小乘以expansion；expansion的默認值是2，也就是說布隆過濾器擴容默認是2倍擴容
NONSCALING：設置此項后，當添加到布隆過濾器中的數據達到初始容量后，不會擴容過濾器，並且會拋出異常（(error) ERR non scaling filter is full）
說明：BloomFilter的擴容是通過增加BloomFilter的層數來完成的。每增加一層，在查詢的時候就可能會遍歷多層BloomFilter來完成，每一層的容量都是上一層的兩倍（默認）。默認的error_rate是 0.01，capacity是 100

添加item操作

BF.ADD

BF.ADD {key} {item}
eg：BF.ADD key0 v0
(integer) 1

功能：向key指定的Bloom中添加一個元素

key：filter 名字
item：單個元素
返回值：1：新添加, 0：已經被添加過，如果設置了capacity且配置為不可以擴容，會返回(error) ERR non scaling filter is full

BF.MADD

BF.MADD {key} {item ...}
eg：BF.ADD key0 v1 v2
1) (integer) 1
2) (integer) 1

功能：向key指定的Bloom中添加多個元素

key：filter 名字
item：單個或者多個元素
返回值(數組)：1：新添加, 0：已經被添加過，如果設置了capacity且配置為不可以擴容，會返回(error) ERR non scaling filter is full

BF.INSERT

BF.INSERT {key} [CAPACITY {cap}] [ERROR {error}] [EXPANSION {expansion}] [NOCREATE] [NONSCALING] ITEMS {item ...}
eg: bf.insert bfinKey0 CAPACITY 5 ERROR 0.1 EXPANSION 2  NONSCALING ITEMS item1 item2
1) (integer) 1
2) (integer) 1

功能：向key指定的Bloom中添加多個元素，添加時可以指定大小和錯誤率，且可以控制在Bloom不存在的時候是否自動創建
參數說明

key：filter 名字
CAPACITY：[如果過濾器已創建，則此參數將被忽略]。
ERROR：[如果過濾器已創建，則此參數將被忽略]。
expansion：布隆過濾器會自動創建一個子過濾器，子過濾器的大小是上一個過濾器大小乘以expansion。expansion的默認值是2，也就是說布隆過濾器擴容默認是2倍擴容。
NOCREATE：如果設置了該參數，當布隆過濾器不存在時則不會被創建。用於嚴格區分過濾器的創建和元素插入場景。該參數不能與CAPACITY和ERROR同時設置。
NONSCALING：設置此項后，當添加到布隆過濾器中的數據達到初始容量后，不會擴容過濾器，並且會拋出異常（(error) ERR non scaling filter is full）。
ITEMS：待插入過濾器的元素列表，該參數必傳。

檢測item

BF.EXISTS

BF.EXISTS {key} {item}
eg:BF.EXISTS key0 v1
(integer) 1

功能：檢查一個元素是否存在於BloomFilter

key：filter 名字
item：一個值
返回值：1：存在, 0：不存在

BF.MEXISTS

BF.MEXISTS {key} {item}
eg:BF.MEXISTS key0 v1 v2
1) (integer) 1
2) (integer) 1

功能：批量檢查多個元素是否存在於BloomFilter

key：filter 名字
item：一個或者多個值
返回值(數組)：1：存在, 0：不存在

其他

BF.SCANDUMP

BF.SCANDUMP {key} {iter}
eg:BF.SCANDUMP key0 0
1) (integer) 1
2) "\x04\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x05\x00\x00\x00\x02\x00\x00\x00\x90\x00\x00\x00\x00\x00\x00\x00\x80\x04\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00{\x14\xaeG\xe1zt?\xe9\x86/\xb25\x0e&@\b\x00\x00\x00d\x00\x00\x00\x00\x00\x00\x00\x00"

功能：對Bloom進行增量持久化操作（增量保存）

key：filter 名字
iter：首次調用傳值0，或者上次調用此命令返回的結果值；
返回值：返回連續的(iter, data)對，直到(0,NULL)，表示DUMP完成

BF.LOADCHUNK

BF.LOADCHUNK {key} {iter} {data}

功能：加載SCANDUMP持久化的Bloom數據

key：目標布隆過濾器的名字；
iter：SCANDUMP返回的迭代器的值，和data一一對應；
data：SCANDUMP返回的數據塊（data chunk）；

BF.INFO

BF.INFO {key}	
eg:bf.info key1
 1) Capacity 
 2) (integer) 7
 3) Size
 4) (integer) 416
 5) Number of filters
 6) (integer) 3
 7) Number of items inserted
 8) (integer) 5
 9) Expansion rate
10) (integer) 2

功能：查詢key指定的Bloom的信息
返回值：

Capacity：預設容量；
Size：實際占用情況，但如何計算待進一步確認；
Number of filters：過濾器層數；
Number of items inserted：已經實際插入的元素數量；
Expansion rate：子過濾器擴容系數（默認2）；

BF.DEBUG

BF.DEBUG {key}
eg：bf.debug key1
1) "size:5"
2) "bytes:8 bits:64 hashes:5 hashwidth:64 capacity:1 size:1 ratio:0.05"
3) "bytes:8 bits:64 hashes:6 hashwidth:64 capacity:2 size:2 ratio:0.025"
4) "bytes:8 bits:64 hashes:7 hashwidth:64 capacity:4 size:2 ratio:0.0125"

功能：查看BloomFilter的內部詳細信息（如每層的元素個數、錯誤率等）
返回值：

size：BloomFilter中已插入的元素數量；
每層BloomFilter的詳細信息
- bytes：占用字節數量；
- bits：占用bit位數量，bits = bytes * 8；
- hashes：該層hash函數數量；
- hashwidth：hash函數寬度；
- capacity：該層容量（第一層為BloomFilter初始化時設置的容量，第2層容量 = 第一層容量 * expansion，以此類推）；
- size：該層中已插入的元素數量（各層size之和等於BloomFilter中已插入的元素數量size）；
- ratio：該層錯誤率（第一層的錯誤率 = BloomFilter初始化時設置的錯誤率 * 0.5，第二層為第一層的0.5倍，以此類推，ratio與expansion無關）；

擴展

RedisBloom工作原理簡述

hash

A Bloom filter is an array of many bits. When an element is ‘added’ to a bloom filter, the element is hashed. Then bit[hashval % nbits] is set to 1

減少hash沖突

In order to reduce the risk of collisions, an entry may use more than one bit

舉例

redis 工作原理

RedisBloom hash函數數量與錯誤率的關系

源碼hash函數數量計算公式

int bloom_init(struct bloom *bloom, uint64_t entries, double error, unsigned options) {
    // ...
    bloom->bpe = calc_bpe(error);
    bloom->hashes = (int)ceil(0.693147180559945 * bloom->bpe); // ln(2) 
    // ...
}
static double calc_bpe(double error) {
    static const double denom = 0.480453013918201; // ln(2)^2
    double num = log(error);

    double bpe = -(num / denom);
    if (bpe < 0) {
        bpe = -bpe;
    }
    return bpe;
}

// Math.ceil() 函數返回大於或等於一個給定數字的最小整數
// ln(2) ≈ 0.693147180559945
// ln(2)^2 ≈ 0.480453013918201
// log(error)：以10為底的對數函數

即RedisBloom計算hash函數的個數k =  - log(error) / ( (ln2) ^2) * ln(2) )
符合bloomfilter的推倒公式：[布隆過濾器 (Bloom Filter) 詳解](https://www.cnblogs.com/allensun/archive/2011/02/16/1956532.html)

結論

錯誤率越低，需要的hash函數越多

可以通過命令bf.reserve和bf.debug創建和查看redis bloom中最佳hash函數數量與錯誤率的關系如下:

錯誤率{error_rate}	hash函數的最佳數量
0.1	5
0.01	8
0.001	11
0.0001	15
0.00001	18
0.000001	21
0.0000001	25

eg:
bf.reserve bf0.1-2 0.1 100
bf.debug bf0.1-2
1) "size:0"
2) "bytes:80 bits:640 hashes:5 hashwidth:64 capacity:100 size:0 ratio:0.05"

RedisBloom存儲空間與錯誤率及容量關系

源碼計算公式

int bloom_init(struct bloom *bloom, uint64_t entries, double error, unsigned options) {
	// ...
  bloom->bpe = calc_bpe(error);
  bits = bloom->bits = (uint64_t)(entries * bloom->bpe);
  // ...
}
即：bits = (entries * ln(error)) / ln(2)^2

結論

錯誤率{error_rate}越小，所需的存儲空間越大；初始化設置的元素數量{capacity}越大，所需的存儲空間越大，當然如果實際遠多於預設時，准確率就會降低。

錯誤率{error_rate}	元素數量{capacity}	占用內存（單位M）
0.01	10萬	0.13146M (bytes:137848)
0.01	1百萬	1.3146M (bytes:137847)
0.01	1千萬	13.146M (bytes:13784696)
0.001	10萬	0.18859M (bytes:197760)
0.001	1百萬	1.8859M（bytes:1977536）
0.001	1千萬	18.859M（bytes:19775360）
0.0001	10萬	2.4572M (bytes:2576608)
0.0001	1百萬	24.572M (bytes:25766016)
0.0001	1千萬	245.72M (bytes:257660152)

RedisBloom官方默認的error_rate是 0.01，默認的capacity是 100

RedisBloom擴容機制

實驗

1、創建一個容量為5的RedisBloom bf.reserve keyExp 0.1 5
   
2、添加5個bf.madd keyExp 1 2 3 4 5   
   bf.debug keyExp
   1) "size:5"
   2) "bytes:8 bits:64 hashes:5 hashwidth:64 capacity:5 size:5 ratio:0.05"
3、重復添加“1” bf.madd keyExp 1 
   查看RedisBloom狀態，未發生擴容
   bf.debug keyExp
   1) "size:5"
   2) "bytes:8 bits:64 hashes:5 hashwidth:64 capacity:5 size:5 ratio:0.05"
   
4、添加第六6key bf.madd keyExp 6 
   查看RedisBloom狀態，發現發生擴容了
   bf.debug keyExp
   1) "size:6"
   2) "bytes:8 bits:64 hashes:5 hashwidth:64 capacity:5 size:5 ratio:0.05"
   3) "bytes:16 bits:128 hashes:6 hashwidth:64 capacity:10 size:1 ratio:0.025"

結論

1.插入m個元素，計算實際插入BloomFilter的元素數量；
2.如果實際插入元素數量 > BloomFilter的容量，則觸發擴容；
3.擴容的倍數為BloomFilter初始化時設置的expansion（默認2）；

備注：

擴容觸發的條件是實際插入 > 容量，實際插入數量 = 容量時，是不會觸發擴容
實際插入指的是插入成功，即使計划插入的數據過濾器中沒有，但由於hash沖突導入插入失敗，這種也不算實際插入成功。

RedisBloom壓測

Redis-benchmark是Redis官方自帶的Redis性能測試工具，可以有效的測試Redis服務的性能，Redis-benchmark參數的使用說明如下所示。

Usage: redis-benchmark [-h <host>] [-p <port>] [-c <clients>] [-n <requests]> [-k <boolean>]

 -h <hostname>      Server hostname (default 127.0.0.1)
 -p <port>          Server port (default 6379)
 -s <socket>        Server socket (overrides host and port)
 -a <password>      Password for Redis Auth
 -c <clients>       Number of parallel connections (default 50)
 -n <requests>      Total number of requests (default 100000)
 -d <size>          Data size of SET/GET value in bytes (default 2)
 --dbnum <db>        SELECT the specified db number (default 0)
 -k <boolean>       1=keep alive 0=reconnect (default 1)
 -r <keyspacelen>   Use random keys for SET/GET/INCR, random values for SADD
  Using this option the benchmark will expand the string __rand_int__
  inside an argument with a 12 digits number in the specified range
  from 0 to keyspacelen-1. The substitution changes every time a command
  is executed. Default tests use this to hit random keys in the
  specified range.
 -P <numreq>        Pipeline <numreq> requests. Default 1 (no pipeline).
 -e                 If server replies with errors, show them on stdout.
                    (no more than 1 error per second is displayed)
 -q                 Quiet. Just show query/sec values
 --csv              Output in CSV format
 -l                 Loop. Run the tests forever
 -t <tests>         Only run the comma separated list of tests. The test
                    names are the same as the ones produced as output.
 -I                 Idle mode. Just open N idle connections and wait.

參考文檔

RedisBloom

Bloom Filter Datatype for Redis

Redis 6.0與老版性能對比評測

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Redis Bloom Filter Bloom Filter 原理及實現 Redis bloom-filter使用硬核 | Redis 布隆（Bloom Filter）過濾器原理與實戰 Bloom Filter的基本原理和變種第九節：Redis的Bloom Filter原理、實操、以及應用場景(緩存穿透、黑名單校驗等)詳解 Bloom Filter(布隆過濾器)的概念和原理布隆過濾器（Bloom Filter）原理以及應用布隆過濾器(Bloom Filter)的原理和實現 Bloom Filter算法