http://blog.itpub.net/31561269/viewspace-2639083/
https://juejin.im/post/5cfd060ee51d4556f76e8067
適合的場景
-
數據庫防止穿庫 Google Bigtable,Apache HBase和Apache Cassandra以及Postgresql 使用BloomFilter來減少不存在的行或列的磁盤查找。避免代價高昂的磁盤查找會大大提高數據庫查詢操作的性能。 如同一開始的業務場景。如果數據量較大,不方便放在緩存中。需要對請求做攔截防止穿庫。
-
緩存宕機 緩存宕機的場景,使用布隆過濾器會造成一定程度的誤判。原因是除了Bloom Filter 本身有誤判率,宕機之前的緩存不一定能覆蓋到所有DB中的數據,當宕機后用戶請求了一個以前從未請求的數據,這個時候就會產生誤判。當然,緩存宕機時使用布隆過濾器作為應急的方式,這種情況應該也是可以忍受的。
-
WEB攔截器 相同請求攔截防止被攻擊。用戶第一次請求,將請求參數放入BloomFilter中,當第二次請求時,先判斷請求參數是否被BloomFilter命中。可以提高緩存命中率
-
惡意地址檢測 chrome 瀏覽器檢查是否是惡意地址。 首先針對本地BloomFilter檢查任何URL,並且僅當BloomFilter返回肯定結果時才對所執行的URL進行全面檢查(並且用戶警告,如果它也返回肯定結果)。
-
比特幣加速 bitcoin 使用BloomFilter來加速錢包同步。
1.2 應用場景
- 數據庫防止穿庫。 Google Bigtable,HBase 和 Cassandra 以及 Postgresql 使用BloomFilter來減少不存在的行或列的磁盤查找。避免代價高昂的磁盤查找會大大提高數據庫查詢操作的性能。
- 業務場景中判斷用戶是否閱讀過某視頻或文章,比如抖音或頭條,當然會導致一定的誤判,但不會讓用戶看到重復的內容。還有之前自己遇到的一個比賽類的社交場景中,需要判斷用戶是否在比賽中,如果在則需要更新比賽內容,也可以使用布隆過濾器,可以減少不在的用戶查詢db或緩存的次數。
- 緩存宕機、緩存擊穿場景,一般判斷用戶是否在緩存中,如果在則直接返回結果,不在則查詢db,如果來一波冷數據,會導致緩存大量擊穿,造成雪崩效應,這時候可以用布隆過濾器當緩存的索引,只有在布隆過濾器中,才去查詢緩存,如果沒查詢到,則穿透到db。如果不在布隆器中,則直接返回。
- WEB攔截器,如果相同請求則攔截,防止重復被攻擊。用戶第一次請求,將請求參數放入布隆過濾器中,當第二次請求時,先判斷請求參數是否被布隆過濾器命中。可以提高緩存命中率
鏈接:https://juejin.im/post/5cfd060ee51d4556f76e8067
- The servers of Akamai Technologies, a content delivery provider, use Bloom filters to prevent "one-hit-wonders" from being stored in its disk caches. One-hit-wonders are web objects requested by users just once, something that Akamai found applied to nearly three-quarters of their caching infrastructure. Using a Bloom filter to detect the second request for a web object and caching that object only on its second request prevents one-hit wonders from entering the disk cache, significantly reducing disk workload and increasing disk cache hit rates.[10]
- Google Bigtable, Apache HBase and Apache Cassandra and PostgreSQL[11] use Bloom filters to reduce the disk lookups for non-existent rows or columns. Avoiding costly disk lookups considerably increases the performance of a database query operation.[12]
- The Google Chrome web browser used to use a Bloom filter to identify malicious URLs. Any URL was first checked against a local Bloom filter, and only if the Bloom filter returned a positive result was a full check of the URL performed (and the user warned, if that too returned a positive result).[13][14]
- Microsoft Bing (search engine) uses multi-level hierarchical Bloom filters for its search index, BitFunnel. Bloom filters provided lower cost than the previous Bing index, which was based on inverted files.[15].
- The Squid Web Proxy Cache uses Bloom filters for cache digests.[16]
- Bitcoin uses Bloom filters to speed up wallet synchronization.[17]
- The Venti archival storage system uses Bloom filters to detect previously stored data.[18]
- The SPIN model checker uses Bloom filters to track the reachable state space for large verification problems.[19]
- The Cascading analytics framework uses Bloom filters to speed up asymmetric joins, where one of the joined data sets is significantly larger than the other (often called Bloom join in the database literature).[20]
- The Exim mail transfer agent (MTA) uses Bloom filters in its rate-limit feature.[21]
- Medium uses Bloom filters to avoid recommending articles a user has previously read.[22]
- Ethereum uses Bloom filters for quickly finding logs on the Ethereum blockchain.
https://juejin.im/post/5de1e37c5188256e8e43adfc