哈希算法本質來說就是將一個元素映射成另一個元素,可以分為加密哈希函數 和 非加密哈希函數
加密哈希函數:
加密哈希函數旨在保證一系列的安全屬性。它們大部分都很難發生碰撞或是被找出加密的原文,而且哈希值看起來是隨機的。
加密哈希,如MD5,SHA256等,
非加密哈希函數:
只是試圖避免非惡意輸入的沖突。作為較弱擔保的交換,它們通常更快。如果數據量小,或者不太在意哈希碰撞的頻率,甚至可以選擇生成哈希值小的哈希算法,占用更小的空間。
非加密哈希,如MurMurHash,CRC32,DJB等。
Smhasher-評價哈希算法的函數
評價一個哈希算法的好壞,人們通常會引用 SMHasher 測試集的運行結果。
Smhasher 測試Hash函數的功能,測試包括以下幾個方面:
- Sanity 是不是可以使用的
- Performance 完成一個散列需要多長時間
- Differentials 產生相同哈希的概率,可能導致相同的的最小差異
- Keysets 分布均勻程度
一系列的測試方式具體可參考:https://github.com/aappleby/smhasher/wiki/SMHasher
是否通過了卡方檢驗和雪崩測試
1、Avalanche Test(雪崩測試)
這意味着輸入的微小變化會導致輸出發生顯著變化,使其統計上看起來與隨機變化沒有差別。例如:MurmurHash3(“abd”,123)=454173339;MurmurHash3(“abe”,123)=4085872068
2、Chi-Squared Test(卡方檢驗)
均勻性:一般期望設計的哈希函數的哈希值均勻落入哈希空間。將哈希空間 n 等分, 得到 p個 哈希值, 那么平均落入每個哈希子空間的哈希值是 𝑓_0= p /n, 落入第 i個子空間的哈希值個數是𝑓_𝑖 。統計量 x^2表示𝑓_𝑖到均勻分布的偏離度。哈希函數均勻性可用卡方擬合優度檢驗來判斷。
具體可參考:
1、https://crypto.stackexchange.com/questions/3690/why-is-sha-1-considered-broken
Every cryptographic hash function is a hash function. But not every hash function is a cryptographic hash.
A cryptographic hash function aims to guarantee a number of security properties. Most of that it's hard to find collisions or pre-images and that the output appears random. (There are a few more properties, and "hard" has well defined bounds in this Context, but that's not important here.)
Non cryptographic hash functions just try to avoid collisions for non malicious input.Some aim to detect accidental changes in data (CRCs), others try to put objects into different buckets in a hash table with as few collisions as possible.
In exchange for weaker stories they are typically very much.
I'd still call md5 a cryptographic hashfunction, since it aimed to prevent security. But it's broken, and thus no longer usable as cryptographic hash. On the other hand when you have a non cryptographic hash function, you can not really call it "Broken", since it never tried to be secure in the first place.