Java中String的hash函數分析

本文轉載自查看原文 2014-03-19 10:54 2444

轉載自：http://blog.csdn.net/hengyunabc/article/details/7198533

JDK6的源碼：

[java]view plaincopy 
           
    /** 
     * Returns a hash code for this string. The hash code for a 
     * <code>String</code> object is computed as 
     * <blockquote><pre> 
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] 
     * </pre></blockquote> 
     * using <code>int</code> arithmetic, where <code>s[i]</code> is the 
     * <i>i</i>th character of the string, <code>n</code> is the length of 
     * the string, and <code>^</code> indicates exponentiation. 
     * (The hash value of the empty string is zero.) 
     * 
     * @return  a hash code value for this object. 
     */  
    public int hashCode() {  
 int h = hash;  
 if (h == 0) {  
     int off = offset;  
     char val[] = value;  
     int len = count;  
   
            for (int i = 0; i < len; i++) {  
                h = 31*h + val[off++];  
            }  
            hash = h;  
        }  
        return h;  
    }  

以字符串"123"為例：

字符'1'的ascii碼是49

hashCode = （49*31 + 50）*31 + 51

或者這樣看：

hashCode=（'1' * 31 + '2' ） * 31 + '3'

可見實際可以看作是一種權重的算法，在前面的字符的權重大。

這樣有個明顯的好處，就是前綴相同的字符串的hash值都落在鄰近的區間。

好處有兩點：

1.可以節省內存，因為hash值在相鄰，這樣hash的數組可以比較小。比如當用HashMap，以String為key時。

2.hash值相鄰，如果存放在容器，比好HashSet，HashMap中時，實際存放的內存的位置也相鄰，則存取的效率也高。（程序局部性原理）

以31為倍數，原因了31的二進制全是1，則可以有效地離散數據。

最后看下，兩個字符串，由Eclipse生成的代碼是如何計算hash值的：

[java]view plaincopy 
           
 public class Name{  
     String firstName;  
     String lastName;  
     @Override  
     public int hashCode() {  
         final int prime = 31;  
         int result = 1;  
         result = prime * result  
                 + ((firstName == null) ? 0 : firstName.hashCode());  
         result = prime * result  
                 + ((lastName == null) ? 0 : lastName.hashCode());  
         return result;  
     }  
   
     @Override  
     public boolean equals(Object obj) {  
         if (this == obj)  
             return true;  
         if (obj == null)  
             return false;  
         if (getClass() != obj.getClass())  
             return false;  
         Name other = (Name) obj;  
         if (firstName == null) {  
             if (other.firstName != null)  
                 return false;  
         } else if (!firstName.equals(other.firstName))  
             return false;  
         if (lastName == null) {  
             if (other.lastName != null)  
                 return false;  
         } else if (!lastName.equals(other.lastName))  
             return false;  
         return true;  
     }     
 }  

可見，還是以31為倍數， hashCode = firstName.hashCode() * 31 + lastName.hashCode() 。

BTW：Java的字符串的hash做了緩存，第一次才會真正算，以后都是取緩存值。

eclipse生成的equals函數質量也很高，各種情況都考慮到了。

總結：字符串hash函數，不僅要減少沖突，而且要注意相同前綴的字符串生成的hash值要相鄰。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 imadjust函數分析一 scanf()函數分析 smali函數分析 FFmpeg: FFmepg中的sws_scale() 函數分析 OSTaskCreate()函數分析,OS_TCBInit()函數分析... ... ... ... 關於container_of函數分析 Qt deletelater函數分析(1) Java String.split()函數分隔回車注意事項 java String.split()函數的用法分析。。。Java String.split()函數的用法分析。。。