還要談談Equals和GetHashcode

本文轉載自查看原文 2012-02-26 13:28 4814 基礎知識

這篇隨筆和上篇隨筆《從兩個數組中查找相同的數字談Hashtable》都是為了下面分析Dictionary的實現做的鋪墊

一.兩個邏輯上相等的實例對象。

兩個對象相等，除了指兩個不同變量引用了同一個對象外，更多的是指邏輯上的相等。什么是邏輯上相等呢？就是在一定的前提上，這兩個對象是相等的。比如說某男生叫劉益紅，然后也有另外一個女生叫劉益紅，雖然這兩個人身高，愛好，甚至性別上都不相同，但是從名字上來說，兩者是相同的。Equals方法通常指的就是邏輯上相等。有些東西不可比較，比如說人和樹比智力，因為樹沒有智力，所以不可比較。但是可以知道人和樹不相等。

二.Object的GetHashcode方法。

計算Hashcode的算法中，應該至少包含一個實例字段。Object中由於沒有有意義的實例字段，也對其派生類型的字段一無所知，因此就沒有邏輯相等這一概念。所以默認情況下Object的GetHashcode方法的返回值，應該都是獨一無二的。利用Object的GetHashcode方法的返回值，可以在AppDomain中唯一性的標識對象。

下面是.Net中Object代碼的實現：

View Code

    [Serializable]
    public class Object
    {
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        public Object()
        {
        }
        public virtual string ToString()
        {
            return this.GetType().ToString();
        }
        [TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
        public virtual bool Equals(object obj)
        {
            return RuntimeHelpers.Equals(this, obj);
        }
        [TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
        public static bool Equals(object objA, object objB)
        {
            return objA == objB || (objA != null && objB != null && objA.Equals(objB));
        }
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success), TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
        public static bool ReferenceEquals(object objA, object objB)
        {
            return objA == objB;
        }
        [TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
        public virtual int GetHashCode()
        {
            return RuntimeHelpers.GetHashCode(this);
        }
        [SecuritySafeCritical]
        [MethodImpl(MethodImplOptions.InternalCall)]
        public extern Type GetType();
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)]
        protected virtual void Finalize()
        {
        }
        [SecuritySafeCritical]
        [MethodImpl(MethodImplOptions.InternalCall)]
        protected extern object MemberwiseClone();
        [SecurityCritical]
        private void FieldSetter(string typeName, string fieldName, object val)
        {
            FieldInfo fieldInfo = this.GetFieldInfo(typeName, fieldName);
            if (fieldInfo.IsInitOnly)
            {
                throw new FieldAccessException(Environment.GetResourceString("FieldAccess_InitOnly"));
            }
            Message.CoerceArg(val, fieldInfo.FieldType);
            fieldInfo.SetValue(this, val);
        }
        private void FieldGetter(string typeName, string fieldName, ref object val)
        {
            FieldInfo fieldInfo = this.GetFieldInfo(typeName, fieldName);
            val = fieldInfo.GetValue(this);
        }
        private FieldInfo GetFieldInfo(string typeName, string fieldName)
        {
            Type type = this.GetType();
            while (null != type && !type.FullName.Equals(typeName))
            {
                type = type.BaseType;
            }
            if (null == type)
            {
                throw new RemotingException(string.Format(CultureInfo.CurrentCulture, Environment.GetResourceString("Remoting_BadType"), new object[]
                {
                    typeName
                }));
            }
            FieldInfo field = type.GetField(fieldName, BindingFlags.IgnoreCase | BindingFlags.Instance | BindingFlags.Public);
            if (null == field)
            {
                throw new RemotingException(string.Format(CultureInfo.CurrentCulture, Environment.GetResourceString("Remoting_BadField"), new object[]
                {
                    fieldName, 
                    typeName
                }));
            }
            return field;
        }
    }

為什么會有Hashcode?

Hashcode是為了幫助計算出該對象在hashtable中所處的位置。而能夠把一個對象放入hashtable中無疑是有好處的。

這是Hashcode的作用，但是我們為什么需要他？

因為一個類型在定義了Equals方法后，在System.Collections.Hashtable類型，System.Collections.Generic.Dictionary類型以及其他一些集合的實現中，要求如果兩個對象相等，不能單單只看Equals方法返回true,還必須要有相同的Hashcode.

這相當於一種前提條件的假設，而上述這些類型就是基於這種假設的基礎上實現的。如果不遵守這些條件，那么在使用這些集合的時候就會出問題。

下面是摘自MSDN的一段描述

“Hashcode是一個用於在相等測試過程中標識對象的數值。它還可以作為一個集合中的對象的索引。 GetHashCode方法適用於哈希算法和諸如哈希表之類的數據結構。 GetHashCode 方法的默認實現不保證針對不同的對象返回唯一值。而且，.NET Framework 不保證 GetHashCode 方法的默認實現以及它所返回的值在不同版本的 .NET Framework 中是相同的。因此，在進行哈希運算時，該方法的默認實現不得用作唯一對象標識符。”

上面這段話想說明的就是：兩個對象相等，hashcode也應該相等。但是兩個對象不等,hashcode也有可能相等。當對象不相等但是hashcode相等的時候，就叫做hash沖突。

下面這兩個不同的string對象就產生了相同的hashcode:

            string str1 = "NB0903100006";
            string str2 = "NB0904140001";
            Console.WriteLine(str1.GetHashCode());
            Console.WriteLine(str2.GetHashCode());

這是因為string類型重寫了Object的GetHashcode方法，如下:

View Code

        public override int GetHashCode() {
            unsafe { 
                fixed (char *src = this) {
                    Contract.Assert(src[this.Length] == '\0', "src[this.Length] == '\\0'");
                    Contract.Assert( ((int)src)%4 == 0, "Managed string should start at 4 bytes boundary");
 
#if WIN32
                    int hash1 = (5381<<16) + 5381; 
#else 
                    int hash1 = 5381;
#endif 
                    int hash2 = hash1;

#if WIN32
                    // 32bit machines. 
                    int* pint = (int *)src;
                    int len = this.Length; 
                    while(len > 0) { 
                        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
                        if( len <= 2) { 
                            break;
                        }
                        hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
                        pint += 2; 
                        len  -= 4;
                    } 
#else 
                    int     c;
                    char *s = src; 
                    while ((c = s[0]) != 0) {
                        hash1 = ((hash1 << 5) + hash1) ^ c;
                        c = s[1];
                        if (c == 0) 
                            break;
                        hash2 = ((hash2 << 5) + hash2) ^ c; 
                        s += 2; 
                    }
#endif 
#if DEBUG
                    // We want to ensure we can change our hash function daily.
                    // This is perfectly fine as long as you don't persist the
                    // value from GetHashCode to disk or count on String A 
                    // hashing before string B.  Those are bugs in your code.
                    hash1 ^= ThisAssembly.DailyBuildNumber; 
#endif 
                    return hash1 + (hash2 * 1566083941);
                } 
            }
        }

歸根結底，因為hashcode本來就是為了方便我們計算位置用的，本意並不是用來判斷兩個對象是否相等，這工作還是要交給Equals方法來完成。總而言之，保持兩者的一致性是最好的做法。
所以.NET中定義了一個比較相等性的接口IEqualityComparer，就包含了Equals、GetHashCode這兩種方法。Dictionary<TKey, TValue> 和 HashSet<T> 類的構造函數都用到了 IEqualityComparer<T> 接口。還有 ConcurrentDictionary<TKey, TValue>、SortedSet<T>、KeyedCollection<TKey, TItem>、SynchronizedKeyedCollection<K, T> 等也使用 IEqualityComparer<T> 接口作為構造函數的參數。

下面是個例子

class BoxEqualityComparer : IEqualityComparer<Box>
{

    public bool Equals(Box b1, Box b2)
    {
        if (b1.Height == b2.Height & b1.Length == b2.Length
                            & b1.Width == b2.Width)
        {
            return true;
        }
        else
        {
            return false;
        }
    }


    public int GetHashCode(Box bx)
    {
        int hCode = bx.Height ^ bx.Length ^ bx.Width;
        return hCode.GetHashCode();
    }

}

在構造Dictionarty<T>時如果不傳遞實現了這個接口的對象，那么就會使用EqualityComparer<T>.Default。具體使用的是哪個，還是要看我們用做key的那個對象實現了哪個接口。

比如

struct MyKey : IEquatable<MyKey> {
}

//這個方法會被調用
 public bool Equals(MyKey that) {
 }

和下面這個，生成的就是不同的實例

struct MyKey  {
}

//這個方法會被調用
 public bool Equals(Object that) {
 }

兩個擁有相同Hashcode的對象，只能說是有可能是相等的。而可能性就取決你的Hash函數是怎么實現的了。實現得越好，相等的可能性越大，相應的Hashtable性能就越好。這是因為放置在同一個Hash桶上的元素可能性就越小，越少可能發生碰撞。

可以想象，最爛的Hashcode的實現方法無疑就是返回一個寫死的整數，這樣Hashtable很容易就被迫轉換成鏈表結構。這是查找的時間復雜度就變味O(n)

        public override int GetHashCode()
        {
            return 31;
        }

一個好的hash函數通常意味着盡量做到“為不相等的對象產生不相等的hashcode",但是不要忘記”相同的對象必須有相同的hashcode"。一個是盡量做到，一個是必須的。

不相等的對象有相同的hashcode只是影響性能，而相同的對象(Equals返回true)沒有相同的hashcode就會破壞整個前提條件。

因此，計算hashcode的時候要避免使用在實現Equals方法中沒有使用的字段，否則也可能出現Equals為true，但是hashcode卻不相等的情況。

三.邏輯上相等但是完全不同的實例

正如同1中所舉的例子一樣，兩人同名，但是兩人並不是同一個人。如上所述一般情況下我們判斷兩個對象是否相等使用的是Equals方法，但是在一些數據結構里面，判斷兩個對象是否相同，卻采用的是hashcode。比如說Dictionnary，這時候如果沒有重寫GetHashcode方法，就會產生問題。

簡單的描述一下整個過程:

1.在一個基於hashtable這種數據結構的集合中，添加一個key/value pair的時候，首先會獲取key對象的hashcode,而這個hashcode指出這個key/value pair應該放在數組的那個位置上。

2.當我們在集合中查找一個對象是否存在時，會先獲取指定對象的hashcode,而這個hashcode就是當初用來計算出存放對象的位置的。因此如果hashcode發生了改變，那么你也沒辦法找到先前存放的對象，因為你計算出來的數組下標是錯誤的。

在沒有重寫GetHashCode方法的情況下，這個方法繼承自Object，而Object的實現就是每一個New出來的對象GetHashCode返回的值都應該不一樣。

舉例:

   public class Staff
    {
        private readonly string ID;
        private readonly string name;

        public Staff(string ID, string name)
        {
            this.ID = ID;
            this.name = name;
        }

        public override bool Equals(object obj)
        {
            if (obj == this)
                return true;
            if (!(obj is Staff))
                return false;

            var staff = (Staff)obj;

            return name == staff.name && ID == staff.ID;
        }
    }

    public class HashtableTest
    {
        public static void Main(){

　　　　　　　Staff a = new Staff("123", "langxue");
            Staff b = new Staff("123", "langxue"); 
            Console.WriteLine(a.Equals(b));  //返回true

            var dic = new Dictionary<Staff, int>();

            dic.Add(new Staff("123", "langxue"), 0213);

            Console.WriteLine(dic.ContainsKey(new Staff("123", "langxue"))); //返回false

        }
    }

這時，我們就要重寫hashcode方法,常見的就是XOR方式（先“或”然后取反）:

View Code

public struct Point {
   public int x;
   public int y; 

   //other methods

   public override int GetHashCode() {
      return x ^ y;
   }
}

當然，我們在這里可以直接使用.NET框架中幫string類型重寫的GetHashcode方法:

        public override int GetHashCode()
        {
            return (ID + name).GetHashCode();
        }

重寫后的代碼如下:

View Code

public class Staff
    {
        private readonly string ID;
        private readonly string name;
 
        public Staff(string ID, string name)
        {
            this.ID = ID;
            this.name = name;
        }
 
        public override bool Equals(object obj)
        {
            if (obj == this)
                return true;
            if (!(obj is Staff))
                return false;
 
            var staff = (Staff)obj;
 
            return name == staff.name && ID == staff.ID;
        }
 
        public override int GetHashCode()
        {
            return (ID + name).GetHashCode();
        }
    }
 
    public class HashtableTest
    {
        public static void Main(){
 
            Staff a = new Staff("123", "langxue");
            Staff b = new Staff("123", "langxue");
 
            Console.WriteLine(a.Equals(b));
 
            var dic = new Dictionary<Staff, int>();
 
            dic.Add(new Staff("123", "langxue"), 0213);
 
            Console.WriteLine(dic.ContainsKey(new Staff("123", "langxue")));
        }
    }

四.一些推薦做法:

1.不要試圖從hash code中排除一個對象的某些關鍵字段來提高性能。

這就相當於把限制條件放寬，使得對象間的區別不那么明顯，最終導致hash函數計算出來的hashcode相等，使得放入hashtable時發生碰撞，導致性能低下。

有了這兩篇隨筆做鋪墊，就為下篇Dictionary源碼分析提供了幫助。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 C#重寫Equals和GetHashCode 重寫Equals為什么要同時重寫GetHashCode C#中Equals和GetHashCode 聊一聊C#的Equals()和GetHashCode()方法為什么重寫equals還要重寫hashcode呢？為什么重寫equals還要重寫hashcode？？為什么重寫了equals()，還要重寫hashCode()？ C# GetHashCode、Equals函數和鍵值對集合的關系談談HashSet的存儲原理及為什么重寫equals必須重寫hashcode方法 GetHashCode作用