Python中hash的問題


ref:http://heipark.iteye.com/blog/1743819

在下面這個例子里:

class Item(object):  
  
    def __init__(self, foo, bar):  
        self.foo = foo  
        self.bar = bar  
      
    def __repr__(self):  
        return "Item(%s, %s)" % (self.foo, self.bar)  
      
print set([Item('1', '2'), Item('1', '2')])  
  
# 輸出: set([Item(1, 2), Item(1, 2)]) 

邏輯上講,set中的兩個對象是貌似相同的,那么set中應該只有一個對象

實際上不是這樣

set是根據兩個元素的hash value判斷這兩個對象是不是相同的。元素的hash value是通過hash方法得到的(內部__hash__() magic method)。

根據文檔:

All of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not. Objects which are instances of user-defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their id().

可知道只有非可變對象才可hash,並且instances of user-defined classes的hash value是根據他們的id得到的。這個id(ref:https://docs.python.org/3/library/functions.html#id),可以理解為對象在內存中的地址,所以例子里的輸出就不奇怪了

關於__hash__()的自定義實現,文檔(ref:https://docs.python.org/3/reference/datamodel.html#object.__hash__)是這么說的:

it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple.

並且舉了一個例子:

def __hash__(self):
    return hash((self.name, self.nick, self.color))

這里再引入一個概念:hashable,文檔是這么寫得:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

簡單來說,hashable的對象必須實現__hash__ and __equal__兩個方法

我們之前說過了hash方法怎么實現,但是僅僅實現hash方法,是不能讓剛開始的例子中輸出正確的結果的。原因如下(ref:https://docs.python.org/3/reference/datamodel.html#object.__hash__):

If a class does not define an __eq__() method it should not define a __hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

如果定義了eq,沒有定義hash,那么顯然,由於hash value不同,剛開始的例子中輸出結果是錯誤的

定義hash的同時要定義eq

之所以要定義eq,是為了處理set中有兩個對象的hash value相同,這時候要怎樣處理。

eq不是每次把元素放進set里都要調用的。如果某個元素和set中的已有元素的hash value都不同,那就沒有調用eq的必要了。如果即使兩個元素的hash value不同,也要調用eq的話,就失去了hash的意義

所以在剛開始的例子里加上:

    ...
    def __eq__(self, other):  
        if isinstance(other, Item):  
            return ((self.foo == other.foo) and (self.bar == other.bar))  
        else:  
            return False  
      
    def __hash__(self):  
        return hash(self.foo + " " + self.bar) 
    ...

就ok了

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM