python 中的可哈希、灵活的标识符
我正在尝试在 python 中制作某种可哈希标识符;我需要它来识别图中的节点。问题在于某些节点具有不同的属性。如果这些节点的属性由属性到值的字典来描述:
idA = {'type':'A', 'name':'a_100'}
idB = {'type':'B', 'name':'b_3', 'value':7}
我希望 __hash__()
和 __eq__()
使用元组对 ((key1 ,值1),(键2,值2),...)
。
字典是理想的选择,因为我将相当频繁地检查这些属性,并且字典查找应该是高效的(我使用许多标识符,每个标识符都有许多属性)。但字典不可散列。
元组对的冻结集可以正确散列,但它对于查找是否有效?
如果我声明一个空类,然后为其设置属性,这将实现我想要的功能(可能在幕后使用字典),但我不知道如何对其进行哈希处理。也许有某种方法可以使用 inspect
或 dir()
对其成员值进行哈希处理?
class identifier():
pass
idA = identifier()
idA.type = 'A'
idA.name = 'a_100'
如果有一种方法可以使用基于(属性,值)元组对的哈希(和 == 运算符),那么这也可以满足我的要求。
或者是否有一些解决方法可以使等效数据类型满足此 SAT 类型类比:frozenset
是 set
as ?是 dict
感谢您的帮助。
编辑:
这是正确的方向吗?
class identifier(dict):
def to_frozenset(self):
return frozenset([(k,self[k]) for k in self])
def __hash__(self):
return hash(self.to_frozenset())
def __eq__(self, rhs):
return self.to_frozenset() == rhs.to_frozenset()
def __ne__(self, rhs):
return not self == rhs
这改变了计算复杂性,使得查找标识符属性的速度很快,但散列标识符或检查两个标识符是否相等的速度很慢。如果有一种方法来缓存它的哈希(并且一旦缓存哈希就不允许它的字典发生更改),并且我们保证标识符类型的哈希冲突很少(因此检查相等性很少),那么也许这将是一个很好的解决方案?让我知道你的想法!
I'm trying to make some sort of hashable identifier in python; I need it to identify nodes in a graph. The trouble is that some nodes have different attributes. If the attributes of these nodes are portrayed by dictionaries of the attributes to values:
idA = {'type':'A', 'name':'a_100'}
idB = {'type':'B', 'name':'b_3', 'value':7}
I want __hash__()
and __eq__()
to use the tuple pairs ((key1,value1), (key2,value2), ...)
.
Dictionaries would be ideal for this, because I'm going to check these properties fairly frequently, and dictionary lookup should be efficient (I'm using many identifiers and each will have many attributes). But dictionaries are not hashable.
A frozenset of the tuple pairs would hash properly, but would it be efficient for lookup?
If I declare an empty class, and then set attributes for it, that does what I want (possibly using a dictionary under the hood), but I don't know how to hash it. Maybe there's some way to hash it's member values using inspect
or dir()
?
class identifier():
pass
idA = identifier()
idA.type = 'A'
idA.name = 'a_100'
If there is a way to use a hash (and == operator) based on tuple pairs of (attribute, value), then this would also do what I want.
Or is there some work around that can make the equivalent data type that would satisfy this SAT-type analogy: frozenset
is to set
as ? is to dict
Thanks for your help.
Edit:
Is this the right direction?
class identifier(dict):
def to_frozenset(self):
return frozenset([(k,self[k]) for k in self])
def __hash__(self):
return hash(self.to_frozenset())
def __eq__(self, rhs):
return self.to_frozenset() == rhs.to_frozenset()
def __ne__(self, rhs):
return not self == rhs
This shifts the computational complexity so that it is fast to lookup an identifier attribute, but slow to hash an identifier or check two identifiers for equality. If there were a way to cache its hash (and disallow its dictionary to change once the hash was cached), and we were guaranteed few hash collisions of identifier types (so checking for equality were rare), then maybe that would be a good solution? Let me know what you think!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
没有
frozendict
。但是collections.namedtuple
是一个近似值可能适合您的行为。There is no
frozendict
. But acollections.namedtuple
is an approximation to that behaviour which may suit you.不要继承 dict,封装它。这样你就可以确保它不会被改变。
至于缓存,你可以记住 to_frozenset 或其哈希值。根据使用模式,记住哈希值,这使您可以快速返回哈希值和不等式,并且仅在哈希值匹配时才比较冻结集。
也就是说,对于尚未编写基准测试的人来说,您太担心性能了。构建尽可能简单的实现。如果速度快的话就完成了。否则,对其进行基准测试,然后找到一种增量方法来改进测量结果。
Don't inherit from dict, encapsulate it. That way you can make sure it won't be changed.
As for caching, you can remember to_frozenset or its hash. Depending on the use pattern, remember the hash, which allows you to quickly return on hashing and inequality, and compare the frozensets only if the hashes match.
That said, you're much too worried about performance for someone who hasn't coded a benchmark yet. Build the simplest possible implementation. If it's fast you're done. Otherwise, benchmark it, then find an incremental way to improve measured results.
我不确定这是否能解决您的问题,但如果您希望对象可散列,您可以以这种方式实现它:
您将以结构化元组格式获取对象的数据,以及作为散列的类名某个国王的签名。您甚至可以扩展 dict 以便在此类中使用。
I'm not sure this solves your problem, but if you want an object to be hashable, you can implement it in this fashion:
You'll get the data of the object in structured tuple format, along with the class name as a hash signature of some king. You can even extend
dict
to use in this class.