Python __eq__ 中是否需要传递性?

发布于 01-14 09:57 字数 817 浏览 1 评论 0原文

我正在使用自定义 __eq__ 实现我自己的类。对于数学意义上不“相等”,但模糊方式“匹配”的事物,我想返回 True。

然而,这样做的一个问题是,这会导致数学意义上的传递性丧失,即 a == b && b ==c,而a可能不等于c

问题:Python 是否依赖于 __eq__ 的传递性?我想要做的事情会破坏事情吗?或者只要我小心翼翼地不假设传递性就可以做到这一点吗?

用例

我想要将电话号码彼此匹配,而这些电话号码可以是国际格式的,也可以仅供国内使用(没有指定国家/地区代码)。如果没有指定国家/地区代码,我希望一个数字等于带有 1 的数字,但如果指定了,它应该只等于具有相同国家/地区代码或不带有 1 的数字。

所以:

  • 当然,+31 6 12345678应该等于+31 6 12345678,而06 12345678应该等于06 12345678 >
  • +31 6 12345678 应等于 06 12345678(和 vv)
  • +49 6 12345678 应该等于 06 12345678(和 vv)
  • 但是 +31 6 12345678 不应该是等于 +49 6 12345678

我不需要散列(所以不会实现它),这样至少让生活变得更轻松。

I'm implementing my own class, with custom __eq__. And I'd like to return True for things that are not "equal" in a mathematical sense, but "match" in a fuzzy way.

An issue with this is, however, that this leads to loss of transitivity in a mathematical sense, i.e. a == b && b ==c, while a may not be equal to c.

Question: is Python dependent on __eq__ being transitive? Will what I'm trying to do break things, or is it possible to do this as long as I'm careful myself not to assume transitivity?

Use case

I want to match telephone numbers with one another, while those may be either formatted internationally, or just for domestic use (without a country code specified). If there's no country code specified, I'd like a number to be equal to a number with one, but if it is specified, it should only be equal to numbers with the same country-code, or without one.

So:

  • Of course, +31 6 12345678 should equal +31 6 12345678, and 06 12345678 should equal 06 12345678
  • +31 6 12345678 should equal 06 12345678 (and v.v.)
  • +49 6 12345678 should equal 06 12345678 (and v.v.)
  • But +31 6 12345678 should not be equal to +49 6 12345678

I don't have a need for hashing (and so won't implement it), so that at least makes life easier.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

躲猫猫2025-01-21 09:57:07

对于与通常理解的关系一致的比较,没有“必须”关系,而是“应该”关系。 Python 明确地不强制执行此操作,并且 float 是一种内置类型,由于 float("nan") 而具有不同的行为。

表达式:值比较

[…]
如果可能的话,自定义比较行为的用户定义的类应该遵循一些一致性规则:

  • […]
  • 比较应该是对称的。换句话说,以下表达式应该具有相同的结果:
    • x == yy == x
    • x != yy != x
    • <代码>x < y 和 y > x
    • x <= yy >= x
  • 比较应该是传递性的。以下(非详尽的)示例说明了这一点:
    • x > y 和 y > z 意味着 x > z
    • x < y 和 y <= z 意味着 x < z

Python 不强制执行这些一致性规则。事实上,非数字值就是不遵守这些规则的一个例子。

不过,请记住,异常非常罕见,并且很容易被忽略:大多数人会对待 float 例如,具有全序。使用不常见的比较关系会严重增加维护工作量。


通过运算符对“模糊匹配”进行建模的规范方法是使用不对称运算符作为子集子序列包含

  • setfrozenset 支持 >>= 等,表示一个集合包含所有值另一个。
    >>> a, b = {1, 5, 6, 8}, {5, 6}
    >>>>> a>=a,a>=b,b>=a
    (真、真、假)
    
  • strbytes 支持 in 来指示子序列被覆盖。
    >>> a、b =+31 6 12345678”、“6 12345678>>>>> a在b中,b在a中
    (假,真)
    
  • rangeipaddress 网络支持 in 来指示覆盖特定项目。
    >>> IPv4网络('192.0.2.0/28')中的IPv4地址('192.0.2.6')
    真的
    

值得注意的是,虽然这些运算符可能是传递的,但它们不是对称的。例如,a >= b 和 c >= b 并不意味着 b >= c,因此也不是 a >= c > 或反之亦然。

实际上,可以将“不带国家/地区代码的号码”建模为同一号码的“带国家/地区代码的号码”的超集。这意味着 06 12345678 >= +31 6 1234567806 12345678 >= +49 6 12345678 但反之则不然。为了进行对称比较,可以使用 a >= b 或 b >= a 而不是 a == b

There is no MUST but a SHOULD relation for comparisons being consistent with the commonly understood relations. Python expressively does not enforce this and float is an inbuilt type with different behaviour due to float("nan").

Expressions: Value comparisons

[…]
User-defined classes that customize their comparison behavior should follow some consistency rules, if possible:

  • […]
  • Comparison should be symmetric. In other words, the following expressions should have the same result:
    • x == y and y == x
    • x != y and y != x
    • x < y and y > x
    • x <= y and y >= x
  • Comparison should be transitive. The following (non-exhaustive) examples illustrate that:
    • x > y and y > z implies x > z
    • x < y and y <= z implies x < z

Python does not enforce these consistency rules. In fact, the not-a-number values are an example for not following these rules.

Still, keep in mind that exceptions are incredibly rare and subject to being ignored: most people would treat float as having total order, for example. Using uncommon comparison relations can seriously increase maintenance effort.


Canonical ways to model "fuzzy matching" via operators are as subset, subsequence or containment using unsymmetric operators.

  • The set and frozenset support >, >= and so on to indicate that one set encompases all values of another.
    >>> a, b = {1, 5, 6, 8}, {5, 6}
    >>> a >= a, a >= b, b >= a
    (True, True, False)
    
  • The str and bytes support in to indicate that subsequences are covered.
    >>> a, b = "+31 6 12345678", "6 12345678"
    >>> a in b, b in a
    (False, True)
    
  • The range and ipaddress Networks support in to indicate that specific items are covered.
    >>> IPv4Address('192.0.2.6') in IPv4Network('192.0.2.0/28')
    True
    

Notably, while these operators may be transitive they are not symmetric. For example, a >= b and c >= b does not imply b >= c and thus not a >= c or vice versa.

Practically, one could model "number without country code" as the superset of "number with country code" for the same number. This means that 06 12345678 >= +31 6 12345678 and 06 12345678 >= +49 6 12345678 but not vice versa. In order to do a symmetric comparison, one would use a >= b or b >= a instead of a == b.

静谧幽蓝2025-01-21 09:57:07

__eq__ 方法应该是传递的;至少字典是这么认为的。

class A:
    def __init__(self, name):
        self.name = name

    def __eq__(self, other):
        for element in self.values:
            if element is other:
                return True
        return False

    def __hash__(self):
        return 0

    def __repr__(self):
        return self.name

x, y, z = A('x'), A('y'), A('z')
x.values = [x,y]
y.values = [x,y,z]
z.values = [y,z]

print(x == y)
--> True

print (y == z)
--> True

print(x == z)
--> False

print({**{x:1},**{y:2, z: 3}})
--> {x: 3}

print({**{x:1},**{z:3, y:2}})
--> {x: 1, z: 2}

{**{x:1},**{y: 2, z:3}} 是两个字典的并集。没有人希望字典在更新后删除某个键。

print(z in {**{x:1},**{y:2, z: 3}})
--> False

通过更改联合中的顺序,您甚至可以获得不同大小的字典:

print(len({**{x:1},**{y:2, z: 3}}))
--> 1

print(len({**{x:1},**{z:3, y:2}}))
--> 2

__eq__ method should be transitive; at least it is what dictionaries assume.

class A:
    def __init__(self, name):
        self.name = name

    def __eq__(self, other):
        for element in self.values:
            if element is other:
                return True
        return False

    def __hash__(self):
        return 0

    def __repr__(self):
        return self.name

x, y, z = A('x'), A('y'), A('z')
x.values = [x,y]
y.values = [x,y,z]
z.values = [y,z]

print(x == y)
--> True

print (y == z)
--> True

print(x == z)
--> False

print({**{x:1},**{y:2, z: 3}})
--> {x: 3}

print({**{x:1},**{z:3, y:2}})
--> {x: 1, z: 2}

{**{x:1},**{y: 2, z:3}} is the union of two dictionaries. No one expects a dictionary to delete a key after updating it.

print(z in {**{x:1},**{y:2, z: 3}})
--> False

By changing the order in the union you can even get different sized dictionaries:

print(len({**{x:1},**{y:2, z: 3}}))
--> 1

print(len({**{x:1},**{z:3, y:2}}))
--> 2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文