Python 中集合的不区分大小写比较

发布于 2024-08-05 08:47:34 字数 748 浏览 2 评论 0原文

我有两套(虽然我可以做列表,或者其他什么):

a = frozenset(('Today','I','am','fine'))
b = frozenset(('hello','how','are','you','today'))

我想要得到:

frozenset(['Today'])

或者至少:

frozenset(['today'])

如果我将我认为的所有内容都小写,那么第二个选项是可行的,但我正在寻找一种更优雅的方式。是否可以以

a.intersection(b) 

不区分大小写的方式进行?

Django 中的快捷方式也很好,因为我正在使用该框架。

下面的交集方法示例(我不知道如何在评论中对其进行格式化):

print intersection('Today I am fine tomorrow'.split(),
                    'Hello How a re you TODAY and today and Today and Tomorrow'.split(),
                    key=str.lower)

[(['tomorrow'], ['Tomorrow']), (['Today'], ['TODAY', 'today', 'Today'])]

I have two sets (although I can do lists, or whatever):

a = frozenset(('Today','I','am','fine'))
b = frozenset(('hello','how','are','you','today'))

I want to get:

frozenset(['Today'])

or at least:

frozenset(['today'])

The second option is doable if I lowercase everything I presume, but I'm looking for a more elegant way. Is it possible to do

a.intersection(b) 

in a case-insensitive manner?

Shortcuts in Django are also fine since I'm using that framework.

Example from intersection method below (I couldn't figure out how to get this formatted in a comment):

print intersection('Today I am fine tomorrow'.split(),
                    'Hello How a re you TODAY and today and Today and Tomorrow'.split(),
                    key=str.lower)

[(['tomorrow'], ['Tomorrow']), (['Today'], ['TODAY', 'today', 'Today'])]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

小草泠泠 2024-08-12 08:47:34

这是适用于任何可迭代对象对的版本:

def intersection(iterableA, iterableB, key=lambda x: x):
    """Return the intersection of two iterables with respect to `key` function.

    """
    def unify(iterable):
        d = {}
        for item in iterable:
            d.setdefault(key(item), []).append(item)
        return d

    A, B = unify(iterableA), unify(iterableB)

    return [(A[k], B[k]) for k in A if k in B]

示例:

print intersection('Today I am fine'.split(),
                   'Hello How a re you TODAY'.split(),
                   key=str.lower)
# -> [(['Today'], ['TODAY'])]

Here's version that works for any pair of iterables:

def intersection(iterableA, iterableB, key=lambda x: x):
    """Return the intersection of two iterables with respect to `key` function.

    """
    def unify(iterable):
        d = {}
        for item in iterable:
            d.setdefault(key(item), []).append(item)
        return d

    A, B = unify(iterableA), unify(iterableB)

    return [(A[k], B[k]) for k in A if k in B]

Example:

print intersection('Today I am fine'.split(),
                   'Hello How a re you TODAY'.split(),
                   key=str.lower)
# -> [(['Today'], ['TODAY'])]
朦胧时间 2024-08-12 08:47:34

不幸的是,即使您可以“动态更改”集合项的与比较相关的特殊方法(__lt__ 和朋友 - 实际上,只有 __eq__ 需要这种方式集合当前已实现,但这是一个实现细节) - 你不能,因为它们属于内置类型 str - 这还不够,因为 __hash__ 至关重要并且当您想要进行交集时,它已经被应用,将集合的项目放入不同的哈希桶中。需要最终使交集按照您想要的方式工作(即,不能保证“今天”和“今天”位于同一个桶中)。

因此,为了您的目的,您不可避免地需要构建新的数据结构 - 如果您认为必须这样做“不优雅”,那么您就很不幸:内置集只是不携带允许人们更改比较和散列函数需要巨大的负担和开销,这会使事情膨胀十倍(或更多),以满足百万分之一用例中(可能)感受到的需求。

如果您经常需要与不区分大小写的比较相关,则应考虑子类化或包装 str (覆盖比较和散列)以提供“不区分大小写的 str”类型 cistr - - 然后,当然,确保只有 cistr 的实例(例如)添加到您感兴趣的集合(&c)中(通过子类化 set & c,或简单地通过支付护理)。举一个过于简单的例子......:

class ci(str):
  def __hash__(self):
    return hash(self.lower())
  def __eq__(self, other):
    return self.lower() == other.lower()

class cifrozenset(frozenset):
  def __new__(cls, seq=()):
    return frozenset((ci(x) for x in seq))

a = cifrozenset(('Today','I','am','fine'))
b = cifrozenset(('hello','how','are','you','today'))

print a.intersection(b)

这确实会根据您表达的愿望发出 frozenset(['Today']) 。当然,在现实生活中,您可能想要做更多的事情(例如......:我在这里的方式,对 cifrozenset 的任何操作都会返回一个简单的 frozenset< /code>,失去了宝贵的独立于大小写的特殊功能——您可能希望确保每次都返回一个 cifrozenset ,而且,虽然相当可行,但这并不是微不足道的)。

Unfortunately, even if you COULD "change on the fly" the comparison-related special methods of the sets' items (__lt__ and friends -- actually, only __eq__ needed the way sets are currently implemented, but that's an implementatio detail) -- and you can't, because they belong to a built-in type, str -- that wouldn't suffice, because __hash__ is also crucial and by the time you want to do your intersection it's already been applied, putting the sets' items in different hash buckets from where they'd need to end up to make intersection work the way you want (i.e., no guarantee that 'Today' and 'today' are in the same bucket).

So, for your purposes, you inevitably need to build new data structures -- if you consider it "inelegant" to have to do that at all, you're plain out of luck: built-in sets just don't carry around the HUGE baggage and overhead that would be needed to allow people to change comparison and hashing functions, which would bloat things by 10 times (or more) for the sae of a need felt in (maybe) one use case in a million.

If you have frequent needs connected with case-insensitive comparison, you should consider subclassing or wrapping str (overriding comparison and hashing) to provide a "case insensitive str" type cistr -- and then, of course, make sure than only instances of cistr are (e.g.) added to your sets (&c) of interest (either by subclassing set &c, or simply by paying care). To give an oversimplified example...:

class ci(str):
  def __hash__(self):
    return hash(self.lower())
  def __eq__(self, other):
    return self.lower() == other.lower()

class cifrozenset(frozenset):
  def __new__(cls, seq=()):
    return frozenset((ci(x) for x in seq))

a = cifrozenset(('Today','I','am','fine'))
b = cifrozenset(('hello','how','are','you','today'))

print a.intersection(b)

this does emit frozenset(['Today']), as per your expressed desire. Of course, in real life you'd probably want to do MUCH more overriding (for example...: the way I have things here, any operation on a cifrozenset returns a plain frozenset, losing the precious case independence special feature -- you'd probably want to ensure that a cifrozenset is returned each time instead, and, while quite feasible, that's NOT trivial).

情深缘浅 2024-08-12 08:47:34

首先,你不是说a.intersection(b)吗?交集(如果不区分大小写)将为 set(['today'])。区别是 set(['i', 'am', 'fine'])

这里有两个想法:

1.) 编写一个函数将两个集合的元素转换为小写,然后执行交叉点。这是您可以做到的一种方法:

>>> intersect_with_key = lambda s1, s2, key=lambda i: i: set(map(key, s1)).intersection(map(key, s2))
>>> fs1 = frozenset('Today I am fine'.split())
>>> fs2 = frozenset('Hello how are you TODAY'.split())
>>> intersect_with_key(fs1, fs2)
set([])
>>> intersect_with_key(fs1, fs2, key=str.lower)
set(['today'])
>>>

但这不是很有效,因为必须在每次调用时创建转换和新集。

2.) 扩展frozenset 类以保留元素的不区分大小写的副本。重写 intersection 方法以使用不区分大小写的元素副本。这样会更有效率。

First, don't you mean a.intersection(b)? The intersection (if case insensitive) would be set(['today']). The difference would be set(['i', 'am', 'fine'])

Here are two ideas:

1.) Write a function to convert the elements of both sets to lowercase and then do the intersection. Here's one way you could do it:

>>> intersect_with_key = lambda s1, s2, key=lambda i: i: set(map(key, s1)).intersection(map(key, s2))
>>> fs1 = frozenset('Today I am fine'.split())
>>> fs2 = frozenset('Hello how are you TODAY'.split())
>>> intersect_with_key(fs1, fs2)
set([])
>>> intersect_with_key(fs1, fs2, key=str.lower)
set(['today'])
>>>

This is not very efficient though because the conversion and new sets would have to be created on each call.

2.) Extend the frozenset class to keep a case insensitive copy of the elements. Override the intersection method to use the case insensitive copy of the elements. This would be more efficient.

懵少女 2024-08-12 08:47:34
>>> a_, b_ = map(set, [map(str.lower, a), map(str.lower, b)])
>>> a_ & b_
set(['today'])

或者...用更少的地图,

>>> a_ = set(map(str.lower, a))
>>> b_ = set(map(str.lower, b))
>>> a_ & b_
set(['today'])
>>> a_, b_ = map(set, [map(str.lower, a), map(str.lower, b)])
>>> a_ & b_
set(['today'])

Or... with less maps,

>>> a_ = set(map(str.lower, a))
>>> b_ = set(map(str.lower, b))
>>> a_ & b_
set(['today'])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文