Python 中集合的不区分大小写比较
我有两套(虽然我可以做列表,或者其他什么):
a = frozenset(('Today','I','am','fine'))
b = frozenset(('hello','how','are','you','today'))
我想要得到:
frozenset(['Today'])
或者至少:
frozenset(['today'])
如果我将我认为的所有内容都小写,那么第二个选项是可行的,但我正在寻找一种更优雅的方式。是否可以以
a.intersection(b)
不区分大小写的方式进行?
Django 中的快捷方式也很好,因为我正在使用该框架。
下面的交集方法示例(我不知道如何在评论中对其进行格式化):
print intersection('Today I am fine tomorrow'.split(),
'Hello How a re you TODAY and today and Today and Tomorrow'.split(),
key=str.lower)
[(['tomorrow'], ['Tomorrow']), (['Today'], ['TODAY', 'today', 'Today'])]
I have two sets (although I can do lists, or whatever):
a = frozenset(('Today','I','am','fine'))
b = frozenset(('hello','how','are','you','today'))
I want to get:
frozenset(['Today'])
or at least:
frozenset(['today'])
The second option is doable if I lowercase everything I presume, but I'm looking for a more elegant way. Is it possible to do
a.intersection(b)
in a case-insensitive manner?
Shortcuts in Django are also fine since I'm using that framework.
Example from intersection method below (I couldn't figure out how to get this formatted in a comment):
print intersection('Today I am fine tomorrow'.split(),
'Hello How a re you TODAY and today and Today and Tomorrow'.split(),
key=str.lower)
[(['tomorrow'], ['Tomorrow']), (['Today'], ['TODAY', 'today', 'Today'])]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是适用于任何可迭代对象对的版本:
示例:
Here's version that works for any pair of iterables:
Example:
不幸的是,即使您可以“动态更改”集合项的与比较相关的特殊方法(
__lt__
和朋友 - 实际上,只有__eq__
需要这种方式集合当前已实现,但这是一个实现细节) - 你不能,因为它们属于内置类型str
- 这还不够,因为__hash__
也至关重要并且当您想要进行交集时,它已经被应用,将集合的项目放入不同的哈希桶中。需要最终使交集按照您想要的方式工作(即,不能保证“今天”和“今天”位于同一个桶中)。因此,为了您的目的,您不可避免地需要构建新的数据结构 - 如果您认为必须这样做“不优雅”,那么您就很不幸:内置集只是不携带允许人们更改比较和散列函数需要巨大的负担和开销,这会使事情膨胀十倍(或更多),以满足百万分之一用例中(可能)感受到的需求。
如果您经常需要与不区分大小写的比较相关,则应考虑子类化或包装
str
(覆盖比较和散列)以提供“不区分大小写的 str”类型cistr
- - 然后,当然,确保只有cistr
的实例(例如)添加到您感兴趣的集合(&c)中(通过子类化set
& c,或简单地通过支付护理)。举一个过于简单的例子......:这确实会根据您表达的愿望发出
frozenset(['Today'])
。当然,在现实生活中,您可能想要做更多的事情(例如......:我在这里的方式,对cifrozenset
的任何操作都会返回一个简单的frozenset< /code>,失去了宝贵的独立于大小写的特殊功能——您可能希望确保每次都返回一个
cifrozenset
,而且,虽然相当可行,但这并不是微不足道的)。Unfortunately, even if you COULD "change on the fly" the comparison-related special methods of the sets' items (
__lt__
and friends -- actually, only__eq__
needed the way sets are currently implemented, but that's an implementatio detail) -- and you can't, because they belong to a built-in type,str
-- that wouldn't suffice, because__hash__
is also crucial and by the time you want to do your intersection it's already been applied, putting the sets' items in different hash buckets from where they'd need to end up to make intersection work the way you want (i.e., no guarantee that 'Today' and 'today' are in the same bucket).So, for your purposes, you inevitably need to build new data structures -- if you consider it "inelegant" to have to do that at all, you're plain out of luck: built-in sets just don't carry around the HUGE baggage and overhead that would be needed to allow people to change comparison and hashing functions, which would bloat things by 10 times (or more) for the sae of a need felt in (maybe) one use case in a million.
If you have frequent needs connected with case-insensitive comparison, you should consider subclassing or wrapping
str
(overriding comparison and hashing) to provide a "case insensitive str" typecistr
-- and then, of course, make sure than only instances ofcistr
are (e.g.) added to your sets (&c) of interest (either by subclassingset
&c, or simply by paying care). To give an oversimplified example...:this does emit
frozenset(['Today'])
, as per your expressed desire. Of course, in real life you'd probably want to do MUCH more overriding (for example...: the way I have things here, any operation on acifrozenset
returns a plainfrozenset
, losing the precious case independence special feature -- you'd probably want to ensure that acifrozenset
is returned each time instead, and, while quite feasible, that's NOT trivial).首先,你不是说
a.intersection(b)
吗?交集(如果不区分大小写)将为set(['today'])
。区别是set(['i', 'am', 'fine'])
这里有两个想法:
1.) 编写一个函数将两个集合的元素转换为小写,然后执行交叉点。这是您可以做到的一种方法:
但这不是很有效,因为必须在每次调用时创建转换和新集。
2.) 扩展
frozenset
类以保留元素的不区分大小写的副本。重写intersection
方法以使用不区分大小写的元素副本。这样会更有效率。First, don't you mean
a.intersection(b)
? The intersection (if case insensitive) would beset(['today'])
. The difference would beset(['i', 'am', 'fine'])
Here are two ideas:
1.) Write a function to convert the elements of both sets to lowercase and then do the intersection. Here's one way you could do it:
This is not very efficient though because the conversion and new sets would have to be created on each call.
2.) Extend the
frozenset
class to keep a case insensitive copy of the elements. Override theintersection
method to use the case insensitive copy of the elements. This would be more efficient.或者...用更少的地图,
Or... with less maps,