为什么 Python 中对集合的处理不统一?
Python 中集合和列表的处理方式不同,并且似乎没有统一的方法来处理这两者。例如,向 set
添加项目是使用 add
方法完成的,对于 list
则使用 append 完成方法。我知道这背后有不同的语义,但也有共同的语义,并且通常与某些集合一起使用的算法更关心共性而不是差异。 C++ STL表明这是可行的,那么为什么Python中没有这样的概念呢?
编辑:在C++中我可以使用output_iterator< /code>
将值存储在(几乎)任意类型的集合中,包括列表和集合。我可以编写一个算法,将这样的迭代器作为参数并向其中写入元素。该算法完全不知道支持迭代器的容器(或其他设备,可能是文件)的类型。如果后备容器是一个忽略重复项的集合,那么这是调用者的决定。我的具体问题是,这种情况已经发生过好几次了,例如我使用
list
来完成某个任务,后来又认为 set
更合适。现在我必须在代码中的多个位置将 append
更改为 add
。我只是想知道为什么Python对这种情况没有概念。
Sets and lists are handled differently in Python, and there seems to be no uniform way to work with both. For example, adding an item to a set
is done using the add
method, and for the list
it is done using the append
method. I am aware that there are different semantics behind this, but there are also common semantics there, and often an algorithm that works with some collection cares more about the commonalities than the differences. The C++ STL shows that this can work, so why is there no such concept in Python?
Edit: In C++ I can use an output_iterator
to store values in an (almost) arbitrary type of collection, including lists and sets. I can write an algorithm that takes such an iterator as argument and writes elements to it. The algorithm then is completely agnostic to the kind of container (or other device, may be a file) that backs the iterator. If the backing container is a set that ignores duplicates, then that is the decision of the caller. My specific problem is, that it has happened several times to me now that I used for instance a list
for a certain task and later decided that set
is more appropriate. Now I have to change the append
to add
in several places in my code. I am just wondering why Python has no concept for such cases.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
直接答案:这是一个设计缺陷。
您应该能够使用相同的方法名称插入到任何通用插入有意义的容器中(例如,不包括 dict)。应该有一个一致的、通用的插入名称,例如。
add
,对应于set.add
和list.append
,这样你就可以添加到容器中,而不必太关心你的内容重新插入。在不同类型中对此操作使用不同的名称是一种无端的不一致,并且设置了一个糟糕的基本标准:库应该鼓励用户容器使用一致的 API,而不是为每个基本容器提供很大程度上不兼容的 API。
也就是说,在这种情况下,这通常不是一个实际问题:大多数情况下,函数的结果是项目列表,将其实现为生成器。它们允许一致地处理这两者(从函数的角度来看),以及其他形式的迭代:
The direct answer: it's a design flaw.
You should be able to insert into any container where generic insertion makes sense (eg. excluding dict) with the same method name. There should be a consistent, generic name for insertion, eg.
add
, corresponding toset.add
andlist.append
, so you can add to a container without having to care as much about what you're inserting into.Using different names for this operation in different types is a gratuitous inconsistency, and sets a poor base standard: the library should encourage user containers to use a consistent API, rather than providing largely incompatible APIs for each basic container.
That said, it's not often a practical problem in this case: most of the time where a function's results are a list of items, implement it as a generator. They allow handling both of these consistently (from the perspective of the function), as well as other forms of iteration:
添加和追加是不同的。集合是无序的并且包含唯一的元素,而追加表明该项目始终被添加,并且这是在最后专门完成的。
集合和列表都可以被视为可迭代,这是它们的共同语义,并且可以由您的算法自由使用。
如果你有一个依赖于某种加法的算法,你就不能依赖行为相同的集合、元组、列表、字典、字符串。
add and append are different. Sets are unordered and contain unique elements, while append suggest the item is always added, and that this is done specifically at the end.
sets and lists can both be treated as iterables, and that's their common semantics, and that's freely usable by your algorithms.
If you have an algorithm that depends on some sort of addition, you simply can't depend on sets, tuples, lists, dicts, strings behaving the same.
真正的原因可能只是与Python的历史有关。
内置 set 类型直到 Python 2.6 才内置,并且是基于 set 模块,该模块本身直到 Python 2.3 才出现在标准库中。显然,更改集合类型的语义可能会破坏大量依赖原始集合模块的现有代码,并且通常语言设计者会在没有主要版本发布的情况下避免破坏现有代码。
如果您愿意,您可以责怪原始模块作者,但请记住,在 Python 2.2 之前,用户定义类型和内置类型必然存在于不同的宇宙中,这意味着您无法直接扩展内置类型,并且可能允许模块作者对不维护一致的集合语义感到满意。
The actual reason is probably just related to Python history.
The built-in set type wasn't built-in until Python 2.6, and was based on a sets module, which itself wasn't in the standard library until Python 2.3. Obviously changing the semantics of the set type could break a host of existing code that relied on the original sets module, and generally language designers shy away from breaking existing code without a major number release.
You can blame the original module author if you like, but keep in mind that user-defined types and built-in types necessarily lived in different universes until Python 2.2, which meant you couldn't directly extend a built-in type, and probably allowed module authors to feel OK about not maintaining consistent collection semantics.