当键值位于可迭代元素中时如何使用 itertools.groupby ?
为了说明这一点,我从 2 元组列表开始:
import itertools
import operator
raw = [(1, "one"),
(2, "two"),
(1, "one"),
(3, "three"),
(2, "two")]
for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
print key, list(grp).pop()[1]
产量:
1 one
2 two
1 one
3 three
2 two
试图调查原因:
for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
print key, list(grp)
# ---- OUTPUT ----
1 [(1, 'one')]
2 [(2, 'two')]
1 [(1, 'one')]
3 [(3, 'three')]
2 [(2, 'two')]
即使这也会给我相同的输出:
for key, grp in itertools.groupby(raw, key=operator.itemgetter(0)):
print key, list(grp)
我想要得到类似的结果:
1 one, one
2 two, two
3 three
我认为这是因为密钥在元组内在列表内部,实际上元组作为一个整体移动。有没有办法达到我想要的输出?也许 groupby()
不适合这项任务?
To illustrate, I start with a list of 2-tuples:
import itertools
import operator
raw = [(1, "one"),
(2, "two"),
(1, "one"),
(3, "three"),
(2, "two")]
for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
print key, list(grp).pop()[1]
yields:
1 one
2 two
1 one
3 three
2 two
In an attempt to investigate why:
for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
print key, list(grp)
# ---- OUTPUT ----
1 [(1, 'one')]
2 [(2, 'two')]
1 [(1, 'one')]
3 [(3, 'three')]
2 [(2, 'two')]
Even this will give me the same output:
for key, grp in itertools.groupby(raw, key=operator.itemgetter(0)):
print key, list(grp)
I want to get something like:
1 one, one
2 two, two
3 three
I am thinking this is because the key is within the tuple inside the list, when in fact the tuple gets moved around as one. Is there a way to get to my desired output? Maybe groupby()
isn't suited for this task?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
groupby
对具有相同键的可迭代对象的连续元素进行聚类。要产生您想要的输出,您必须首先对
raw
进行排序。groupby
clusters consecutive elements of the iterable which have the same key.To produce the output you desire, you must first sort
raw
.我认为获得您想要的结果的更干净的方法是这样的。
构建
d
的时间复杂度为O(n),现在sorted()
只是在唯一键上而不是整个数据集上I think a cleaner way to get your desired result is this.
building
d
is O(n), and nowsorted()
is just over the unique keys instead of the entire dataset来自 文档:
由于无论如何您都是按字典顺序对元组进行排序,因此您只需调用
sorted
即可:From the docs:
Since you are sorting the tuples lexicographically anyway, you can just call
sorted
: