当键值位于可迭代元素中时如何使用 itertools.groupby ？

发布于 2024-09-13 03:56:44 字数 892 浏览 12 评论 0原文

为了说明这一点，我从 2 元组列表开始：

import itertools
import operator

raw = [(1, "one"),
       (2, "two"),
       (1, "one"),
       (3, "three"),
       (2, "two")]

for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
    print key, list(grp).pop()[1]

产量：

1 one
2 two
1 one
3 three
2 two

试图调查原因：

for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
    print key, list(grp)

# ---- OUTPUT ----
1 [(1, 'one')]
2 [(2, 'two')]
1 [(1, 'one')]
3 [(3, 'three')]
2 [(2, 'two')]

即使这也会给我相同的输出：

for key, grp in itertools.groupby(raw, key=operator.itemgetter(0)):
    print key, list(grp)

我想要得到类似的结果：

1 one, one
2 two, two
3 three

我认为这是因为密钥在元组内在列表内部，实际上元组作为一个整体移动。有没有办法达到我想要的输出？也许 groupby() 不适合这项任务？

原文

To illustrate, I start with a list of 2-tuples:

import itertools
import operator

raw = [(1, "one"),
       (2, "two"),
       (1, "one"),
       (3, "three"),
       (2, "two")]

for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
    print key, list(grp).pop()[1]

yields:

1 one
2 two
1 one
3 three
2 two

In an attempt to investigate why:

for key, grp in itertools.groupby(raw, key=lambda item: item[0]):
    print key, list(grp)

# ---- OUTPUT ----
1 [(1, 'one')]
2 [(2, 'two')]
1 [(1, 'one')]
3 [(3, 'three')]
2 [(2, 'two')]

Even this will give me the same output:

for key, grp in itertools.groupby(raw, key=operator.itemgetter(0)):
    print key, list(grp)

I want to get something like:

1 one, one
2 two, two
3 three

I am thinking this is because the key is within the tuple inside the list, when in fact the tuple gets moved around as one. Is there a way to get to my desired output? Maybe groupby() isn't suited for this task?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

栀梦 2024-09-20 03:56:44

groupby 对具有相同键的可迭代对象的连续元素进行聚类。
要产生您想要的输出，您必须首先对 raw 进行排序。

for key, grp in itertools.groupby(sorted(raw), key=operator.itemgetter(0)):
    print key, map(operator.itemgetter(1), grp)

# 1 ['one', 'one']
# 2 ['two', 'two']
# 3 ['three']

groupby clusters consecutive elements of the iterable which have the same key.
To produce the output you desire, you must first sort raw.

for key, grp in itertools.groupby(sorted(raw), key=operator.itemgetter(0)):
    print key, map(operator.itemgetter(1), grp)

# 1 ['one', 'one']
# 2 ['two', 'two']
# 3 ['three']

回复收藏 0 原文

你穿错了嫁妆 2024-09-20 03:56:44

我认为获得您想要的结果的更干净的方法是这样的。

>>> from collections import defaultdict
>>> d=defaultdict(list)
>>> for k,v in raw:
...  d[k].append(v)
... 
>>> for k,v in sorted(d.items()):
...  print k, v
... 
1 ['one', 'one']
2 ['two', 'two']
3 ['three']

构建d的时间复杂度为O(n)，现在sorted()只是在唯一键上而不是整个数据集上

I think a cleaner way to get your desired result is this.

>>> from collections import defaultdict
>>> d=defaultdict(list)
>>> for k,v in raw:
...  d[k].append(v)
... 
>>> for k,v in sorted(d.items()):
...  print k, v
... 
1 ['one', 'one']
2 ['two', 'two']
3 ['three']

building d is O(n), and now sorted() is just over the unique keys instead of the entire dataset

回复收藏 0 原文

情丝乱 2024-09-20 03:56:44

来自文档：

groupby()的操作类似
到 Unix 中的 uniq 过滤器。它
每隔一段时间生成一个中断或新组
时间关键函数的值
变化（这就是为什么它通常是
需要对数据进行排序
使用相同的按键功能）。那
行为与 SQL 的 GROUP BY 不同
聚合共同元素
无论输入顺序如何。

由于无论如何您都是按字典顺序对元组进行排序，因此您只需调用 sorted 即可：

for key, grp in itertools.groupby( sorted( raw ), key = operator.itemgetter( 0 ) ):
    print( key, list( map( operator.itemgetter( 1 ), list( grp ) ) ) )

From the docs:

The operation of groupby() is similar
to the uniq filter in Unix. It
generates a break or new group every
time the value of the key function
changes (which is why it is usually
necessary to have sorted the data
using the same key function). That
behavior differs from SQL’s GROUP BY
which aggregates common elements
regardless of their input order.

Since you are sorting the tuples lexicographically anyway, you can just call sorted:

for key, grp in itertools.groupby( sorted( raw ), key = operator.itemgetter( 0 ) ):
    print( key, list( map( operator.itemgetter( 1 ), list( grp ) ) ) )

回复收藏 0 原文

~没有更多了~