在 Python 中使用 Groupby 对 Access 表中的列进行分组

发布于 2024-12-09 15:19:35 字数 1239 浏览 2 评论 0原文

我已经使用 itertools 模块中的 groupby 函数有一段时间了（就像几天一样），

for k, g in groupby(data, keyfunc):

我真的很难理解语法。我尝试了各种不同的方法，但我真的不知道要为“data”和“keyfunc”输入什么，并让它在打印语句中返回我想要的数据组。

我正在做的是循环访问表中的行。

我将一个变量设置为访问表中的游标搜索（已排序）并拉出我想要的列。

for row in cursor:
    print row.ROAD_TYPE

这将返回：

TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
COLLECTOR HIGHWAY
COLLECTOR HIGHWAY
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
MAJOR ROAD
MAJOR HIGHWAY

我想将这些值分组在一起，并让它为我返回字符串值，以便它打印如下内容：

TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD

COLLECTOR HIGHWAY
COLLECTOR HIGHWAY

ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD

MAJOR ROAD

MAJOR HIGHWAY

最后，我想根据这些新组对第二列进行分组，以便我得到如下内容：

TRUNK ROAD  M1
TRUNK ROAD  M1

TRUNK ROAD  M2

TRUNK ROAD  M3


COLLECTOR HIGHWAY  M1

COLLECTOR HIGHWAY  M2


ACCESS ROAD  M1
ACCESS ROAD  M1

ACCESS ROAD  M3
ACCESS ROAD  M3

ACCESS ROAD  M7
ACCESS ROAD  M7

ACCESS ROAD  M8


MAJOR ROAD  M8


MAJOR HIGHWAY  M8

我知道这可能比我自己想象的要简单得多，我觉得有一个简单的答案，但我完全被难住了，我似乎无法在互联网上找到以某种方式解释 groupby 语法的示例我明白。请感受一下

原文

I have been playing with the groupby function from the itertools module for a while now (like days)

for k, g in groupby(data, keyfunc):

I am really having trouble understanding the syntax. I've tried a variety of different things, but I really don't know what to put in for "data" and "keyfunc" and get it to return the groups of data I want in a print statement.

What am I doing is looping through rows in an access table.

I set a variable to a cursor search (which is sorted) in the access table and pull out the column I want.

for row in cursor:
    print row.ROAD_TYPE

This returns:

TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
COLLECTOR HIGHWAY
COLLECTOR HIGHWAY
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
MAJOR ROAD
MAJOR HIGHWAY

I want to group these values together and have it return the string value for me so it prints something like this:

TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD

COLLECTOR HIGHWAY
COLLECTOR HIGHWAY

ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD

MAJOR ROAD

MAJOR HIGHWAY

Finally, I want to group a second column based on these new groups so that I get something like this:

TRUNK ROAD  M1
TRUNK ROAD  M1

TRUNK ROAD  M2

TRUNK ROAD  M3


COLLECTOR HIGHWAY  M1

COLLECTOR HIGHWAY  M2


ACCESS ROAD  M1
ACCESS ROAD  M1

ACCESS ROAD  M3
ACCESS ROAD  M3

ACCESS ROAD  M7
ACCESS ROAD  M7

ACCESS ROAD  M8


MAJOR ROAD  M8


MAJOR HIGHWAY  M8

I know this is probably way less difficult than I've made it out for myself, I feel there's a simple answer, but I'm completely stumped and I can't seem to find an example on the internet that explains the groupby syntax in a way that I understand. Please feel f

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

誰ツ都不明白 2024-12-16 15:19:35

import itertools as it
for key, group in it.groupby(cursor, lambda row: row.ROAD_TYPE):
    for sec_col,pairs in it.groupby(group, lambda row: row.SECOND_COLUMN):
        for row in pairs:
            print('{t}  {s}'.format(t=row.ROAD_TYPE,s=row.SECOND_COLUMN))
        print
    print

这里有两个例子来帮助理解groupby：

[list(g) for k, g in it.groupby('AAAABBBCCD')]
# [['A', 'A', 'A', 'A'], ['B', 'B', 'B'], ['C', 'C'], ['D']]

上面，所有项目都是相同的，归为一组。

现在我们添加一个关键函数 keyfunc：

keyfunc=lambda x: x//3
data=range(13)
[list(g) for k,g in it.groupby(data,keyfunc)]
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12]]
[k for k,g in it.groupby(data,keyfunc)]
# [0, 1, 2, 3, 4]

现在我们不再按数据中的项目进行分组，而是根据每个 x 的 keyfunc(x) 进行分组> 在数据中。

import itertools as it
for key, group in it.groupby(cursor, lambda row: row.ROAD_TYPE):
    for sec_col,pairs in it.groupby(group, lambda row: row.SECOND_COLUMN):
        for row in pairs:
            print('{t}  {s}'.format(t=row.ROAD_TYPE,s=row.SECOND_COLUMN))
        print
    print

Here are two examples to help grok groupby:

[list(g) for k, g in it.groupby('AAAABBBCCD')]
# [['A', 'A', 'A', 'A'], ['B', 'B', 'B'], ['C', 'C'], ['D']]

Above, all items which are the same, are grouped together.

Now we add a key function, keyfunc:

keyfunc=lambda x: x//3
data=range(13)
[list(g) for k,g in it.groupby(data,keyfunc)]
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12]]
[k for k,g in it.groupby(data,keyfunc)]
# [0, 1, 2, 3, 4]

Now instead of grouping by the items in data, we group according to keyfunc(x) for each x in data.

回复收藏 0 原文

遥远的她 2024-12-16 15:19:35

itertools.groupby 背后的想法是解决以下基本问题：“我想迭代一个序列，分析其中的每个事物，并根据分析过程以不同的方式处理每个事物”。 groupby 将序列分成组，但保留原始序列的顺序。

from itertools import groupby
i = groupby(xrange(12), lambda v: v / 3)
results = [(x[0], list(x[1])) for x in i]
print str(results)

上面打印：[(0, [0, 1, 2]), (1, [3, 4, 5]), (2, [6, 7, 8]), (3, [9, 10, 11])]

您会注意到，为了获得结果列表，我必须进行一些额外的处理。 groupby 实际上给你的是迭代：

i = groupby(xrange(12), lambda v: v / 3)
print str(i.next())

上面的打印： (0,)

所以这个想法是在你的程序中你会说key, valueiter = i.next()，测试key，然后将valueiter传递给正确的处理函数/方法。

那么这里有什么价值呢？好吧，您不必继续测试处理函数中的键来判断何时停止。 groupby 会自动为您停止每个“子迭代器”。

与 SQL 中的 GROUP BY 子句不同，groupby 不做的事情是确保处理原始可迭代中具有相同键值的所有内容立刻。键值可以在 groupby 之外重复。这是合理的，因为目标是保留原始序列的顺序，而不是在内存中累积所有内容。例如：

i = groupby(xrange(6), lambda v: v % 3)  # note: modulo division
results = [(x[0], list(x[1])) for x in i]
print str(results)

以上打印：[(0, [0]), (1, [1]), (2, [2]), (0, [3]), (1, [4]) ，(2，[5])]。键值重复，每个子迭代器仅生成一个数据。就性能而言，这是 groupby 的最坏情况，这意味着您在使用此工具时必须遵循特定模型。

因此，这样的事情：

i = groupby(xrange(12), lambda v: v / 3)
results = dict([(x[0], list(x[1])) for x in i])  # beware of dict() here!
print str(results)

只有当您先验知道您的键值将永远重复时，才是正确的。

The idea behind itertools.groupby is to solve the basic problem of: "I want to iterate through a sequence, analyze each thing in it, and depending on the analysis process each thing a different way". groupby separates the sequence into groups, but preserving the order of the original sequence.

from itertools import groupby
i = groupby(xrange(12), lambda v: v / 3)
results = [(x[0], list(x[1])) for x in i]
print str(results)

The above prints: [(0, [0, 1, 2]), (1, [3, 4, 5]), (2, [6, 7, 8]), (3, [9, 10, 11])]

You will notice that in order to get the results list I had to do some extra processing. What groupby actually gives you is meant for iteration:

i = groupby(xrange(12), lambda v: v / 3)
print str(i.next())

The above prints: (0, <itertools._grouper object at 0x020BF3D0>)

So the idea is that in your program you will say key, valueiter = i.next(), test the key, and then pass valueiter to the correct processing function/method.

So what's the value here? Well, you don't have to continue to test the key within your processing functions to tell when to stop. groupby will stop each "sub-iterator" for you automatically.

What groupby does not do, unlike the GROUP BY clause in SQL, is ensure that everything with the same key value from the original iterable is processed at once. Key values can repeat out of groupby. This is justified because the objective is to preserve the order of the original sequence, and not to accumulate everything in memory. For example:

i = groupby(xrange(6), lambda v: v % 3)  # note: modulo division
results = [(x[0], list(x[1])) for x in i]
print str(results)

The above prints: [(0, [0]), (1, [1]), (2, [2]), (0, [3]), (1, [4]), (2, [5])]. The key values repeat, and each sub-iterator produces only a single datum. This is the worst-case scenario, in terms of performance, for groupby, and it means that you have to follow a certain model in working with this tool.

So something like this:

i = groupby(xrange(12), lambda v: v / 3)
results = dict([(x[0], list(x[1])) for x in i])  # beware of dict() here!
print str(results)

is only correct if you know a priori that your key values will never repeat.

回复收藏 0 原文

眼泪淡了忧伤 2024-12-16 15:19:35

您的两个 groupby 示例的完成方式完全相同，除了您选择的 key

from itertools import groupby
from operator import attrgetter

for key, rows in groupby(cursor, key=attrgetter('ROAD_TYPE')):
    for row in rows:
        print row.ROAD_TYPE
    print

for key, rows in groupby(cursor, key=attrgetter('ROAD_TYPE', 'OTHER_COLUMN')):
    for row in rows:
        print row.ROAD_TYPE, row.OTHER_COLUMN
    print

在这两种情况下 key 都将是 attrgetter() 的结果，但您实际上并不需要它，因为您将迭代按该键分组的行。当然，如果 cursor 按分组所依据的相同键排序，则这一切都可以正常工作。

Your two groupby examples are done the exact same way, except for your choice of key

from itertools import groupby
from operator import attrgetter

for key, rows in groupby(cursor, key=attrgetter('ROAD_TYPE')):
    for row in rows:
        print row.ROAD_TYPE
    print

for key, rows in groupby(cursor, key=attrgetter('ROAD_TYPE', 'OTHER_COLUMN')):
    for row in rows:
        print row.ROAD_TYPE, row.OTHER_COLUMN
    print

In both cases key will be the result of attrgetter(), but you won't actually need it, as you will be iterating the rows grouped by that key. Of course, this all works correctly, if cursor is sorted by the same key that you group by.

回复收藏 0 原文

~没有更多了~