在 Python 中使用 Groupby 对 Access 表中的列进行分组
我已经使用 itertools 模块中的 groupby 函数有一段时间了(就像几天一样),
for k, g in groupby(data, keyfunc):
我真的很难理解语法。我尝试了各种不同的方法,但我真的不知道要为“data”和“keyfunc”输入什么,并让它在打印语句中返回我想要的数据组。
我正在做的是循环访问表中的行。
我将一个变量设置为访问表中的游标搜索(已排序)并拉出我想要的列。
for row in cursor:
print row.ROAD_TYPE
这将返回:
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
COLLECTOR HIGHWAY
COLLECTOR HIGHWAY
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
MAJOR ROAD
MAJOR HIGHWAY
我想将这些值分组在一起,并让它为我返回字符串值,以便它打印如下内容:
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
COLLECTOR HIGHWAY
COLLECTOR HIGHWAY
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
MAJOR ROAD
MAJOR HIGHWAY
最后,我想根据这些新组对第二列进行分组,以便我得到如下内容:
TRUNK ROAD M1
TRUNK ROAD M1
TRUNK ROAD M2
TRUNK ROAD M3
COLLECTOR HIGHWAY M1
COLLECTOR HIGHWAY M2
ACCESS ROAD M1
ACCESS ROAD M1
ACCESS ROAD M3
ACCESS ROAD M3
ACCESS ROAD M7
ACCESS ROAD M7
ACCESS ROAD M8
MAJOR ROAD M8
MAJOR HIGHWAY M8
我知道这可能比我自己想象的要简单得多,我觉得有一个简单的答案,但我完全被难住了,我似乎无法在互联网上找到以某种方式解释 groupby 语法的示例我明白。请感受一下
I have been playing with the groupby function from the itertools module for a while now (like days)
for k, g in groupby(data, keyfunc):
I am really having trouble understanding the syntax. I've tried a variety of different things, but I really don't know what to put in for "data" and "keyfunc" and get it to return the groups of data I want in a print statement.
What am I doing is looping through rows in an access table.
I set a variable to a cursor search (which is sorted) in the access table and pull out the column I want.
for row in cursor:
print row.ROAD_TYPE
This returns:
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
COLLECTOR HIGHWAY
COLLECTOR HIGHWAY
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
MAJOR ROAD
MAJOR HIGHWAY
I want to group these values together and have it return the string value for me so it prints something like this:
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
TRUNK ROAD
COLLECTOR HIGHWAY
COLLECTOR HIGHWAY
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
ACCESS ROAD
MAJOR ROAD
MAJOR HIGHWAY
Finally, I want to group a second column based on these new groups so that I get something like this:
TRUNK ROAD M1
TRUNK ROAD M1
TRUNK ROAD M2
TRUNK ROAD M3
COLLECTOR HIGHWAY M1
COLLECTOR HIGHWAY M2
ACCESS ROAD M1
ACCESS ROAD M1
ACCESS ROAD M3
ACCESS ROAD M3
ACCESS ROAD M7
ACCESS ROAD M7
ACCESS ROAD M8
MAJOR ROAD M8
MAJOR HIGHWAY M8
I know this is probably way less difficult than I've made it out for myself, I feel there's a simple answer, but I'm completely stumped and I can't seem to find an example on the internet that explains the groupby syntax in a way that I understand. Please feel f
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这里有两个例子来帮助理解groupby:
上面,所有项目都是相同的,归为一组。
现在我们添加一个关键函数
keyfunc
:现在我们不再按数据中的项目进行分组,而是根据每个
x
的keyfunc(x)
进行分组> 在数据
中。Here are two examples to help grok groupby:
Above, all items which are the same, are grouped together.
Now we add a key function,
keyfunc
:Now instead of grouping by the items in data, we group according to
keyfunc(x)
for eachx
indata
.itertools.groupby 背后的想法是解决以下基本问题:“我想迭代一个序列,分析其中的每个事物,并根据分析过程以不同的方式处理每个事物”。
groupby
将序列分成组,但保留原始序列的顺序。上面打印:
[(0, [0, 1, 2]), (1, [3, 4, 5]), (2, [6, 7, 8]), (3, [9, 10, 11])]
您会注意到,为了获得
结果
列表,我必须进行一些额外的处理。groupby
实际上给你的是迭代:上面的打印:
(0,)
所以这个想法是在你的程序中你会说
key, valueiter = i.next()
,测试key
,然后将valueiter
传递给正确的处理函数/方法。那么这里有什么价值呢?好吧,您不必继续测试处理函数中的键来判断何时停止。
groupby
会自动为您停止每个“子迭代器”。与 SQL 中的 GROUP BY 子句不同,groupby 不做的事情是确保处理原始可迭代中具有相同键值的所有内容立刻。键值可以在
groupby
之外重复。这是合理的,因为目标是保留原始序列的顺序,而不是在内存中累积所有内容。例如:以上打印:
[(0, [0]), (1, [1]), (2, [2]), (0, [3]), (1, [4]) ,(2,[5])]
。键值重复,每个子迭代器仅生成一个数据。就性能而言,这是groupby
的最坏情况,这意味着您在使用此工具时必须遵循特定模型。因此,这样的事情:
只有当您先验知道您的键值将永远重复时,才是正确的。
The idea behind
itertools.groupby
is to solve the basic problem of: "I want to iterate through a sequence, analyze each thing in it, and depending on the analysis process each thing a different way".groupby
separates the sequence into groups, but preserving the order of the original sequence.The above prints:
[(0, [0, 1, 2]), (1, [3, 4, 5]), (2, [6, 7, 8]), (3, [9, 10, 11])]
You will notice that in order to get the
results
list I had to do some extra processing. Whatgroupby
actually gives you is meant for iteration:The above prints:
(0, <itertools._grouper object at 0x020BF3D0>)
So the idea is that in your program you will say
key, valueiter = i.next()
, test thekey
, and then passvalueiter
to the correct processing function/method.So what's the value here? Well, you don't have to continue to test the key within your processing functions to tell when to stop.
groupby
will stop each "sub-iterator" for you automatically.What
groupby
does not do, unlike theGROUP BY
clause in SQL, is ensure that everything with the same key value from the original iterable is processed at once. Key values can repeat out ofgroupby
. This is justified because the objective is to preserve the order of the original sequence, and not to accumulate everything in memory. For example:The above prints:
[(0, [0]), (1, [1]), (2, [2]), (0, [3]), (1, [4]), (2, [5])]
. The key values repeat, and each sub-iterator produces only a single datum. This is the worst-case scenario, in terms of performance, forgroupby
, and it means that you have to follow a certain model in working with this tool.So something like this:
is only correct if you know a priori that your key values will never repeat.
您的两个
groupby
示例的完成方式完全相同,除了您选择的key
在这两种情况下
key
都将是 attrgetter() 的结果,但您实际上并不需要它,因为您将迭代按该键分组的行。当然,如果cursor
按分组所依据的相同键排序,则这一切都可以正常工作。Your two
groupby
examples are done the exact same way, except for your choice ofkey
In both cases
key
will be the result of attrgetter(), but you won't actually need it, as you will be iterating the rows grouped by that key. Of course, this all works correctly, ifcursor
is sorted by the same key that you group by.