python列表/dict理解,将另一个键在同一dict中的另一个键概括键
一直在考虑如何将其转换为一个衬里:
activities =
[ {'type': 'Run', 'distance': 12345, 'other_stuff': other ...},
{'type': 'Ride', 'distance': 12345, 'other_stuff': other ...},
{'type': 'Swim', 'distance': 12345, 'other_stuff': other ...} ]
当前正在使用:
grouped_distance = defaultdict(int)
for activity in activities:
act_type = activity['type']
grouped_distance[act_type] += activity['distance']
# {'Run': 12345, 'Ride': 12345, 'Swim': 12345}
尝试过grouped_distance = {活动['type']:[活动中的活动] 这在没有定义的活动['type']的情况下无法正常工作。
编辑
修复@samwise 更新的一些变量错字
: 在发布的所有解决方案上做了一些基准。 100万个项目,有10种不同类型:
方法1(计数器):7.43S
方法2(itertools @chepner):8.64S
方法3(@dmig组):19.34S
方法4(pandas @db):32.73S
方法5(DICS @db):
在Raspberry Pi 4上测试的10.95s,以进一步查看差异。 如果我错误地“命名”该方法,请纠正我。
谢谢大家,@dmig,@mark, @juanpa.arrivillaga激起了我对性能的兴趣。较短/不动摇的性能更高。想只是问我是否以一种衬里形式编写它,以使其看起来更整洁,但是我学到的远不止于此。
Been thinking how to convert this to a one liner if possible:
activities =
[ {'type': 'Run', 'distance': 12345, 'other_stuff': other ...},
{'type': 'Ride', 'distance': 12345, 'other_stuff': other ...},
{'type': 'Swim', 'distance': 12345, 'other_stuff': other ...} ]
currently am using:
grouped_distance = defaultdict(int)
for activity in activities:
act_type = activity['type']
grouped_distance[act_type] += activity['distance']
# {'Run': 12345, 'Ride': 12345, 'Swim': 12345}
Have triedgrouped_distance = {activity['type']:[sum(activity['distance']) for activity in activities]}
this is not working where it says activity['type'] is not defined.
Edited
Fix some variables typo as noticed by @Samwise
Update:
Did some a benchmark on all the solution that was posted.
10 millions items, with 10 different types:
Method 1 (Counter): 7.43s
Method 2 (itertools @chepner): 8.64s
Method 3 (groups @Dmig): 19.34s
Method 4 (pandas @d.b): 32.73s
Method 5 (Dict @d.b): 10.95s
Tested on Raspberry Pi 4 to further see the differences.
Do correct me if I "name" the method wrongly.
Thank you everyone and @Dmig, @Mark, @juanpa.arrivillaga has piqued my interest in performance. Shorter/Neater ≠ Higher Performance. Wanted to just asked if I write it in a one liner form for it to look neater, but I have learnt a lot more than that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的解决方案是很好的,但是如果您真的想要一个单位:
upd:我进行了一些性能研究,将此答案与使用
group的答案进行了比较(.. 。))
。事实证明,在一千万个条目和10种不同类型的类型上,此方法输给了group(Antedby(...))
带有18.14
对10.12 秒>。因此,虽然它更可读性,但在更大的列表上效率较低,尤其是其中的类型更具不同的类型(因为它每次迭代初始列表一次,每种不同类型)。
但是请注意,从疑问中进行的最初直接方法只需
5
秒!这个答案仅是出于教育目的显示单线,问题的解决方案的表现要好得多。您不应该使用它而不是相关的使用,除非我说,除非您真的想要/需要单线。
Your solution is good as it is, but if you really want one-liner:
UPD: I've made some performance research, comparing this answer to answer which uses
group(sortedby(...))
. Turns out, on ten million entries and 10 different types, this approach loses togroup(sortedby(...))
with18.14
seconds against10.12
. So, while it is more readable, it is less efficient on bigger lists and especially with more distinct types in it (because it iterates initial list one time per each distinct type).But take note, the initial straight way to do it from question would take only
5
seconds!This answer is only to show one-liner for educational purposes, solution from question has much better performance. You should not use this instead of one in question, unless, as I said, you really want/need one-liner.
使用
itertools.groups.groupbyby
。在
groupby
实例上进行迭代时,k
将是活动类型之一,v
将是具有类型的活动的疑问k
。Use
itertools.groupby
.When iterating over the
groupby
instance,k
will be one of the activity types, andv
will be an iterable of activities having typek
.