pandas groupby cummax 只是分配原始值而不是更新 max-so-far
我有这个数据框:
type run corrected_episode Reward
0 notsweet 0 0 35.0
1 notsweet 0 100 20.0
2 notsweet 0 200 20.0
3 notsweet 0 300 22.0
4 notsweet 0 400 20.0
我想创建一个新列 best_so_far
,它具有按 type
分组的相应 Reward
单调递增值, 运行
和Corrected_episode
。很容易,对吧?除了当我尝试使用 groupby
和 cummax
时发生以下情况:
foo['best_so_far'] = foo.groupby(['type','run', ' Corrected_episode']).Reward.cummax()
产量:
type run corrected_episode Reward best_so_far
0 notsweet 0 0 35.0 35.0
1 notsweet 0 100 20.0 20.0
2 notsweet 0 200 20.0 20.0
3 notsweet 0 300 22.0 22.0
4 notsweet 0 400 20.0 20.0
“迄今为止最好的”,嗯,并不是最好的。如果我使用 foo['best_so_far'] = foo.groupby(['type','run',' Corrected_episode']).Reward.apply(lambda x: x.cummax()) 我会得到相同的结果
我知道这是可能的,因为我已经用其他数据帧完成了数十次,但这个简单的过程不起作用,这有点奇怪。
I have this dataframe:
type run corrected_episode Reward
0 notsweet 0 0 35.0
1 notsweet 0 100 20.0
2 notsweet 0 200 20.0
3 notsweet 0 300 22.0
4 notsweet 0 400 20.0
I want to create a new column, best_so_far
, that has a monotonically increasing value for the corresponding Reward
grouped by type
, run
, and corrected_episode
. Easy enough, right? Except the following happens when I try to use groupby
and cummax
:
foo['best_so_far'] = foo.groupby(['type','run','corrected_episode']).Reward.cummax()
yields:
type run corrected_episode Reward best_so_far
0 notsweet 0 0 35.0 35.0
1 notsweet 0 100 20.0 20.0
2 notsweet 0 200 20.0 20.0
3 notsweet 0 300 22.0 22.0
4 notsweet 0 400 20.0 20.0
The "best so far", well, isn't the best. I get the same results if I use foo['best_so_far'] = foo.groupby(['type','run','corrected_episode']).Reward.apply(lambda x: x.cummax())
I know this is possible because I've done this dozens of times with other dataframes, there's just something weird about this one that this simple procedure just doesn't work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以尝试删除
Corrected_episode
You can try remove
corrected_episode
发布此内容后,我当然发现了发生了什么,但我将在这里分享我为解决此问题所做的工作,因为这是熊猫容易违反的最小惊讶原则。
解决方案是这样做:
foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()
也就是说,我过度指定了包含
Corrected_episode 的列,仅针对该元素执行
cummax() 会产生意想不到的效果。但是,我最初包含了
Corrected_episode 以确保行的顺序正确 - 即,数据帧实际上是处理大量数据的结果(您是看到一个很小的子集),并且数据的顺序对于 cummax()
来说不一定是正常的,可以按照我的设想工作。After posting this of course I discovered what happened, but I'm going to share what I did to fix this here because this is the kind of Violation of the Principle of Least Astonishment that pandas is prone to.
The solution was to do this, instead:
foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()
That is, I over specified the columns by including
corrected_episode
that had the unintended effect of just executingcummax()
for that one element. However, I had originally includedcorrected_episode
to ensure that the order of the rows was correct -- i.e., the dataframe was actually the result of massaging a lot of data (you are seeing a teeny tiny subset), and the order of the data wasn't necessarily sane for thecummax()
to work as I envisioned.