pandas groupby cummax 只是分配原始值而不是更新 max-so-far

发布于 2025-01-12 07:18:14 字数 1189 浏览 0 评论 0原文

我有这个数据框：

    type      run   corrected_episode   Reward
0   notsweet    0   0                   35.0
1   notsweet    0   100                 20.0
2   notsweet    0   200                 20.0
3   notsweet    0   300                 22.0
4   notsweet    0   400                 20.0

我想创建一个新列 best_so_far，它具有按 type 分组的相应 Reward 单调递增值， 运行和Corrected_episode。很容易，对吧？除了当我尝试使用 groupby 和 cummax 时发生以下情况：

foo['best_so_far'] = foo.groupby(['type','run', ' Corrected_episode']).Reward.cummax() 产量：

type            run corrected_episode   Reward  best_so_far
0   notsweet    0   0                   35.0    35.0
1   notsweet    0   100                 20.0    20.0
2   notsweet    0   200                 20.0    20.0
3   notsweet    0   300                 22.0    22.0
4   notsweet    0   400                 20.0    20.0

“迄今为止最好的”，嗯，并不是最好的。如果我使用 foo['best_so_far'] = foo.groupby(['type','run',' Corrected_episode']).Reward.apply(lambda x: x.cummax()) 我会得到相同的结果

我知道这是可能的，因为我已经用其他数据帧完成了数十次，但这个简单的过程不起作用，这有点奇怪。

原文

I have this dataframe:

    type      run   corrected_episode   Reward
0   notsweet    0   0                   35.0
1   notsweet    0   100                 20.0
2   notsweet    0   200                 20.0
3   notsweet    0   300                 22.0
4   notsweet    0   400                 20.0

I want to create a new column, best_so_far, that has a monotonically increasing value for the corresponding Reward grouped by type, run, and corrected_episode. Easy enough, right? Except the following happens when I try to use groupby and cummax:

foo['best_so_far'] = foo.groupby(['type','run','corrected_episode']).Reward.cummax() yields:

type            run corrected_episode   Reward  best_so_far
0   notsweet    0   0                   35.0    35.0
1   notsweet    0   100                 20.0    20.0
2   notsweet    0   200                 20.0    20.0
3   notsweet    0   300                 22.0    22.0
4   notsweet    0   400                 20.0    20.0

The "best so far", well, isn't the best. I get the same results if I use foo['best_so_far'] = foo.groupby(['type','run','corrected_episode']).Reward.apply(lambda x: x.cummax())

I know this is possible because I've done this dozens of times with other dataframes, there's just something weird about this one that this simple procedure just doesn't work.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不知在何时 2025-01-19 07:18:14

您可以尝试删除Corrected_episode

foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()

You can try remove corrected_episode

foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()

回复收藏 0 原文

内心激荡 2025-01-19 07:18:14

发布此内容后，我当然发现了发生了什么，但我将在这里分享我为解决此问题所做的工作，因为这是熊猫容易违反的最小惊讶原则。

解决方案是这样做：

foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()

也就是说，我过度指定了包含 Corrected_episode 的列，仅针对该元素执行 cummax() 会产生意想不到的效果。但是，我最初包含了 Corrected_episode 以确保行的顺序正确 - 即，数据帧实际上是处理大量数据的结果（您是看到一个很小的子集），并且数据的顺序对于 cummax() 来说不一定是正常的，可以按照我的设想工作。