pandas groupby cummax 只是分配原始值而不是更新 max-so-far

发布于 2025-01-12 07:18:14 字数 1189 浏览 0 评论 0原文

我有这个数据框:

    type      run   corrected_episode   Reward
0   notsweet    0   0                   35.0
1   notsweet    0   100                 20.0
2   notsweet    0   200                 20.0
3   notsweet    0   300                 22.0
4   notsweet    0   400                 20.0

我想创建一个新列 best_so_far,它具有按 type 分组的相应 Reward 单调递增值, 运行Corrected_episode。很容易,对吧?除了当我尝试使用 groupbycummax 时发生以下情况:

foo['best_so_far'] = foo.groupby(['type','run', ' Corrected_episode']).Reward.cummax() 产量:

type            run corrected_episode   Reward  best_so_far
0   notsweet    0   0                   35.0    35.0
1   notsweet    0   100                 20.0    20.0
2   notsweet    0   200                 20.0    20.0
3   notsweet    0   300                 22.0    22.0
4   notsweet    0   400                 20.0    20.0

“迄今为止最好的”,嗯,并不是最好的。如果我使用 foo['best_so_far'] = foo.groupby(['type','run',' Corrected_episode']).Reward.apply(lambda x: x.cummax()) 我会得到相同的结果

我知道这是可能的,因为我已经用其他数据帧完成了数十次,但这个简单的过程不起作用,这有点奇怪。

I have this dataframe:

    type      run   corrected_episode   Reward
0   notsweet    0   0                   35.0
1   notsweet    0   100                 20.0
2   notsweet    0   200                 20.0
3   notsweet    0   300                 22.0
4   notsweet    0   400                 20.0

I want to create a new column, best_so_far, that has a monotonically increasing value for the corresponding Reward grouped by type, run, and corrected_episode. Easy enough, right? Except the following happens when I try to use groupby and cummax:

foo['best_so_far'] = foo.groupby(['type','run','corrected_episode']).Reward.cummax() yields:

type            run corrected_episode   Reward  best_so_far
0   notsweet    0   0                   35.0    35.0
1   notsweet    0   100                 20.0    20.0
2   notsweet    0   200                 20.0    20.0
3   notsweet    0   300                 22.0    22.0
4   notsweet    0   400                 20.0    20.0

The "best so far", well, isn't the best. I get the same results if I use foo['best_so_far'] = foo.groupby(['type','run','corrected_episode']).Reward.apply(lambda x: x.cummax())

I know this is possible because I've done this dozens of times with other dataframes, there's just something weird about this one that this simple procedure just doesn't work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不知在何时 2025-01-19 07:18:14

您可以尝试删除Corrected_episode

foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()

You can try remove corrected_episode

foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()
内心激荡 2025-01-19 07:18:14

发布此内容后,我当然发现了发生了什么,但我将在这里分享我为解决此问题所做的工作,因为这是熊猫容易违反的最小惊讶原则。

解决方案是这样做:

foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()

也就是说,我过度指定了包含 Corrected_episode 的列,仅针对该元素执行 cummax() 会产生意想不到的效果。但是,我最初包含了 Corrected_episode 以确保行的顺序正确 - 即,数据帧实际上是处理大量数据的结果(您是看到一个很小的子集),并且数据的顺序对于 cummax() 来说不一定是正常的,可以按照我的设想工作。

After posting this of course I discovered what happened, but I'm going to share what I did to fix this here because this is the kind of Violation of the Principle of Least Astonishment that pandas is prone to.

The solution was to do this, instead:

foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()

That is, I over specified the columns by including corrected_episode that had the unintended effect of just executing cummax() for that one element. However, I had originally included corrected_episode to ensure that the order of the rows was correct -- i.e., the dataframe was actually the result of massaging a lot of data (you are seeing a teeny tiny subset), and the order of the data wasn't necessarily sane for the cummax() to work as I envisioned.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文