进行groupby时添加具有特定值的列

发布于 2025-01-10 18:50:18 字数 1766 浏览 0 评论 0原文

我有一个看起来像这样的 DataFrame：

df：

date                          price     bool
---------------------------------------------
2022-01-03 22:00:00+01:00     109.65    False
2022-01-03 22:00:00+01:00      80.00    False
2022-01-03 22:00:00+01:00      65.79    True
2022-01-03 22:00:00+01:00      50.00    True
2022-01-03 23:00:00+01:00      47.00    False
2022-01-03 23:00:00+01:00      39.95    True
2022-01-03 23:00:00+01:00      39.47    False
2022-01-03 23:00:00+01:00      29.96    False
2022-01-03 23:00:00+01:00      22.47    True

如果我执行 df.groupby("date") 我的输出将是 2 groupby由日期分隔的对象。这很好。但我想要的是向这两个列添加一个新列，其中整个列的 max price 其中 bool == True 。因此，生成的数据帧将变为：

df_groupby_object1:

date                          price     bool      max_price
-----------------------------------------------------------
2022-01-03 22:00:00+01:00     109.65    False      65.79
2022-01-03 22:00:00+01:00      80.00    False      65.79
2022-01-03 22:00:00+01:00      65.79    True       65.79
2022-01-03 22:00:00+01:00      50.00    True       65.79

df_groupby_object2:

date                           price     bool      max_price
-----------------------------------------------------------
2022-01-03 23:00:00+01:00      47.00    False      39.95
2022-01-03 23:00:00+01:00      39.95    True       39.95
2022-01-03 23:00:00+01:00      39.47    False      39.95
2022-01-03 23:00:00+01:00      29.96    False      39.95
2022-01-03 23:00:00+01:00      22.47    True       39.95

我可能可以迭代 groupby 对象，以这种方式创建一个额外的列，但我想知道这是否可以直接在 groupby 函数中完成？

原文

I have a DataFrame that looks something like:

df:

date                          price     bool
---------------------------------------------
2022-01-03 22:00:00+01:00     109.65    False
2022-01-03 22:00:00+01:00      80.00    False
2022-01-03 22:00:00+01:00      65.79    True
2022-01-03 22:00:00+01:00      50.00    True
2022-01-03 23:00:00+01:00      47.00    False
2022-01-03 23:00:00+01:00      39.95    True
2022-01-03 23:00:00+01:00      39.47    False
2022-01-03 23:00:00+01:00      29.96    False
2022-01-03 23:00:00+01:00      22.47    True

If I do a df.groupby("date") my output will be 2 groupby objects separated by date. This is fine. But what I would like is to add a new column to both of these with the max price where bool == True for the entire column. Hence, the resulting data frames would become:

df_groupby_object1:

date                          price     bool      max_price
-----------------------------------------------------------
2022-01-03 22:00:00+01:00     109.65    False      65.79
2022-01-03 22:00:00+01:00      80.00    False      65.79
2022-01-03 22:00:00+01:00      65.79    True       65.79
2022-01-03 22:00:00+01:00      50.00    True       65.79

df_groupby_object2:

date                           price     bool      max_price
-----------------------------------------------------------
2022-01-03 23:00:00+01:00      47.00    False      39.95
2022-01-03 23:00:00+01:00      39.95    True       39.95
2022-01-03 23:00:00+01:00      39.47    False      39.95
2022-01-03 23:00:00+01:00      29.96    False      39.95
2022-01-03 23:00:00+01:00      22.47    True       39.95

I could probably just iterate through the groupby objects as create a extra column that way, but I was wondering if this could be done directly in the groupby function ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏天碎花小短裙 2025-01-17 18:50:18

使用 GroupBy。仅当 price 中的值为 True 时，才转换 以获得最大值。如果不匹配 price 则为 Series.where：

df['max_price'] = df['price'].where(df['bool']).groupby(df['date']).transform('max')

详细信息：

print (df['price'].where(df['bool']))
0      NaN
1      NaN
2    65.79
3    50.00
4      NaN
5    39.95
6      NaN
7      NaN
8    22.47
Name: price, dtype: float64

Use GroupBy.transform for get maximal values only if Trues values in price. If not match price is NaN created by Series.where:

df['max_price'] = df['price'].where(df['bool']).groupby(df['date']).transform('max')

Details:

print (df['price'].where(df['bool']))
0      NaN
1      NaN
2    65.79
3    50.00
4      NaN
5    39.95
6      NaN
7      NaN
8    22.47
Name: price, dtype: float64

回复收藏 0 原文

~没有更多了~