PANDAS-分配组滚动的均值结果,以尊重初始数据框架的新列
我的目标是计算每个客户的avg_price
列的移动平均线。
数据帧看起来像这样:
customer avg_price avg_price2 count1 count2 rate
date_time
2022-06-11 00:00:00 Customer1 4.4656 1.25 36 11084 0.003237
2022-06-11 00:00:00 Customer2 7.8873 0.92 10 22150 0.000451
2022-06-11 00:00:00 Customer3 2.3016 1.37 1 2521 0.000397
2022-06-11 00:00:00 Customer4 3.2421 1.05 221 98973 0.002228
2022-06-11 00:00:00 Customer5 1.0050 0.94 2 410 0.004854
... ... ... ... ... ... ...
2022-06-21 10:00:00 Customer1 4.9450 1.99 340 118000 0.002873
2022-06-21 10:00:00 Customer2 4.0643 2.06 268 20850 0.012691
2022-06-21 10:00:00 Customer3 3.7034 1.00 25 5100 0.004878
2022-06-21 10:00:00 Customer4 5.0367 2.64 2098 118251 0.017433
2022-06-21 10:00:00 Customer5 2.7429 1.57 50 11900 0.004184
现在,我已经相信我找到了一种进行计算的方法,因此我试图找出一种将该组的结果分配给数据框的方法,作为新列。
df.groupby('customer')['avg_price'].rolling(50).mean()
结果看起来像这样:
customer date_time
Customer1 2022-06-11 00:00:00 NaN
2022-06-11 06:44:00 NaN
2022-06-11 08:07:00 NaN
2022-06-11 08:10:00 NaN
2022-06-11 08:20:00 NaN
...
Customer2 2022-06-21 09:56:00 14.048820
2022-06-21 09:57:00 14.060702
2022-06-21 09:58:00 14.054652
2022-06-21 09:59:00 14.024310
2022-06-21 10:00:00 14.020486
Name: avg_price, Length: 228442, dtype: float64
当我尝试将其添加到数据框中作为列时,我收到以下错误:
df['ma_24h'] = df.groupby('customer' ['avg_price'].rolling(50).mean().reset_index(level=0,drop=True)
ValueError: cannot reindex from a duplicate axis
我想添加计算的列,没有客户的结果到现有数据框,以便它尊重已经存在的索引date_time列。
My goal is to to calculate the moving average line for avg_price
column each customer.
The dataframe looks like this:
customer avg_price avg_price2 count1 count2 rate
date_time
2022-06-11 00:00:00 Customer1 4.4656 1.25 36 11084 0.003237
2022-06-11 00:00:00 Customer2 7.8873 0.92 10 22150 0.000451
2022-06-11 00:00:00 Customer3 2.3016 1.37 1 2521 0.000397
2022-06-11 00:00:00 Customer4 3.2421 1.05 221 98973 0.002228
2022-06-11 00:00:00 Customer5 1.0050 0.94 2 410 0.004854
... ... ... ... ... ... ...
2022-06-21 10:00:00 Customer1 4.9450 1.99 340 118000 0.002873
2022-06-21 10:00:00 Customer2 4.0643 2.06 268 20850 0.012691
2022-06-21 10:00:00 Customer3 3.7034 1.00 25 5100 0.004878
2022-06-21 10:00:00 Customer4 5.0367 2.64 2098 118251 0.017433
2022-06-21 10:00:00 Customer5 2.7429 1.57 50 11900 0.004184
Now, I already believe I found a way to do the calculation so I'm trying to figure out a way to assign the results of this group to the dataframe as a new column.
df.groupby('customer')['avg_price'].rolling(50).mean()
The results looks like this:
customer date_time
Customer1 2022-06-11 00:00:00 NaN
2022-06-11 06:44:00 NaN
2022-06-11 08:07:00 NaN
2022-06-11 08:10:00 NaN
2022-06-11 08:20:00 NaN
...
Customer2 2022-06-21 09:56:00 14.048820
2022-06-21 09:57:00 14.060702
2022-06-21 09:58:00 14.054652
2022-06-21 09:59:00 14.024310
2022-06-21 10:00:00 14.020486
Name: avg_price, Length: 228442, dtype: float64
When I tried adding this to the dataframe as a column, I got the following error:
df['ma_24h'] = df.groupby('customer' ['avg_price'].rolling(50).mean().reset_index(level=0,drop=True)
ValueError: cannot reindex from a duplicate axis
I want to add the calculated column, the results without customer to the existing dataframe so that it respects the already existing index which the date_time column.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我相信我为这篇文章找到了一种方法:如何根据其他列上的值将熊猫的移动平均值纳入
我将滚动变为24小时。我遇到的主要问题是与索引并获得值,显然我必须对索引进行排序,然后选择值。
如果还有其他更简单的方法可以实现这一目标,我为他们开放。
I believe I found a way thanks to this post: How to include Moving Average with Pandas based on Values on other Columns
I changed the rolling to 24H. The main issue I had was with the index and getting the values and apparently I had to sort the index and just select the values.
If there are other simpler ways to accomplish this, I'm open for them.