PANDAS-分配组滚动的均值结果，以尊重初始数据框架的新列

发布于 2025-02-09 18:20:22 字数 1929 浏览 2 评论 0原文

我的目标是计算每个客户的avg_price列的移动平均线。

数据帧看起来像这样：

    customer    avg_price   avg_price2  count1  count2  rate
date_time                       
2022-06-11 00:00:00 Customer1   4.4656  1.25    36  11084   0.003237
2022-06-11 00:00:00 Customer2   7.8873  0.92    10  22150   0.000451
2022-06-11 00:00:00 Customer3   2.3016  1.37    1   2521    0.000397
2022-06-11 00:00:00 Customer4   3.2421  1.05    221 98973   0.002228
2022-06-11 00:00:00 Customer5   1.0050  0.94    2   410     0.004854
... ... ... ... ... ... ...
2022-06-21 10:00:00 Customer1   4.9450  1.99    340 118000  0.002873
2022-06-21 10:00:00 Customer2   4.0643  2.06    268 20850   0.012691
2022-06-21 10:00:00 Customer3   3.7034  1.00    25  5100    0.004878
2022-06-21 10:00:00 Customer4   5.0367  2.64    2098    118251  0.017433
2022-06-21 10:00:00 Customer5   2.7429  1.57    50  11900   0.004184

现在，我已经相信我找到了一种进行计算的方法，因此我试图找出一种将该组的结果分配给数据框的方法，作为新列。

df.groupby('customer')['avg_price'].rolling(50).mean()

结果看起来像这样：

customer  date_time          
Customer1   2022-06-11 00:00:00          NaN
            2022-06-11 06:44:00          NaN
            2022-06-11 08:07:00          NaN
            2022-06-11 08:10:00          NaN
            2022-06-11 08:20:00          NaN
                                     ...    
Customer2   2022-06-21 09:56:00    14.048820
            2022-06-21 09:57:00    14.060702
            2022-06-21 09:58:00    14.054652
            2022-06-21 09:59:00    14.024310
            2022-06-21 10:00:00    14.020486
Name: avg_price, Length: 228442, dtype: float64

当我尝试将其添加到数据框中作为列时，我收到以下错误：

df['ma_24h'] = df.groupby('customer' ['avg_price'].rolling(50).mean().reset_index(level=0,drop=True)

ValueError: cannot reindex from a duplicate axis

我想添加计算的列，没有客户的结果到现有数据框，以便它尊重已经存在的索引date_time列。

原文

My goal is to to calculate the moving average line for avg_price column each customer.

The dataframe looks like this:

    customer    avg_price   avg_price2  count1  count2  rate
date_time                       
2022-06-11 00:00:00 Customer1   4.4656  1.25    36  11084   0.003237
2022-06-11 00:00:00 Customer2   7.8873  0.92    10  22150   0.000451
2022-06-11 00:00:00 Customer3   2.3016  1.37    1   2521    0.000397
2022-06-11 00:00:00 Customer4   3.2421  1.05    221 98973   0.002228
2022-06-11 00:00:00 Customer5   1.0050  0.94    2   410     0.004854
... ... ... ... ... ... ...
2022-06-21 10:00:00 Customer1   4.9450  1.99    340 118000  0.002873
2022-06-21 10:00:00 Customer2   4.0643  2.06    268 20850   0.012691
2022-06-21 10:00:00 Customer3   3.7034  1.00    25  5100    0.004878
2022-06-21 10:00:00 Customer4   5.0367  2.64    2098    118251  0.017433
2022-06-21 10:00:00 Customer5   2.7429  1.57    50  11900   0.004184

Now, I already believe I found a way to do the calculation so I'm trying to figure out a way to assign the results of this group to the dataframe as a new column.

df.groupby('customer')['avg_price'].rolling(50).mean()

The results looks like this:

customer  date_time          
Customer1   2022-06-11 00:00:00          NaN
            2022-06-11 06:44:00          NaN
            2022-06-11 08:07:00          NaN
            2022-06-11 08:10:00          NaN
            2022-06-11 08:20:00          NaN
                                     ...    
Customer2   2022-06-21 09:56:00    14.048820
            2022-06-21 09:57:00    14.060702
            2022-06-21 09:58:00    14.054652
            2022-06-21 09:59:00    14.024310
            2022-06-21 10:00:00    14.020486
Name: avg_price, Length: 228442, dtype: float64

When I tried adding this to the dataframe as a column, I got the following error:

df['ma_24h'] = df.groupby('customer' ['avg_price'].rolling(50).mean().reset_index(level=0,drop=True)

ValueError: cannot reindex from a duplicate axis

I want to add the calculated column, the results without customer to the existing dataframe so that it respects the already existing index which the date_time column.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

始终不够 2025-02-16 18:20:22

我相信我为这篇文章找到了一种方法：如何根据其他列上的值将熊猫的移动平均值纳入

df['ma_24h'] = (df.groupby(['customer'])['avg_price']
                .rolling('24H', min_periods=360)
                .mean()
                .sort_index(level=1)
                .values)

我将滚动变为24小时。我遇到的主要问题是与索引并获得值，显然我必须对索引进行排序，然后选择值。

如果还有其他更简单的方法可以实现这一目标，我为他们开放。

I believe I found a way thanks to this post: How to include Moving Average with Pandas based on Values on other Columns

df['ma_24h'] = (df.groupby(['customer'])['avg_price']
                .rolling('24H', min_periods=360)
                .mean()
                .sort_index(level=1)
                .values)

I changed the rolling to 24H. The main issue I had was with the index and getting the values and apparently I had to sort the index and just select the values.

If there are other simpler ways to accomplish this, I'm open for them.

回复收藏 0 原文

~没有更多了~