熊猫计算平均汇总之后

发布于 2025-01-29 19:25:50 字数 1746 浏览 4 评论 0原文

我正在尝试计算分组后的滚动价值手段。 我的数据集看起来像

import pandas as pd
df = pd.DataFrame({'day': ['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03', '2020-01-03','2020-01-03'], 
               'weather': ['rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun'], 
               'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})

现在,我想每天和每个天气都有一个滚动的平均值。

虽然

>>>> df.groupby(['day', 'weather']).value.mean()
day         weather
2020-01-01  rain        2
            sun         3
2020-01-02  rain        6
            sun         7
2020-01-03  rain       10
            sun        11

正确计算均值,但它的滚动版本似乎不起作用?

>>>> df.groupby(['day', 'weather']).value.rolling(2).mean()
day         weather    
2020-01-01  rain     0      NaN
                     2      2.0
            sun      1      NaN
                     3      3.0
2020-01-02  rain     4      NaN
                     6      6.0
            sun      5      NaN
                     7      7.0
2020-01-03  rain     8      NaN
                     10    10.0
            sun      9      NaN
                     11    11.0

正确的做法是什么?

我希望输出是多天的平均值,即忽略索引)

day         weather    
2020-01-01  rain     2      2.0
            sun      3      3.0
2020-01-02  rain     6      4.0
            sun      5      5.0
2020-01-03  rain     8      8.0
            sun      9      9.0

I am trying to compute the rolling means of values after grouping by.
My dataset looks like

import pandas as pd
df = pd.DataFrame({'day': ['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03', '2020-01-03','2020-01-03'], 
               'weather': ['rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun'], 
               'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})

enter image description here

Now, I want to have a rolling mean per day, and per weather.

While

>>>> df.groupby(['day', 'weather']).value.mean()
day         weather
2020-01-01  rain        2
            sun         3
2020-01-02  rain        6
            sun         7
2020-01-03  rain       10
            sun        11

proper computes the mean, the rolling version of it, does not seem to work?

>>>> df.groupby(['day', 'weather']).value.rolling(2).mean()
day         weather    
2020-01-01  rain     0      NaN
                     2      2.0
            sun      1      NaN
                     3      3.0
2020-01-02  rain     4      NaN
                     6      6.0
            sun      5      NaN
                     7      7.0
2020-01-03  rain     8      NaN
                     10    10.0
            sun      9      NaN
                     11    11.0

What's the right way of doing it?

I would expect an output that is the mean over multiple days i.e. (ignore the index)

day         weather    
2020-01-01  rain     2      2.0
            sun      3      3.0
2020-01-02  rain     6      4.0
            sun      5      5.0
2020-01-03  rain     8      8.0
            sun      9      9.0

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

放我走吧 2025-02-05 19:25:50

我认为您是指nan值?您的窗口设置为2,因此每个组的第一个值将设置为nan,因为min_periods。这是文档

对于由整数指定的窗口,min_period将默认
到窗户的大小。

df.groupby(['day', 'weather']).value.rolling(2,min_periods=1).mean()

day         weather    
2020-01-01  rain     0      1.0
                     2      2.0
            sun      1      2.0
                     3      3.0
2020-01-02  rain     4      5.0
                     6      6.0
            sun      5      6.0
                     7      7.0
2020-01-03  rain     8      9.0
                     10    10.0
            sun      9     10.0
                     11    11.0

那是您要搜索的内容吗?

更新

您需要每天的平均值和呼啸声,而不是在计算平均值的X天内滚动平均值(如果我理解正确的话)。
尝试以下操作:

out = df.groupby(['day','weather',],as_index=False)['value'].mean()
print(out)

          day weather  value
0  2020-01-01    rain    2.0
1  2020-01-01     sun    3.0
2  2020-01-02    rain    6.0
3  2020-01-02     sun    7.0
4  2020-01-03    rain   10.0
5  2020-01-03     sun   11.0

用每天的时间,您可以计算滚动窗口:

out['rolling_mean'] = out.groupby('weather', as_index=False)['value'].rolling(2,min_periods=1).mean()['value']

print(out)

          day weather  value  rolling_mean
0  2020-01-01    rain    2.0           2.0
1  2020-01-01     sun    3.0           3.0
2  2020-01-02    rain    6.0           4.0
3  2020-01-02     sun    7.0           5.0
4  2020-01-03    rain   10.0           8.0
5  2020-01-03     sun   11.0           9.0

I think you are referring to the NaN values? Your window is set to 2 so the first value of each group will be set to NaN because of min_periods. Here is a quote of the documentation :

For a window that is specified by an integer, min_periods will default
to the size of the window.

df.groupby(['day', 'weather']).value.rolling(2,min_periods=1).mean()

day         weather    
2020-01-01  rain     0      1.0
                     2      2.0
            sun      1      2.0
                     3      3.0
2020-01-02  rain     4      5.0
                     6      6.0
            sun      5      6.0
                     7      7.0
2020-01-03  rain     8      9.0
                     10    10.0
            sun      9     10.0
                     11    11.0

Was that what you are searching for?

UPDATE

You want a mean value for each day and wheather and than a rolling mean over x days of the computed mean (if I understand that right).
Try this:

out = df.groupby(['day','weather',],as_index=False)['value'].mean()
print(out)

          day weather  value
0  2020-01-01    rain    2.0
1  2020-01-01     sun    3.0
2  2020-01-02    rain    6.0
3  2020-01-02     sun    7.0
4  2020-01-03    rain   10.0
5  2020-01-03     sun   11.0

With the daily mean you can compute your rolling window:

out['rolling_mean'] = out.groupby('weather', as_index=False)['value'].rolling(2,min_periods=1).mean()['value']

print(out)

          day weather  value  rolling_mean
0  2020-01-01    rain    2.0           2.0
1  2020-01-01     sun    3.0           3.0
2  2020-01-02    rain    6.0           4.0
3  2020-01-02     sun    7.0           5.0
4  2020-01-03    rain   10.0           8.0
5  2020-01-03     sun   11.0           9.0

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文