熊猫计算平均汇总之后
我正在尝试计算分组后的滚动价值手段。 我的数据集看起来像
import pandas as pd
df = pd.DataFrame({'day': ['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03', '2020-01-03','2020-01-03'],
'weather': ['rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun'],
'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})
现在,我想每天和每个天气都有一个滚动的平均值。
虽然
>>>> df.groupby(['day', 'weather']).value.mean()
day weather
2020-01-01 rain 2
sun 3
2020-01-02 rain 6
sun 7
2020-01-03 rain 10
sun 11
正确计算均值,但它的滚动版本似乎不起作用?
>>>> df.groupby(['day', 'weather']).value.rolling(2).mean()
day weather
2020-01-01 rain 0 NaN
2 2.0
sun 1 NaN
3 3.0
2020-01-02 rain 4 NaN
6 6.0
sun 5 NaN
7 7.0
2020-01-03 rain 8 NaN
10 10.0
sun 9 NaN
11 11.0
正确的做法是什么?
我希望输出是多天的平均值,即忽略索引)
day weather
2020-01-01 rain 2 2.0
sun 3 3.0
2020-01-02 rain 6 4.0
sun 5 5.0
2020-01-03 rain 8 8.0
sun 9 9.0
I am trying to compute the rolling means of values after grouping by.
My dataset looks like
import pandas as pd
df = pd.DataFrame({'day': ['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03', '2020-01-03','2020-01-03'],
'weather': ['rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun', 'rain', 'sun'],
'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})
Now, I want to have a rolling mean per day, and per weather.
While
>>>> df.groupby(['day', 'weather']).value.mean()
day weather
2020-01-01 rain 2
sun 3
2020-01-02 rain 6
sun 7
2020-01-03 rain 10
sun 11
proper computes the mean, the rolling version of it, does not seem to work?
>>>> df.groupby(['day', 'weather']).value.rolling(2).mean()
day weather
2020-01-01 rain 0 NaN
2 2.0
sun 1 NaN
3 3.0
2020-01-02 rain 4 NaN
6 6.0
sun 5 NaN
7 7.0
2020-01-03 rain 8 NaN
10 10.0
sun 9 NaN
11 11.0
What's the right way of doing it?
I would expect an output that is the mean over multiple days i.e. (ignore the index)
day weather
2020-01-01 rain 2 2.0
sun 3 3.0
2020-01-02 rain 6 4.0
sun 5 5.0
2020-01-03 rain 8 8.0
sun 9 9.0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您是指
nan
值?您的窗口设置为2
,因此每个组的第一个值将设置为nan
,因为min_periods
。这是文档:那是您要搜索的内容吗?
更新
您需要每天的平均值和呼啸声,而不是在计算平均值的X天内滚动平均值(如果我理解正确的话)。
尝试以下操作:
用每天的时间,您可以计算滚动窗口:
I think you are referring to the
NaN
values? Your window is set to2
so the first value of each group will be set toNaN
because ofmin_periods
. Here is a quote of the documentation :Was that what you are searching for?
UPDATE
You want a mean value for each day and wheather and than a rolling mean over x days of the computed mean (if I understand that right).
Try this:
With the daily mean you can compute your rolling window: