重新采样数据框架和分配新样本频率的值

发布于 2025-02-01 18:24:37 字数 830 浏览 0 评论 0 原文

如何使用respample()对数据框进行示例示例以使初始值分配在新的示例频率上?

样本频率的数据

                       date        revenue
0 2021-11-01 00:00:00+00:00        300
1 2021-10-01 00:00:00+00:00        500
2 2021-09-01 00:00:00+00:00        100
3 2021-08-01 00:00:00+00:00        50
4 2021-07-01 00:00:00+00:00        200
5 2021-06-01 00:00:00+00:00        150

带有每月

                                 revenue
date                                    
2021-06-01 00:00:00+00:00    4.8
2021-06-02 00:00:00+00:00    4.8
2021-06-03 00:00:00+00:00    4.8
2021-06-04 00:00:00+00:00    4.8
2021-06-05 00:00:00+00:00    4.8
...                                  ...
2021-11-28 00:00:00+00:00    9.6
2021-11-29 00:00:00+00:00    9.6
2021-11-30 00:00:00+00:00    9.6
2021-11-31 00:00:00+00:00    9.6

How do I upsample a dataframe using resample() to get the initial values divided over the new sample frequency?

Dataframe with monthly sample frequency

                       date        revenue
0 2021-11-01 00:00:00+00:00        300
1 2021-10-01 00:00:00+00:00        500
2 2021-09-01 00:00:00+00:00        100
3 2021-08-01 00:00:00+00:00        50
4 2021-07-01 00:00:00+00:00        200
5 2021-06-01 00:00:00+00:00        150

Approximate expected Dataframe with revenue divided over the days in that month

                                 revenue
date                                    
2021-06-01 00:00:00+00:00    4.8
2021-06-02 00:00:00+00:00    4.8
2021-06-03 00:00:00+00:00    4.8
2021-06-04 00:00:00+00:00    4.8
2021-06-05 00:00:00+00:00    4.8
...                                  ...
2021-11-28 00:00:00+00:00    9.6
2021-11-29 00:00:00+00:00    9.6
2021-11-30 00:00:00+00:00    9.6
2021-11-31 00:00:00+00:00    9.6

ie, i want to be sure that the values get divided over the amount of days in that sepcific month

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

泛泛之交 2025-02-08 18:24:37

您可以使用 asfreq 将时间表从每月转换为每日频率,然后使用 ffill 转发填充值,然后将 Revenue 划分为 daysinmonth dateTimeIndex的属性计算分布式收入

s = df.set_index('date')
s.loc[s.index.max() + pd.offsets.MonthEnd()] = np.nan

s = s.asfreq('D').ffill()
s['revenue'] /= s.index.daysinmonth

print(s)
                             revenue
date                                
2021-06-01 00:00:00+00:00   5.000000
2021-06-02 00:00:00+00:00   5.000000
2021-06-03 00:00:00+00:00   5.000000
2021-06-04 00:00:00+00:00   5.000000
2021-06-05 00:00:00+00:00   5.000000
...
2021-07-24 00:00:00+00:00   6.451613
2021-07-25 00:00:00+00:00   6.451613
...
2021-11-30 00:00:00+00:00  10.000000

You can use asfreq to convert the timeseries from monthly to daily frequency, then use ffill to forward fill the values then divide the revenue by daysinmonth attribute of datetimeindex to calculate distributed revenue

s = df.set_index('date')
s.loc[s.index.max() + pd.offsets.MonthEnd()] = np.nan

s = s.asfreq('D').ffill()
s['revenue'] /= s.index.daysinmonth

print(s)
                             revenue
date                                
2021-06-01 00:00:00+00:00   5.000000
2021-06-02 00:00:00+00:00   5.000000
2021-06-03 00:00:00+00:00   5.000000
2021-06-04 00:00:00+00:00   5.000000
2021-06-05 00:00:00+00:00   5.000000
...
2021-07-24 00:00:00+00:00   6.451613
2021-07-25 00:00:00+00:00   6.451613
...
2021-11-30 00:00:00+00:00  10.000000
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文