熊猫 - 每月重新采样每月序列到小时

发布于 2025-02-06 06:01:56 字数 1499 浏览 1 评论 0原文

假设我有一个具有两个索引级别的多指数pandas数据框架:month_begin and month_end,

import pandas as pd

multi_index = pd.MultiIndex.from_tuples([("2022-03-01", "2022-03-31"), 
                                  ("2022-04-01", "2022-04-30"), 
                                  ("2022-05-01", "2022-05-31"),
                                  ("2022-06-01", "2022-06-30")])

multi_index.names = ['month_begin', 'month_end']

df = pd.DataFrame(np.random.rand(4,100), index=multi_index)
df
                              0         1   ...        98        99
month_begin month_end                       ...                    
2022-03-01  2022-03-31  0.322032  0.205307  ...  0.975128  0.673460
2022-04-01  2022-04-30  0.113813  0.278981  ...  0.951049  0.090765
2022-05-01  2022-05-31  0.777918  0.842734  ...  0.667831  0.274189
2022-06-01  2022-06-30  0.221407  0.555711  ...  0.745158  0.648246

我想将数据重新采样以在相应月份的每个小时内一个月内的一个月内具有值:

                              0         1   ...        98        99
                                            ...                    
2022-03-01 00:00       0.322032  0.205307  ...  0.975128  0.673460
2022-03-01 01:00       0.322032  0.205307  ...  0.975128  0.673460
2022-03-01 02:00       0.322032  0.205307  ...  0.975128  0.673460
...
2022-06-30 22:00       0.221407  0.555711  ...  0.745158  0.648246
2022-06-30 23:00       0.221407  0.555711  ...  0.745158  0.648246 

我知道我可以使用resample( ),但我在为此而苦苦挣扎。有人有线索吗?

Suppose I have a multi-index Pandas data frame with two index levels: month_begin and month_end

import pandas as pd

multi_index = pd.MultiIndex.from_tuples([("2022-03-01", "2022-03-31"), 
                                  ("2022-04-01", "2022-04-30"), 
                                  ("2022-05-01", "2022-05-31"),
                                  ("2022-06-01", "2022-06-30")])

multi_index.names = ['month_begin', 'month_end']

df = pd.DataFrame(np.random.rand(4,100), index=multi_index)
df
                              0         1   ...        98        99
month_begin month_end                       ...                    
2022-03-01  2022-03-31  0.322032  0.205307  ...  0.975128  0.673460
2022-04-01  2022-04-30  0.113813  0.278981  ...  0.951049  0.090765
2022-05-01  2022-05-31  0.777918  0.842734  ...  0.667831  0.274189
2022-06-01  2022-06-30  0.221407  0.555711  ...  0.745158  0.648246

I would like to resample the data to have the value in a month at every hour in the respective month:

                              0         1   ...        98        99
                                            ...                    
2022-03-01 00:00       0.322032  0.205307  ...  0.975128  0.673460
2022-03-01 01:00       0.322032  0.205307  ...  0.975128  0.673460
2022-03-01 02:00       0.322032  0.205307  ...  0.975128  0.673460
...
2022-06-30 22:00       0.221407  0.555711  ...  0.745158  0.648246
2022-06-30 23:00       0.221407  0.555711  ...  0.745158  0.648246 

I know I can use resample(), but I am struggeling with how to do this. Does anybody have a clue?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

随梦而飞# 2025-02-13 06:01:56

iiuc,使用list_compherension尝试使用 pd.date_range :

df['Date'] = [pd.date_range(s, e, freq='H') for s, e in df.index]

df_out = df.explode('Date').set_index('Date')

输出:输出:

                           0         1   ...        98        99
Date                                     ...                    
2022-03-01 00:00:00  0.396311  0.138263  ...  0.637640  0.106366
2022-03-01 01:00:00  0.396311  0.138263  ...  0.637640  0.106366
2022-03-01 02:00:00  0.396311  0.138263  ...  0.637640  0.106366
2022-03-01 03:00:00  0.396311  0.138263  ...  0.637640  0.106366
2022-03-01 04:00:00  0.396311  0.138263  ...  0.637640  0.106366
...                       ...       ...  ...       ...       ...
2022-06-29 20:00:00  0.129921  0.654878  ...  0.619212  0.142297
2022-06-29 21:00:00  0.129921  0.654878  ...  0.619212  0.142297
2022-06-29 22:00:00  0.129921  0.654878  ...  0.619212  0.142297
2022-06-29 23:00:00  0.129921  0.654878  ...  0.619212  0.142297
2022-06-30 00:00:00  0.129921  0.654878  ...  0.619212  0.142297

[2836 rows x 100 columns]

IIUC, try this using list_comprehension and explode with pd.date_range:

df['Date'] = [pd.date_range(s, e, freq='H') for s, e in df.index]

df_out = df.explode('Date').set_index('Date')

Output:

                           0         1   ...        98        99
Date                                     ...                    
2022-03-01 00:00:00  0.396311  0.138263  ...  0.637640  0.106366
2022-03-01 01:00:00  0.396311  0.138263  ...  0.637640  0.106366
2022-03-01 02:00:00  0.396311  0.138263  ...  0.637640  0.106366
2022-03-01 03:00:00  0.396311  0.138263  ...  0.637640  0.106366
2022-03-01 04:00:00  0.396311  0.138263  ...  0.637640  0.106366
...                       ...       ...  ...       ...       ...
2022-06-29 20:00:00  0.129921  0.654878  ...  0.619212  0.142297
2022-06-29 21:00:00  0.129921  0.654878  ...  0.619212  0.142297
2022-06-29 22:00:00  0.129921  0.654878  ...  0.619212  0.142297
2022-06-29 23:00:00  0.129921  0.654878  ...  0.619212  0.142297
2022-06-30 00:00:00  0.129921  0.654878  ...  0.619212  0.142297

[2836 rows x 100 columns]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文