缺少日期的熊猫的每日最大值

发布于 2025-01-10 18:41:19 字数 1940 浏览 0 评论 0原文

我目前在获取数据集的每日最大值方面有些困难。 它看起来像这样:

                      Date  Value
0      1996-03-07 21:30:00  360.0
1      1996-03-07 21:45:00  360.0
2      1996-03-07 22:00:00  360.0
3      1996-03-07 22:15:00  360.0
4      1996-03-07 22:30:00  360.0
...                    ...    ...
867882 2021-02-03 12:45:00  361.9
867883 2021-02-03 13:00:00  361.8
867884 2021-02-03 13:15:00  361.8
867885 2021-02-03 13:30:00  361.7
867886 2021-02-03 13:45:00  361.8
[867887 rows x 2 columns]

问题是数据集中丢失了整整一天。 如果我理解正确的话,熊猫中的石斑鱼需要持续几天才能正常工作。 所以我重新填写了日期:

df.set_index('Date', inplace=True)
all_days = pd.date_range(df.index.min(), df.index.max(), freq='15T')
df = df.reindex(all_days)

但是当我现在运行代码以获取每日最大值时,

daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D')).idxmax().iloc[:, 0]]

我收到以下错误消息:

The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.

当我检查 df.index 时,我得到

DatetimeIndex(['1996-03-07 21:30:00', '1996-03-07 21:45:00',
           '1996-03-07 22:00:00', '1996-03-07 22:15:00',
           '1996-03-07 22:30:00', '1996-03-07 22:45:00',
           '1996-03-07 23:00:00', '1996-03-07 23:15:00',
           '1996-03-07 23:30:00', '1996-03-07 23:45:00',
           ...
           '2021-02-03 11:30:00', '2021-02-03 11:45:00',
           '2021-02-03 12:00:00', '2021-02-03 12:15:00',
           '2021-02-03 12:30:00', '2021-02-03 12:45:00',
           '2021-02-03 13:00:00', '2021-02-03 13:15:00',
           '2021-02-03 13:30:00', '2021-02-03 13:45:00'],
          dtype='datetime64[ns]', length=873474, freq='15T')

并检查我​​的 Value 列返回 dtype('float64')

我可能在这里遗漏了一些非常明显的东西,但老实说我对数据类型和日期格式根本不熟悉。

I am currently somewhat stuck on getting the daily maximum for my dataset.
It looks like this:

                      Date  Value
0      1996-03-07 21:30:00  360.0
1      1996-03-07 21:45:00  360.0
2      1996-03-07 22:00:00  360.0
3      1996-03-07 22:15:00  360.0
4      1996-03-07 22:30:00  360.0
...                    ...    ...
867882 2021-02-03 12:45:00  361.9
867883 2021-02-03 13:00:00  361.8
867884 2021-02-03 13:15:00  361.8
867885 2021-02-03 13:30:00  361.7
867886 2021-02-03 13:45:00  361.8
[867887 rows x 2 columns]

The problem is that inside the dataset entire days are missing.
If I understood correctly the grouper in pandas needs continues days to work properly.
So I refilled the dates:

df.set_index('Date', inplace=True)
all_days = pd.date_range(df.index.min(), df.index.max(), freq='15T')
df = df.reindex(all_days)

But when I now run my code to get the daily maximum

daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D')).idxmax().iloc[:, 0]]

I get the following error message:

The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.

When I check with df.index, I get

DatetimeIndex(['1996-03-07 21:30:00', '1996-03-07 21:45:00',
           '1996-03-07 22:00:00', '1996-03-07 22:15:00',
           '1996-03-07 22:30:00', '1996-03-07 22:45:00',
           '1996-03-07 23:00:00', '1996-03-07 23:15:00',
           '1996-03-07 23:30:00', '1996-03-07 23:45:00',
           ...
           '2021-02-03 11:30:00', '2021-02-03 11:45:00',
           '2021-02-03 12:00:00', '2021-02-03 12:15:00',
           '2021-02-03 12:30:00', '2021-02-03 12:45:00',
           '2021-02-03 13:00:00', '2021-02-03 13:15:00',
           '2021-02-03 13:30:00', '2021-02-03 13:45:00'],
          dtype='datetime64[ns]', length=873474, freq='15T')

and checking the dtype of my Value column returns dtype('float64').

I am probably missing something very obvious here, but I'm honestly not familiar at all with dtypes and date formats.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

美羊羊 2025-01-17 18:41:19

我认为这里的问题是在groupby之后没有定义列,所以不是返回Series,而是DataFrame:

daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D'))['value'].idxmax()]

I think here is problem is not defined column after groupby, so is not returned Series, but DataFrame:

daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D'))['value'].idxmax()]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文