缺少日期的熊猫的每日最大值
我目前在获取数据集的每日最大值方面有些困难。 它看起来像这样:
Date Value
0 1996-03-07 21:30:00 360.0
1 1996-03-07 21:45:00 360.0
2 1996-03-07 22:00:00 360.0
3 1996-03-07 22:15:00 360.0
4 1996-03-07 22:30:00 360.0
... ... ...
867882 2021-02-03 12:45:00 361.9
867883 2021-02-03 13:00:00 361.8
867884 2021-02-03 13:15:00 361.8
867885 2021-02-03 13:30:00 361.7
867886 2021-02-03 13:45:00 361.8
[867887 rows x 2 columns]
问题是数据集中丢失了整整一天。 如果我理解正确的话,熊猫中的石斑鱼需要持续几天才能正常工作。 所以我重新填写了日期:
df.set_index('Date', inplace=True)
all_days = pd.date_range(df.index.min(), df.index.max(), freq='15T')
df = df.reindex(all_days)
但是当我现在运行代码以获取每日最大值时,
daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D')).idxmax().iloc[:, 0]]
我收到以下错误消息:
The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.
当我检查 df.index
时,我得到
DatetimeIndex(['1996-03-07 21:30:00', '1996-03-07 21:45:00',
'1996-03-07 22:00:00', '1996-03-07 22:15:00',
'1996-03-07 22:30:00', '1996-03-07 22:45:00',
'1996-03-07 23:00:00', '1996-03-07 23:15:00',
'1996-03-07 23:30:00', '1996-03-07 23:45:00',
...
'2021-02-03 11:30:00', '2021-02-03 11:45:00',
'2021-02-03 12:00:00', '2021-02-03 12:15:00',
'2021-02-03 12:30:00', '2021-02-03 12:45:00',
'2021-02-03 13:00:00', '2021-02-03 13:15:00',
'2021-02-03 13:30:00', '2021-02-03 13:45:00'],
dtype='datetime64[ns]', length=873474, freq='15T')
并检查我的 Value
列返回 dtype('float64')
。
我可能在这里遗漏了一些非常明显的东西,但老实说我对数据类型和日期格式根本不熟悉。
I am currently somewhat stuck on getting the daily maximum for my dataset.
It looks like this:
Date Value
0 1996-03-07 21:30:00 360.0
1 1996-03-07 21:45:00 360.0
2 1996-03-07 22:00:00 360.0
3 1996-03-07 22:15:00 360.0
4 1996-03-07 22:30:00 360.0
... ... ...
867882 2021-02-03 12:45:00 361.9
867883 2021-02-03 13:00:00 361.8
867884 2021-02-03 13:15:00 361.8
867885 2021-02-03 13:30:00 361.7
867886 2021-02-03 13:45:00 361.8
[867887 rows x 2 columns]
The problem is that inside the dataset entire days are missing.
If I understood correctly the grouper in pandas needs continues days to work properly.
So I refilled the dates:
df.set_index('Date', inplace=True)
all_days = pd.date_range(df.index.min(), df.index.max(), freq='15T')
df = df.reindex(all_days)
But when I now run my code to get the daily maximum
daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D')).idxmax().iloc[:, 0]]
I get the following error message:
The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.
When I check with df.index
, I get
DatetimeIndex(['1996-03-07 21:30:00', '1996-03-07 21:45:00',
'1996-03-07 22:00:00', '1996-03-07 22:15:00',
'1996-03-07 22:30:00', '1996-03-07 22:45:00',
'1996-03-07 23:00:00', '1996-03-07 23:15:00',
'1996-03-07 23:30:00', '1996-03-07 23:45:00',
...
'2021-02-03 11:30:00', '2021-02-03 11:45:00',
'2021-02-03 12:00:00', '2021-02-03 12:15:00',
'2021-02-03 12:30:00', '2021-02-03 12:45:00',
'2021-02-03 13:00:00', '2021-02-03 13:15:00',
'2021-02-03 13:30:00', '2021-02-03 13:45:00'],
dtype='datetime64[ns]', length=873474, freq='15T')
and checking the dtype of my Value
column returns dtype('float64')
.
I am probably missing something very obvious here, but I'm honestly not familiar at all with dtypes and date formats.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这里的问题是在
groupby
之后没有定义列,所以不是返回Series
,而是DataFrame:I think here is problem is not defined column after
groupby
, so is not returnedSeries
, but DataFrame: