为什么日期格式会随着 np.where 改变以及如何阻止它发生
我有来自数据帧的以下数据子集。
{'NID': {131598: '215026851',
131599: '215026851',
131600: '215026851',
131601: '215026851',
131602: '215026851',
131603: '215026851',
131604: '215026851',
131605: '215026851',
131606: '215026851'},
'AbCode': {131598: 0,
131599: 0,
131600: 0,
131601: 0,
131602: 0,
131603: 1,
131604: 0,
131605: 0,
131606: 0},
'ABdat': {131598: Timestamp('2018-01-24 00:00:00'),
131599: Timestamp('2019-01-25 00:00:00'),
131600: NaT,
131601: Timestamp('2019-11-08 00:00:00'),
131602: Timestamp('2020-01-24 00:00:00'),
131603: Timestamp('2020-02-15 00:00:00'),
131604: Timestamp('2020-10-16 00:00:00'),
131605: Timestamp('2020-10-26 00:00:00'),
131606: NaT}}
格式化数据后,如下所示,
NID AbCode ABdat
131598 215026851 0 2018-01-24
131599 215026851 0 2019-01-25
131600 215026851 0 NaT
131601 215026851 0 2019-11-08
131602 215026851 0 2020-01-24
131603 215026851 1 2020-02-15
131604 215026851 0 2020-10-16
131605 215026851 0 2020-10-26
131606 215026851 0 NaT
我想将 AbCode = 0 的 ABdat 替换为 Missing (NaT),并将 AbCode = 1 的 ABdat 替换为 ABdat-7days
我在下面编写了以下 np.where 代码来执行此操作。
breed_info['ABdat'] = np.where(breed_info.AbCode == 1, breed_info['ABdat'] - pd.DateOffset(days=7), breed_info['ABdat'].isnull)
输出如下所示,
NID AbCode ABdat
131598 215026851 0 <bound method Series.isnull of 49017 ...
131599 215026851 0 <bound method Series.isnull of 49017 ...
131600 215026851 0 <bound method Series.isnull of 49017 ...
131601 215026851 0 <bound method Series.isnull of 49017 ...
131602 215026851 0 <bound method Series.isnull of 49017 ...
131603 215026851 1 1581120000000000000
131604 215026851 0 <bound method Series.isnull of 49017 ...
131605 215026851 0 <bound method Series.isnull of 49017 ...
131606 215026851 0 <bound method Series.isnull of 49017 ...
您能否告知为什么日期格式发生变化以及如何避免这种情况发生?
谢谢
I have the following subset of data from a dataframe.
{'NID': {131598: '215026851',
131599: '215026851',
131600: '215026851',
131601: '215026851',
131602: '215026851',
131603: '215026851',
131604: '215026851',
131605: '215026851',
131606: '215026851'},
'AbCode': {131598: 0,
131599: 0,
131600: 0,
131601: 0,
131602: 0,
131603: 1,
131604: 0,
131605: 0,
131606: 0},
'ABdat': {131598: Timestamp('2018-01-24 00:00:00'),
131599: Timestamp('2019-01-25 00:00:00'),
131600: NaT,
131601: Timestamp('2019-11-08 00:00:00'),
131602: Timestamp('2020-01-24 00:00:00'),
131603: Timestamp('2020-02-15 00:00:00'),
131604: Timestamp('2020-10-16 00:00:00'),
131605: Timestamp('2020-10-26 00:00:00'),
131606: NaT}}
When formatted the data looks like below
NID AbCode ABdat
131598 215026851 0 2018-01-24
131599 215026851 0 2019-01-25
131600 215026851 0 NaT
131601 215026851 0 2019-11-08
131602 215026851 0 2020-01-24
131603 215026851 1 2020-02-15
131604 215026851 0 2020-10-16
131605 215026851 0 2020-10-26
131606 215026851 0 NaT
I would like to replace the ABdat with missing (NaT) for AbCode = 0 and replace ABdat with ABdat-7days for AbCode = 1
I wrote the following np.where code below to do this.
breed_info['ABdat'] = np.where(breed_info.AbCode == 1, breed_info['ABdat'] - pd.DateOffset(days=7), breed_info['ABdat'].isnull)
The output is presented below
NID AbCode ABdat
131598 215026851 0 <bound method Series.isnull of 49017 ...
131599 215026851 0 <bound method Series.isnull of 49017 ...
131600 215026851 0 <bound method Series.isnull of 49017 ...
131601 215026851 0 <bound method Series.isnull of 49017 ...
131602 215026851 0 <bound method Series.isnull of 49017 ...
131603 215026851 1 1581120000000000000
131604 215026851 0 <bound method Series.isnull of 49017 ...
131605 215026851 0 <bound method Series.isnull of 49017 ...
131606 215026851 0 <bound method Series.isnull of 49017 ...
Could you please advise why the date format is changing and how I can avoid this from happening?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
最简单的是使用一些 pandas 解决方案和 pandas 方法,例如
Series.where
:使用
np.where
带有帮助器Series
的黑客解决方案:因为如果传递
pd.NAT它返回下划线numpy数组(以纳秒为单位):
我认为原因是错误。
Simpluiest is use some pandas solutions with pandas method e.g.
Series.where
:With
np.where
hacky solution with helperSeries
:because if passing
pd.NAT
it return underline numpy array (in nanoseconds):I think reason is bug.