为什么日期格式会随着 np.where 改变以及如何阻止它发生

发布于 2025-01-10 19:03:10 字数 2104 浏览 0 评论 0原文

我有来自数据帧的以下数据子集。

{'NID': {131598: '215026851',
  131599: '215026851',
  131600: '215026851',
  131601: '215026851',
  131602: '215026851',
  131603: '215026851',
  131604: '215026851',
  131605: '215026851',
  131606: '215026851'},
 'AbCode': {131598: 0,
  131599: 0,
  131600: 0,
  131601: 0,
  131602: 0,
  131603: 1,
  131604: 0,
  131605: 0,
  131606: 0},
 'ABdat': {131598: Timestamp('2018-01-24 00:00:00'),
  131599: Timestamp('2019-01-25 00:00:00'),
  131600: NaT,
  131601: Timestamp('2019-11-08 00:00:00'),
  131602: Timestamp('2020-01-24 00:00:00'),
  131603: Timestamp('2020-02-15 00:00:00'),
  131604: Timestamp('2020-10-16 00:00:00'),
  131605: Timestamp('2020-10-26 00:00:00'),
  131606: NaT}}

格式化数据后,如下所示,

          NID     AbCode  ABdat
131598  215026851   0   2018-01-24
131599  215026851   0   2019-01-25
131600  215026851   0   NaT
131601  215026851   0   2019-11-08
131602  215026851   0   2020-01-24
131603  215026851   1   2020-02-15
131604  215026851   0   2020-10-16
131605  215026851   0   2020-10-26
131606  215026851   0   NaT 

我想将 AbCode = 0 的 ABdat 替换为 Missing (NaT),并将 AbCode = 1 的 ABdat 替换为 ABdat-7days

我在下面编写了以下 np.where 代码来执行此操作。

breed_info['ABdat'] = np.where(breed_info.AbCode == 1, breed_info['ABdat'] - pd.DateOffset(days=7), breed_info['ABdat'].isnull)

输出如下所示,

          NID   AbCode       ABdat
131598  215026851   0   <bound method Series.isnull of 49017 ...
131599  215026851   0   <bound method Series.isnull of 49017 ...
131600  215026851   0   <bound method Series.isnull of 49017 ...
131601  215026851   0   <bound method Series.isnull of 49017 ...
131602  215026851   0   <bound method Series.isnull of 49017 ...
131603  215026851   1   1581120000000000000
131604  215026851   0   <bound method Series.isnull of 49017 ...
131605  215026851   0   <bound method Series.isnull of 49017 ...
131606  215026851   0   <bound method Series.isnull of 49017 ...

您能否告知为什么日期格式发生变化以及如何避免这种情况发生?

谢谢

I have the following subset of data from a dataframe.

{'NID': {131598: '215026851',
  131599: '215026851',
  131600: '215026851',
  131601: '215026851',
  131602: '215026851',
  131603: '215026851',
  131604: '215026851',
  131605: '215026851',
  131606: '215026851'},
 'AbCode': {131598: 0,
  131599: 0,
  131600: 0,
  131601: 0,
  131602: 0,
  131603: 1,
  131604: 0,
  131605: 0,
  131606: 0},
 'ABdat': {131598: Timestamp('2018-01-24 00:00:00'),
  131599: Timestamp('2019-01-25 00:00:00'),
  131600: NaT,
  131601: Timestamp('2019-11-08 00:00:00'),
  131602: Timestamp('2020-01-24 00:00:00'),
  131603: Timestamp('2020-02-15 00:00:00'),
  131604: Timestamp('2020-10-16 00:00:00'),
  131605: Timestamp('2020-10-26 00:00:00'),
  131606: NaT}}

When formatted the data looks like below

          NID     AbCode  ABdat
131598  215026851   0   2018-01-24
131599  215026851   0   2019-01-25
131600  215026851   0   NaT
131601  215026851   0   2019-11-08
131602  215026851   0   2020-01-24
131603  215026851   1   2020-02-15
131604  215026851   0   2020-10-16
131605  215026851   0   2020-10-26
131606  215026851   0   NaT 

I would like to replace the ABdat with missing (NaT) for AbCode = 0 and replace ABdat with ABdat-7days for AbCode = 1

I wrote the following np.where code below to do this.

breed_info['ABdat'] = np.where(breed_info.AbCode == 1, breed_info['ABdat'] - pd.DateOffset(days=7), breed_info['ABdat'].isnull)

The output is presented below

          NID   AbCode       ABdat
131598  215026851   0   <bound method Series.isnull of 49017 ...
131599  215026851   0   <bound method Series.isnull of 49017 ...
131600  215026851   0   <bound method Series.isnull of 49017 ...
131601  215026851   0   <bound method Series.isnull of 49017 ...
131602  215026851   0   <bound method Series.isnull of 49017 ...
131603  215026851   1   1581120000000000000
131604  215026851   0   <bound method Series.isnull of 49017 ...
131605  215026851   0   <bound method Series.isnull of 49017 ...
131606  215026851   0   <bound method Series.isnull of 49017 ...

Could you please advise why the date format is changing and how I can avoid this from happening?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦一生花开无言 2025-01-17 19:03:10

最简单的是使用一些 pandas 解决方案和 pandas 方法,例如 Series.where

breed_info['ABdat'] = (breed_info['ABdat'] - pd.DateOffset(days=7))
                                                        .where(breed_info.AbCode == 1)

使用 np.where 带有帮助器 Series 的黑客解决方案:

breed_info['ABdat'] = np.where(breed_info.AbCode == 1,
                               breed_info['ABdat'] - pd.DateOffset(days=7),
                                pd.Series(pd.NaT, index=breed_info.index))
print (breed_info)
              NID  AbCode      ABdat
131598  215026851       0        NaT
131599  215026851       0        NaT
131600  215026851       0        NaT
131601  215026851       0        NaT
131602  215026851       0        NaT
131603  215026851       1 2020-02-08
131604  215026851       0        NaT
131605  215026851       0        NaT
131606  215026851       0        NaT

因为如果传递 pd.NAT它返回下划线numpy数组(以纳秒为单位):

breed_info['ABdat'] = np.where(breed_info.AbCode == 1, 
                               breed_info['ABdat'] - pd.DateOffset(days=7),
                               pd.NaT)
print (breed_info)
              NID  AbCode                ABdat
131598  215026851       0                  NaT
131599  215026851       0                  NaT
131600  215026851       0                  NaT
131601  215026851       0                  NaT
131602  215026851       0                  NaT
131603  215026851       1  1581120000000000000
131604  215026851       0                  NaT
131605  215026851       0                  NaT
131606  215026851       0                  NaT

我认为原因是错误。

Simpluiest is use some pandas solutions with pandas method e.g. Series.where:

breed_info['ABdat'] = (breed_info['ABdat'] - pd.DateOffset(days=7))
                                                        .where(breed_info.AbCode == 1)

With np.where hacky solution with helper Series:

breed_info['ABdat'] = np.where(breed_info.AbCode == 1,
                               breed_info['ABdat'] - pd.DateOffset(days=7),
                                pd.Series(pd.NaT, index=breed_info.index))
print (breed_info)
              NID  AbCode      ABdat
131598  215026851       0        NaT
131599  215026851       0        NaT
131600  215026851       0        NaT
131601  215026851       0        NaT
131602  215026851       0        NaT
131603  215026851       1 2020-02-08
131604  215026851       0        NaT
131605  215026851       0        NaT
131606  215026851       0        NaT

because if passing pd.NAT it return underline numpy array (in nanoseconds):

breed_info['ABdat'] = np.where(breed_info.AbCode == 1, 
                               breed_info['ABdat'] - pd.DateOffset(days=7),
                               pd.NaT)
print (breed_info)
              NID  AbCode                ABdat
131598  215026851       0                  NaT
131599  215026851       0                  NaT
131600  215026851       0                  NaT
131601  215026851       0                  NaT
131602  215026851       0                  NaT
131603  215026851       1  1581120000000000000
131604  215026851       0                  NaT
131605  215026851       0                  NaT
131606  215026851       0                  NaT

I think reason is bug.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文