将序数日期、缩写月份名称、正常年份格式的字符串日期列转换为 %Y-%m-%d

发布于 2025-01-13 04:40:21 字数 1298 浏览 0 评论 0原文

给定以下 df 和字符串 date 列，其中包含日期的序数、月份的缩写月份名称和正常年份：

             date       oil       gas
0    1st Oct 2021       428        99
1   10th Sep 2021       401       101
2    2nd Oct 2020       189        74
3   10th Jan 2020       659       119
4    1st Nov 2019       691       130
5   30th Aug 2019       742       162
6   10th May 2019       805       183
7   24th Aug 2018       860       182
8    1st Sep 2017       759       183
9   10th Mar 2017       617       151
10  10th Feb 2017       591       149
11  22nd Apr 2016       343        88
12  10th Apr 2015       760       225
13  23rd Jan 2015      1317       316

我想知道我们如何解析 date 列转换为标准 %Y-%m-%d 格式？

到目前为止我的想法：1.从字符日期字符串中删除序数指示符（'st'，'nd'，'rd'，'th'），同时保留日期数字与re; 2. 将月份名称缩写转换为数字（好像不是%b），3. 最后将它们转换为%Y-%m-%d。

代码可能对第一步有用：

re.compile(r"(?<=\d)(st|nd|rd|th)").sub("", df['date'])

参考：

https://metacpan.org/release/DROLSKY/DateTime-Locale-0.46/view/lib/DateTime/Locale/en_US.pm#Months

原文

Given the following df with string date column with ordinal numbers for day, abbreviated month name for month, and normal year:

             date       oil       gas
0    1st Oct 2021       428        99
1   10th Sep 2021       401       101
2    2nd Oct 2020       189        74
3   10th Jan 2020       659       119
4    1st Nov 2019       691       130
5   30th Aug 2019       742       162
6   10th May 2019       805       183
7   24th Aug 2018       860       182
8    1st Sep 2017       759       183
9   10th Mar 2017       617       151
10  10th Feb 2017       591       149
11  22nd Apr 2016       343        88
12  10th Apr 2015       760       225
13  23rd Jan 2015      1317       316

I'm wondering how could we parse date column to standard %Y-%m-%d format?

My ideas so far: 1. strip ordinal indicators ('st', 'nd', 'rd', 'th') from character day string while keeping the day number with re; 2. and convert abbreviated month name to numbers (seems not %b), 3. finally convert them to %Y-%m-%d.

Code may be useful for the first step:

re.compile(r"(?<=\d)(st|nd|rd|th)").sub("", df['date'])

References:

https://metacpan.org/release/DROLSKY/DateTime-Locale-0.46/view/lib/DateTime/Locale/en_US.pm#Months

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

偷得浮生 2025-01-20 04:40:21

如果您不指定 format 参数，pd.to_datetime 已经可以处理这种情况：

>>> pd.to_datetime(df['date'])
0    2021-10-01
1    2021-09-10
2    2020-10-02
3    2020-01-10
4    2019-11-01
5    2019-08-30
6    2019-05-10
7    2018-08-24
8    2017-09-01
9    2017-03-10
10   2017-02-10
11   2016-04-22
12   2015-04-10
13   2015-01-23
Name: date, dtype: datetime64[ns]

pd.to_datetime already handles this case if you don't specify the format parameter:

>>> pd.to_datetime(df['date'])
0    2021-10-01
1    2021-09-10
2    2020-10-02
3    2020-01-10
4    2019-11-01
5    2019-08-30
6    2019-05-10
7    2018-08-24
8    2017-09-01
9    2017-03-10
10   2017-02-10
11   2016-04-22
12   2015-04-10
13   2015-01-23
Name: date, dtype: datetime64[ns]

回复收藏 0 原文

~没有更多了~