将序数日期、缩写月份名称、正常年份格式的字符串日期列转换为 %Y-%m-%d
给定以下 df
和字符串 date
列,其中包含日期的序数、月份的缩写月份名称和正常年份:
date oil gas
0 1st Oct 2021 428 99
1 10th Sep 2021 401 101
2 2nd Oct 2020 189 74
3 10th Jan 2020 659 119
4 1st Nov 2019 691 130
5 30th Aug 2019 742 162
6 10th May 2019 805 183
7 24th Aug 2018 860 182
8 1st Sep 2017 759 183
9 10th Mar 2017 617 151
10 10th Feb 2017 591 149
11 22nd Apr 2016 343 88
12 10th Apr 2015 760 225
13 23rd Jan 2015 1317 316
我想知道我们如何解析 date
列转换为标准 %Y-%m-%d
格式?
到目前为止我的想法:1.从字符日期字符串中删除序数指示符('st','nd','rd','th'
),同时保留日期数字与re; 2. 将月份名称缩写转换为数字(好像不是
%b
),3. 最后将它们转换为%Y-%m-%d
。
代码可能对第一步有用:
re.compile(r"(?<=\d)(st|nd|rd|th)").sub("", df['date'])
参考:
https://metacpan.org/release/DROLSKY/DateTime-Locale-0.46/view/lib/DateTime/Locale/en_US.pm#Months
Given the following df
with string date
column with ordinal numbers for day, abbreviated month name for month, and normal year:
date oil gas
0 1st Oct 2021 428 99
1 10th Sep 2021 401 101
2 2nd Oct 2020 189 74
3 10th Jan 2020 659 119
4 1st Nov 2019 691 130
5 30th Aug 2019 742 162
6 10th May 2019 805 183
7 24th Aug 2018 860 182
8 1st Sep 2017 759 183
9 10th Mar 2017 617 151
10 10th Feb 2017 591 149
11 22nd Apr 2016 343 88
12 10th Apr 2015 760 225
13 23rd Jan 2015 1317 316
I'm wondering how could we parse date
column to standard %Y-%m-%d
format?
My ideas so far: 1. strip ordinal indicators ('st', 'nd', 'rd', 'th'
) from character day string while keeping the day number with re
; 2. and convert abbreviated month name to numbers (seems not %b
), 3. finally convert them to %Y-%m-%d
.
Code may be useful for the first step:
re.compile(r"(?<=\d)(st|nd|rd|th)").sub("", df['date'])
References:
https://metacpan.org/release/DROLSKY/DateTime-Locale-0.46/view/lib/DateTime/Locale/en_US.pm#Months
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您不指定
format
参数,pd.to_datetime
已经可以处理这种情况:pd.to_datetime
already handles this case if you don't specify theformat
parameter: