用不同格式python整理日期

发布于 2025-02-09 06:34:22 字数 1412 浏览 3 评论 0原文

我很难以不同的格式对日期进行分类。我有一个系列的输入，其中包含许多不同格式的日期，需要提取它们并按时间顺序排序。，我已经为完全数字日期（1989年1月1日）设置了不同的等级，日期为月（1989年3月12日或1989年3月12日或1989年3月12日），并且日期仅给出了这一年的日期（请参见下面的代码）

pat1=r'(\d{0,2}[/-]\d{0,2}[/-]\d{2,4})' # matches mm/dd/yy and mm/dd/yyyy
pat2=r'((\d{1,2})?\W?(Jan|Feb|Mar|Apr|May|June|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\W+(\d{1,2})?\W?\d{4})' 
pat3=r'((?<!\d)(\d{4})(?!\d))'
finalpat=pat1 + "|"+ pat2 + "|"  + pat3
df2=df1.str.extractall(finalpat).groupby(level=0).first()

到目前为止在我需要在可用的时间转换的不同列中，在上面的不同列中获得了一个数据框。

我的问题是我在1989年3月12日和 1989年3月12日的日期和 1989年Mar 1989 （否日）我的数据框。如果没有两种格式（DD Yyyy和DD月Yyyy），我可以轻松地执行此操作：

df3=df2.copy()

dico={"Jan":'01','Feb':'02','Mar':'03','Apr':'04','May':'05','Jun':'06','Jul':'07','Aug':'08','Sep':'09','Oct':'10','Nov':'11','Dec':'12'}


df3[1]=df3[1].str.replace("(?<=[A-Z]{1}[a-z]{2})\w*","")  # we replace the month in the column by its number, and remove
for key,item in dico.items():                          # the letters in month after the first 3.
    df3[1]=df3[1].str.replace(key,item)
df3[1]=df3[1].str.replace("^(\d{1,2}/\d{4})",r'01/\g<1>')

df3[1]=pd.to_datetime(df3[1],format='%d/%m/%Y').dt.strftime('%Y%m%d')  # add 01 if no day given

其中DF3 [1]是感兴趣的列。我使用词典将月份更改为它们的数字，并按照我想要的日期。问题在于，有两种格式的日期（1989年3月12日和1989年3月12日），两种格式之一将被错误地改变。

有没有办法区分日期格式并相应地应用不同的转换？

多谢

原文

I am having a hard time trying to sort dates with different formats. I have a Series with inputs containing dates in many different formats and need to extract them and sort them chronologically.
So far I have setup different regex for fully numerical dates (01/01/1989), dates with month (either Mar 12 1989 or March 1989 or 12 Mar 1989) and dates where only the year is given (see code below)

pat1=r'(\d{0,2}[/-]\d{0,2}[/-]\d{2,4})' # matches mm/dd/yy and mm/dd/yyyy
pat2=r'((\d{1,2})?\W?(Jan|Feb|Mar|Apr|May|June|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\W+(\d{1,2})?\W?\d{4})' 
pat3=r'((?<!\d)(\d{4})(?!\d))'
finalpat=pat1 + "|"+ pat2 + "|"  + pat3
df2=df1.str.extractall(finalpat).groupby(level=0).first()

I now got a dataframe with the different regex expressions above in different columns that I need to transform in usable times.

The problem I have is that I got dates like Mar 12 1989 and 12 Mar 1989 and Mar 1989 (no day) in the same column of my dataframe.
Without two formats ( Month dd YYYY and dd Month YYYY) I can easily do this :

df3=df2.copy()

dico={"Jan":'01','Feb':'02','Mar':'03','Apr':'04','May':'05','Jun':'06','Jul':'07','Aug':'08','Sep':'09','Oct':'10','Nov':'11','Dec':'12'}


df3[1]=df3[1].str.replace("(?<=[A-Z]{1}[a-z]{2})\w*","")  # we replace the month in the column by its number, and remove
for key,item in dico.items():                          # the letters in month after the first 3.
    df3[1]=df3[1].str.replace(key,item)
df3[1]=df3[1].str.replace("^(\d{1,2}/\d{4})",r'01/\g<1>')

df3[1]=pd.to_datetime(df3[1],format='%d/%m/%Y').dt.strftime('%Y%m%d')  # add 01 if no day given

where df3[1] is the column of interest. I use a dictionary to change Month to their number and get my dates as I want them.
The problem is that with two formats of dates ( Mar 12 1989 and 12 Mar 1989), one of the two format will be wrongly transformed.

Is there a way to discriminate between the date formats and apply different transformations accordingly ?

Thanks a lot

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

回忆躺在深渊里 2025-02-16 06:34:22

我遇到的问题是，我的日期是1989年3月12日和1989年3月12日
1989年3月（无日）在我的数据框架的同一列中。

a>可以应付这一点，请考虑以下示例

import pandas as pd
df = pd.DataFrame({'d_str':["Mar 12 1989", "12 Mar 1989", "Mar 1989"]})
df['d_dt'] = pd.to_datetime(df.d_str)
print(df)

输出

         d_str       d_dt
0  Mar 12 1989 1989-03-12
1  12 Mar 1989 1989-03-12
2     Mar 1989 1989-03-01

，您可以使用d_dt进行排序，因为它具有类型dateTime64 [ns]，但是您必须记住，缺乏一天的时间为被视为给定月份的第一天。如果您的数据包含格式（mm/dd/yy）。

problem I have is that I got dates like Mar 12 1989 and 12 Mar 1989
and Mar 1989 (no day) in the same column of my dataframe.

pandas.to_datetime can cope with that, consider following example

import pandas as pd
df = pd.DataFrame({'d_str':["Mar 12 1989", "12 Mar 1989", "Mar 1989"]})
df['d_dt'] = pd.to_datetime(df.d_str)
print(df)

output

         d_str       d_dt
0  Mar 12 1989 1989-03-12
1  12 Mar 1989 1989-03-12
2     Mar 1989 1989-03-01

Now you can sort using d_dt as it has type datetime64[ns] but you must keep in mind that lack of day is treated as 1st day of given month. Be warned though it might fail if your data contain dates in middle-endian format (mm/dd/yy).

回复收藏 0 原文

~没有更多了~