去除 Y/M/D 格式解析时间的噪音(小时)

发布于 2025-01-17 19:01:42 字数 1293 浏览 1 评论 0 原文

我正在解析数据集的日期,但是正在遇到很多 parsererror ,因为小时通常是错误的格式。我决定跳过小时,只关注数年,几个月,

这些都是我约会的变体:

| StartDate |

| --- |

| 2022年3月23日6:00 |

| 2022年3月23日7:0 |

| 2022年3月23日7:|

| 2022年3月23日7 |

目前,只有第一个日期/行用于解析数据。我目前跳过其他行,但是我还想通过排除小时数来包括它们。


for date in df_en['Startdate']:

    try:

        parse(date).date()

    except Exception:

        pass

仍然不必花时间打扰其他日期的正确方法是什么?

我试图将时间转换为有效的小时格式。使用 pd.to_datetime 无法使用,因为时间格式为str 3月不是数字 3 。当手动更改为3时,它仍然给出了错误 value eRror:未转换的数据保留:: 00 。因此,没有几个小时的相关性,我只是想跳过它。

资料来源:


dates = ['December 1, 2021 6:00', 'March 23, 2022 6']

for date in dates:

    date.replace(' (\d{1})', ' 0\\1')

    pd.to_datetime(date, format='%m %d, %Y %H')

    print(date)

endgoal:

|年月|天|

| --- | --- | --- |

| 2022 |三月| 23 |

| 2022 |三月|三月|

I am parsing the dates of my dataset, but am encountering a lot of ParserError because the hours are often in the wrong format. I've decided to skip the hours and only focus on Years, Months, Days

These are the variants I have for date:

| Startdate |

| --- |

| March 23, 2022 6:00 |

| March 23, 2022 7:0 |

| March 23, 2022 7: |

| March 23, 2022 7 |

For now, only the first date/row works for parsing data. I currently skip the other rows, however I would want to also include them by just excluding the hours.


for date in df_en['Startdate']:

    try:

        parse(date).date()

    except Exception:

        pass

What is the right way to still parse the other dates without having to bother with hours?

I've tried to convert the time into a valid hours format. using pd.to_datetime did not work because the time format was a str march not number 3. When manually changed towards 3, it still gave the error ValueError: unconverted data remains: :00. Therefore with no relevancy for hours, I just wanted to skip it.

Source: https://serveanswer.com/questions/converting-to-datetime-parsererror-unknown-string-format-2022-02-17-7


dates = ['December 1, 2021 6:00', 'March 23, 2022 6']

for date in dates:

    date.replace(' (\d{1})', ' 0\\1')

    pd.to_datetime(date, format='%m %d, %Y %H')

    print(date)

Endgoal:

| Year | Month | Day |

| --- | --- | --- |

| 2022 | March | 23 |

| 2022 | March | March |

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

萌面超妹 2025-01-24 19:01:42

如果您只需要年/月/日列,实际上不需要解析为日期时间。只需通过分割和重新排列来处理字符串即可;前任:

import pandas as pd

df = pd.DataFrame({'Startdate': ['December 1, 2021 6:00', 'March 23, 2022 6']})

parts = df['Startdate'].str.split('\ |, ')

df['year'], df['month'], df['day'] = parts.str[2], parts.str[0], parts.str[1]

print(df)
#                Startdate  year     month day
# 0  December 1, 2021 6:00  2021  December   1
# 1       March 23, 2022 6  2022     March  23

If you just need year/month/day columns, there's actually no need to parse to datetime. Just deal with the strings by splitting and rearranging; EX:

import pandas as pd

df = pd.DataFrame({'Startdate': ['December 1, 2021 6:00', 'March 23, 2022 6']})

parts = df['Startdate'].str.split('\ |, ')

df['year'], df['month'], df['day'] = parts.str[2], parts.str[0], parts.str[1]

print(df)
#                Startdate  year     month day
# 0  December 1, 2021 6:00  2021  December   1
# 1       March 23, 2022 6  2022     March  23
櫻之舞 2025-01-24 19:01:42

我想您可以在此之后倾倒小时部分

dates = ['March 23, 2022 6:00', 'March 23, 2022 7:0', 'March 23, 2022 7:', 'March 23, 2022 7']
pd.to_datetime([' '.join(x.split(' ')[:-1]) for x in dates])
DatetimeIndex(['2022-03-23', '2022-03-23', '2022-03-23', '2022-03-23'], dtype='datetime64[ns]', freq=None)

,您可以做 df ['date']。dt. year 以提取年,月,天

I guess you can just dump the hour part

dates = ['March 23, 2022 6:00', 'March 23, 2022 7:0', 'March 23, 2022 7:', 'March 23, 2022 7']
pd.to_datetime([' '.join(x.split(' ')[:-1]) for x in dates])
DatetimeIndex(['2022-03-23', '2022-03-23', '2022-03-23', '2022-03-23'], dtype='datetime64[ns]', freq=None)

after that you can do df['date'].dt.year to extract year, month, day

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文