我正在解析数据集的日期,但是正在遇到很多 parsererror
,因为小时通常是错误的格式。我决定跳过小时,只关注数年,几个月,
这些都是我约会的变体:
| StartDate |
| --- |
| 2022年3月23日6:00 |
| 2022年3月23日7:0 |
| 2022年3月23日7:|
| 2022年3月23日7 |
目前,只有第一个日期/行用于解析数据。我目前跳过其他行,但是我还想通过排除小时数来包括它们。
for date in df_en['Startdate']:
try:
parse(date).date()
except Exception:
pass
仍然不必花时间打扰其他日期的正确方法是什么?
我试图将时间转换为有效的小时格式。使用 pd.to_datetime
无法使用,因为时间格式为str 3月
不是数字 3
。当手动更改为3时,它仍然给出了错误 value eRror:未转换的数据保留:: 00
。因此,没有几个小时的相关性,我只是想跳过它。
资料来源:
dates = ['December 1, 2021 6:00', 'March 23, 2022 6']
for date in dates:
date.replace(' (\d{1})', ' 0\\1')
pd.to_datetime(date, format='%m %d, %Y %H')
print(date)
endgoal:
|年月|天|
| --- | --- | --- |
| 2022 |三月| 23 |
| 2022 |三月|三月|
I am parsing the dates of my dataset, but am encountering a lot of ParserError
because the hours are often in the wrong format. I've decided to skip the hours and only focus on Years, Months, Days
These are the variants I have for date:
| Startdate |
| --- |
| March 23, 2022 6:00 |
| March 23, 2022 7:0 |
| March 23, 2022 7: |
| March 23, 2022 7 |
For now, only the first date/row works for parsing data. I currently skip the other rows, however I would want to also include them by just excluding the hours.
for date in df_en['Startdate']:
try:
parse(date).date()
except Exception:
pass
What is the right way to still parse the other dates without having to bother with hours?
I've tried to convert the time into a valid hours format. using pd.to_datetime
did not work because the time format was a str march
not number 3
. When manually changed towards 3, it still gave the error ValueError: unconverted data remains: :00
. Therefore with no relevancy for hours, I just wanted to skip it.
Source: https://serveanswer.com/questions/converting-to-datetime-parsererror-unknown-string-format-2022-02-17-7
dates = ['December 1, 2021 6:00', 'March 23, 2022 6']
for date in dates:
date.replace(' (\d{1})', ' 0\\1')
pd.to_datetime(date, format='%m %d, %Y %H')
print(date)
Endgoal:
| Year | Month | Day |
| --- | --- | --- |
| 2022 | March | 23 |
| 2022 | March | March |
发布评论
评论(2)
如果您只需要年/月/日列,实际上不需要解析为日期时间。只需通过分割和重新排列来处理字符串即可;前任:
If you just need year/month/day columns, there's actually no need to parse to datetime. Just deal with the strings by splitting and rearranging; EX:
我想您可以在此之后倾倒小时部分
,您可以做
df ['date']。dt. year
以提取年,月,天I guess you can just dump the hour part
after that you can do
df['date'].dt.year
to extract year, month, day