分离 Pandas 中的日期和时间

发布于 2025-01-11 19:38:31 字数 1465 浏览 0 评论 0原文

我有一个带有时间戳的数据文件,如下所示:

时间
2022年3月4日 15:00
2022年3月4日 14:00
3/4/2022 13:00

它被加载到 pandas 中,列名称为“Time”。我正在尝试创建两个新的 datetime64 类型列,一个包含日期,一个包含时间(小时) )。我已经在 StackOverflow 上探索了这个问题的一些解决方案,但仍然遇到问题,我需要最后的列不是对象,这样我就可以使用 pandas 和 numpy 功能

加载数据帧并创建两个新列 。所以:

df = pd.read_csv('C:\\Users\\...\\xyz.csv')
df['Date'] = pd.to_datetime(df['Time']).dt.date
df['Hour'] = pd.to_datetime(df['Time']).dt.time

这有效,但日期和小时列现在是对象,

我运行它以将日期转换为我想要的 datetime64 数据类型,并且它有效:

df['Date'] = pd.to_datetime(df['Date'])

但是,当我尝试在小时列上使用相同的代码时,我收到错误:

TypeError: <class 'datetime.time'> is not convertible to datetime

我做了一些挖掘,发现以下运行的代码:

df['Hour'] = pd.to_datetime(df['Hour'], format='%H:%M:%S')

但是实际输出包括一个通用日期,如下所示:

在此处输入图像描述

当我尝试运行引用 Hour 列的代码时,如下所示:

HourVarb = '15:00:00'
df['Test'] = np.where(df['Hour']==HourVarb,1,np.nan)

它运行但不产生我想要的结果。

也许我的 HourVarb 变量对于 numpy 代码来说格式错误?或者,1/1/1900 引起问题并且格式 %H: %M: %S 需要更改吗?我的最终目标是能够引用小时和日期来过滤掉特定的日期/小时组合。请帮忙。

需要注意的是,当我将 HourVarb 更改为“1/1/1900 15:00:00”时,上面的代码按预期工作,但我仍然想了解是否有一种更干净的方法来删除日期。谢谢

I have a data file with timestamps that look like this:

Time
3/4/2022 15:00
3/4/2022 14:00
3/4/2022 13:00

It gets loaded into pandas with a column name of "Time". I am trying to create two new datetime64 type columns, one with the date and one with the time (hour). I have explored a few solutions to this problem on StackOverflow but am still having issues. Quick note, I need the final columns to not be objects so I can use pandas and numpy functionality.

I load the dataframe and create two new columns like so:

df = pd.read_csv('C:\\Users\\...\\xyz.csv')
df['Date'] = pd.to_datetime(df['Time']).dt.date
df['Hour'] = pd.to_datetime(df['Time']).dt.time

This works but the Date and Hour columns are now objects.

I run this to convert the date to my desired datetime64 data type and it works:

df['Date'] = pd.to_datetime(df['Date'])

However, when I try to use this same code on the Hour column, I get an error:

TypeError: <class 'datetime.time'> is not convertible to datetime

I did some digging and found the following code which runs:

df['Hour'] = pd.to_datetime(df['Hour'], format='%H:%M:%S')

However the actual output includes a generic date like so:

enter image description here

When I try to run code referencing the Hour column like so:

HourVarb = '15:00:00'
df['Test'] = np.where(df['Hour']==HourVarb,1,np.nan)

It runs but doesn't produce the result I want.

Perhaps my HourVarb variable is the wrong format for the numpy code? Alternatively, the 1/1/1900 is causing problems and the format %H: %M: %S needs to change? My end goal is to be able to reference the hour and the date to filter out specific date/hour combinations. Please help.

One note, when I change the HourVarb to '1/1/1900 15:00:00' the code above works as intended, but I'd still like to understand if there is a cleaner way that removes the date. Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

儭儭莪哋寶赑 2025-01-18 19:38:31

我不确定我是否理解这些列的“对象”数据类型的问题。

我以这种方式加载了您提供的数据:

df = pd.read_csv('xyz.csv')
df['Time'] = pd.to_datetime(df['Time'])
df['Date'] = df['Time'].dt.date
df['Hour'] = df['Time'].dt.time
print(df.dtypes)

我得到了这些数据类型:

Time    datetime64[ns]
Date            object
Hour            object

Date 和 Hour 是对象类型这一事实应该不是问题。基础数据是 datetime 类型:

print(type(df.Date.iloc[0]))
print(type(df.Hour.iloc[0]))

<class 'datetime.date'>
<class 'datetime.time'>

这意味着您可以将这些列用作这样的。例如:

print(df['Date'] + pd.Timedelta('1D'))

您想要做什么,要求列 dtype 是 Pandas dtype?

更新

以下是您如何实现问题的最后部分:

from datetime import datetime, time

hourVarb = datetime.strptime("15:00:00", '%H:%M:%S').time()
# or hourVarb = time(15, 0)
df['Test'] = df['Hour'] == hourVarb
print(df['Test'])

0     True
1    False
2    False
3    False
Name: Test, dtype: bool

I'm not sure I understand the problem with the 'object' datatypes of these columns.

I loaded the data you provided this way:

df = pd.read_csv('xyz.csv')
df['Time'] = pd.to_datetime(df['Time'])
df['Date'] = df['Time'].dt.date
df['Hour'] = df['Time'].dt.time
print(df.dtypes)

And I get these data types:

Time    datetime64[ns]
Date            object
Hour            object

The fact that Date and Hour are object types should not be a problem. The underlying data is a datetime type:

print(type(df.Date.iloc[0]))
print(type(df.Hour.iloc[0]))

<class 'datetime.date'>
<class 'datetime.time'>

This means you can use these columns as such. For example:

print(df['Date'] + pd.Timedelta('1D'))

What are you trying to do that is requiring the column dtype to be a Pandas dtype?

UPDATE

Here is how you achieve the last part of your question:

from datetime import datetime, time

hourVarb = datetime.strptime("15:00:00", '%H:%M:%S').time()
# or hourVarb = time(15, 0)
df['Test'] = df['Hour'] == hourVarb
print(df['Test'])

0     True
1    False
2    False
3    False
Name: Test, dtype: bool
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文