分离 Pandas 中的日期和时间

发布于 2025-01-11 19:38:31 字数 1465 浏览 0 评论 0原文

我有一个带有时间戳的数据文件，如下所示：

它被加载到 pandas 中，列名称为“Time”。我正在尝试创建两个新的 datetime64 类型列，一个包含日期，一个包含时间（小时））。我已经在 StackOverflow 上探索了这个问题的一些解决方案，但仍然遇到问题，我需要最后的列不是对象，这样我就可以使用 pandas 和 numpy 功能

加载数据帧并创建两个新列。所以：

df = pd.read_csv('C:\\Users\\...\\xyz.csv')
df['Date'] = pd.to_datetime(df['Time']).dt.date
df['Hour'] = pd.to_datetime(df['Time']).dt.time

这有效，但日期和小时列现在是对象，

我运行它以将日期转换为我想要的 datetime64 数据类型，并且它有效：

df['Date'] = pd.to_datetime(df['Date'])

但是，当我尝试在小时列上使用相同的代码时，我收到错误：

TypeError: <class 'datetime.time'> is not convertible to datetime

我做了一些挖掘，发现以下运行的代码：

df['Hour'] = pd.to_datetime(df['Hour'], format='%H:%M:%S')

但是实际输出包括一个通用日期，如下所示：

当我尝试运行引用 Hour 列的代码时，如下所示：

HourVarb = '15:00:00'
df['Test'] = np.where(df['Hour']==HourVarb,1,np.nan)

它运行但不产生我想要的结果。

也许我的 HourVarb 变量对于 numpy 代码来说格式错误？或者，1/1/1900 引起问题并且格式 %H: %M: %S 需要更改吗？我的最终目标是能够引用小时和日期来过滤掉特定的日期/小时组合。请帮忙。

需要注意的是，当我将 HourVarb 更改为“1/1/1900 15:00:00”时，上面的代码按预期工作，但我仍然想了解是否有一种更干净的方法来删除日期。谢谢

原文

I have a data file with timestamps that look like this:

It gets loaded into pandas with a column name of "Time". I am trying to create two new datetime64 type columns, one with the date and one with the time (hour). I have explored a few solutions to this problem on StackOverflow but am still having issues. Quick note, I need the final columns to not be objects so I can use pandas and numpy functionality.

I load the dataframe and create two new columns like so:

df = pd.read_csv('C:\\Users\\...\\xyz.csv')
df['Date'] = pd.to_datetime(df['Time']).dt.date
df['Hour'] = pd.to_datetime(df['Time']).dt.time

This works but the Date and Hour columns are now objects.

I run this to convert the date to my desired datetime64 data type and it works:

df['Date'] = pd.to_datetime(df['Date'])

However, when I try to use this same code on the Hour column, I get an error:

TypeError: <class 'datetime.time'> is not convertible to datetime

I did some digging and found the following code which runs:

df['Hour'] = pd.to_datetime(df['Hour'], format='%H:%M:%S')

However the actual output includes a generic date like so:

When I try to run code referencing the Hour column like so:

HourVarb = '15:00:00'
df['Test'] = np.where(df['Hour']==HourVarb,1,np.nan)

It runs but doesn't produce the result I want.

Perhaps my HourVarb variable is the wrong format for the numpy code? Alternatively, the 1/1/1900 is causing problems and the format %H: %M: %S needs to change? My end goal is to be able to reference the hour and the date to filter out specific date/hour combinations. Please help.

One note, when I change the HourVarb to '1/1/1900 15:00:00' the code above works as intended, but I'd still like to understand if there is a cleaner way that removes the date. Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

儭儭莪哋寶赑 2025-01-18 19:38:31

我不确定我是否理解这些列的“对象”数据类型的问题。

我以这种方式加载了您提供的数据：

df = pd.read_csv('xyz.csv')
df['Time'] = pd.to_datetime(df['Time'])
df['Date'] = df['Time'].dt.date
df['Hour'] = df['Time'].dt.time
print(df.dtypes)

我得到了这些数据类型：

Time    datetime64[ns]
Date            object
Hour            object

Date 和 Hour 是对象类型这一事实应该不是问题。基础数据是 datetime 类型：

print(type(df.Date.iloc[0]))
print(type(df.Hour.iloc[0]))

<class 'datetime.date'>
<class 'datetime.time'>

这意味着您可以将这些列用作这样的。例如：

print(df['Date'] + pd.Timedelta('1D'))

您想要做什么，要求列 dtype 是 Pandas dtype？

更新

以下是您如何实现问题的最后部分：

from datetime import datetime, time

hourVarb = datetime.strptime("15:00:00", '%H:%M:%S').time()
# or hourVarb = time(15, 0)
df['Test'] = df['Hour'] == hourVarb
print(df['Test'])

0     True
1    False
2    False
3    False
Name: Test, dtype: bool

I'm not sure I understand the problem with the 'object' datatypes of these columns.

I loaded the data you provided this way:

df = pd.read_csv('xyz.csv')
df['Time'] = pd.to_datetime(df['Time'])
df['Date'] = df['Time'].dt.date
df['Hour'] = df['Time'].dt.time
print(df.dtypes)

And I get these data types:

Time    datetime64[ns]
Date            object
Hour            object

The fact that Date and Hour are object types should not be a problem. The underlying data is a datetime type:

print(type(df.Date.iloc[0]))
print(type(df.Hour.iloc[0]))

<class 'datetime.date'>
<class 'datetime.time'>

This means you can use these columns as such. For example:

print(df['Date'] + pd.Timedelta('1D'))

What are you trying to do that is requiring the column dtype to be a Pandas dtype?

UPDATE

Here is how you achieve the last part of your question:

from datetime import datetime, time

hourVarb = datetime.strptime("15:00:00", '%H:%M:%S').time()
# or hourVarb = time(15, 0)
df['Test'] = df['Hour'] == hourVarb
print(df['Test'])

0     True
1    False
2    False
3    False
Name: Test, dtype: bool

回复收藏 0 原文

~没有更多了~