分离 Pandas 中的日期和时间
我有一个带有时间戳的数据文件,如下所示:
它被加载到 pandas 中,列名称为“Time”。我正在尝试创建两个新的 datetime64 类型列,一个包含日期,一个包含时间(小时) )。我已经在 StackOverflow 上探索了这个问题的一些解决方案,但仍然遇到问题,我需要最后的列不是对象,这样我就可以使用 pandas 和 numpy 功能
加载数据帧并创建两个新列 。所以:
df = pd.read_csv('C:\\Users\\...\\xyz.csv')
df['Date'] = pd.to_datetime(df['Time']).dt.date
df['Hour'] = pd.to_datetime(df['Time']).dt.time
这有效,但日期和小时列现在是对象,
我运行它以将日期转换为我想要的 datetime64 数据类型,并且它有效:
df['Date'] = pd.to_datetime(df['Date'])
但是,当我尝试在小时列上使用相同的代码时,我收到错误:
TypeError: <class 'datetime.time'> is not convertible to datetime
我做了一些挖掘,发现以下运行的代码:
df['Hour'] = pd.to_datetime(df['Hour'], format='%H:%M:%S')
但是实际输出包括一个通用日期,如下所示:
当我尝试运行引用 Hour 列的代码时,如下所示:
HourVarb = '15:00:00'
df['Test'] = np.where(df['Hour']==HourVarb,1,np.nan)
它运行但不产生我想要的结果。
也许我的 HourVarb 变量对于 numpy 代码来说格式错误?或者,1/1/1900 引起问题并且格式 %H: %M: %S 需要更改吗?我的最终目标是能够引用小时和日期来过滤掉特定的日期/小时组合。请帮忙。
需要注意的是,当我将 HourVarb 更改为“1/1/1900 15:00:00”时,上面的代码按预期工作,但我仍然想了解是否有一种更干净的方法来删除日期。谢谢
I have a data file with timestamps that look like this:
It gets loaded into pandas with a column name of "Time". I am trying to create two new datetime64 type columns, one with the date and one with the time (hour). I have explored a few solutions to this problem on StackOverflow but am still having issues. Quick note, I need the final columns to not be objects so I can use pandas and numpy functionality.
I load the dataframe and create two new columns like so:
df = pd.read_csv('C:\\Users\\...\\xyz.csv')
df['Date'] = pd.to_datetime(df['Time']).dt.date
df['Hour'] = pd.to_datetime(df['Time']).dt.time
This works but the Date and Hour columns are now objects.
I run this to convert the date to my desired datetime64 data type and it works:
df['Date'] = pd.to_datetime(df['Date'])
However, when I try to use this same code on the Hour column, I get an error:
TypeError: <class 'datetime.time'> is not convertible to datetime
I did some digging and found the following code which runs:
df['Hour'] = pd.to_datetime(df['Hour'], format='%H:%M:%S')
However the actual output includes a generic date like so:
When I try to run code referencing the Hour column like so:
HourVarb = '15:00:00'
df['Test'] = np.where(df['Hour']==HourVarb,1,np.nan)
It runs but doesn't produce the result I want.
Perhaps my HourVarb variable is the wrong format for the numpy code? Alternatively, the 1/1/1900 is causing problems and the format %H: %M: %S needs to change? My end goal is to be able to reference the hour and the date to filter out specific date/hour combinations. Please help.
One note, when I change the HourVarb to '1/1/1900 15:00:00' the code above works as intended, but I'd still like to understand if there is a cleaner way that removes the date. Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不确定我是否理解这些列的“对象”数据类型的问题。
我以这种方式加载了您提供的数据:
我得到了这些数据类型:
Date 和 Hour 是对象类型这一事实应该不是问题。基础数据是 datetime 类型:
这意味着您可以将这些列用作这样的。例如:
您想要做什么,要求列 dtype 是 Pandas dtype?
更新
以下是您如何实现问题的最后部分:
I'm not sure I understand the problem with the 'object' datatypes of these columns.
I loaded the data you provided this way:
And I get these data types:
The fact that Date and Hour are object types should not be a problem. The underlying data is a datetime type:
This means you can use these columns as such. For example:
What are you trying to do that is requiring the column dtype to be a Pandas dtype?
UPDATE
Here is how you achieve the last part of your question: