计算两个 Pandas 列之间的时间差(以小时和分钟为单位)
我在数据框中有两列,fromdate
和 todate
。
import pandas as pd
data = {'todate': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
'fromdate': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
df = pd.DataFrame(data)
我添加了一个新列 diff
,使用 I get the diff
列查找两个日期之间的差异
df['diff'] = df['fromdate'] - df['todate']
,但它包含 days
,当还有24小时以上。
todate fromdate diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000
如何将结果转换为仅小时和分钟(即将天转换为小时)?
I have two columns, fromdate
and todate
, in a dataframe.
import pandas as pd
data = {'todate': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
'fromdate': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
df = pd.DataFrame(data)
I add a new column, diff
, to find the difference between the two dates using
df['diff'] = df['fromdate'] - df['todate']
I get the diff
column, but it contains days
, when there's more than 24 hours.
todate fromdate diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000
How do I convert my results to only hours and minutes (i.e. days are converted to hours)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Pandas 时间戳差异返回一个 datetime.timedelta 对象。通过使用 *as_type* 方法可以轻松地将其转换为小时,就像这样
产生,
Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so
to yield,
这让我抓狂,因为上面的
.astype()
解决方案对我不起作用。但我找到了另一种方法。没有计时或任何东西,但可能对其他人有用:......如果你想要几个小时。或者:
...如果你想要几分钟。
更新:这里曾经有一个有用的评论提到使用
.total_seconds()
来表示跨越多天的时间段。由于它消失了,我更新了答案。This was driving me bonkers as the
.astype()
solution above didn't work for me. But I found another way. Haven't timed it or anything, but might work for others out there:...if you want hours. Or:
...if you want minutes.
UPDATE: There used to be a helpful comment here that mentioned using
.total_seconds()
for time periods spanning multiple days. Since it's gone, I've updated the answer.天+小时
。 会议纪要不包括在内。hh:mm
或x 小时 y 分钟
形式显示小时和分钟的列,需要额外的计算和字符串格式化。timedelta
数学以浮点形式获取总小时数或总分钟数,并且比使用.astype('timedelta64[h]')
更快。pandas v2.0.0
,.astype('timedelta64[h]')
不是允许。timedelta
对象:查看支持的操作。datetime64[ns] dtype
。需要使用转换所有相关列pandas.to_datetime()
。python 3.11.2
、pandas 2.0.1
、numpy 1.24.3
中测试其他方法
.total_seconds()
是在核心开发人员休假时添加并合并的,并且不会被批准。.total_xx
方法的原因。其他资源
dateutil
维护者:(df.from_date - df.to_date) / pd.Timedelta(hours=1)
(df.from_date - df.to_date).dt.total_seconds() / 3600
pandas.Series.dt.total_seconds
.dt
访问器dateutil
模块为标准datetime
模块。%%timeit
测试days + hours
. Minutes are not included.hh:mm
orx hours y minutes
, would require additional calculations and string formatting.timedelta
math, and is faster than using.astype('timedelta64[h]')
.pandas v2.0.0
,.astype('timedelta64[h]')
is not allowed.timedelta
objects: See supported operations.datetime64[ns] dtype
. It is required that all relevant columns are converted usingpandas.to_datetime()
.python 3.11.2
,pandas 2.0.1
,numpy 1.24.3
Other methods
.total_seconds()
was added and merged when the core developer was on vacation, and would not have been approved..total_xx
methods.Other Resources
dateutil
maintainer:(df.from_date - df.to_date) / pd.Timedelta(hours=1)
(df.from_date - df.to_date).dt.total_seconds() / 3600
pandas.Series.dt.total_seconds
.dt
accessordateutil
module provides powerful extensions to the standarddatetime
module.%%timeit
test默认情况下,pandas 中的时间差是纳秒分辨率,即 timedelta64[ns],因此一种将其转换为秒/分钟/小时/等的方法。是将其纳秒表示除以
10**9
来转换为秒,除以60*10**9
来转换为分钟等。此方法至少比本页建议的其他方法。1PS:上面的代码假设您想要整秒、分钟、小时等的差异,因此它使用整数除法(
//
)但如果你也想要分数,那么使用真除法(/
) 代替。也就是说,如果您想要精确的差异,那么考虑将差异转换为更高分辨率(毫秒/微秒/等),而不是小数秒/分钟/小时。1 使用 Trenton McKinney 的设置:
By default, time difference in pandas is in nanosecond resolution, i.e.
timedelta64[ns]
, so one way to convert it into seconds/minutes/hours/etc. is to divide its nanosecond representation by10**9
to convert to seconds, by60*10**9
for minutes etc. This method is at least 3 times faster than other methods suggested on this page.1PS: The above code assumes that you want the difference in whole seconds, minutes, hours etc. so it uses integer division (
//
) but if you want the fractions as well, then use true division (/
) instead. That said, if you want the exact difference, then instead of fractional seconds/minutes/hours, consider converting the difference into higher resolution (milliseconds/microseconds/etc.)1 Some benchmarks using Trenton McKinney's setup: