计算两个 Pandas 列之间的时间差(以小时和分钟为单位)

发布于 2025-01-12 21:15:21 字数 1097 浏览 3 评论 0原文

我在数据框中有两列,fromdatetodate

import pandas as pd

data = {'todate': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
        'fromdate': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}

df = pd.DataFrame(data)

我添加了一个新列 diff,使用 I get the diff 列查找两个日期之间的差异

df['diff'] = df['fromdate'] - df['todate']

,但它包含 days,当还有24小时以上。

                   todate                 fromdate                    diff
0 2014-01-24 13:03:12.050  2014-01-26 23:41:21.870  2 days 10:38:09.820000
1 2014-01-27 11:57:18.240  2014-01-27 15:38:22.540  0 days 03:41:04.300000
2 2014-01-23 10:07:47.660  2014-01-23 18:50:41.420  0 days 08:42:53.760000

如何将结果转换为仅小时和分钟(即将天转换为小时)?

I have two columns, fromdate and todate, in a dataframe.

import pandas as pd

data = {'todate': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
        'fromdate': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}

df = pd.DataFrame(data)

I add a new column, diff, to find the difference between the two dates using

df['diff'] = df['fromdate'] - df['todate']

I get the diff column, but it contains days, when there's more than 24 hours.

                   todate                 fromdate                    diff
0 2014-01-24 13:03:12.050  2014-01-26 23:41:21.870  2 days 10:38:09.820000
1 2014-01-27 11:57:18.240  2014-01-27 15:38:22.540  0 days 03:41:04.300000
2 2014-01-23 10:07:47.660  2014-01-23 18:50:41.420  0 days 08:42:53.760000

How do I convert my results to only hours and minutes (i.e. days are converted to hours)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

酒中人 2025-01-19 21:15:21

Pandas 时间戳差异返回一个 datetime.timedelta 对象。通过使用 *as_type* 方法可以轻松地将其转换为小时,就像这样

import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')

产生,

0    58
1     3
2     8
dtype: float64

Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so

import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')

to yield,

0    58
1     3
2     8
dtype: float64
丶情人眼里出诗心の 2025-01-19 21:15:21

这让我抓狂,因为上面的 .astype() 解决方案对我不起作用。但我找到了另一种方法。没有计时或任何东西,但可能对其他人有用:

t1 = pd.to_datetime('1/1/2015 01:00')
t2 = pd.to_datetime('1/1/2015 03:30')

print pd.Timedelta(t2 - t1).seconds / 3600.0

......如果你想要几个小时。或者:

print pd.Timedelta(t2 - t1).seconds / 60.0

...如果你想要几分钟。

更新:这里曾经有一个有用的评论提到使用 .total_seconds() 来表示跨越多天的时间段。由于它消失了,我更新了答案。

This was driving me bonkers as the .astype() solution above didn't work for me. But I found another way. Haven't timed it or anything, but might work for others out there:

t1 = pd.to_datetime('1/1/2015 01:00')
t2 = pd.to_datetime('1/1/2015 03:30')

print pd.Timedelta(t2 - t1).seconds / 3600.0

...if you want hours. Or:

print pd.Timedelta(t2 - t1).seconds / 60.0

...if you want minutes.

UPDATE: There used to be a helpful comment here that mentioned using .total_seconds() for time periods spanning multiple days. Since it's gone, I've updated the answer.

短叹 2025-01-19 21:15:21
import pandas as pd

# test data from OP, with values already in a datetime format
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
        'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}

# test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed
df = pd.DataFrame(data)

# add a timedelta column if wanted. It's added here for information only
# df['time_delta_with_sub'] = df.from_date.sub(df.to_date)  # also works
df['time_delta'] = (df.from_date - df.to_date)

# create a column with timedelta as total hours, as a float type
df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)

# create a colume with timedelta as total minutes, as a float type
df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)

# display(df)
                  to_date               from_date             time_delta  tot_hour_diff  tot_mins_diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000      58.636061    3518.163667
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000       3.684528     221.071667
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000       8.714933     522.896000

其他方法

  • 注意事项从其他资源中的播客来看, .total_seconds() 是在核心开发人员休假时添加并合并的,并且不会被批准。
    • 这也是没有其他 .total_xx 方法的原因。
# convert the entire timedelta to seconds
# this is the same as td / timedelta(seconds=1)
(df.from_date - df.to_date).dt.total_seconds()
[out]:
0    211089.82
1     13264.30
2     31373.76
dtype: float64

# get the number of days
(df.from_date - df.to_date).dt.days
[out]:
0    2
1    0
2    0
dtype: int64

# get the seconds for hours + minutes + seconds, but not days
# note the difference from total_seconds
(df.from_date - df.to_date).dt.seconds
[out]:
0    38289
1    13264
2    31373
dtype: int64

其他资源

%%timeit 测试

import pandas as pd

# dataframe with 2M rows
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}
df = pd.DataFrame(data)
df = pd.concat([df] * 1000000).reset_index(drop=True)

%timeit (df.from_date - df.to_date) / pd.Timedelta(hours=1)
[out]:
24.2 ms ± 2.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit (df.from_date - df.to_date).astype('timedelta64[h]')
[out]:
ValueError: Cannot convert from timedelta64[ns] to timedelta64[D]. Supported resolutions are 's', 'ms', 'us', 'ns'
  • How do I convert my results to only hours and minutes?
    • The accepted answer only returns days + hours. Minutes are not included.
  • To provide a column that has hours and minutes as hh:mm or x hours y minutes, would require additional calculations and string formatting.
  • This answer shows how to get either total hours or total minutes as a float, using timedelta math, and is faster than using .astype('timedelta64[h]').
  • Pandas Time Deltas User Guide
  • Pandas Time series / date functionality User Guide
  • python timedelta objects: See supported operations.
  • The following sample data is already a datetime64[ns] dtype. It is required that all relevant columns are converted using pandas.to_datetime().
  • Tested in python 3.11.2, pandas 2.0.1, numpy 1.24.3
import pandas as pd

# test data from OP, with values already in a datetime format
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
        'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}

# test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed
df = pd.DataFrame(data)

# add a timedelta column if wanted. It's added here for information only
# df['time_delta_with_sub'] = df.from_date.sub(df.to_date)  # also works
df['time_delta'] = (df.from_date - df.to_date)

# create a column with timedelta as total hours, as a float type
df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)

# create a colume with timedelta as total minutes, as a float type
df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)

# display(df)
                  to_date               from_date             time_delta  tot_hour_diff  tot_mins_diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000      58.636061    3518.163667
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000       3.684528     221.071667
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000       8.714933     522.896000

Other methods

  • An item of note from the podcast in Other Resources, .total_seconds() was added and merged when the core developer was on vacation, and would not have been approved.
    • This is also why there aren't other .total_xx methods.
# convert the entire timedelta to seconds
# this is the same as td / timedelta(seconds=1)
(df.from_date - df.to_date).dt.total_seconds()
[out]:
0    211089.82
1     13264.30
2     31373.76
dtype: float64

# get the number of days
(df.from_date - df.to_date).dt.days
[out]:
0    2
1    0
2    0
dtype: int64

# get the seconds for hours + minutes + seconds, but not days
# note the difference from total_seconds
(df.from_date - df.to_date).dt.seconds
[out]:
0    38289
1    13264
2    31373
dtype: int64

Other Resources

%%timeit test

import pandas as pd

# dataframe with 2M rows
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}
df = pd.DataFrame(data)
df = pd.concat([df] * 1000000).reset_index(drop=True)

%timeit (df.from_date - df.to_date) / pd.Timedelta(hours=1)
[out]:
24.2 ms ± 2.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit (df.from_date - df.to_date).astype('timedelta64[h]')
[out]:
ValueError: Cannot convert from timedelta64[ns] to timedelta64[D]. Supported resolutions are 's', 'ms', 'us', 'ns'
半枫 2025-01-19 21:15:21

默认情况下,pandas 中的时间差是纳秒分辨率,即 timedelta64[ns],因此一种将其转换为秒/分钟/小时/等的方法。是将其纳秒表示除以 10**9 来转换为秒,除以 60*10**9 来转换为分钟等。此方法至少比本页建议的其他方法。1

df['diff_in_seconds'] = df['from_date'].sub(df['to_date']).view('int64') // 10**9
df['diff_in_minutes'] = df['from_date'].sub(df['to_date']).view('int64') // (60*10**9)
df['diff_in_hours'] = df['from_date'].sub(df['to_date']).view('int64') // (3600*10**9)

PS:上面的代码假设您想要整秒、分钟、小时等的差异,因此它使用整数除法(// )但如果​​你也想要分数,那么使用真除法(/) 代替。也就是说,如果您想要精确的差异,那么考虑将差异转换为更高分辨率(毫秒/微秒/等),而不是小数秒/分钟/小时。


1 使用 Trenton McKinney 的设置

data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')]*1000000, 
        'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]*1000000}
df = pd.DataFrame(data)
df['Diff'] = df['from_date'] - df['to_date']

%timeit df['Diff'].view('int64') // (3600*10**9)
# 11 ms ± 271 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['Diff'] // pd.Timedelta(hours=1)
# 36.7 ms ± 2.99 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['Diff'].astype('timedelta64[h]')
# 46.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['Diff'].dt.total_seconds() // 3600
# 169 ms ± 7.71 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

By default, time difference in pandas is in nanosecond resolution, i.e. timedelta64[ns], so one way to convert it into seconds/minutes/hours/etc. is to divide its nanosecond representation by 10**9 to convert to seconds, by 60*10**9 for minutes etc. This method is at least 3 times faster than other methods suggested on this page.1

df['diff_in_seconds'] = df['from_date'].sub(df['to_date']).view('int64') // 10**9
df['diff_in_minutes'] = df['from_date'].sub(df['to_date']).view('int64') // (60*10**9)
df['diff_in_hours'] = df['from_date'].sub(df['to_date']).view('int64') // (3600*10**9)

PS: The above code assumes that you want the difference in whole seconds, minutes, hours etc. so it uses integer division (//) but if you want the fractions as well, then use true division (/) instead. That said, if you want the exact difference, then instead of fractional seconds/minutes/hours, consider converting the difference into higher resolution (milliseconds/microseconds/etc.)


1 Some benchmarks using Trenton McKinney's setup:

data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')]*1000000, 
        'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]*1000000}
df = pd.DataFrame(data)
df['Diff'] = df['from_date'] - df['to_date']

%timeit df['Diff'].view('int64') // (3600*10**9)
# 11 ms ± 271 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['Diff'] // pd.Timedelta(hours=1)
# 36.7 ms ± 2.99 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['Diff'].astype('timedelta64[h]')
# 46.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['Diff'].dt.total_seconds() // 3600
# 169 ms ± 7.71 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文