如何最好地找到所需事件序列之间的时间差?
假设我有以下数据框:
import pandas as pd
df = pd.DataFrame(columns=['Time', 'Event'])
df.Time = ['06:15.2', '06:15.2', '06:45.3', '43:26.8', '43:26.8', '43:57.9', '27:30.0', '27:31.1', '27:31.1', '27:31.1', '27:31.1', '32:41.0', '10:22.6', '10:22.6', '10:54.7', '11:30.3', '11:30.3']
df.Event = ['login', 'other', 'logout', 'login', 'other', 'logout', 'login', 'other', 'other', 'login', 'logout', 'logout', 'login', 'other', 'logout', 'login', 'other']
我想查找登录和登录后立即注销之间的时间差。也就是说,如果有多次登录,然后一次注销,我想找到最后一次登录和注销之间的时间差。
我的尝试如下:
import numpy as np
import datetime
diff = np.zeros(len(df))
for i in range(len(df)):
if df.Event[i]=='login':
for j in range(len(df[i:])):
if df.Event[j]=='logout':
diff[j] = (datetime.datetime.strptime(df.Time[j], '%M:%S.%f') - datetime.datetime.strptime(df.Time[i], '%M:%S.%f'))/ datetime.timedelta(milliseconds=1)
df['Diff'] = diff
df
但它不会产生正确的结果:
正确的结果应该是:Diff
列中的非零数字应该为 30.1、31.1、 0.0, 0.0 (不知道如何处理这种情况,因为这个注销是在注销之后发生的...考虑到成对登录和注销会增加另一个级别的复杂性), 32.1
是否有任何 Pythonic 方法可以做到这一点?
Say I have the following dataframe:
import pandas as pd
df = pd.DataFrame(columns=['Time', 'Event'])
df.Time = ['06:15.2', '06:15.2', '06:45.3', '43:26.8', '43:26.8', '43:57.9', '27:30.0', '27:31.1', '27:31.1', '27:31.1', '27:31.1', '32:41.0', '10:22.6', '10:22.6', '10:54.7', '11:30.3', '11:30.3']
df.Event = ['login', 'other', 'logout', 'login', 'other', 'logout', 'login', 'other', 'other', 'login', 'logout', 'logout', 'login', 'other', 'logout', 'login', 'other']
I want to find the time difference between a login and the logout that comes immediately after the login. That is, if there are multiple logins and then one logout, I want to find the time difference between the last login and the logout.
My attempt to do this is as follows:
import numpy as np
import datetime
diff = np.zeros(len(df))
for i in range(len(df)):
if df.Event[i]=='login':
for j in range(len(df[i:])):
if df.Event[j]=='logout':
diff[j] = (datetime.datetime.strptime(df.Time[j], '%M:%S.%f') - datetime.datetime.strptime(df.Time[i], '%M:%S.%f'))/ datetime.timedelta(milliseconds=1)
df['Diff'] = diff
df
But it doesn't produce the right result:
The correct result should be: the non-zero numbers in the Diff
column should be 30.1, 31.1, 0.0, 0.0 (not sure how to handle this case, as this logout comes after a logout...considering pairwise logins and logouts would be adding another level of complexity), 32.1
Is there any Pythonic way of doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我有一些非常长且低效的代码,但它有效。第一个函数稍后用于将时间转换为您想要的字符串格式。
Times
和Events
列将转换为列表,然后进行列表理解以将时间字符串转换为分钟。我还没有测试代码是否可以忍受
Times
列中的小时值,但理论上是可以的。它无法处理滚动到第二天的时间,所以要注意!另外,如果小时值非常大,则此代码将不必要地消耗内存来存储较大的时间,因为时间以分钟为单位;在这种情况下,我建议分块工作,并从所有相关时间中减去一定的小时数。I've got some very long and inefficient code, but it works. The first function is used later to turn times into the string format you want. The
Times
andEvents
columns are turned into lists and then list comprehension is done to turn the time strings into minutes.I haven't tested whether the code can put up with hours values in the
Times
column, but theoretically, it should. It cannot deal with times rolling over to the next day, so be aware! Also, if the hour values are very large, this code will unnecessarily eat up memory just to store the large times, since times are in terms of minutes; in that case, I'd advise working in chunks and subtracting a certain number of hours from all the times in question.这是输出:
Here is the output :