计算数据帧中具有分钟差异的连续行

发布于 2025-01-16 13:31:58 字数 1066 浏览 0 评论 0原文

5我有一个如下所示的数据框：

名称	站点	时间
手动	BCN	3/10/2022 11:23:13 PM
手动	BCN	3/10/2022 11:38:47 PM
自动	马德里	3/10/2022 11:40:32 PM
手册	BCN	3/10/2022 11:39:47 PM
手册	BCN	3/11/2022 12:44:47 AM

它由名称列、地点和时间组成。我正在寻找的是计算名称和地点相等且实例之间的时间少于 20 分钟的位置。在这种情况下，输出将为 Manual,bcn1 ->3 倍，因为第 5 行距其他两行一个小时。数据按时间排序。

我尝试过的是使用名称和地点进行分组，然后将差异应用于时间，但无济于事。

df['Time'] = pd.to_datetime(df['Time'])
g=( df.groupby(['site','Name'])['Time'].diff().ne(pd.Timedelta(minutes=20))
      .groupby(df['site','Ppath']).cumsum() )
groups = df.groupby(['Site',g])['Time']
new_df = df.assign(count = groups.transform('size'))

这将返回所有值的计数，而不是满足时间增量的值。文件本身很大，有多个名称和站点位置。

非常感谢

编辑1。为了澄清，我正在查看值对，因此在本例中第一行与第二行。然后是第二个和第三个，依此类推。我正在探索一种通过名称和站点进行过滤的解决方案。

谢谢

原文

5I have a dataframe that looks like this:

Name	Site	Time
Manual	BCN	3/10/2022 11:23:13 PM
Manual	BCN	3/10/2022 11:38:47 PM
Automatic	Madrid	3/10/2022 11:40:32 PM
Manual	BCN	3/10/2022 11:39:47 PM
Manual	BCN	3/11/2022 12:44:47 AM

It consists of a Name column, Place and Time. What I'm looking for is to count where Name and place are equal and Time is less than 20minutes between instances. In this case output would be Manual,bcn1 ->3 times as the 5th row is an hour away from the other two. The data is sorted by Time.

What I have tried is to groupby with the Name and Place and then apply a diff to Time with no avail.

df['Time'] = pd.to_datetime(df['Time'])
g=( df.groupby(['site','Name'])['Time'].diff().ne(pd.Timedelta(minutes=20))
      .groupby(df['site','Ppath']).cumsum() )
groups = df.groupby(['Site',g])['Time']
new_df = df.assign(count = groups.transform('size'))

This is returning the count of all values not the ones that fulfill the timedelta. The file itself is quite big with multiple Name and site places.

Many thanks

Edit1.
To clarify I'm looking at value pairs so in this case the first row with the second one. And then the second one with the third one and so on. I'm exploring a solution with a For filtering by Name and site.

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

画▽骨i 2025-01-23 13:31:58

IIUC，尝试：

df["Time"] = pd.to_datetime(df["Time"])
df = df.sort_values("Time", ignore_index=True)

output = (df.groupby(["Name", "Site"])["Time"].apply(lambda x: x.diff()
                                                                .dt
                                                                .total_seconds()
                                                                .div(60)
                                                                .fillna(0)
                                                                .le(20)
                                                                .sum()
                                                    )
          )

>>> output
Name       Site  
Automatic  Madrid    1
Manual     BCN       3
Name: Time, dtype: int64

IIUC, try:

df["Time"] = pd.to_datetime(df["Time"])
df = df.sort_values("Time", ignore_index=True)

output = (df.groupby(["Name", "Site"])["Time"].apply(lambda x: x.diff()
                                                                .dt
                                                                .total_seconds()
                                                                .div(60)
                                                                .fillna(0)
                                                                .le(20)
                                                                .sum()
                                                    )
          )

>>> output
Name       Site  
Automatic  Madrid    1
Manual     BCN       3
Name: Time, dtype: int64

回复收藏 0 原文

~没有更多了~