迭代地计算熊猫组差异

发布于 2025-02-06 03:44:07 字数 791 浏览 3 评论 0原文

嗨,我根据前后的行创建了一个名为“组”的列,是水果。

如何创建new_group列?它基于10分钟的水果差距。数据帧按时间,时间分类。

person   time_bought  product  group  new_group
abby     2:21         fruit     1       1
abby     2:24         other     
abby     2:25         fruit     2       1  (2.25 is within 10 minutes of 2.21 so part of same group)
abby     10:35        fruit     2       2  
abby     10:40        other
abby     10:42        fruit     3       2  (10.42 is within 10 minutes of 10.35)
abby     10:53        fruit     4       3  (10.53 is not within 10 minutes of 10.42)
abby     11:04        fruit     d
barry    12:00        fruit     1

我试过

m= df.groupby(["person", "group"]).time_bought.diff()
df["new_group"] = df.groupby(["person, "group"]).mask(m).ffill()

Hi I created a column called "group" based on if the row before and after is fruit.

How could I create the new_group column? It's based on 10-minute fruit gaps. The dataframe is sorted by person, time.

person   time_bought  product  group  new_group
abby     2:21         fruit     1       1
abby     2:24         other     
abby     2:25         fruit     2       1  (2.25 is within 10 minutes of 2.21 so part of same group)
abby     10:35        fruit     2       2  
abby     10:40        other
abby     10:42        fruit     3       2  (10.42 is within 10 minutes of 10.35)
abby     10:53        fruit     4       3  (10.53 is not within 10 minutes of 10.42)
abby     11:04        fruit     d
barry    12:00        fruit     1

I tried

m= df.groupby(["person", "group"]).time_bought.diff()
df["new_group"] = df.groupby(["person, "group"]).mask(m).ffill()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

幼儿园老大 2025-02-13 03:44:07

要生成新组,您可以使用:

m1 = pd.to_datetime(df['time_bought']).groupby(df['person']).diff().gt('10min')
df['new_group'] = m1.cumsum().add(1)

输出:

  person time_bought product group  new_group
0   abby        2:21   fruit     1          1
1   abby        2:24   other  None          1
2   abby        2:25   fruit     2          1
3   abby       10:35   fruit     2          2
4   abby       10:40   other  None          2
5   abby       10:42   fruit     3          2
6   abby       10:53   fruit     4          3
7   abby       11:04   fruit     d          4
8  barry       12:00   fruit     1          4

潜在掩蔽:掩码非保证和组的最后(逻辑不清楚):

m2 = df['product'].ne('fruit')
m3 = df['person'].ne(df['person'].shift(-1))
df['new_group'] = m1.cumsum().add(1).mask(m2|m3).convert_dtypes()

输出:

  person time_bought product group  new_group
0   abby        2:21   fruit     1          1
1   abby        2:24   other  None       <NA>
2   abby        2:25   fruit     2          1
3   abby       10:35   fruit     2          2
4   abby       10:40   other  None       <NA>
5   abby       10:42   fruit     3          2
6   abby       10:53   fruit     4          3
7   abby       11:04   fruit     d       <NA>
8  barry       12:00   fruit     1       <NA>

To generate your new group, you can use:

m1 = pd.to_datetime(df['time_bought']).groupby(df['person']).diff().gt('10min')
df['new_group'] = m1.cumsum().add(1)

output:

  person time_bought product group  new_group
0   abby        2:21   fruit     1          1
1   abby        2:24   other  None          1
2   abby        2:25   fruit     2          1
3   abby       10:35   fruit     2          2
4   abby       10:40   other  None          2
5   abby       10:42   fruit     3          2
6   abby       10:53   fruit     4          3
7   abby       11:04   fruit     d          4
8  barry       12:00   fruit     1          4

Potential masking: mask non-fruit and last of group (the logic is unclear):

m2 = df['product'].ne('fruit')
m3 = df['person'].ne(df['person'].shift(-1))
df['new_group'] = m1.cumsum().add(1).mask(m2|m3).convert_dtypes()

output:

  person time_bought product group  new_group
0   abby        2:21   fruit     1          1
1   abby        2:24   other  None       <NA>
2   abby        2:25   fruit     2          1
3   abby       10:35   fruit     2          2
4   abby       10:40   other  None       <NA>
5   abby       10:42   fruit     3          2
6   abby       10:53   fruit     4          3
7   abby       11:04   fruit     d       <NA>
8  barry       12:00   fruit     1       <NA>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文