按照熊猫组的差异

发布于 2025-02-06 19:29:21 字数 841 浏览 2 评论 0原文

如何创建new_group列?如果上面的行是水果,则基于10分钟的水果差距;以及2分钟的水果差距,如果上面的行是其他行?数据框已排序。

person   time_bought  product    new_group
abby     2:21         fruit        1
abby     2:25         fruit        1  (2.25 is within 10 minutes of 2.21 so part of same group)
abby     10:35        fruit        2  
abby     10:40        other
abby     10:42        fruit        2  (10.42 is within 2 minutes of 10.35)
abby     10:53        fruit        3  (10.53 is not within 10 minutes of 10.42)
barry    12:00        fruit        1
...

我试过:

m1 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('10min')
m2 = df.product.shift(1)=="other"
m3 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('2min')
df['new_group'] = m1.cumsum().mask(m2, m3)

How could I create the new_group column? It's based on 10-minute fruit gaps if the row above is fruit; and 2-minute fruit gaps if the row above is Other? Dataframe is sorted.

person   time_bought  product    new_group
abby     2:21         fruit        1
abby     2:25         fruit        1  (2.25 is within 10 minutes of 2.21 so part of same group)
abby     10:35        fruit        2  
abby     10:40        other
abby     10:42        fruit        2  (10.42 is within 2 minutes of 10.35)
abby     10:53        fruit        3  (10.53 is not within 10 minutes of 10.42)
barry    12:00        fruit        1
...

I tried:

m1 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('10min')
m2 = df.product.shift(1)=="other"
m3 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('2min')
df['new_group'] = m1.cumsum().mask(m2, m3)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

彩扇题诗 2025-02-13 19:29:21

iiuc,您可以使用词典保存参考,然后使用相同代码的变体:

thresh = {'fruit': pd.Timedelta('10min'), 'other': pd.Timedelta('2min')}
# map custom threshold based on previous row product
ref = df.groupby('person')['product'].shift().map(thresh)

# compare each delta to the custom threshold
m1 = pd.to_datetime(df['time_bought']).groupby(df['person']).diff().gt(ref)
m2 = df['product'].ne('fruit')

df['new_group'] = m1.groupby(df['person']).cumsum().add(1).mask(m2)

输出:

  person time_bought product  new_group
0   abby        2:21   fruit        1.0
1   abby        2:25   fruit        1.0
2   abby       10:35   fruit        2.0
3   abby       10:40   other        NaN
4   abby       10:42   fruit        2.0
5   abby       10:53   fruit        3.0
6  barry       12:00   fruit        1.0

IIUC, you can use a dictionary to hold the references, then use a variation of the same code:

thresh = {'fruit': pd.Timedelta('10min'), 'other': pd.Timedelta('2min')}
# map custom threshold based on previous row product
ref = df.groupby('person')['product'].shift().map(thresh)

# compare each delta to the custom threshold
m1 = pd.to_datetime(df['time_bought']).groupby(df['person']).diff().gt(ref)
m2 = df['product'].ne('fruit')

df['new_group'] = m1.groupby(df['person']).cumsum().add(1).mask(m2)

output:

  person time_bought product  new_group
0   abby        2:21   fruit        1.0
1   abby        2:25   fruit        1.0
2   abby       10:35   fruit        2.0
3   abby       10:40   other        NaN
4   abby       10:42   fruit        2.0
5   abby       10:53   fruit        3.0
6  barry       12:00   fruit        1.0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文