按照熊猫组的差异

发布于 2025-02-06 19:29:21 字数 841 浏览 2 评论 0原文

如何创建new_group列？如果上面的行是水果，则基于10分钟的水果差距；以及2分钟的水果差距，如果上面的行是其他行？数据框已排序。

person   time_bought  product    new_group
abby     2:21         fruit        1
abby     2:25         fruit        1  (2.25 is within 10 minutes of 2.21 so part of same group)
abby     10:35        fruit        2  
abby     10:40        other
abby     10:42        fruit        2  (10.42 is within 2 minutes of 10.35)
abby     10:53        fruit        3  (10.53 is not within 10 minutes of 10.42)
barry    12:00        fruit        1
...

我试过：

m1 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('10min')
m2 = df.product.shift(1)=="other"
m3 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('2min')
df['new_group'] = m1.cumsum().mask(m2, m3)

原文

How could I create the new_group column? It's based on 10-minute fruit gaps if the row above is fruit; and 2-minute fruit gaps if the row above is Other? Dataframe is sorted.

person   time_bought  product    new_group
abby     2:21         fruit        1
abby     2:25         fruit        1  (2.25 is within 10 minutes of 2.21 so part of same group)
abby     10:35        fruit        2  
abby     10:40        other
abby     10:42        fruit        2  (10.42 is within 2 minutes of 10.35)
abby     10:53        fruit        3  (10.53 is not within 10 minutes of 10.42)
barry    12:00        fruit        1
...

I tried:

m1 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('10min')
m2 = df.product.shift(1)=="other"
m3 = df.loc[df['product'].eq('fruit'), 'time_bought'].groupby(df['person']).diff().gt('2min')
df['new_group'] = m1.cumsum().mask(m2, m3)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

彩扇题诗 2025-02-13 19:29:21

iiuc，您可以使用词典保存参考，然后使用相同代码的变体：

thresh = {'fruit': pd.Timedelta('10min'), 'other': pd.Timedelta('2min')}
# map custom threshold based on previous row product
ref = df.groupby('person')['product'].shift().map(thresh)

# compare each delta to the custom threshold
m1 = pd.to_datetime(df['time_bought']).groupby(df['person']).diff().gt(ref)
m2 = df['product'].ne('fruit')

df['new_group'] = m1.groupby(df['person']).cumsum().add(1).mask(m2)

输出：

  person time_bought product  new_group
0   abby        2:21   fruit        1.0
1   abby        2:25   fruit        1.0
2   abby       10:35   fruit        2.0
3   abby       10:40   other        NaN
4   abby       10:42   fruit        2.0
5   abby       10:53   fruit        3.0
6  barry       12:00   fruit        1.0

IIUC, you can use a dictionary to hold the references, then use a variation of the same code:

thresh = {'fruit': pd.Timedelta('10min'), 'other': pd.Timedelta('2min')}
# map custom threshold based on previous row product
ref = df.groupby('person')['product'].shift().map(thresh)

# compare each delta to the custom threshold
m1 = pd.to_datetime(df['time_bought']).groupby(df['person']).diff().gt(ref)
m2 = df['product'].ne('fruit')

df['new_group'] = m1.groupby(df['person']).cumsum().add(1).mask(m2)

output:

  person time_bought product  new_group
0   abby        2:21   fruit        1.0
1   abby        2:25   fruit        1.0
2   abby       10:35   fruit        2.0
3   abby       10:40   other        NaN
4   abby       10:42   fruit        2.0
5   abby       10:53   fruit        3.0
6  barry       12:00   fruit        1.0

回复收藏 0 原文

~没有更多了~