基于大熊猫数据框中的ID列的多个日期之间的差异
我的实际情况是,在一家商店中,我想知道第一次访问和第二次访问之间的访问期(在第二次访问,第二次和第三次访问之间,...
我有一个带有2列的Python数据集(访问IDS)对于每个客户和访问日期)
data = {'Id': ['A', 'B','A','B','A','A'],
'Date': ['01/03/2022', '03/03/2022', '05/03/2022', '07/03/2022', '09/03/2022','11/03/2022']
}
我的问题:第一次访问与第二次访问之间有几天的时间?第二次访问和第三次访问之间的同样问题...
My practical case is the following, in a store I would like to know visit period (in day) between the 1st visit and the 2nd visit, the 2nd and the 3rd,...
I have a python dataset with 2 columns (visit IDs for each customer and a date of visit)
data = {'Id': ['A', 'B','A','B','A','A'],
'Date': ['01/03/2022', '03/03/2022', '05/03/2022', '07/03/2022', '09/03/2022','11/03/2022']
}
My question : How many days are there between the 1st visit and the 2nd visit for customers who have come 4 times? same question between the 2nd and the 3rd visit...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您期望的输出尚不清楚,但是让我们以客户为索引来计算一个2D表,访问为列:
输出:
这给出了自上次访问以来每个客户(如果更多3个访问)以来的天数
nb 。您可以在此处删除第一列,因为它始终是未定义的(NAT)
The output you expect is unclear, but let's compute a 2D table with the customers as index and the visits as columns:
output:
This gives the number of days since the previous visit for each customer (if more that 3 visits)
NB. you could remove the first column here as it will always be undefined (NaT)
您可以首先将它们施放到
pd.timestamp
对象,然后对ID和日期进行排序是同一用户:
,您可以简单地计算时间戳中的差异,乘以布尔式蒙版,是否 将为您提供序列对象,您可以看到已经有多少天了。
您的输出将是:
或者仅过滤出第一次访问:
它给您:
You can first cast them to
pd.Timestamp
objects and sort the ids and dates byThen, you can simply calculate the difference in the timestamps, multiplied by the boolean mask of whether it is the same user:
Which will give you the timedelta objects that you can see how many days it has been.
Your output will be:
Or just filter out the first visits after:
Which gives you: