使用熊猫数据范围对客户数据进行分析
我目前正在对企业的客户进行分析。我试图得出的是产品2之前有多少客户购买了产品1。我目前在要分析的时间段的所有订单中都有所有订单的数据。我将其包含在一个数据框架中,即:
客户ID | 订单订单 | 订单产品的订单订单索引 | (串联到字符串中) |
---|---|---|---|
客户1 | 订单111 | 1 | 产品1,产品3,产品4 |
客户1 | 订单112 | 2 | 产品2 |
客户2客户2 | 订单113 | 1 | 产品2,产品4 |
客户2 | 订单114 | 2 | 产品1 |
<代码>订单索引表示订单属于客户寿命的位置在1。我有一个想法,即如何设置此设置,但我无法将其连接到pandas/python中的执行。
从本质上讲,我想使用逻辑:
- 有多少客户的订单包含较低订单索引的订单比包含产品2的订单2。
每个记录都是订单。在上表示例中,我想计算客户1,而不是2,因为客户1在产品2之前购买了产品1
。我可以尝试从中获得这些见解。我开始旋转车轮,并想在这里伸出援手以获取想法或解决方案。
I am currently working on an analysis of customers at a business. What I am trying to derive is how many customers purchased Product 1 before Product 2. I currently have data for all orders for the time period I am looking to analyze. I have it in a dataframe that is:
Customer ID | Order ID | Index of Order | Products on Order (concatenated into a string) |
---|---|---|---|
Customer 1 | Order 111 | 1 | Product 1, Product 3, Product 4 |
Customer 1 | Order 112 | 2 | Product 2 |
Customer 2 | Order 113 | 1 | Product 2, Product 4 |
Customer 2 | Order 114 | 2 | Product 1 |
The index of order
represents where the order falls within the lifespan of a customer, i.e. someone's first order, second order, etc. indexed as an integer starting at 1. I have an idea of how to set this up but I can't connect it to execution within Pandas/Python.
Essentially I want to use the logic:
- How many customers have an order containing Product 1 with a lower order index than an order containing Product 2.
Each record is an order. In the above table example I would like for Customer 1 to be counted, but not 2 because customer 1 purchased Product 1 before Product 2.
I don't really have any code right now outside of cleaning the data to get to a workable data set that I can attempt to derive these insights from. I started spinning my wheels and wanted to reach out here for ideas or solutions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一种方法,首先使用爆炸产品创建另一个数据框,然后根据您的目标使用它切片:
输出:
中间
df2
:Here is one approach, first create another DataFrame with exploded products, then use it to slice based on your targets:
output:
intermediate
df2
:只需逐步建立它即可。
请注意,仅当客户第一次购买产品1是在他们第一次购买产品2之前就计算的。
Just build it up piece by piece.
Note this only counts customers if the first time they bought Product 1 was before the first time they bought Product 2.
我能够使用以下代码完成所需的结果
I was able to accomplish the desired results with the following code