使用熊猫数据范围对客户数据进行分析

发布于 2025-02-09 10:06:20 字数 769 浏览 1 评论 0原文

我目前正在对企业的客户进行分析。我试图得出的是产品2之前有多少客户购买了产品1。我目前在要分析的时间段的所有订单中都有所有订单的数据。我将其包含在一个数据框架中，即：

客户ID	订单订单	订单产品的订单订单索引	（串联到字符串中）
客户1	订单111	1	产品1，产品3，产品4
客户1	订单112	2	产品2
客户2客户2	订单113	1	产品2，产品4
客户2	订单114	2	产品1

<代码>订单索引表示订单属于客户寿命的位置在1。我有一个想法，即如何设置此设置，但我无法将其连接到pandas/python中的执行。

从本质上讲，我想使用逻辑：

有多少客户的订单包含较低订单索引的订单比包含产品2的订单2。

每个记录都是订单。在上表示例中，我想计算客户1，而不是2，因为客户1在产品2之前购买了产品1

。我可以尝试从中获得这些见解。我开始旋转车轮，并想在这里伸出援手以获取想法或解决方案。

原文

I am currently working on an analysis of customers at a business. What I am trying to derive is how many customers purchased Product 1 before Product 2. I currently have data for all orders for the time period I am looking to analyze. I have it in a dataframe that is:

Customer ID	Order ID	Index of Order	Products on Order (concatenated into a string)
Customer 1	Order 111	1	Product 1, Product 3, Product 4
Customer 1	Order 112	2	Product 2
Customer 2	Order 113	1	Product 2, Product 4
Customer 2	Order 114	2	Product 1

The index of order represents where the order falls within the lifespan of a customer, i.e. someone's first order, second order, etc. indexed as an integer starting at 1. I have an idea of how to set this up but I can't connect it to execution within Pandas/Python.

Essentially I want to use the logic:

How many customers have an order containing Product 1 with a lower order index than an order containing Product 2.

Each record is an order. In the above table example I would like for Customer 1 to be counted, but not 2 because customer 1 purchased Product 1 before Product 2.

I don't really have any code right now outside of cleaning the data to get to a workable data set that I can attempt to derive these insights from. I started spinning my wheels and wanted to reach out here for ideas or solutions.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

辞取 2025-02-16 10:06:21

这是一种方法，首先使用爆炸产品创建另一个数据框，然后根据您的目标使用它切片：

df2 = (df
 .assign(product=df['Products on Order'].str.split(', '))
 .explode('product')
)

target = ('Product 1', 'Product 2')
s0 = df2.loc[df2['product'].eq(target[0])].set_index('Customer ID')['Index of Order']
s1 = df2.loc[df2['product'].eq(target[1])].set_index('Customer ID')['Index of Order']
s1.sub(s0).gt(0)

输出：

Customer ID
Customer 1     True
Customer 2    False
Name: Index of Order, dtype: bool

中间df2：

  Customer ID   Order ID  Index of Order                Products on Order    product
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 1
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 3
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 4
1  Customer 1  Order 112               2                        Product 2  Product 2
2  Customer 2  Order 113               1             Product 2, Product 4  Product 2
2  Customer 2  Order 113               1             Product 2, Product 4  Product 4
3  Customer 2  Order 114               2                        Product 1  Product 1

Here is one approach, first create another DataFrame with exploded products, then use it to slice based on your targets:

df2 = (df
 .assign(product=df['Products on Order'].str.split(', '))
 .explode('product')
)

target = ('Product 1', 'Product 2')
s0 = df2.loc[df2['product'].eq(target[0])].set_index('Customer ID')['Index of Order']
s1 = df2.loc[df2['product'].eq(target[1])].set_index('Customer ID')['Index of Order']
s1.sub(s0).gt(0)

output:

Customer ID
Customer 1     True
Customer 2    False
Name: Index of Order, dtype: bool

intermediate df2:

  Customer ID   Order ID  Index of Order                Products on Order    product
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 1
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 3
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 4
1  Customer 1  Order 112               2                        Product 2  Product 2
2  Customer 2  Order 113               1             Product 2, Product 4  Product 2
2  Customer 2  Order 113               1             Product 2, Product 4  Product 4
3  Customer 2  Order 114               2                        Product 1  Product 1

回复收藏 0 原文

花海 2025-02-16 10:06:21

只需逐步建立它即可。

total_customers = df['Customer ID'].nunique()

desired_customers = 0
first_product = 'Product 1'
second_product = 'Product 2'

for customer in df['Customer ID'].unique():
    orders = df[df['Customer ID'] == customer]
    order_contents = orders['Products on Order (concatenated into a string)'].sum()

    if first_product in order_contents and second_product in order_contents:
        first_prod_ind = orders[orders['Products on Order (concatenated into a string)'].str.contains(first_product)]['Index of Order'].iloc[0]
        second_prod_ind = orders[orders['Products on Order (concatenated into a string)'].str.contains(second_product)]['Index of Order'].iloc[0]
        
        if first_prod_ind < second_prod_ind:
            desired_customers += 1

print(f"Customers who bought {first_product} first: {desired_customers}")
print(f"Total customers: {total_customers}")

请注意，仅当客户第一次购买产品1是在他们第一次购买产品2之前就计算的。

Just build it up piece by piece.

total_customers = df['Customer ID'].nunique()

desired_customers = 0
first_product = 'Product 1'
second_product = 'Product 2'

for customer in df['Customer ID'].unique():
    orders = df[df['Customer ID'] == customer]
    order_contents = orders['Products on Order (concatenated into a string)'].sum()

    if first_product in order_contents and second_product in order_contents:
        first_prod_ind = orders[orders['Products on Order (concatenated into a string)'].str.contains(first_product)]['Index of Order'].iloc[0]
        second_prod_ind = orders[orders['Products on Order (concatenated into a string)'].str.contains(second_product)]['Index of Order'].iloc[0]
        
        if first_prod_ind < second_prod_ind:
            desired_customers += 1

print(f"Customers who bought {first_product} first: {desired_customers}")
print(f"Total customers: {total_customers}")

Note this only counts customers if the first time they bought Product 1 was before the first time they bought Product 2.

回复收藏 0 原文

长途伴 2025-02-16 10:06:21

我能够使用以下代码完成所需的结果

df['Prod_1'] = np.where(df['Products on Order (concatenated into a string)'].str.contains('Product 1'), df['Index of Order'], 0)
df['Prod_2'] = np.where(df['Products on Order (concatenated into a string)'].str.contains('Product 2'), df['Index of Order'], 0)
df['Prod_1'] = df.groupby('Customer ID')['Prod_1'].transform(max)
df['Prod_2'] = df.groupby('Customer ID')['Prod_2'].transform(max)
df = df.loc[np.where(df['Prod_1'].lt(df['Prod_2'], True, False))]
df['Customer ID'].unique().tolist()

I was able to accomplish the desired results with the following code

df['Prod_1'] = np.where(df['Products on Order (concatenated into a string)'].str.contains('Product 1'), df['Index of Order'], 0)
df['Prod_2'] = np.where(df['Products on Order (concatenated into a string)'].str.contains('Product 2'), df['Index of Order'], 0)
df['Prod_1'] = df.groupby('Customer ID')['Prod_1'].transform(max)
df['Prod_2'] = df.groupby('Customer ID')['Prod_2'].transform(max)
df = df.loc[np.where(df['Prod_1'].lt(df['Prod_2'], True, False))]
df['Customer ID'].unique().tolist()

回复收藏 0 原文

~没有更多了~