使用熊猫数据范围对客户数据进行分析

发布于 2025-02-09 10:06:20 字数 769 浏览 1 评论 0原文

我目前正在对企业的客户进行分析。我试图得出的是产品2之前有多少客户购买了产品1。我目前在要分析的时间段的所有订单中都有所有订单的数据。我将其包含在一个数据框架中,即:

客户ID订单订单订单产品的订单订单索引(串联到字符串中)
客户1订单1111产品1,产品3,产品4
客户1订单1122产品2
客户2客户2订单1131产品2,产品4
客户2订单1142产品1

<代码>订单索引表示订单属于客户寿命的位置在1。我有一个想法,即如何设置此设置,但我无法将其连接到pandas/python中的执行。

从本质上讲,我想使用逻辑:

  • 有多少客户的订单包含较低订单索引的订单比包含产品2的订单2。

每个记录都是订单。在上表示例中,我想计算客户1,而不是2,因为客户1在产品2之前购买了产品1

。我可以尝试从中获得这些见解。我开始旋转车轮,并想在这里伸出援手以获取想法或解决方案。

I am currently working on an analysis of customers at a business. What I am trying to derive is how many customers purchased Product 1 before Product 2. I currently have data for all orders for the time period I am looking to analyze. I have it in a dataframe that is:

Customer IDOrder IDIndex of OrderProducts on Order (concatenated into a string)
Customer 1Order 1111Product 1, Product 3, Product 4
Customer 1Order 1122Product 2
Customer 2Order 1131Product 2, Product 4
Customer 2Order 1142Product 1

The index of order represents where the order falls within the lifespan of a customer, i.e. someone's first order, second order, etc. indexed as an integer starting at 1. I have an idea of how to set this up but I can't connect it to execution within Pandas/Python.

Essentially I want to use the logic:

  • How many customers have an order containing Product 1 with a lower order index than an order containing Product 2.

Each record is an order. In the above table example I would like for Customer 1 to be counted, but not 2 because customer 1 purchased Product 1 before Product 2.

I don't really have any code right now outside of cleaning the data to get to a workable data set that I can attempt to derive these insights from. I started spinning my wheels and wanted to reach out here for ideas or solutions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

辞取 2025-02-16 10:06:21

这是一种方法,首先使用爆炸产品创建另一个数据框,然后根据您的目标使用它切片:

df2 = (df
 .assign(product=df['Products on Order'].str.split(', '))
 .explode('product')
)

target = ('Product 1', 'Product 2')
s0 = df2.loc[df2['product'].eq(target[0])].set_index('Customer ID')['Index of Order']
s1 = df2.loc[df2['product'].eq(target[1])].set_index('Customer ID')['Index of Order']
s1.sub(s0).gt(0)

输出:

Customer ID
Customer 1     True
Customer 2    False
Name: Index of Order, dtype: bool

中间df2

  Customer ID   Order ID  Index of Order                Products on Order    product
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 1
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 3
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 4
1  Customer 1  Order 112               2                        Product 2  Product 2
2  Customer 2  Order 113               1             Product 2, Product 4  Product 2
2  Customer 2  Order 113               1             Product 2, Product 4  Product 4
3  Customer 2  Order 114               2                        Product 1  Product 1

Here is one approach, first create another DataFrame with exploded products, then use it to slice based on your targets:

df2 = (df
 .assign(product=df['Products on Order'].str.split(', '))
 .explode('product')
)

target = ('Product 1', 'Product 2')
s0 = df2.loc[df2['product'].eq(target[0])].set_index('Customer ID')['Index of Order']
s1 = df2.loc[df2['product'].eq(target[1])].set_index('Customer ID')['Index of Order']
s1.sub(s0).gt(0)

output:

Customer ID
Customer 1     True
Customer 2    False
Name: Index of Order, dtype: bool

intermediate df2:

  Customer ID   Order ID  Index of Order                Products on Order    product
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 1
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 3
0  Customer 1  Order 111               1  Product 1, Product 3, Product 4  Product 4
1  Customer 1  Order 112               2                        Product 2  Product 2
2  Customer 2  Order 113               1             Product 2, Product 4  Product 2
2  Customer 2  Order 113               1             Product 2, Product 4  Product 4
3  Customer 2  Order 114               2                        Product 1  Product 1
花海 2025-02-16 10:06:21

只需逐步建立它即可。

total_customers = df['Customer ID'].nunique()

desired_customers = 0
first_product = 'Product 1'
second_product = 'Product 2'

for customer in df['Customer ID'].unique():
    orders = df[df['Customer ID'] == customer]
    order_contents = orders['Products on Order (concatenated into a string)'].sum()

    if first_product in order_contents and second_product in order_contents:
        first_prod_ind = orders[orders['Products on Order (concatenated into a string)'].str.contains(first_product)]['Index of Order'].iloc[0]
        second_prod_ind = orders[orders['Products on Order (concatenated into a string)'].str.contains(second_product)]['Index of Order'].iloc[0]
        
        if first_prod_ind < second_prod_ind:
            desired_customers += 1

print(f"Customers who bought {first_product} first: {desired_customers}")
print(f"Total customers: {total_customers}")

请注意,仅当客户第一次购买产品1是在他们第一次购买产品2之前就计算的。

Just build it up piece by piece.

total_customers = df['Customer ID'].nunique()

desired_customers = 0
first_product = 'Product 1'
second_product = 'Product 2'

for customer in df['Customer ID'].unique():
    orders = df[df['Customer ID'] == customer]
    order_contents = orders['Products on Order (concatenated into a string)'].sum()

    if first_product in order_contents and second_product in order_contents:
        first_prod_ind = orders[orders['Products on Order (concatenated into a string)'].str.contains(first_product)]['Index of Order'].iloc[0]
        second_prod_ind = orders[orders['Products on Order (concatenated into a string)'].str.contains(second_product)]['Index of Order'].iloc[0]
        
        if first_prod_ind < second_prod_ind:
            desired_customers += 1

print(f"Customers who bought {first_product} first: {desired_customers}")
print(f"Total customers: {total_customers}")

Note this only counts customers if the first time they bought Product 1 was before the first time they bought Product 2.

长途伴 2025-02-16 10:06:21

我能够使用以下代码完成所需的结果

df['Prod_1'] = np.where(df['Products on Order (concatenated into a string)'].str.contains('Product 1'), df['Index of Order'], 0)
df['Prod_2'] = np.where(df['Products on Order (concatenated into a string)'].str.contains('Product 2'), df['Index of Order'], 0)
df['Prod_1'] = df.groupby('Customer ID')['Prod_1'].transform(max)
df['Prod_2'] = df.groupby('Customer ID')['Prod_2'].transform(max)
df = df.loc[np.where(df['Prod_1'].lt(df['Prod_2'], True, False))]
df['Customer ID'].unique().tolist()

I was able to accomplish the desired results with the following code

df['Prod_1'] = np.where(df['Products on Order (concatenated into a string)'].str.contains('Product 1'), df['Index of Order'], 0)
df['Prod_2'] = np.where(df['Products on Order (concatenated into a string)'].str.contains('Product 2'), df['Index of Order'], 0)
df['Prod_1'] = df.groupby('Customer ID')['Prod_1'].transform(max)
df['Prod_2'] = df.groupby('Customer ID')['Prod_2'].transform(max)
df = df.loc[np.where(df['Prod_1'].lt(df['Prod_2'], True, False))]
df['Customer ID'].unique().tolist()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文