正确计算概率给定条件的方法
我有一些数据显示某个购买某种产品类型的某个客户群的订单:
< img src =“ https://i.sstatic.net/wce85.png” alt =“在此处输入图像说明”>
和相同的格式,但显示了多少退款:
我试图回答一个问题:
顾客在组中的订单是什么可能性[a -b]并退还了?
我的方法是:
being_in_group = df_final[df_final.customer_group.isin(['A','B'])]\
.groupby('customer_group')\
.agg({'order_id': 'count'}).sum(axis = 0)
all_orders = df_final.groupby('customer_group').agg({'order_id': 'count'})\
.sum(axis = 0)
p_being_in_group = round(being_in_group / all_orders, 5)
being_refunded = df_final[(df_final.refund == True) & (df_final.customer_group.isin(['A','B']))]\
.groupby('customer_group')\
.agg({'order_id': 'count'})\
.sum(axis = 0)
# or taking all customer groups
being_refunded_all = df_final[(df_final.refund == True)]\
.groupby('customer_group')\
.agg({'order_id': 'count'})\
.sum(axis = 0)
p_being_refunded = round(being_refunded / all_orders, 5)
p_being_refunded_all = round(being_refunded_all / all_orders, 5)
p_final_1 = p_being_in_group * p_being_refunded * 100
p_final_2 = p_being_in_group * p_being_refunded_all * 100
我想知道这是否是正确的方法 - 计算组a&amp; b
然后检查退款订单 - 我是否应该检查所有数据中的退款订单,或仅在customer_group
IS a&amp; b
?
I have some data which shows how many orders were made by a certain customer group that bought a certain product type:
And the same format but showing how many refunds were made:
I am trying to answer a question:
What is the probability that an order is made by a customer in the group [A - B] and is refunded?
My approach was:
being_in_group = df_final[df_final.customer_group.isin(['A','B'])]\
.groupby('customer_group')\
.agg({'order_id': 'count'}).sum(axis = 0)
all_orders = df_final.groupby('customer_group').agg({'order_id': 'count'})\
.sum(axis = 0)
p_being_in_group = round(being_in_group / all_orders, 5)
being_refunded = df_final[(df_final.refund == True) & (df_final.customer_group.isin(['A','B']))]\
.groupby('customer_group')\
.agg({'order_id': 'count'})\
.sum(axis = 0)
# or taking all customer groups
being_refunded_all = df_final[(df_final.refund == True)]\
.groupby('customer_group')\
.agg({'order_id': 'count'})\
.sum(axis = 0)
p_being_refunded = round(being_refunded / all_orders, 5)
p_being_refunded_all = round(being_refunded_all / all_orders, 5)
p_final_1 = p_being_in_group * p_being_refunded * 100
p_final_2 = p_being_in_group * p_being_refunded_all * 100
I am wondering if that is the correct approach - calculating the probability of an order being made by the group A & B
and then checking the refunded orders - should I check the refunded orders in all of the data or only in the data where customer_group
is A & B
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好的,因此首先是技术性 - 使用给定的样本数据集,您可以计算 比例 ,而不是概率。您可以假设概率相对较近,但这就是一个假设。有关差异的更多信息在这里。
为了找出这一比例,您必须将2组观测值降低 - 感兴趣的总观测空间 - 在您的情况下,我们没有做出任何假设,因此我们采用整个观察空间。也就是说,所有订单。您将每个数字总结在All_orders中,这将是我们的事件总数。
第二组我们必须找出匹配所有条件的。
我们有2个条件:
很容易 - 如果退还订单,则将其计算在第二个表中。
因此,现在我们只需要计算满足第一个条件的那些(13)即可。
除以All_orders的总和,将13划分为13
Okay, so first of all bit of technicality - with a given sample dataset you can calculate proportions, not probability. You can assume that probability will be relatively close, but that's gonna be just that, an assumption. More about the difference here.
In order to find out this proportion, you have to indentify 2 sets of observations - total observation space that interests us - in your case, we don't make any assumptions, so we take whole observation space. That is, all the orders. You sum every number in all_orders and that will be our total number of events.
Second set we have to find out is the one that matches all the conditions.
We have 2 conditions:
Second one is easy - if an order is refunded, it is counted in the second table.
So now we just have to count the ones in there that satisfy first condition (13).
Divide 13 by whatever is the sum of all_orders, and you get your total proportion of an order being refunded and in the group A or B.