找到子组中的最大比例

发布于 2025-01-18 01:57:53 字数 1501 浏览 2 评论 0原文

我有一个数据集可以共享一小部分：

ID=c(1,1,2,3,3,1,2,4,2,1,2,1,4,3,1,2,3)
country=c("USA","Canada","Mexico","UK","UK","Mexico",
          "USA","Canada","Canada","Mexico","UK","Mexico","Canada","Canada",
          "USA","USA","UK")
sold_items=c(1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,0,0)
df <- data.frame(ID,country,sold_items)

> df %>% sample_n(5)
  ID country sold_items
1  3      UK          0
2  1     USA          1
3  3      UK          0
4  2     USA          0
5  1  Canada          0

我能够找到每个国家 /地区的销售价格：

df %>% group_by(country) %>% 
  summarise(n_total=n(), per_total=round(n()/nrow(.),digits= 4)*100,
            sales_rate=sum(sold_items[sold_items==1])/n_total * 100)

 country n_total per_total sales_rate
1 Canada        5      29.4         80
2 Mexico        4      23.5         75
3 UK            4      23.5         25
4 USA           4      23.5         50

但是，我需要添加与每个国家相对应的5个单独的列，以在其中显示前2个销售ID国家和该国家/地区每个顶级ID的销售比例，例如在加拿大 sales_rate是80（％），我需要知道top_id_1（％） top_id_1（％）多少钱。此外，最终，一列汇总了每个省的顶级ID名称。因此，我的想法假设数据集看起来像：

country   n_total   per_total     sales_rate  top_ID_1   top_ID_1 (%)   top_ID_2  top_ID_2(%)  names_top_IDs
1 Canada        5      29.4         80
2 Mexico        4      23.5         75
3 UK            4      23.5         25
4 USA           4      23.5         50

原文

I have a data set that I can share a small piece of it:

ID=c(1,1,2,3,3,1,2,4,2,1,2,1,4,3,1,2,3)
country=c("USA","Canada","Mexico","UK","UK","Mexico",
          "USA","Canada","Canada","Mexico","UK","Mexico","Canada","Canada",
          "USA","USA","UK")
sold_items=c(1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,0,0)
df <- data.frame(ID,country,sold_items)

> df %>% sample_n(5)
  ID country sold_items
1  3      UK          0
2  1     USA          1
3  3      UK          0
4  2     USA          0
5  1  Canada          0

I was able to find the sales price for each country as follows:

df %>% group_by(country) %>% 
  summarise(n_total=n(), per_total=round(n()/nrow(.),digits= 4)*100,
            sales_rate=sum(sold_items[sold_items==1])/n_total * 100)

 country n_total per_total sales_rate
1 Canada        5      29.4         80
2 Mexico        4      23.5         75
3 UK            4      23.5         25
4 USA           4      23.5         50

However I need to add 5 separate columns corresponding to each country to show the top 2 sales IDs in that country and the sales proportion of each top ID in that country, For example, in Canada the sales_rate is 80(%) and I need to know how much is it by the top_ID_1 (%) and how much by top_ID_1 (%). Also, eventually, one column to aggregate the names of top IDs in each province.
So, my idea hypothetical data set would look like:

country   n_total   per_total     sales_rate  top_ID_1   top_ID_1 (%)   top_ID_2  top_ID_2(%)  names_top_IDs
1 Canada        5      29.4         80
2 Mexico        4      23.5         75
3 UK            4      23.5         25
4 USA           4      23.5         50

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

往昔成烟 2025-01-25 01:57:53

我认为这会解决问题：

df %>% 
  group_by(country) %>% 
  summarise(
    n_total = n(), 
    per_total = round(n()/nrow(.),digits= 4)*100,
    sales_rate = sum(sold_items)/n_total * 100, 
    top_ID_1 = names(sort(table(ID), decreasing = TRUE)[1]), 
    top_ID_1_per = sum(sold_items[ID == top_ID_1]/n_total * 100)
  )

尽管我认为它不会很好地处理联系...

I think this will do the trick:

df %>% 
  group_by(country) %>% 
  summarise(
    n_total = n(), 
    per_total = round(n()/nrow(.),digits= 4)*100,
    sales_rate = sum(sold_items)/n_total * 100, 
    top_ID_1 = names(sort(table(ID), decreasing = TRUE)[1]), 
    top_ID_1_per = sum(sold_items[ID == top_ID_1]/n_total * 100)
  )

Although I don't think it will handle ties nicely...

回复收藏 0 原文

~没有更多了~