找到子组中的最大比例

发布于 2025-01-18 01:57:53 字数 1501 浏览 0 评论 0原文

我有一个数据集可以共享一小部分:

ID=c(1,1,2,3,3,1,2,4,2,1,2,1,4,3,1,2,3)
country=c("USA","Canada","Mexico","UK","UK","Mexico",
          "USA","Canada","Canada","Mexico","UK","Mexico","Canada","Canada",
          "USA","USA","UK")
sold_items=c(1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,0,0)
df <- data.frame(ID,country,sold_items)
> df %>% sample_n(5)
  ID country sold_items
1  3      UK          0
2  1     USA          1
3  3      UK          0
4  2     USA          0
5  1  Canada          0

我能够找到每个国家 /地区的销售价格:

df %>% group_by(country) %>% 
  summarise(n_total=n(), per_total=round(n()/nrow(.),digits= 4)*100,
            sales_rate=sum(sold_items[sold_items==1])/n_total * 100)
 country n_total per_total sales_rate
1 Canada        5      29.4         80
2 Mexico        4      23.5         75
3 UK            4      23.5         25
4 USA           4      23.5         50

但是,我需要添加与每个国家相对应的5个单独的列,以在其中显示前2个销售ID国家和该国家/地区每个顶级ID的销售比例,例如在加拿大 sales_rate80(%),我需要知道top_id_1(%) top_id_1(%)多少钱。此外,最终,一列汇总了每个省的顶级ID名称。 因此,我的想法假设数据集看起来像:

country   n_total   per_total     sales_rate  top_ID_1   top_ID_1 (%)   top_ID_2  top_ID_2(%)  names_top_IDs
1 Canada        5      29.4         80
2 Mexico        4      23.5         75
3 UK            4      23.5         25
4 USA           4      23.5         50

I have a data set that I can share a small piece of it:

ID=c(1,1,2,3,3,1,2,4,2,1,2,1,4,3,1,2,3)
country=c("USA","Canada","Mexico","UK","UK","Mexico",
          "USA","Canada","Canada","Mexico","UK","Mexico","Canada","Canada",
          "USA","USA","UK")
sold_items=c(1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,0,0)
df <- data.frame(ID,country,sold_items)
> df %>% sample_n(5)
  ID country sold_items
1  3      UK          0
2  1     USA          1
3  3      UK          0
4  2     USA          0
5  1  Canada          0

I was able to find the sales price for each country as follows:

df %>% group_by(country) %>% 
  summarise(n_total=n(), per_total=round(n()/nrow(.),digits= 4)*100,
            sales_rate=sum(sold_items[sold_items==1])/n_total * 100)
 country n_total per_total sales_rate
1 Canada        5      29.4         80
2 Mexico        4      23.5         75
3 UK            4      23.5         25
4 USA           4      23.5         50

However I need to add 5 separate columns corresponding to each country to show the top 2 sales IDs in that country and the sales proportion of each top ID in that country, For example, in Canada the sales_rate is 80(%) and I need to know how much is it by the top_ID_1 (%) and how much by top_ID_1 (%). Also, eventually, one column to aggregate the names of top IDs in each province.
So, my idea hypothetical data set would look like:

country   n_total   per_total     sales_rate  top_ID_1   top_ID_1 (%)   top_ID_2  top_ID_2(%)  names_top_IDs
1 Canada        5      29.4         80
2 Mexico        4      23.5         75
3 UK            4      23.5         25
4 USA           4      23.5         50

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

往昔成烟 2025-01-25 01:57:53

我认为这会解决问题:

df %>% 
  group_by(country) %>% 
  summarise(
    n_total = n(), 
    per_total = round(n()/nrow(.),digits= 4)*100,
    sales_rate = sum(sold_items)/n_total * 100, 
    top_ID_1 = names(sort(table(ID), decreasing = TRUE)[1]), 
    top_ID_1_per = sum(sold_items[ID == top_ID_1]/n_total * 100)
  )

尽管我认为它不会很好地处理联系...

I think this will do the trick:

df %>% 
  group_by(country) %>% 
  summarise(
    n_total = n(), 
    per_total = round(n()/nrow(.),digits= 4)*100,
    sales_rate = sum(sold_items)/n_total * 100, 
    top_ID_1 = names(sort(table(ID), decreasing = TRUE)[1]), 
    top_ID_1_per = sum(sold_items[ID == top_ID_1]/n_total * 100)
  )

Although I don't think it will handle ties nicely...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文