找到子组中的最大比例
我有一个数据集可以共享一小部分:
ID=c(1,1,2,3,3,1,2,4,2,1,2,1,4,3,1,2,3)
country=c("USA","Canada","Mexico","UK","UK","Mexico",
"USA","Canada","Canada","Mexico","UK","Mexico","Canada","Canada",
"USA","USA","UK")
sold_items=c(1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,0,0)
df <- data.frame(ID,country,sold_items)
> df %>% sample_n(5)
ID country sold_items
1 3 UK 0
2 1 USA 1
3 3 UK 0
4 2 USA 0
5 1 Canada 0
我能够找到每个国家 /地区的销售价格:
df %>% group_by(country) %>%
summarise(n_total=n(), per_total=round(n()/nrow(.),digits= 4)*100,
sales_rate=sum(sold_items[sold_items==1])/n_total * 100)
country n_total per_total sales_rate
1 Canada 5 29.4 80
2 Mexico 4 23.5 75
3 UK 4 23.5 25
4 USA 4 23.5 50
但是,我需要添加与每个国家相对应的5个单独的列,以在其中显示前2个销售ID国家和该国家/地区每个顶级ID的销售比例,例如在加拿大
sales_rate
是80(%)
,我需要知道top_id_1(%)
top_id_1(%)
多少钱。此外,最终,一列汇总了每个省的顶级ID名称。 因此,我的想法假设数据集看起来像:
country n_total per_total sales_rate top_ID_1 top_ID_1 (%) top_ID_2 top_ID_2(%) names_top_IDs
1 Canada 5 29.4 80
2 Mexico 4 23.5 75
3 UK 4 23.5 25
4 USA 4 23.5 50
I have a data set that I can share a small piece of it:
ID=c(1,1,2,3,3,1,2,4,2,1,2,1,4,3,1,2,3)
country=c("USA","Canada","Mexico","UK","UK","Mexico",
"USA","Canada","Canada","Mexico","UK","Mexico","Canada","Canada",
"USA","USA","UK")
sold_items=c(1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,0,0)
df <- data.frame(ID,country,sold_items)
> df %>% sample_n(5)
ID country sold_items
1 3 UK 0
2 1 USA 1
3 3 UK 0
4 2 USA 0
5 1 Canada 0
I was able to find the sales price for each country as follows:
df %>% group_by(country) %>%
summarise(n_total=n(), per_total=round(n()/nrow(.),digits= 4)*100,
sales_rate=sum(sold_items[sold_items==1])/n_total * 100)
country n_total per_total sales_rate
1 Canada 5 29.4 80
2 Mexico 4 23.5 75
3 UK 4 23.5 25
4 USA 4 23.5 50
However I need to add 5 separate columns corresponding to each country to show the top 2 sales IDs in that country and the sales proportion of each top ID in that country, For example, in Canada
the sales_rate
is 80(%)
and I need to know how much is it by the top_ID_1 (%)
and how much by top_ID_1 (%)
. Also, eventually, one column to aggregate the names of top IDs in each province.
So, my idea hypothetical data set would look like:
country n_total per_total sales_rate top_ID_1 top_ID_1 (%) top_ID_2 top_ID_2(%) names_top_IDs
1 Canada 5 29.4 80
2 Mexico 4 23.5 75
3 UK 4 23.5 25
4 USA 4 23.5 50
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这会解决问题:
尽管我认为它不会很好地处理联系...
I think this will do the trick:
Although I don't think it will handle ties nicely...