如何计算R中的累积和一个特定列?
我有几年和副产品的销售数据,可以说这样的:
Year <- c(2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012)
Model <- c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12")
df <- data.frame(Year, Model, Sale)
首先,我需要计算“共享”列,该列代表每年在每年之内的每种产品的份额。
计算出这样的累积分享之后:
仅将这些产品保留在整个数据框中 +添加排名列(基于去年),并将所有其余产品总结为“其他”类别。因此,最终的数据框应该是这样的:
I have the data about sales by years and by-products, let's say like this:
Year <- c(2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012)
Model <- c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12")
df <- data.frame(Year, Model, Sale)
Firstly I need to calculate the "Share" column which represents the share of each product within each year.
After I compute cumulative share like this:
In the 3rd step need to identify products that accumulate total sales up to 70% in the last year (2012 in this case) and keep only these products in the whole dataframe + add a ranking column (based on last year) and summarises all the rest of products as category "other". So the final dataframe should be like this:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个相当复杂的数据争吵任务,但是可以使用
dplyr
:请注意,在输出中,该代码已将组保留为
a
a 和c
,而不是c
和d
,如预期的输出。这是因为a
和d
在最后一年(16.6)具有相同的值,因此可以选择要么选择。由
This is a fairly complex data wrangling task, but can be achieved using
dplyr
:Note that in the output this code has kept groups
a
andc
, rather thanc
andd
, as in your expected output. This is becausea
andd
have the same value in the final year (16.6), and therefore either could be chosen.Created on 2022-04-21 by the reprex package (v2.0.1)
给
我不明白您在第3步中的意思,您如何确定要保留哪些产品?
gives
I don't understand what you mean by step 3, how do you decide which products to keep?