根据年份和其他分类变量添加共享列中的r中的数据框架

发布于 2025-01-22 15:35:16 字数 652 浏览 0 评论 0原文

我现在有销售数据,状况和产品

Year <- c(2010,2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012,2012)
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12",18,10,17)
Condition <- c("New","New","New","Used","Used","Used","New","New","New","Used","Used","Used","New","New","New","Used","Used","Used")
Product <- c("a","b","c","a","b","c","a","b","c","a","b","c","a","b","c","a","b","c")
df <- data.frame(Year,Condition, Product, Sale)

,现在我想根据每年的状况变量来计算每种产品的份额。我尝试了以下代码,但根据总数按年和“条件”计算得出

df$percentage <- df$Sale/sum(df$Sale)*100

I have sales data by year, condition and products

Year <- c(2010,2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012,2012)
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12",18,10,17)
Condition <- c("New","New","New","Used","Used","Used","New","New","New","Used","Used","Used","New","New","New","Used","Used","Used")
Product <- c("a","b","c","a","b","c","a","b","c","a","b","c","a","b","c","a","b","c")
df <- data.frame(Year,Condition, Product, Sale)

Now I want to calculate the share of each product by condition variable within each year. I tried the following code, but it calculates based on total no by year and "condition"

df$percentage <- df$Sale/sum(df$Sale)*100

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

待天淡蓝洁白时 2025-01-29 15:35:17

首先使用type.convert(as.is = true)将从字符到数字,

然后按所需列进行分组,然后应用汇总

请注意,在您提供的数据框中,由于您提供的数据,您将获得100个百分比:

此伪造的数据,

set.seed(123)
Year <- sample(c(2010, 2011, 2012), 18, replace = TRUE)
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12",18,10,17)
Condition <- sample(c("Used","New"), 18, replace = TRUE)
Product <- sample(c("a","b","c"), 18, replace = TRUE)
df <- data.frame(Year,Condition, Product, Sale)

使用

library(dplyr)
df %>% 
  type.convert(as.is=TRUE) %>% 
  group_by(Year, Product, Condition) %>% 
  summarise(percentage = Sale/sum(Sale)*100)

您将获得:

    Year Product Condition percentage
   <int> <chr>   <chr>          <dbl>
 1  2010 a       Used            83.2
 2  2010 a       Used            16.8
 3  2010 c       New            100  
 4  2011 a       New            100  
 5  2011 a       Used            42.9
 6  2011 a       Used            14.3
 7  2011 a       Used            42.9
 8  2011 b       New            100  
 9  2011 c       New             49.2
10  2011 c       New             50.8
11  2012 a       Used            63.8
12  2012 a       Used            36.2
13  2012 b       New            100  
14  2012 b       Used            69.7
15  2012 b       Used            30.3
16  2012 c       New            100  
17  2012 c       Used            34.8
18  2012 c       Used            65.2

更新:保留 sale> sale column:用 替换突变

df %>% 
  type.convert(as.is=TRUE) %>% 
  group_by(Year, Product, Condition) %>% 
  mutate(percentage = paste(round(Sale/sum(Sale)*100, 1), "%"))
    Year Condition Product  Sale percentage
   <int> <chr>     <chr>   <int> <chr>     
 1  2012 Used      a          30 63.8 %    
 2  2012 New       c          45 100 %     
 3  2012 Used      b          23 69.7 %    
 4  2011 Used      a          33 42.9 %    
 5  2012 Used      c          24 34.8 %    
 6  2011 Used      a          11 14.3 %    
 7  2011 New       a          56 100 %     
 8  2011 New       b          19 100 %     
 9  2012 Used      c          45 65.2 %    
10  2010 New       c          56 100 %     
11  2011 Used      a          33 42.9 %    
12  2011 New       c          32 49.2 %    
13  2010 Used      a          89 83.2 %    
14  2011 New       c          33 50.8 %    
15  2012 New       b          12 100 %     
16  2010 Used      a          18 16.8 %    
17  2012 Used      b          10 30.3 %    
18  2012 Used      a          17 36.2 % 

First convert Sale from character to numeric with type.convert(as.is = TRUE),

then group by the desired columns and apply summarise:

Note that in your provided dataframe you will get 100 for percentage because of your provided data:

With this fake data

set.seed(123)
Year <- sample(c(2010, 2011, 2012), 18, replace = TRUE)
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12",18,10,17)
Condition <- sample(c("Used","New"), 18, replace = TRUE)
Product <- sample(c("a","b","c"), 18, replace = TRUE)
df <- data.frame(Year,Condition, Product, Sale)

using this code

library(dplyr)
df %>% 
  type.convert(as.is=TRUE) %>% 
  group_by(Year, Product, Condition) %>% 
  summarise(percentage = Sale/sum(Sale)*100)

you will get:

    Year Product Condition percentage
   <int> <chr>   <chr>          <dbl>
 1  2010 a       Used            83.2
 2  2010 a       Used            16.8
 3  2010 c       New            100  
 4  2011 a       New            100  
 5  2011 a       Used            42.9
 6  2011 a       Used            14.3
 7  2011 a       Used            42.9
 8  2011 b       New            100  
 9  2011 c       New             49.2
10  2011 c       New             50.8
11  2012 a       Used            63.8
12  2012 a       Used            36.2
13  2012 b       New            100  
14  2012 b       Used            69.7
15  2012 b       Used            30.3
16  2012 c       New            100  
17  2012 c       Used            34.8
18  2012 c       Used            65.2

Update: to keep Sale column: replace summarise with mutate

df %>% 
  type.convert(as.is=TRUE) %>% 
  group_by(Year, Product, Condition) %>% 
  mutate(percentage = paste(round(Sale/sum(Sale)*100, 1), "%"))
    Year Condition Product  Sale percentage
   <int> <chr>     <chr>   <int> <chr>     
 1  2012 Used      a          30 63.8 %    
 2  2012 New       c          45 100 %     
 3  2012 Used      b          23 69.7 %    
 4  2011 Used      a          33 42.9 %    
 5  2012 Used      c          24 34.8 %    
 6  2011 Used      a          11 14.3 %    
 7  2011 New       a          56 100 %     
 8  2011 New       b          19 100 %     
 9  2012 Used      c          45 65.2 %    
10  2010 New       c          56 100 %     
11  2011 Used      a          33 42.9 %    
12  2011 New       c          32 49.2 %    
13  2010 Used      a          89 83.2 %    
14  2011 New       c          33 50.8 %    
15  2012 New       b          12 100 %     
16  2010 Used      a          18 16.8 %    
17  2012 Used      b          10 30.3 %    
18  2012 Used      a          17 36.2 % 
爱她像谁 2025-01-29 15:35:17

这是使用ave()基本解决方案。您可以用所需的任何其他内容替换AVE中的分组变量。

within(df, {
  perc1 = ave(as.numeric(Sale), Year, Product, FUN = proportions) * 100
  perc2 = sprintf("%.1f %%", perc1)
})

   Year Condition Product Sale  perc2     perc1
1  2010       New       a   30 47.6 %  47.61905
2  2010       New       b   45 65.2 %  65.21739
3  2010       New       c   23 67.6 %  67.64706
4  2010      Used       a   33 52.4 %  52.38095
5  2010      Used       b   24 34.8 %  34.78261
6  2010      Used       c   11 32.4 %  32.35294
7  2011       New       a   56 50.0 %  50.00000
8  2011       New       b   19 36.5 %  36.53846
9  2011       New       c   45 58.4 %  58.44156
10 2011      Used       a   56 50.0 %  50.00000
11 2011      Used       b   33 63.5 %  63.46154
12 2011      Used       c   32 41.6 %  41.55844

Here is a base solution using ave(). You can replace grouping variables in ave with any others you want.

within(df, {
  perc1 = ave(as.numeric(Sale), Year, Product, FUN = proportions) * 100
  perc2 = sprintf("%.1f %%", perc1)
})

   Year Condition Product Sale  perc2     perc1
1  2010       New       a   30 47.6 %  47.61905
2  2010       New       b   45 65.2 %  65.21739
3  2010       New       c   23 67.6 %  67.64706
4  2010      Used       a   33 52.4 %  52.38095
5  2010      Used       b   24 34.8 %  34.78261
6  2010      Used       c   11 32.4 %  32.35294
7  2011       New       a   56 50.0 %  50.00000
8  2011       New       b   19 36.5 %  36.53846
9  2011       New       c   45 58.4 %  58.44156
10 2011      Used       a   56 50.0 %  50.00000
11 2011      Used       b   33 63.5 %  63.46154
12 2011      Used       c   32 41.6 %  41.55844
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文