如何最好地计算R中不同列的相对份额?

发布于 2025-02-07 08:25:01 字数 1671 浏览 2 评论 0原文

以下是示例数据和代码。我有两个问题。首先,我需要indtotal列是twodigit代码的总和,并保持恒定,如下所示。原因是我可以对一个列除以另一列的简单计算,以达到smbshare编号。当我尝试以下内容时,

second <- first %>% 
  group_by(twodigit,smb) %>% 
  summarize(indtotal = sum(employment))

它会通过twodigitsmb
将其分解。
第二个问题是如果该值不存在,则产生0。最好的示例是twodigit 51和smb = 4的代码。如果没有4个不同的smb给定两个数字的值,我正在寻找它产生0。

注意:smb是小型企业

 naicstest <- c (512131,512141,521921,522654,512131,536978,541214,531214,621112,541213,551212,574121,569887,541211,523141,551122,512312,521114,522112)
 employment <- c(11,130,315,17,190,21,22,231,15,121,19,21,350,110,515,165,12,110,111)
 smb <- c(1,2,3,1,3,1,1,3,1,2,1,1,4,2,4,3,1,2,2)

 first <- data.frame(naicstest,employment,smb)

 first<-first %>% mutate(twodigit = substr(naicstest,1,2))

 second <- first %>% group_by(twodigit) %>% summarize(indtotal = sum(employment))

所需结果的简短

  twodigit      indtotal    smb      smbtotal           smbshare
     51            343        1           23  (11+12)     23/343
     51            343        2           130             130/343
     51            343        3           190             190/343
     51            343        4           0               0/343
     52            1068       1           17              23/1068
     52            1068       2           221 (110+111)   221/1068
     52            1068       3           315             315/1068
     52            1068       4           515              515/1068

Below is the sample data and code. I have two issues. First, I need the indtotal column to be the sum by the twodigit code and have it stay constant as shown below. The reasons is so that I can do a simple calculation of one column divided by the other to arrive at the smbshare number. When I try the following,

second <- first %>% 
  group_by(twodigit,smb) %>% 
  summarize(indtotal = sum(employment))

it breaks it down by twodigit and smb.
Second issue is having it produce an 0 if the value does not exist. Best example is twodigit code of 51 and smb = 4. When there are not 4 distinct smb values for a given two digit, I am looking for it to produce a 0.

Note: smb is short for small business

 naicstest <- c (512131,512141,521921,522654,512131,536978,541214,531214,621112,541213,551212,574121,569887,541211,523141,551122,512312,521114,522112)
 employment <- c(11,130,315,17,190,21,22,231,15,121,19,21,350,110,515,165,12,110,111)
 smb <- c(1,2,3,1,3,1,1,3,1,2,1,1,4,2,4,3,1,2,2)

 first <- data.frame(naicstest,employment,smb)

 first<-first %>% mutate(twodigit = substr(naicstest,1,2))

 second <- first %>% group_by(twodigit) %>% summarize(indtotal = sum(employment))

Desired result is below

  twodigit      indtotal    smb      smbtotal           smbshare
     51            343        1           23  (11+12)     23/343
     51            343        2           130             130/343
     51            343        3           190             190/343
     51            343        4           0               0/343
     52            1068       1           17              23/1068
     52            1068       2           221 (110+111)   221/1068
     52            1068       3           315             315/1068
     52            1068       4           515              515/1068

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

长发绾君心 2025-02-14 08:25:01

这为您提供了所需的所有列,但顺序略有不同。您可以使用选择重新排序以您想要的顺序获取它们:

first %>%
  group_by(twodigit, smb) %>%
  summarize(smbtotal = sum(employment)) %>%
  ungroup() %>%
  complete(twodigit, smb, fill = list('smbtotal' = 0)) %>%
  group_by(twodigit) %>%
  mutate(
    indtotal = sum(smbtotal), 
    smbshare = smbtotal / indtotal
  ) 

`summarise()` has grouped output by 'twodigit'. You can override using the `.groups` argument.
# A tibble: 32 × 5
# Groups:   twodigit [8]
   twodigit   smb smbtotal indtotal smbshare
   <chr>    <dbl>    <dbl>    <dbl>    <dbl>
 1 51           1       23      343   0.0671
 2 51           2      130      343   0.379 
 3 51           3      190      343   0.554 
 4 51           4        0      343   0     
 5 52           1       17     1068   0.0159
 6 52           2      221     1068   0.207 
 7 52           3      315     1068   0.295 
 8 52           4      515     1068   0.482 
 9 53           1       21      252   0.0833
10 53           2        0      252   0     
# … with 22 more rows

This gives you all the columns you need, but in a slightly different order. You could use select or relocate to get them in the order you want I suppose:

first %>%
  group_by(twodigit, smb) %>%
  summarize(smbtotal = sum(employment)) %>%
  ungroup() %>%
  complete(twodigit, smb, fill = list('smbtotal' = 0)) %>%
  group_by(twodigit) %>%
  mutate(
    indtotal = sum(smbtotal), 
    smbshare = smbtotal / indtotal
  ) 

`summarise()` has grouped output by 'twodigit'. You can override using the `.groups` argument.
# A tibble: 32 × 5
# Groups:   twodigit [8]
   twodigit   smb smbtotal indtotal smbshare
   <chr>    <dbl>    <dbl>    <dbl>    <dbl>
 1 51           1       23      343   0.0671
 2 51           2      130      343   0.379 
 3 51           3      190      343   0.554 
 4 51           4        0      343   0     
 5 52           1       17     1068   0.0159
 6 52           2      221     1068   0.207 
 7 52           3      315     1068   0.295 
 8 52           4      515     1068   0.482 
 9 53           1       21      252   0.0833
10 53           2        0      252   0     
# … with 22 more rows
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文