如何最好地计算R中不同列的相对份额?
以下是示例数据和代码。我有两个问题。首先,我需要indtotal
列是twodigit
代码的总和,并保持恒定,如下所示。原因是我可以对一个列除以另一列的简单计算,以达到smbshare
编号。当我尝试以下内容时,
second <- first %>%
group_by(twodigit,smb) %>%
summarize(indtotal = sum(employment))
它会通过twodigit
和smb
。
将其分解。
第二个问题是如果该值不存在,则产生0。最好的示例是twodigit
51和smb = 4
的代码。如果没有4个不同的smb
给定两个数字的值,我正在寻找它产生0。
注意:smb
是小型企业
naicstest <- c (512131,512141,521921,522654,512131,536978,541214,531214,621112,541213,551212,574121,569887,541211,523141,551122,512312,521114,522112)
employment <- c(11,130,315,17,190,21,22,231,15,121,19,21,350,110,515,165,12,110,111)
smb <- c(1,2,3,1,3,1,1,3,1,2,1,1,4,2,4,3,1,2,2)
first <- data.frame(naicstest,employment,smb)
first<-first %>% mutate(twodigit = substr(naicstest,1,2))
second <- first %>% group_by(twodigit) %>% summarize(indtotal = sum(employment))
所需结果的简短
twodigit indtotal smb smbtotal smbshare
51 343 1 23 (11+12) 23/343
51 343 2 130 130/343
51 343 3 190 190/343
51 343 4 0 0/343
52 1068 1 17 23/1068
52 1068 2 221 (110+111) 221/1068
52 1068 3 315 315/1068
52 1068 4 515 515/1068
Below is the sample data and code. I have two issues. First, I need the indtotal
column to be the sum by the twodigit
code and have it stay constant as shown below. The reasons is so that I can do a simple calculation of one column divided by the other to arrive at the smbshare
number. When I try the following,
second <- first %>%
group_by(twodigit,smb) %>%
summarize(indtotal = sum(employment))
it breaks it down by twodigit
and smb
.
Second issue is having it produce an 0 if the value does not exist. Best example is twodigit
code of 51 and smb = 4
. When there are not 4 distinct smb
values for a given two digit, I am looking for it to produce a 0.
Note: smb
is short for small business
naicstest <- c (512131,512141,521921,522654,512131,536978,541214,531214,621112,541213,551212,574121,569887,541211,523141,551122,512312,521114,522112)
employment <- c(11,130,315,17,190,21,22,231,15,121,19,21,350,110,515,165,12,110,111)
smb <- c(1,2,3,1,3,1,1,3,1,2,1,1,4,2,4,3,1,2,2)
first <- data.frame(naicstest,employment,smb)
first<-first %>% mutate(twodigit = substr(naicstest,1,2))
second <- first %>% group_by(twodigit) %>% summarize(indtotal = sum(employment))
Desired result is below
twodigit indtotal smb smbtotal smbshare
51 343 1 23 (11+12) 23/343
51 343 2 130 130/343
51 343 3 190 190/343
51 343 4 0 0/343
52 1068 1 17 23/1068
52 1068 2 221 (110+111) 221/1068
52 1068 3 315 315/1068
52 1068 4 515 515/1068
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这为您提供了所需的所有列,但顺序略有不同。您可以使用
选择
或重新排序
以您想要的顺序获取它们:This gives you all the columns you need, but in a slightly different order. You could use
select
orrelocate
to get them in the order you want I suppose: