如何在dplyr中敲打并创建一个新列
我被dplyr(再次!)坚持,并试图解决我的问题而不死于Attemp。
我的DF的第一行看起来像这样:
df <- structure(list(fecha = c(1990, 1990, 1990, 1990, 1990, 1990,
1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990), cientifico = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Argentina sphyraena", class = "factor"),
dem_sect = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AB", "EP", "FE", "MF",
"PA"), class = "factor"), sector = c("EPb", "EPc", "EPc",
"EPb", "EPa", "EPa", "EPb", "EPc", "EPb", "EPb", "EPb", "EPb",
"EPb", "EPb", "EPa"), md_area = c(3010.44, 665.88, 665.88,
3010.44, 1273.65, 1273.65, 3010.44, 665.88, 3010.44, 3010.44,
3010.44, 3010.44, 3010.44, 3010.44, 1273.65), md_peso = c(1.42957605985037,
1.04499099099099, 1.04499099099099, 1.42957605985037, 1.24025925925926,
1.24025925925926, 1.42957605985037, 1.04499099099099, 1.42957605985037,
1.42957605985037, 1.42957605985037, 1.42957605985037, 1.42957605985037,
1.42957605985037, 1.24025925925926), dummy = c(4303.65295361596,
695.838601081081, 695.838601081081, 4303.65295361596, 1579.65620555556,
1579.65620555556, 4303.65295361596, 695.838601081081, 4303.65295361596,
4303.65295361596, 4303.65295361596, 4303.65295361596, 4303.65295361596,
4303.65295361596, 1579.65620555556)), row.names = c(NA, -15L
), class = "data.frame")
我正在尝试“翻译”此:sumsect&lt; - tapply(md_peso * md_area,as.factor(substr(substr(names(sector),1,2)),),总和)
进入dplyr。但是尽管我尝试了许多方法,但没有成功。我添加了一个列(“ dem_sect” ),这将是as.factor(substr(substr(names(sector),1,2)))
问题,但我失败了。
所需的输出将是一个具有新列的数据框架:“ sumsect” (在这种情况下为6579.148(MD_PESO * MD_AREA的总和)sector(1579.6562 + 4303.6530 + 695.8386))
fecha cientifico dem_sect sector md_area md_peso dummy sumsect
1 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
2 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
3 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
4 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
5 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
6 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
7 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
8 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
9 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
10 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
11 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
12 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
13 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
14 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
15 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
任何提示都将非常感谢
I´m stuck with dplyr (again!) and trying to solve my problem without dying in the attemp.
The first lines of my df look like this:
df <- structure(list(fecha = c(1990, 1990, 1990, 1990, 1990, 1990,
1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990), cientifico = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Argentina sphyraena", class = "factor"),
dem_sect = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AB", "EP", "FE", "MF",
"PA"), class = "factor"), sector = c("EPb", "EPc", "EPc",
"EPb", "EPa", "EPa", "EPb", "EPc", "EPb", "EPb", "EPb", "EPb",
"EPb", "EPb", "EPa"), md_area = c(3010.44, 665.88, 665.88,
3010.44, 1273.65, 1273.65, 3010.44, 665.88, 3010.44, 3010.44,
3010.44, 3010.44, 3010.44, 3010.44, 1273.65), md_peso = c(1.42957605985037,
1.04499099099099, 1.04499099099099, 1.42957605985037, 1.24025925925926,
1.24025925925926, 1.42957605985037, 1.04499099099099, 1.42957605985037,
1.42957605985037, 1.42957605985037, 1.42957605985037, 1.42957605985037,
1.42957605985037, 1.24025925925926), dummy = c(4303.65295361596,
695.838601081081, 695.838601081081, 4303.65295361596, 1579.65620555556,
1579.65620555556, 4303.65295361596, 695.838601081081, 4303.65295361596,
4303.65295361596, 4303.65295361596, 4303.65295361596, 4303.65295361596,
4303.65295361596, 1579.65620555556)), row.names = c(NA, -15L
), class = "data.frame")
I´m trying to "translate" this: sumsect <- tapply(md_peso * md_area, as.factor(substr(names(sector), 1, 2)), sum)
into dplyr. But with no success although I´ve tried many many approaches. I added a column ("dem_sect") which will be the result of as.factor(substr(names(sector), 1, 2))
in an attempt to solve the problem, but I failed.
The desired output would be a data frame with a new column: "sumsect" (with the same value (in this case 6579.148 (the sum of md_peso * md_area by sector (1579.6562 + 4303.6530 + 695.8386))
fecha cientifico dem_sect sector md_area md_peso dummy sumsect
1 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
2 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
3 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
4 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
5 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
6 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
7 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
8 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
9 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
10 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
11 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
12 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
13 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
14 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
15 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
Any hint will be more than welcome. Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您只需突变,然后总结
唯一
假人的值,如果您依赖MD_AREA和MD_PESO,则可以使用:
You can just mutate and then summarise the
unique
values of dummyif you're reliant on md_area and md_peso you can use:
更新:看到@Jahi Zamy答案+1也可以使用无分组:分组将有机会控制真实数据集中的不同组:
第一个答案:
您可以使用
dplyr
>:诀窍是使用group_by
,然后ungroup()
和unique> unique> >值。如果您想对特定组总结,则代替
un -group
使用group_by
所需的组:Update: Seeing @Jahi Zamy answer+1 it is also possible using no grouping: Grouping would have the chance to control over different groups in the real data set:
First answer:
You can do it this way with
dplyr
: The trick is usinggroup_by
and thenungroup()
and sum withunique
values. In case you want to sum for specific groups, then instead ofungroup
usegroup_by
the desired group:您不需要
tapply
如果您将使用dpylr
。 no necesitastapply
si vas a trabajar condpylr
。输出:
如果您想通过分组 cientifico 或
fecha
或两者都可以分组它们。在您的示例中,只有一个。En tu Ejemplo Solo Tienes 1 Fecha y 1 Cientifico。 si quieres que execect seect seect distinto para cada leve de esas columnas no te olvides de agrupartambiénConsascolumans。
You don't need
tapply
if you will work withdpylr
. No necesitastapply
si vas a trabajar condpylr
.Output:
If it is possible that you want to calculate
sumsect
by groupedcientifico
orfecha
or both you can group them. In your example there is only one.En tu ejemplo solo tienes 1 fecha y 1 cientifico. Si quieres que sumsect sea distinto para cada level de esas columnas no te olvides de agrupar también con esas columnas.