如何在dplyr中敲打并创建一个新列

发布于 2025-02-01 11:39:50 字数 3164 浏览 1 评论 0原文

我被dplyr（再次！）坚持，并试图解决我的问题而不死于Attemp。

我的DF的第一行看起来像这样：

df <- structure(list(fecha = c(1990, 1990, 1990, 1990, 1990, 1990, 
1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990), cientifico = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Argentina sphyraena", class = "factor"), 
    dem_sect = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AB", "EP", "FE", "MF", 
    "PA"), class = "factor"), sector = c("EPb", "EPc", "EPc", 
    "EPb", "EPa", "EPa", "EPb", "EPc", "EPb", "EPb", "EPb", "EPb", 
    "EPb", "EPb", "EPa"), md_area = c(3010.44, 665.88, 665.88, 
    3010.44, 1273.65, 1273.65, 3010.44, 665.88, 3010.44, 3010.44, 
    3010.44, 3010.44, 3010.44, 3010.44, 1273.65), md_peso = c(1.42957605985037, 
    1.04499099099099, 1.04499099099099, 1.42957605985037, 1.24025925925926, 
    1.24025925925926, 1.42957605985037, 1.04499099099099, 1.42957605985037, 
    1.42957605985037, 1.42957605985037, 1.42957605985037, 1.42957605985037, 
    1.42957605985037, 1.24025925925926), dummy = c(4303.65295361596, 
    695.838601081081, 695.838601081081, 4303.65295361596, 1579.65620555556, 
    1579.65620555556, 4303.65295361596, 695.838601081081, 4303.65295361596, 
    4303.65295361596, 4303.65295361596, 4303.65295361596, 4303.65295361596, 
    4303.65295361596, 1579.65620555556)), row.names = c(NA, -15L
), class = "data.frame")

我正在尝试“翻译”此：sumsect＆lt; - tapply（md_peso * md_area，as.factor（substr（substr（names（sector），1，2）），），总和）进入dplyr。但是尽管我尝试了许多方法，但没有成功。我添加了一个列（“ dem_sect” ），这将是as.factor（substr（substr（names（sector），1，2））） 问题，但我失败了。

所需的输出将是一个具有新列的数据框架：“ sumsect” （在这种情况下为6579.148（MD_PESO * MD_AREA的总和）sector（1579.6562 + 4303.6530 + 695.8386））

    fecha  cientifico          dem_sect sector md_area md_peso  dummy  sumsect
1   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
2   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
3   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
4   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
5   1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148
6   1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148
7   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
8   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
9   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
10  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
11  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
12  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
13  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
14  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
15  1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148

任何提示都将非常感谢

原文

I´m stuck with dplyr (again!) and trying to solve my problem without dying in the attemp.

The first lines of my df look like this:

df <- structure(list(fecha = c(1990, 1990, 1990, 1990, 1990, 1990, 
1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990), cientifico = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Argentina sphyraena", class = "factor"), 
    dem_sect = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AB", "EP", "FE", "MF", 
    "PA"), class = "factor"), sector = c("EPb", "EPc", "EPc", 
    "EPb", "EPa", "EPa", "EPb", "EPc", "EPb", "EPb", "EPb", "EPb", 
    "EPb", "EPb", "EPa"), md_area = c(3010.44, 665.88, 665.88, 
    3010.44, 1273.65, 1273.65, 3010.44, 665.88, 3010.44, 3010.44, 
    3010.44, 3010.44, 3010.44, 3010.44, 1273.65), md_peso = c(1.42957605985037, 
    1.04499099099099, 1.04499099099099, 1.42957605985037, 1.24025925925926, 
    1.24025925925926, 1.42957605985037, 1.04499099099099, 1.42957605985037, 
    1.42957605985037, 1.42957605985037, 1.42957605985037, 1.42957605985037, 
    1.42957605985037, 1.24025925925926), dummy = c(4303.65295361596, 
    695.838601081081, 695.838601081081, 4303.65295361596, 1579.65620555556, 
    1579.65620555556, 4303.65295361596, 695.838601081081, 4303.65295361596, 
    4303.65295361596, 4303.65295361596, 4303.65295361596, 4303.65295361596, 
    4303.65295361596, 1579.65620555556)), row.names = c(NA, -15L
), class = "data.frame")

I´m trying to "translate" this: sumsect <- tapply(md_peso * md_area, as.factor(substr(names(sector), 1, 2)), sum) into dplyr. But with no success although I´ve tried many many approaches. I added a column ("dem_sect") which will be the result of as.factor(substr(names(sector), 1, 2)) in an attempt to solve the problem, but I failed.

The desired output would be a data frame with a new column: "sumsect" (with the same value (in this case 6579.148 (the sum of md_peso * md_area by sector (1579.6562 + 4303.6530 + 695.8386))

    fecha  cientifico          dem_sect sector md_area md_peso  dummy  sumsect
1   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
2   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
3   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
4   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
5   1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148
6   1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148
7   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
8   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
9   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
10  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
11  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
12  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
13  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
14  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
15  1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148

Any hint will be more than welcome. Thanks in advance

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

末が日狂欢 2025-02-08 11:39:50

您只需突变，然后总结唯一假人的值，

df |> 
  mutate(sumsect = sum(unique(dummy)))

如果您依赖MD_AREA和MD_PESO，则可以使用：

df |> 
  mutate(sumsect = sum(unique(md_area * md_peso)))

You can just mutate and then summarise the unique values of dummy

df |> 
  mutate(sumsect = sum(unique(dummy)))

if you're reliant on md_area and md_peso you can use:

df |> 
  mutate(sumsect = sum(unique(md_area * md_peso)))

回复收藏 0 原文

丿*梦醉红颜 2025-02-08 11:39:50

更新：看到@Jahi Zamy答案+1也可以使用无分组：分组将有机会控制真实数据集中的不同组：

df %>% 
  mutate(sumsect = sum(unique( md_peso * md_area)))

第一个答案：
您可以使用dplyr>：诀窍是使用group_by，然后ungroup（）和unique> unique> >值。如果您想对特定组总结，则代替un -group使用group_by所需的组：

df %>% 
  group_by(sector) %>% 
  mutate(y = md_peso * md_area) %>% 
  ungroup() %>% 
  mutate(sumsect = sum(unique(y)), .keep="unused")

   fecha cientifico          dem_sect sector md_area md_peso dummy sumsect
   <dbl> <fct>               <fct>    <chr>    <dbl>   <dbl> <dbl>   <dbl>
 1  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
 2  1990 Argentina sphyraena EP       EPc       666.    1.04  696.   6579.
 3  1990 Argentina sphyraena EP       EPc       666.    1.04  696.   6579.
 4  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
 5  1990 Argentina sphyraena EP       EPa      1274.    1.24 1580.   6579.
 6  1990 Argentina sphyraena EP       EPa      1274.    1.24 1580.   6579.
 7  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
 8  1990 Argentina sphyraena EP       EPc       666.    1.04  696.   6579.
 9  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
10  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
11  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
12  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
13  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
14  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
15  1990 Argentina sphyraena EP       EPa      1274.    1.24 1580.   6579.

Update: Seeing @Jahi Zamy answer+1 it is also possible using no grouping: Grouping would have the chance to control over different groups in the real data set:

df %>% 
  mutate(sumsect = sum(unique( md_peso * md_area)))

First answer:
You can do it this way with dplyr: The trick is using group_by and then ungroup() and sum with unique values. In case you want to sum for specific groups, then instead of ungroup use group_by the desired group:

df %>% 
  group_by(sector) %>% 
  mutate(y = md_peso * md_area) %>% 
  ungroup() %>% 
  mutate(sumsect = sum(unique(y)), .keep="unused")

   fecha cientifico          dem_sect sector md_area md_peso dummy sumsect
   <dbl> <fct>               <fct>    <chr>    <dbl>   <dbl> <dbl>   <dbl>
 1  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
 2  1990 Argentina sphyraena EP       EPc       666.    1.04  696.   6579.
 3  1990 Argentina sphyraena EP       EPc       666.    1.04  696.   6579.
 4  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
 5  1990 Argentina sphyraena EP       EPa      1274.    1.24 1580.   6579.
 6  1990 Argentina sphyraena EP       EPa      1274.    1.24 1580.   6579.
 7  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
 8  1990 Argentina sphyraena EP       EPc       666.    1.04  696.   6579.
 9  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
10  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
11  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
12  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
13  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
14  1990 Argentina sphyraena EP       EPb      3010.    1.43 4304.   6579.
15  1990 Argentina sphyraena EP       EPa      1274.    1.24 1580.   6579.

回复收藏 0 原文

带刺的爱情 2025-02-08 11:39:50

您不需要tapply如果您将使用dpylr。 no necesitas tapply si vas a trabajar con dpylr。

library(tidyverse)
df %>% # target dataframe
  cbind( # we will join a value as a new column for every row
    df %>% # work with dataframe df
    group_by(sector) %>% # calculate by sector
    summarise(sumsect = unique(md_area*md_peso)) %>% # the md_area*md _peso
    ungroup() %>% # remove grouping
    summarise(sumsect = sum(sumsect)) # sum the 3 calculated values
  )

输出：

   fecha          cientifico dem_sect sector md_area  md_peso     dummy  sumsect
1   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
2   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
3   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
4   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
5   1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148
6   1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148
7   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
8   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
9   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
10  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
11  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
12  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
13  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
14  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
15  1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148

如果您想通过分组 cientifico 或fecha或两者都可以分组它们。在您的示例中，只有一个。

En tu Ejemplo Solo Tienes 1 Fecha y 1 Cientifico。 si quieres que execect seect seect distinto para cada leve de esas columnas no te olvides de agrupartambiénConsascolumans。

You don't need tapply if you will work with dpylr. No necesitas tapply si vas a trabajar con dpylr.

library(tidyverse)
df %>% # target dataframe
  cbind( # we will join a value as a new column for every row
    df %>% # work with dataframe df
    group_by(sector) %>% # calculate by sector
    summarise(sumsect = unique(md_area*md_peso)) %>% # the md_area*md _peso
    ungroup() %>% # remove grouping
    summarise(sumsect = sum(sumsect)) # sum the 3 calculated values
  )

Output:

   fecha          cientifico dem_sect sector md_area  md_peso     dummy  sumsect
1   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
2   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
3   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
4   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
5   1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148
6   1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148
7   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
8   1990 Argentina sphyraena       EP    EPc  665.88 1.044991  695.8386 6579.148
9   1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
10  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
11  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
12  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
13  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
14  1990 Argentina sphyraena       EP    EPb 3010.44 1.429576 4303.6530 6579.148
15  1990 Argentina sphyraena       EP    EPa 1273.65 1.240259 1579.6562 6579.148

If it is possible that you want to calculate sumsect by grouped cientifico or fecha or both you can group them. In your example there is only one.

En tu ejemplo solo tienes 1 fecha y 1 cientifico. Si quieres que sumsect sea distinto para cada level de esas columnas no te olvides de agrupar también con esas columnas.

回复收藏 0 原文

~没有更多了~