通过汇总列减少底漆数量

发布于 2025-02-03 11:22:10 字数 719 浏览 4 评论 0原文

我在以下结构中有一个数据框:

structure(list(SUB_DISTRICT_CODE = c(90101L, 90101L, 90101L, 
90101L, 90101L, 90101L, 90102L, 90102L, 90102L, 90102L, 90102L, 
90102L, 90103L, 90103L, 90103L, 90103L, 90103L, 90103L), SEX = c(1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
2L), AGR3 = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L), TOTAL_per_GROUP = c(184L, 1245L, 456L, 
167L, 1216L, 567L, 91L, 463L, 150L, 96L, 476L, 217L, 118L, 618L, 
256L, 116L, 627L, 293L)), row.names = 21295:21312, class = "data.frame")

目前,每个sub_district代码都有6个条目。在最终的数据框中,每个sub_district_code只有3个条目(一个agr3的每个唯一值一个)。 列性别应删除,总_per_group值由AGR3列总结。我该如何以一种简单的方式(使用dplyr)做到这一点? 谢谢

I have a dataframe in the following structure:

structure(list(SUB_DISTRICT_CODE = c(90101L, 90101L, 90101L, 
90101L, 90101L, 90101L, 90102L, 90102L, 90102L, 90102L, 90102L, 
90102L, 90103L, 90103L, 90103L, 90103L, 90103L, 90103L), SEX = c(1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
2L), AGR3 = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L), TOTAL_per_GROUP = c(184L, 1245L, 456L, 
167L, 1216L, 567L, 91L, 463L, 150L, 96L, 476L, 217L, 118L, 618L, 
256L, 116L, 627L, 293L)), row.names = 21295:21312, class = "data.frame")

At the moment there are 6 entries for every SUB_DISTRICT CODE. In the final dataframe there should only be 3 entries for every SUB_DISTRICT_CODE (one for each unique value of AGR3).
The column SEX should be dropped and the TOTAL_per_GROUP values be summarized by the AGR3 column. How can I do this in a easy way (using dplyr)?
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

他不在意 2025-02-10 11:22:10

尝试使用group_by()用于agr3sub_district_code,然后summarize()

df %>% group_by(AGR3, SUB_DISTRICT_CODE) %>% 
  summarise(sum = sum(TOTAL_per_GROUP))

output:output:output:output:

# AGR3 SUB_DISTRICT_CODE   sum
# <int>             <int> <int>
# 1     1             90101   351
# 2     1             90102   187
# 3     1             90103   234
# 4     2             90101  2461
# 5     2             90102   939
# 6     2             90103  1245
# 7     3             90101  1023
# 8     3             90102   367
# 9     3             90103   549

Try using group_by() for AGR3 and SUB_DISTRICT_CODE and then summarise():

df %>% group_by(AGR3, SUB_DISTRICT_CODE) %>% 
  summarise(sum = sum(TOTAL_per_GROUP))

Output:

# AGR3 SUB_DISTRICT_CODE   sum
# <int>             <int> <int>
# 1     1             90101   351
# 2     1             90102   187
# 3     1             90103   234
# 4     2             90101  2461
# 5     2             90102   939
# 6     2             90103  1245
# 7     3             90101  1023
# 8     3             90102   367
# 9     3             90103   549
真心难拥有 2025-02-10 11:22:10
# Base R solution: base_r_res => data.frame 
base_r_res <- aggregate(
  TOTAL_per_GROUP ~ AGR3 + SUB_DISTRICT_CODE,
  data = df, 
  FUN = sum
)

# Send result to console: data.frame => stdout(console)
base_r_res

# Data table solution: import library
library(data.table)

# Aggregate by group: dt_res => data.table
dt_res = setDT(df)[,.(TOTAL_per_GROUP=sum(TOTAL_per_GROUP)), by = list(AGR3, SUB_DISTRICT_CODE)]

# Send result to console: data.table => stdout(console)
dt_res
# Base R solution: base_r_res => data.frame 
base_r_res <- aggregate(
  TOTAL_per_GROUP ~ AGR3 + SUB_DISTRICT_CODE,
  data = df, 
  FUN = sum
)

# Send result to console: data.frame => stdout(console)
base_r_res

# Data table solution: import library
library(data.table)

# Aggregate by group: dt_res => data.table
dt_res = setDT(df)[,.(TOTAL_per_GROUP=sum(TOTAL_per_GROUP)), by = list(AGR3, SUB_DISTRICT_CODE)]

# Send result to console: data.table => stdout(console)
dt_res
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文