通过汇总列减少底漆数量

发布于 2025-02-03 11:22:10 字数 719 浏览 4 评论 0原文

我在以下结构中有一个数据框：

structure(list(SUB_DISTRICT_CODE = c(90101L, 90101L, 90101L, 
90101L, 90101L, 90101L, 90102L, 90102L, 90102L, 90102L, 90102L, 
90102L, 90103L, 90103L, 90103L, 90103L, 90103L, 90103L), SEX = c(1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
2L), AGR3 = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L), TOTAL_per_GROUP = c(184L, 1245L, 456L, 
167L, 1216L, 567L, 91L, 463L, 150L, 96L, 476L, 217L, 118L, 618L, 
256L, 116L, 627L, 293L)), row.names = 21295:21312, class = "data.frame")

目前，每个sub_district代码都有6个条目。在最终的数据框中，每个sub_district_code只有3个条目（一个agr3的每个唯一值一个）。列性别应删除，总_per_group值由AGR3列总结。我该如何以一种简单的方式（使用dplyr）做到这一点？谢谢

原文

I have a dataframe in the following structure:

structure(list(SUB_DISTRICT_CODE = c(90101L, 90101L, 90101L, 
90101L, 90101L, 90101L, 90102L, 90102L, 90102L, 90102L, 90102L, 
90102L, 90103L, 90103L, 90103L, 90103L, 90103L, 90103L), SEX = c(1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
2L), AGR3 = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L), TOTAL_per_GROUP = c(184L, 1245L, 456L, 
167L, 1216L, 567L, 91L, 463L, 150L, 96L, 476L, 217L, 118L, 618L, 
256L, 116L, 627L, 293L)), row.names = 21295:21312, class = "data.frame")

At the moment there are 6 entries for every SUB_DISTRICT CODE. In the final dataframe there should only be 3 entries for every SUB_DISTRICT_CODE (one for each unique value of AGR3).
The column SEX should be dropped and the TOTAL_per_GROUP values be summarized by the AGR3 column. How can I do this in a easy way (using dplyr)?
Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

他不在意 2025-02-10 11:22:10

尝试使用group_by（）用于agr3和sub_district_code，然后summarize（）：

df %>% group_by(AGR3, SUB_DISTRICT_CODE) %>% 
  summarise(sum = sum(TOTAL_per_GROUP))

output：output：output：output：

# AGR3 SUB_DISTRICT_CODE   sum
# <int>             <int> <int>
# 1     1             90101   351
# 2     1             90102   187
# 3     1             90103   234
# 4     2             90101  2461
# 5     2             90102   939
# 6     2             90103  1245
# 7     3             90101  1023
# 8     3             90102   367
# 9     3             90103   549

Try using group_by() for AGR3 and SUB_DISTRICT_CODE and then summarise():

df %>% group_by(AGR3, SUB_DISTRICT_CODE) %>% 
  summarise(sum = sum(TOTAL_per_GROUP))

Output:

# AGR3 SUB_DISTRICT_CODE   sum
# <int>             <int> <int>
# 1     1             90101   351
# 2     1             90102   187
# 3     1             90103   234
# 4     2             90101  2461
# 5     2             90102   939
# 6     2             90103  1245
# 7     3             90101  1023
# 8     3             90102   367
# 9     3             90103   549

回复收藏 0 原文

真心难拥有 2025-02-10 11:22:10

# Base R solution: base_r_res => data.frame 
base_r_res <- aggregate(
  TOTAL_per_GROUP ~ AGR3 + SUB_DISTRICT_CODE,
  data = df, 
  FUN = sum
)

# Send result to console: data.frame => stdout(console)
base_r_res

# Data table solution: import library
library(data.table)

# Aggregate by group: dt_res => data.table
dt_res = setDT(df)[,.(TOTAL_per_GROUP=sum(TOTAL_per_GROUP)), by = list(AGR3, SUB_DISTRICT_CODE)]

# Send result to console: data.table => stdout(console)
dt_res

# Base R solution: base_r_res => data.frame 
base_r_res <- aggregate(
  TOTAL_per_GROUP ~ AGR3 + SUB_DISTRICT_CODE,
  data = df, 
  FUN = sum
)

# Send result to console: data.frame => stdout(console)
base_r_res

# Data table solution: import library
library(data.table)

# Aggregate by group: dt_res => data.table
dt_res = setDT(df)[,.(TOTAL_per_GROUP=sum(TOTAL_per_GROUP)), by = list(AGR3, SUB_DISTRICT_CODE)]

# Send result to console: data.table => stdout(console)
dt_res

回复收藏 0 原文

~没有更多了~