将组子标题和小计行添加到 R 中的 data.frame 或表

发布于 2025-01-11 15:03:06 字数 5738 浏览 3 评论 0原文

目标

我希望在表格中添加副标题和小计/边距行。最终,我正在寻找如下所示的结构,我将使用 openxlsxwriteData 将其导出到 Excel。

201920202021
A
A110011157911
A210058031110
A311258971190
总计 A313128573211
B
B18069821098
B211069451080
B310571123867
总计 B296930503045
C
C184710871140
C211469661176
C31071915892
总计 C306429683208
总计 全部916488759464

我怀疑小标题和小计是完全不同的问题,但我在这里问这两个问题,以防有一个与每个问题相关的通用方法。

到目前为止可重现的代码

创建示例数据(长格式):

d <- data.frame(
  year = rep(c(2019, 2020, 2021), times = 9),
  sector = rep(c("A","B","C"),each = 9),
  subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
  value = sample(800:1200, 27, replace = FALSE)
)

输出:

head(d)
#>   year sector subsector value
#> 1 2019      A        A1  1001
#> 2 2020      A        A1  1157
#> 3 2021      A        A1   911
#> 4 2019      A        A2  1005
#> 5 2020      A        A2   803
#> 6 2021      A        A2  1110

格式化宽并添加边距(总计)行:

library(janitor)
#[snip]warnings[/snip]
library(tidyverse)
#[snip]warnings[/snip]

d %>%
    group_by(year, sector, subsector) %>%
    summarise(sales = sum(value, na.rm = TRUE)) %>% 
    pivot_wider(names_from = year, values_from = sales) %>%
    janitor::adorn_totals(where = "row")

输出:

#> `summarise()` has grouped output by 'year', 'sector'. You can override using the `.groups` argument.
#>  sector subsector 2019 2020 2021
#>       A        A1 1001 1157  911
#>       A        A2 1005  803 1110
#>       A        A3 1125  897 1190
#>       B        B1  806  982 1098
#>       B        B2 1106  945 1080
#>       B        B3 1057 1123  867
#>       C        C1  847 1087 1140
#>       C        C2 1146  966 1176
#>       C        C3 1071  915  892
#>   Total         - 9164 8875 9464

reprex 包于 2022 年 3 月 2 日创建 (v2.0.1)

janitor 包的 adorn_totals() 函数非常适合为整个集合添加边距行或列。 Sam Firke 的回复此处暗示了使用 tidyr::gather 的解决方案,但我的数据位于不同的格式。我不想“收集”专栏。同一线程中的其他人显示了解决方案,但他们将所有总数放在表格的末尾。

我可以想象一个解决方案,我循环遍历部门因素并为每个部门组装和组合表格,但我怀疑我对此想得太多,并且有一个更简单的解决方案。

是否有针对此目标的现有解决方案,或者有效/普遍实现此目标的想法?

请注意:每个扇区的子扇区数量在实际数据中会有所不同(即,有些可能只有一个子扇区,其他可能有多个),并且没有将子扇区与扇区相关的命名约定(即,父部门不会成为子部门名称的一部分:而不是部门:“A”,子部门:“A1”,它可能是部门:“制造”,子部门:“汽车”)。

@akrun - 解决方案!

你的回答让我完成了 90% 的工作,你随后的评论引导我找到了剩下的解决方案。

gt 有一个函数 as_raw_html(),它使用 xml2::read_html()rvest::html_table() > 将 gt() 对象转换为 tibble,同时保留子标题。

library(dplyr)
library(tidyr)
library(purrr)
library(gt)
library(xml2)
library(rvest)

d <- data.frame(
  year = rep(c(2019, 2020, 2021), times = 9),
  sector = rep(c("A","B","C"),each = 9),
  subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
  value = sample(800:1200, 27, replace = FALSE)
)

d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  gt() %>% 
  gt::as_raw_html() %>% 
  xml2::read_html() %>% 
  rvest::html_table()
#> [[1]]
#> # A tibble: 15 x 4
#>    subsector `2019` `2020` `2021`
#>    <chr>     <chr>  <chr>  <chr> 
#>  1 A         A      A      A     
#>  2 A1        932    1117   800   
#>  3 A2        925    1078   1090  
#>  4 A3        816    1058   1146  
#>  5 Total     2673   3253   3036  
#>  6 B         B      B      B     
#>  7 B1        862    1181   947   
#>  8 B2        1083   812    912   
#>  9 B3        1079   1130   1097  
#> 10 Total     3024   3123   2956  
#> 11 C         C      C      C     
#> 12 C1        966    895    944   
#> 13 C2        970    1147   1166  
#> 14 C3        1043   1116   826   
#> 15 Total     2979   3158   2936

reprex 包 (v2.0.1)于 2022 年 3 月 2 日

创建子标题行在所有列中重复扇区名称;除此之外,它看起来不错。

有趣的是,rvest 还有一个 read_html 函数,甚至可能引用 xml2::read_html() 函数,但它在这种情况下不起作用。

Objective

I wish to add subheader and subtotal/margin rows within a table. Ultimately, I am looking for a structure shown below, which I will export to Excel with openxlsx and writeData.

201920202021
A
A110011157911
A210058031110
A311258971190
Total A313128573211
B
B18069821098
B211069451080
B310571123867
Total B296930503045
C
C184710871140
C211469661176
C31071915892
Total C306429683208
Total All916488759464

I suspect the subheaders and subtotals are completely different questions, but I am asking both here in case there is a common method related to each.

Reproducible Code So Far

Create the Sample Data (long format):

d <- data.frame(
  year = rep(c(2019, 2020, 2021), times = 9),
  sector = rep(c("A","B","C"),each = 9),
  subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
  value = sample(800:1200, 27, replace = FALSE)
)

Output:

head(d)
#>   year sector subsector value
#> 1 2019      A        A1  1001
#> 2 2020      A        A1  1157
#> 3 2021      A        A1   911
#> 4 2019      A        A2  1005
#> 5 2020      A        A2   803
#> 6 2021      A        A2  1110

Format wide and add a margin (total) row:

library(janitor)
#[snip]warnings[/snip]
library(tidyverse)
#[snip]warnings[/snip]

d %>%
    group_by(year, sector, subsector) %>%
    summarise(sales = sum(value, na.rm = TRUE)) %>% 
    pivot_wider(names_from = year, values_from = sales) %>%
    janitor::adorn_totals(where = "row")

Output:

#> `summarise()` has grouped output by 'year', 'sector'. You can override using the `.groups` argument.
#>  sector subsector 2019 2020 2021
#>       A        A1 1001 1157  911
#>       A        A2 1005  803 1110
#>       A        A3 1125  897 1190
#>       B        B1  806  982 1098
#>       B        B2 1106  945 1080
#>       B        B3 1057 1123  867
#>       C        C1  847 1087 1140
#>       C        C2 1146  966 1176
#>       C        C3 1071  915  892
#>   Total         - 9164 8875 9464

Created on 2022-03-02 by the reprex package (v2.0.1)

The janitor package's adorn_totals() function works well for adding a margin row or column for the entire set. And Sam Firke's response here hints at a solution using tidyr::gather but my data is in a different format. I don't want to "gather" the columns. Others in the same thread show solutions but they place all the totals at the end of the table.

I can imagine a solution where I loop through the sector factors and assemble and combine tables for each sector, but I suspect I am overthinking this and there is a simpler solution.

Is there an existing solution for this objective, or ideas on accomplishing this efficiently/universally?

Please Note: the number of subsectors per sector will vary in the actual data (i.e., some may have only one subsector, others may have several), and there is no naming convention relating the subsector to the sector (i.e., the parent sector will not be part of the child subsectors name: rather than Sector: "A", Subsector: "A1", it might be Sector: "Manufacturing", Subsector: "Cars").

@akrun -- Solution!

Your answer got me 90% of the way there and your subsequent comments lead me to the remaining solution.

gt has a function as_raw_html() which, using xml2::read_html() and rvest::html_table() convert the gt() object to a tibble while keeping the subheaders.

library(dplyr)
library(tidyr)
library(purrr)
library(gt)
library(xml2)
library(rvest)

d <- data.frame(
  year = rep(c(2019, 2020, 2021), times = 9),
  sector = rep(c("A","B","C"),each = 9),
  subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
  value = sample(800:1200, 27, replace = FALSE)
)

d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  gt() %>% 
  gt::as_raw_html() %>% 
  xml2::read_html() %>% 
  rvest::html_table()
#> [[1]]
#> # A tibble: 15 x 4
#>    subsector `2019` `2020` `2021`
#>    <chr>     <chr>  <chr>  <chr> 
#>  1 A         A      A      A     
#>  2 A1        932    1117   800   
#>  3 A2        925    1078   1090  
#>  4 A3        816    1058   1146  
#>  5 Total     2673   3253   3036  
#>  6 B         B      B      B     
#>  7 B1        862    1181   947   
#>  8 B2        1083   812    912   
#>  9 B3        1079   1130   1097  
#> 10 Total     3024   3123   2956  
#> 11 C         C      C      C     
#> 12 C1        966    895    944   
#> 13 C2        970    1147   1166  
#> 14 C3        1043   1116   826   
#> 15 Total     2979   3158   2936

Created on 2022-03-02 by the reprex package (v2.0.1)

The subheader rows repeat the sector name in all columns; other than that, it looks good.

Interestingly, rvest also has a read_html function that might even reference the xml2::read_html() function, but it did not work in this context.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

御弟哥哥 2025-01-18 15:03:07

不要对整个摘要应用 adorn_totals,而是使用 group_modify,然后转换为 gt

library(dplyr)
library(tidyr)
library(purrr)
library(janitor)
library(gt)
d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  
  gt()

-output

在此处输入图像描述


一个选项是还可以将列拆分为expss

library(expss)
library(openxlsx)
out <- d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  ungroup %>%
  split_columns(columns = 1) 
wb <- createWorkbook()
sh <- addWorksheet(wb, "Tables")
xl_write(out, wb, sh)
saveWorkbook(wb, file.path(getwd(), "Documents/table1.xlsx"), overwrite = TRUE)

-输出

在此处输入图像描述

Instead of applying adorn_totals on the entire summary, use group_modify and then convert to gt

library(dplyr)
library(tidyr)
library(purrr)
library(janitor)
library(gt)
d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  
  gt()

-output

enter image description here


An option is also to split the column with expss

library(expss)
library(openxlsx)
out <- d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  ungroup %>%
  split_columns(columns = 1) 
wb <- createWorkbook()
sh <- addWorksheet(wb, "Tables")
xl_write(out, wb, sh)
saveWorkbook(wb, file.path(getwd(), "Documents/table1.xlsx"), overwrite = TRUE)

-output

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文