将组子标题和小计行添加到 R 中的 data.frame 或表

发布于 2025-01-11 15:03:06 字数 5738 浏览 3 评论 0原文

目标

我希望在表格中添加副标题和小计/边距行。最终，我正在寻找如下所示的结构，我将使用 openxlsx 和 writeData 将其导出到 Excel。

	2019	2020	2021
A
A1	1001	1157	911
A2	1005	803	1110
A3	1125	897	1190
总计 A	3131	2857	3211
B
B1	806	982	1098
B2	1106	945	1080
B3	1057	1123	867
总计 B	2969	3050	3045
C
C1	847	1087	1140
C2	1146	966	1176
C3	1071	915	892
总计 C	3064	2968	3208
总计全部	9164	8875	9464

我怀疑小标题和小计是完全不同的问题，但我在这里问这两个问题，以防有一个与每个问题相关的通用方法。

到目前为止可重现的代码

创建示例数据（长格式）：

d <- data.frame(
  year = rep(c(2019, 2020, 2021), times = 9),
  sector = rep(c("A","B","C"),each = 9),
  subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
  value = sample(800:1200, 27, replace = FALSE)
)

输出：

head(d)
#>   year sector subsector value
#> 1 2019      A        A1  1001
#> 2 2020      A        A1  1157
#> 3 2021      A        A1   911
#> 4 2019      A        A2  1005
#> 5 2020      A        A2   803
#> 6 2021      A        A2  1110

格式化宽并添加边距（总计）行：

library(janitor)
#[snip]warnings[/snip]
library(tidyverse)
#[snip]warnings[/snip]

d %>%
    group_by(year, sector, subsector) %>%
    summarise(sales = sum(value, na.rm = TRUE)) %>% 
    pivot_wider(names_from = year, values_from = sales) %>%
    janitor::adorn_totals(where = "row")

输出：

#> `summarise()` has grouped output by 'year', 'sector'. You can override using the `.groups` argument.
#>  sector subsector 2019 2020 2021
#>       A        A1 1001 1157  911
#>       A        A2 1005  803 1110
#>       A        A3 1125  897 1190
#>       B        B1  806  982 1098
#>       B        B2 1106  945 1080
#>       B        B3 1057 1123  867
#>       C        C1  847 1087 1140
#>       C        C2 1146  966 1176
#>       C        C3 1071  915  892
#>   Total         - 9164 8875 9464

^{由 reprex 包于 2022 年 3 月 2 日创建 (v2.0.1)}

janitor 包的 adorn_totals() 函数非常适合为整个集合添加边距行或列。 Sam Firke 的回复此处暗示了使用 tidyr::gather 的解决方案，但我的数据位于不同的格式。我不想“收集”专栏。同一线程中的其他人显示了解决方案，但他们将所有总数放在表格的末尾。

我可以想象一个解决方案，我循环遍历部门因素并为每个部门组装和组合表格，但我怀疑我对此想得太多，并且有一个更简单的解决方案。

是否有针对此目标的现有解决方案，或者有效/普遍实现此目标的想法？

请注意：每个扇区的子扇区数量在实际数据中会有所不同（即，有些可能只有一个子扇区，其他可能有多个），并且没有将子扇区与扇区相关的命名约定（即，父部门不会成为子部门名称的一部分：而不是部门：“A”，子部门：“A1”，它可能是部门：“制造”，子部门：“汽车”）。

@akrun - 解决方案！

你的回答让我完成了 90% 的工作，你随后的评论引导我找到了剩下的解决方案。

gt 有一个函数 as_raw_html()，它使用 xml2::read_html() 和 rvest::html_table() > 将 gt() 对象转换为 tibble，同时保留子标题。

library(dplyr)
library(tidyr)
library(purrr)
library(gt)
library(xml2)
library(rvest)

d <- data.frame(
  year = rep(c(2019, 2020, 2021), times = 9),
  sector = rep(c("A","B","C"),each = 9),
  subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
  value = sample(800:1200, 27, replace = FALSE)
)

d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  gt() %>% 
  gt::as_raw_html() %>% 
  xml2::read_html() %>% 
  rvest::html_table()
#> [[1]]
#> # A tibble: 15 x 4
#>    subsector `2019` `2020` `2021`
#>    <chr>     <chr>  <chr>  <chr> 
#>  1 A         A      A      A     
#>  2 A1        932    1117   800   
#>  3 A2        925    1078   1090  
#>  4 A3        816    1058   1146  
#>  5 Total     2673   3253   3036  
#>  6 B         B      B      B     
#>  7 B1        862    1181   947   
#>  8 B2        1083   812    912   
#>  9 B3        1079   1130   1097  
#> 10 Total     3024   3123   2956  
#> 11 C         C      C      C     
#> 12 C1        966    895    944   
#> 13 C2        970    1147   1166  
#> 14 C3        1043   1116   826   
#> 15 Total     2979   3158   2936

^{由 reprex 包 (v2.0.1)}于 2022 年 3 月 2 日

创建子标题行在所有列中重复扇区名称；除此之外，它看起来不错。

有趣的是，rvest 还有一个 read_html 函数，甚至可能引用 xml2::read_html() 函数，但它在这种情况下不起作用。

原文

Objective

I wish to add subheader and subtotal/margin rows within a table. Ultimately, I am looking for a structure shown below, which I will export to Excel with openxlsx and writeData.

	2019	2020	2021
A
A1	1001	1157	911
A2	1005	803	1110
A3	1125	897	1190
Total A	3131	2857	3211
B
B1	806	982	1098
B2	1106	945	1080
B3	1057	1123	867
Total B	2969	3050	3045
C
C1	847	1087	1140
C2	1146	966	1176
C3	1071	915	892
Total C	3064	2968	3208
Total All	9164	8875	9464

I suspect the subheaders and subtotals are completely different questions, but I am asking both here in case there is a common method related to each.

Reproducible Code So Far

Create the Sample Data (long format):

d <- data.frame(
  year = rep(c(2019, 2020, 2021), times = 9),
  sector = rep(c("A","B","C"),each = 9),
  subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
  value = sample(800:1200, 27, replace = FALSE)
)

Output:

head(d)
#>   year sector subsector value
#> 1 2019      A        A1  1001
#> 2 2020      A        A1  1157
#> 3 2021      A        A1   911
#> 4 2019      A        A2  1005
#> 5 2020      A        A2   803
#> 6 2021      A        A2  1110

Format wide and add a margin (total) row:

library(janitor)
#[snip]warnings[/snip]
library(tidyverse)
#[snip]warnings[/snip]

d %>%
    group_by(year, sector, subsector) %>%
    summarise(sales = sum(value, na.rm = TRUE)) %>% 
    pivot_wider(names_from = year, values_from = sales) %>%
    janitor::adorn_totals(where = "row")

Output:

#> `summarise()` has grouped output by 'year', 'sector'. You can override using the `.groups` argument.
#>  sector subsector 2019 2020 2021
#>       A        A1 1001 1157  911
#>       A        A2 1005  803 1110
#>       A        A3 1125  897 1190
#>       B        B1  806  982 1098
#>       B        B2 1106  945 1080
#>       B        B3 1057 1123  867
#>       C        C1  847 1087 1140
#>       C        C2 1146  966 1176
#>       C        C3 1071  915  892
#>   Total         - 9164 8875 9464

^{Created on 2022-03-02 by the reprex package (v2.0.1)}

The janitor package's adorn_totals() function works well for adding a margin row or column for the entire set. And Sam Firke's response here hints at a solution using tidyr::gather but my data is in a different format. I don't want to "gather" the columns. Others in the same thread show solutions but they place all the totals at the end of the table.

I can imagine a solution where I loop through the sector factors and assemble and combine tables for each sector, but I suspect I am overthinking this and there is a simpler solution.

Is there an existing solution for this objective, or ideas on accomplishing this efficiently/universally?

Please Note: the number of subsectors per sector will vary in the actual data (i.e., some may have only one subsector, others may have several), and there is no naming convention relating the subsector to the sector (i.e., the parent sector will not be part of the child subsectors name: rather than Sector: "A", Subsector: "A1", it might be Sector: "Manufacturing", Subsector: "Cars").

@akrun -- Solution!

Your answer got me 90% of the way there and your subsequent comments lead me to the remaining solution.

gt has a function as_raw_html() which, using xml2::read_html() and rvest::html_table() convert the gt() object to a tibble while keeping the subheaders.

library(dplyr)
library(tidyr)
library(purrr)
library(gt)
library(xml2)
library(rvest)

d <- data.frame(
  year = rep(c(2019, 2020, 2021), times = 9),
  sector = rep(c("A","B","C"),each = 9),
  subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
  value = sample(800:1200, 27, replace = FALSE)
)

d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  gt() %>% 
  gt::as_raw_html() %>% 
  xml2::read_html() %>% 
  rvest::html_table()
#> [[1]]
#> # A tibble: 15 x 4
#>    subsector `2019` `2020` `2021`
#>    <chr>     <chr>  <chr>  <chr> 
#>  1 A         A      A      A     
#>  2 A1        932    1117   800   
#>  3 A2        925    1078   1090  
#>  4 A3        816    1058   1146  
#>  5 Total     2673   3253   3036  
#>  6 B         B      B      B     
#>  7 B1        862    1181   947   
#>  8 B2        1083   812    912   
#>  9 B3        1079   1130   1097  
#> 10 Total     3024   3123   2956  
#> 11 C         C      C      C     
#> 12 C1        966    895    944   
#> 13 C2        970    1147   1166  
#> 14 C3        1043   1116   826   
#> 15 Total     2979   3158   2936

^{Created on 2022-03-02 by the reprex package (v2.0.1)}

The subheader rows repeat the sector name in all columns; other than that, it looks good.

Interestingly, rvest also has a read_html function that might even reference the xml2::read_html() function, but it did not work in this context.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

御弟哥哥 2025-01-18 15:03:07

不要对整个摘要应用 adorn_totals，而是使用 group_modify，然后转换为 gt

library(dplyr)
library(tidyr)
library(purrr)
library(janitor)
library(gt)
d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  
  gt()

-output

一个选项是还可以将列拆分为expss

library(expss)
library(openxlsx)
out <- d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  ungroup %>%
  split_columns(columns = 1) 
wb <- createWorkbook()
sh <- addWorksheet(wb, "Tables")
xl_write(out, wb, sh)
saveWorkbook(wb, file.path(getwd(), "Documents/table1.xlsx"), overwrite = TRUE)

-输出

Instead of applying adorn_totals on the entire summary, use group_modify and then convert to gt

library(dplyr)
library(tidyr)
library(purrr)
library(janitor)
library(gt)
d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  
  gt()

-output

An option is also to split the column with expss

library(expss)
library(openxlsx)
out <- d %>%
  group_by(year, sector, subsector) %>%
  summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>% 
  pivot_wider(names_from = year, values_from = sales) %>%
  group_by(sector) %>%
  group_modify(~ .x %>% adorn_totals(where = "row")) %>%
  ungroup %>%
  split_columns(columns = 1) 
wb <- createWorkbook()
sh <- addWorksheet(wb, "Tables")
xl_write(out, wb, sh)
saveWorkbook(wb, file.path(getwd(), "Documents/table1.xlsx"), overwrite = TRUE)

-output