将组子标题和小计行添加到 R 中的 data.frame 或表
目标
我希望在表格中添加副标题和小计/边距行。最终,我正在寻找如下所示的结构,我将使用 openxlsx
和 writeData
将其导出到 Excel。
2019 | 2020 | 2021 | |
---|---|---|---|
A | |||
A1 | 1001 | 1157 | 911 |
A2 | 1005 | 803 | 1110 |
A3 | 1125 | 897 | 1190 |
总计 A | 3131 | 2857 | 3211 |
B | |||
B1 | 806 | 982 | 1098 |
B2 | 1106 | 945 | 1080 |
B3 | 1057 | 1123 | 867 |
总计 B | 2969 | 3050 | 3045 |
C | |||
C1 | 847 | 1087 | 1140 |
C2 | 1146 | 966 | 1176 |
C3 | 1071 | 915 | 892 |
总计 C | 3064 | 2968 | 3208 |
总计 全部 | 9164 | 8875 | 9464 |
我怀疑小标题和小计是完全不同的问题,但我在这里问这两个问题,以防有一个与每个问题相关的通用方法。
到目前为止可重现的代码
创建示例数据(长格式):
d <- data.frame(
year = rep(c(2019, 2020, 2021), times = 9),
sector = rep(c("A","B","C"),each = 9),
subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
value = sample(800:1200, 27, replace = FALSE)
)
输出:
head(d)
#> year sector subsector value
#> 1 2019 A A1 1001
#> 2 2020 A A1 1157
#> 3 2021 A A1 911
#> 4 2019 A A2 1005
#> 5 2020 A A2 803
#> 6 2021 A A2 1110
格式化宽并添加边距(总计)行:
library(janitor)
#[snip]warnings[/snip]
library(tidyverse)
#[snip]warnings[/snip]
d %>%
group_by(year, sector, subsector) %>%
summarise(sales = sum(value, na.rm = TRUE)) %>%
pivot_wider(names_from = year, values_from = sales) %>%
janitor::adorn_totals(where = "row")
输出:
#> `summarise()` has grouped output by 'year', 'sector'. You can override using the `.groups` argument.
#> sector subsector 2019 2020 2021
#> A A1 1001 1157 911
#> A A2 1005 803 1110
#> A A3 1125 897 1190
#> B B1 806 982 1098
#> B B2 1106 945 1080
#> B B3 1057 1123 867
#> C C1 847 1087 1140
#> C C2 1146 966 1176
#> C C3 1071 915 892
#> Total - 9164 8875 9464
由 reprex 包于 2022 年 3 月 2 日创建 (v2.0.1)
janitor
包的 adorn_totals()
函数非常适合为整个集合添加边距行或列。 Sam Firke 的回复此处暗示了使用 tidyr::gather
的解决方案,但我的数据位于不同的格式。我不想“收集”专栏。同一线程中的其他人显示了解决方案,但他们将所有总数放在表格的末尾。
我可以想象一个解决方案,我循环遍历部门因素并为每个部门组装和组合表格,但我怀疑我对此想得太多,并且有一个更简单的解决方案。
是否有针对此目标的现有解决方案,或者有效/普遍实现此目标的想法?
请注意:每个扇区的子扇区数量在实际数据中会有所不同(即,有些可能只有一个子扇区,其他可能有多个),并且没有将子扇区与扇区相关的命名约定(即,父部门不会成为子部门名称的一部分:而不是部门:“A”,子部门:“A1”,它可能是部门:“制造”,子部门:“汽车”)。
@akrun - 解决方案!
你的回答让我完成了 90% 的工作,你随后的评论引导我找到了剩下的解决方案。
gt
有一个函数 as_raw_html()
,它使用 xml2::read_html()
和 rvest::html_table()
> 将 gt()
对象转换为 tibble,同时保留子标题。
library(dplyr)
library(tidyr)
library(purrr)
library(gt)
library(xml2)
library(rvest)
d <- data.frame(
year = rep(c(2019, 2020, 2021), times = 9),
sector = rep(c("A","B","C"),each = 9),
subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
value = sample(800:1200, 27, replace = FALSE)
)
d %>%
group_by(year, sector, subsector) %>%
summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>%
pivot_wider(names_from = year, values_from = sales) %>%
group_by(sector) %>%
group_modify(~ .x %>% adorn_totals(where = "row")) %>%
gt() %>%
gt::as_raw_html() %>%
xml2::read_html() %>%
rvest::html_table()
#> [[1]]
#> # A tibble: 15 x 4
#> subsector `2019` `2020` `2021`
#> <chr> <chr> <chr> <chr>
#> 1 A A A A
#> 2 A1 932 1117 800
#> 3 A2 925 1078 1090
#> 4 A3 816 1058 1146
#> 5 Total 2673 3253 3036
#> 6 B B B B
#> 7 B1 862 1181 947
#> 8 B2 1083 812 912
#> 9 B3 1079 1130 1097
#> 10 Total 3024 3123 2956
#> 11 C C C C
#> 12 C1 966 895 944
#> 13 C2 970 1147 1166
#> 14 C3 1043 1116 826
#> 15 Total 2979 3158 2936
由 reprex 包 (v2.0.1)于 2022 年 3 月 2 日
创建子标题行在所有列中重复扇区名称;除此之外,它看起来不错。
有趣的是,rvest 还有一个 read_html 函数,甚至可能引用 xml2::read_html() 函数,但它在这种情况下不起作用。
Objective
I wish to add subheader and subtotal/margin rows within a table. Ultimately, I am looking for a structure shown below, which I will export to Excel with openxlsx
and writeData
.
2019 | 2020 | 2021 | |
---|---|---|---|
A | |||
A1 | 1001 | 1157 | 911 |
A2 | 1005 | 803 | 1110 |
A3 | 1125 | 897 | 1190 |
Total A | 3131 | 2857 | 3211 |
B | |||
B1 | 806 | 982 | 1098 |
B2 | 1106 | 945 | 1080 |
B3 | 1057 | 1123 | 867 |
Total B | 2969 | 3050 | 3045 |
C | |||
C1 | 847 | 1087 | 1140 |
C2 | 1146 | 966 | 1176 |
C3 | 1071 | 915 | 892 |
Total C | 3064 | 2968 | 3208 |
Total All | 9164 | 8875 | 9464 |
I suspect the subheaders and subtotals are completely different questions, but I am asking both here in case there is a common method related to each.
Reproducible Code So Far
Create the Sample Data (long format):
d <- data.frame(
year = rep(c(2019, 2020, 2021), times = 9),
sector = rep(c("A","B","C"),each = 9),
subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
value = sample(800:1200, 27, replace = FALSE)
)
Output:
head(d)
#> year sector subsector value
#> 1 2019 A A1 1001
#> 2 2020 A A1 1157
#> 3 2021 A A1 911
#> 4 2019 A A2 1005
#> 5 2020 A A2 803
#> 6 2021 A A2 1110
Format wide and add a margin (total) row:
library(janitor)
#[snip]warnings[/snip]
library(tidyverse)
#[snip]warnings[/snip]
d %>%
group_by(year, sector, subsector) %>%
summarise(sales = sum(value, na.rm = TRUE)) %>%
pivot_wider(names_from = year, values_from = sales) %>%
janitor::adorn_totals(where = "row")
Output:
#> `summarise()` has grouped output by 'year', 'sector'. You can override using the `.groups` argument.
#> sector subsector 2019 2020 2021
#> A A1 1001 1157 911
#> A A2 1005 803 1110
#> A A3 1125 897 1190
#> B B1 806 982 1098
#> B B2 1106 945 1080
#> B B3 1057 1123 867
#> C C1 847 1087 1140
#> C C2 1146 966 1176
#> C C3 1071 915 892
#> Total - 9164 8875 9464
Created on 2022-03-02 by the reprex package (v2.0.1)
The janitor
package's adorn_totals()
function works well for adding a margin row or column for the entire set. And Sam Firke's response here hints at a solution using tidyr::gather
but my data is in a different format. I don't want to "gather" the columns. Others in the same thread show solutions but they place all the totals at the end of the table.
I can imagine a solution where I loop through the sector factors and assemble and combine tables for each sector, but I suspect I am overthinking this and there is a simpler solution.
Is there an existing solution for this objective, or ideas on accomplishing this efficiently/universally?
Please Note: the number of subsectors per sector will vary in the actual data (i.e., some may have only one subsector, others may have several), and there is no naming convention relating the subsector to the sector (i.e., the parent sector will not be part of the child subsectors name: rather than Sector: "A", Subsector: "A1", it might be Sector: "Manufacturing", Subsector: "Cars").
@akrun -- Solution!
Your answer got me 90% of the way there and your subsequent comments lead me to the remaining solution.
gt
has a function as_raw_html()
which, using xml2::read_html()
and rvest::html_table()
convert the gt()
object to a tibble while keeping the subheaders.
library(dplyr)
library(tidyr)
library(purrr)
library(gt)
library(xml2)
library(rvest)
d <- data.frame(
year = rep(c(2019, 2020, 2021), times = 9),
sector = rep(c("A","B","C"),each = 9),
subsector = paste0(rep(c("A","B","C"),each = 9), rep(c("1","2","3"), each = 3)),
value = sample(800:1200, 27, replace = FALSE)
)
d %>%
group_by(year, sector, subsector) %>%
summarise(sales = sum(value, na.rm = TRUE), .groups = 'drop') %>%
pivot_wider(names_from = year, values_from = sales) %>%
group_by(sector) %>%
group_modify(~ .x %>% adorn_totals(where = "row")) %>%
gt() %>%
gt::as_raw_html() %>%
xml2::read_html() %>%
rvest::html_table()
#> [[1]]
#> # A tibble: 15 x 4
#> subsector `2019` `2020` `2021`
#> <chr> <chr> <chr> <chr>
#> 1 A A A A
#> 2 A1 932 1117 800
#> 3 A2 925 1078 1090
#> 4 A3 816 1058 1146
#> 5 Total 2673 3253 3036
#> 6 B B B B
#> 7 B1 862 1181 947
#> 8 B2 1083 812 912
#> 9 B3 1079 1130 1097
#> 10 Total 3024 3123 2956
#> 11 C C C C
#> 12 C1 966 895 944
#> 13 C2 970 1147 1166
#> 14 C3 1043 1116 826
#> 15 Total 2979 3158 2936
Created on 2022-03-02 by the reprex package (v2.0.1)
The subheader rows repeat the sector name in all columns; other than that, it looks good.
Interestingly, rvest
also has a read_html
function that might even reference the xml2::read_html()
function, but it did not work in this context.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不要对整个摘要应用
adorn_totals
,而是使用group_modify
,然后转换为gt
-output
一个选项是还可以将列拆分为
expss
-输出
Instead of applying
adorn_totals
on the entire summary, usegroup_modify
and then convert togt
-output
An option is also to split the column with
expss
-output