在汇总统计表中添加更高级别的分组
不久前我问如何制作分组汇总表: 如何在 R 中对描述性统计数据的列进行分组?
我想做一些与此类似的事情,但要更进一步,但我不确定如何继续。
这是我到目前为止所拥有的:
data %>%
dplyr::filter_all(all_vars(!is.na(.))) %>%
group_by(Type.Time, Type.Perc, Grp) %>%
dplyr::summarise(mean.ms = sprintf("%.2f", mean(Time, na.rm = TRUE)),
se.ms = sprintf("%.2f", (sd(Time, na.rm = T))/sqrt(data %>% filter(Grp == 1) %>% nrow())),
mean.perc = sprintf("%.2f", mean(Percentage, na.rm = TRUE)),
se.perc = sprintf("%.2f", (sd(Percentage, na.rm = T))/sqrt(data %>% filter(Grp == 1) %>% nrow())),
) %>%
gather(key, value, mean.ms:se.perc) %>%
unite(Group, Grp, key) %>%
spread(Group, value)
这为我提供了我想要的信息,但格式错误且值是原来的两倍:
| Type.Time | Type.Perc | 1_mean.ms | 1_mean.perc | 1_se.ms | 1_se.perc | 2_mean.ms | 2_mean.perc | 2_se.ms | 2_se.perc|
|-----------|-----------|-----------|-------------|---------|-----------|-----------|-------------|---------|----------|
| TType2 | PType2 | 703 | 15 | 15 | 1.4 | 573 | 8 | 22 | 1.3 |
| TType2 | PType1 | 703 | 10 | 15 | 1.8 | 573 | 13 | 22 | 3.1 |
| TType1 | PType2 | 710 | 15 | 18 | 1.4 | 622 | 8 | 29 | 1.3 |
| TType1 | PType1 | 710 | 10 | 18 | 1.8 | 622 | 13 | 29 | 3.1 |
我希望新表中的顶部分组是 1 或 2(即 Grp [Group] )位于“mean”/“se”之前。然后是 Type1 和 Type 2 的子组,前面的 T 和 P 被分割为行(分别为 ms 和 %)...所以我的目标是生成一个这种格式的表:
| Group1 | Group2 |
|------------------------|---------------------------|
| Type1 | Type2 | Type1 | Type2 |
|------------|-----------|------------|--------------|
| M | SE | M | SE | M | SE | M | SE |
|----|-----|------|-----|-----|------|-----|------|-------|
|ms | [values calculated from 'Time' variable] |
|% | [values calculated from 'Percentage' variable] |
我希望这是有道理的!
示例数据:
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), Grp = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Type.Time = c("TType1",
"TType1", "TType2", "TType2", "TType1", "TType1", "TType2", "TType2",
"TType1", "TType1", "TType2", "TType2", "TType1", "TType1",
"TType2", "TType2"), Time = c(711, 711, 669, 669, 765, 765, 876, 876, 740,
740, 658, 658, 456, 456, 423, 423), Type.Perc = c("PType1",
"PType2", "PType1", "PType2", "PType1", "PType2",
"PType1", "PType2", "PType1", "PType2", "PType1",
"PType2", "PType1", "PType2", "PType1", "PType2"
), Percentage = c(8, 3, 9, 7, 19, 22, 30, 21, 10, 5, 10, 5, 8, 7,
13, 5)), row.names = c(NA, -16L), class = c("tbl_df",
"tbl", "data.frame"))
A while ago I asked how to make a grouped summary table: How can I group columns of descriptive statistics in R?
I'd like to do something similar to this, but a few steps further, and I'm not sure how to proceed.
Here's what I have so far:
data %>%
dplyr::filter_all(all_vars(!is.na(.))) %>%
group_by(Type.Time, Type.Perc, Grp) %>%
dplyr::summarise(mean.ms = sprintf("%.2f", mean(Time, na.rm = TRUE)),
se.ms = sprintf("%.2f", (sd(Time, na.rm = T))/sqrt(data %>% filter(Grp == 1) %>% nrow())),
mean.perc = sprintf("%.2f", mean(Percentage, na.rm = TRUE)),
se.perc = sprintf("%.2f", (sd(Percentage, na.rm = T))/sqrt(data %>% filter(Grp == 1) %>% nrow())),
) %>%
gather(key, value, mean.ms:se.perc) %>%
unite(Group, Grp, key) %>%
spread(Group, value)
This gives me the information I want, but in the wrong format and twice as many values:
| Type.Time | Type.Perc | 1_mean.ms | 1_mean.perc | 1_se.ms | 1_se.perc | 2_mean.ms | 2_mean.perc | 2_se.ms | 2_se.perc|
|-----------|-----------|-----------|-------------|---------|-----------|-----------|-------------|---------|----------|
| TType2 | PType2 | 703 | 15 | 15 | 1.4 | 573 | 8 | 22 | 1.3 |
| TType2 | PType1 | 703 | 10 | 15 | 1.8 | 573 | 13 | 22 | 3.1 |
| TType1 | PType2 | 710 | 15 | 18 | 1.4 | 622 | 8 | 29 | 1.3 |
| TType1 | PType1 | 710 | 10 | 18 | 1.8 | 622 | 13 | 29 | 3.1 |
I'd like the top grouping in my new table to be the 1 or 2 (i.e., Grp [Group]) that precedes 'mean'/'se'. Then subgroups of Type1 and Type 2, with the preceding T and P being split as the rows (ms and % respectively)... So my aim is to produce a table in this format:
| Group1 | Group2 |
|------------------------|---------------------------|
| Type1 | Type2 | Type1 | Type2 |
|------------|-----------|------------|--------------|
| M | SE | M | SE | M | SE | M | SE |
|----|-----|------|-----|-----|------|-----|------|-------|
|ms | [values calculated from 'Time' variable] |
|% | [values calculated from 'Percentage' variable] |
I hope that makes sense!
Example data:
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), Grp = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Type.Time = c("TType1",
"TType1", "TType2", "TType2", "TType1", "TType1", "TType2", "TType2",
"TType1", "TType1", "TType2", "TType2", "TType1", "TType1",
"TType2", "TType2"), Time = c(711, 711, 669, 669, 765, 765, 876, 876, 740,
740, 658, 658, 456, 456, 423, 423), Type.Perc = c("PType1",
"PType2", "PType1", "PType2", "PType1", "PType2",
"PType1", "PType2", "PType1", "PType2", "PType1",
"PType2", "PType1", "PType2", "PType1", "PType2"
), Percentage = c(8, 3, 9, 7, 19, 22, 30, 21, 10, 5, 10, 5, 8, 7,
13, 5)), row.names = c(NA, -16L), class = c("tbl_df",
"tbl", "data.frame"))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
配置此类标头分组的一种选项是使用 kableExtra 包。
对于数据准备,我做了两个主要更改 - 仅考虑 Type.Time == Type.Perc(以避免问题中显示的过多组合),并计算每个 Type& 的 SE 值。组(在示例代码中,这混合了不同的分组,我认为这不是有意的)。
这些值已经采用正确的格式,我们可以使用
add_header_above
简单地定义几个标头分组。kableExtra
提供了大量用于修改输出格式的附加选项。One option to configure such header groupings is with the
kableExtra
package.For the data preparation, I've made two main changes - only considering
Type.Time == Type.Perc
(to avoid the excess combinations shown in the question), and calculating the SE values per Type&Group (in the example code this mixes different groupings, which I assume is not intended).The values are already in the right format, and we can simply define several header groupings with
add_header_above
. AndkableExtra
provides plenty of additional options for modyfing the output format.