library(dplyr)
# library(tidyr) # pivot_wider
out <- dat %>%
# set up for determining groups of columns
group_by(ID) %>%
mutate(
n = row_number(),
IDn = paste(n, ID, sep = "_")
) %>%
ungroup() %>%
# pivot so that each group of columns is now in a wider format
tidyr::pivot_wider(ID, names_from = "n", values_from = c("IDn", "Date", "Score")) %>%
# convert from the solitary 'ID' field to the duplicated 'IDn' fields,
# and fix the augmented ID _values_ to remove the 'n' component
select(-ID) %>%
mutate(across(starts_with("IDn"), ~ sub(".*_", "", .))) %>%
# order them based on the trailing number, natural sort after that
select(colnames(.)[ order(sub(".*_", "", colnames(.))) ])
out
# # A tibble: 10 x 9
# IDn_1 Date_1 Score_1 IDn_2 Date_2 Score_2 IDn_3 Date_3 Score_3
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 101 44656 10 101 44660 50 101 44676 65
# 2 102 44656 30 102 44678 66 NA NA NA
# 3 103 44658 40 103 44673 32 NA NA NA
# 4 104 44659 20 104 44682 65 NA NA NA
# 5 105 44661 35 NA NA NA NA NA NA
# 6 106 44661 45 NA NA NA NA NA NA
# 7 107 44682 51 NA NA NA NA NA NA
# 8 108 44683 25 NA NA NA NA NA NA
# 9 109 44684 39 NA NA NA NA NA NA
# 10 110 44685 43 NA NA NA NA NA NA
names(out) <- sub("n?_.*", "", names(out))
out
# # A tibble: 10 x 9
# ID Date Score ID Date Score ID Date Score
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 101 44656 10 101 44660 50 101 44676 65
# 2 102 44656 30 102 44678 66 NA NA NA
# 3 103 44658 40 103 44673 32 NA NA NA
# 4 104 44659 20 104 44682 65 NA NA NA
# 5 105 44661 35 NA NA NA NA NA NA
# 6 106 44661 45 NA NA NA NA NA NA
# 7 107 44682 51 NA NA NA NA NA NA
# 8 108 44683 25 NA NA NA NA NA NA
# 9 109 44684 39 NA NA NA NA NA NA
# 10 110 44685 43 NA NA NA NA NA NA
As already mentioned in comments, this "violates" the premise of tidy data to a certain extent, though it is less a problem if you are doing this at report-generation time (where wider layouts may be more aesthetically functional).
Try this:
library(dplyr)
# library(tidyr) # pivot_wider
out <- dat %>%
# set up for determining groups of columns
group_by(ID) %>%
mutate(
n = row_number(),
IDn = paste(n, ID, sep = "_")
) %>%
ungroup() %>%
# pivot so that each group of columns is now in a wider format
tidyr::pivot_wider(ID, names_from = "n", values_from = c("IDn", "Date", "Score")) %>%
# convert from the solitary 'ID' field to the duplicated 'IDn' fields,
# and fix the augmented ID _values_ to remove the 'n' component
select(-ID) %>%
mutate(across(starts_with("IDn"), ~ sub(".*_", "", .))) %>%
# order them based on the trailing number, natural sort after that
select(colnames(.)[ order(sub(".*_", "", colnames(.))) ])
out
# # A tibble: 10 x 9
# IDn_1 Date_1 Score_1 IDn_2 Date_2 Score_2 IDn_3 Date_3 Score_3
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 101 44656 10 101 44660 50 101 44676 65
# 2 102 44656 30 102 44678 66 NA NA NA
# 3 103 44658 40 103 44673 32 NA NA NA
# 4 104 44659 20 104 44682 65 NA NA NA
# 5 105 44661 35 NA NA NA NA NA NA
# 6 106 44661 45 NA NA NA NA NA NA
# 7 107 44682 51 NA NA NA NA NA NA
# 8 108 44683 25 NA NA NA NA NA NA
# 9 109 44684 39 NA NA NA NA NA NA
# 10 110 44685 43 NA NA NA NA NA NA
The results have two things "wrong":
The ID columns have an n in there, mostly because of how I chose to duplicate them across each group of columns.
All columns have _# appended. R in general prefers its column names to be unique. It can be coerced, but it makes any references to columns from that point on either ambiguous (if using column names) or requiring integer column indices (which can be fragile).
However, dplyr and most of the tidyverse in general more-strongly prefer against duplicate names, so much so that I couldn't find an elegant/generic select(..) to do it in one step.
If you must have the names as in your question, it can be done outside of dplyr with:
names(out) <- sub("n?_.*", "", names(out))
out
# # A tibble: 10 x 9
# ID Date Score ID Date Score ID Date Score
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 101 44656 10 101 44660 50 101 44676 65
# 2 102 44656 30 102 44678 66 NA NA NA
# 3 103 44658 40 103 44673 32 NA NA NA
# 4 104 44659 20 104 44682 65 NA NA NA
# 5 105 44661 35 NA NA NA NA NA NA
# 6 106 44661 45 NA NA NA NA NA NA
# 7 107 44682 51 NA NA NA NA NA NA
# 8 108 44683 25 NA NA NA NA NA NA
# 9 109 44684 39 NA NA NA NA NA NA
# 10 110 44685 43 NA NA NA NA NA NA
(Sure, it's possible to put that somehow in a dplyr pipe, but I don't know if that buys you much.)
发布评论
评论(1)
正如评论中已经提到的那样,这在一定程度上“违反”了整洁数据的前提,尽管如果您在报告生成时间进行此操作,这不是一个问题(其中更广泛的布局可能更加美观)。
尝试以下操作:
结果有两件事“错误”:
id
列中有一个n
,主要是因为我选择了如何通过每个组重复它们列。所有列都有
_#
附加。通常,R更喜欢其列名是唯一的。它可以被强制强制,但是从该点开始对列的任何引用(如果使用列名称)或需要整数列索引(可能是脆弱的)。然而,
dplyr
和大多数tidyverse
一般而言,更偏爱重复名称,以至于我找不到优雅/通用选择(..)
一个步骤。您必须像问题中的名称一样
如果 您很多。)
数据(来自OP评论)
As already mentioned in comments, this "violates" the premise of tidy data to a certain extent, though it is less a problem if you are doing this at report-generation time (where wider layouts may be more aesthetically functional).
Try this:
The results have two things "wrong":
The
ID
columns have ann
in there, mostly because of how I chose to duplicate them across each group of columns.All columns have
_#
appended. R in general prefers its column names to be unique. It can be coerced, but it makes any references to columns from that point on either ambiguous (if using column names) or requiring integer column indices (which can be fragile).However,
dplyr
and most of thetidyverse
in general more-strongly prefer against duplicate names, so much so that I couldn't find an elegant/genericselect(..)
to do it in one step.If you must have the names as in your question, it can be done outside of
dplyr
with:(Sure, it's possible to put that somehow in a dplyr pipe, but I don't know if that buys you much.)
Data (from the OP comments)