带有names_from两个不同变量的pivot_wider

发布于 2025-01-10 10:18:02 字数 1388 浏览 2 评论 0原文

我有一个关于使用两个变量的组合（类别）来旋转数据框的问题。

我有以下数据框：

df <- data.frame(id = (c(1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5,5,5,5)),
             genes = c(4,4,4,5,5,5,4,4,4,5,4,4,5,5,5,4,4,4,5,4,4,4,5,5,5),
             proteins = c(1,2,3,1,2,3,1,2,3,1,1,2,1,2,3,1,2,3,2,1,2,3,1,2,3),
             values =c(1,4,5,6,4,10,5,4,6,13,14,54,34,67,45,1,3,5,7,5,12,5,6,44,3))

基因和蛋白质变量代表同一个人的重复测量中的不同组合（类别）。例如，第一次测量 id 1 给出基因“4”和蛋白质“1”的组合，第二次测量同一 id 给出基因“4”和蛋白质“2”的组合，依此类推。变量基因和蛋白质总共有 6 种组合（即 4 & 1、4 & 2、4 & 3、5 & 1、5 & 2 和 5 & 3），其中一些 id如您所见，并非全部都有。

我想要的是通过将这 6 个组合作为列 group_by() id 来旋转该数据框。这意味着每个人只有一行数据，这些类别的 6 列（4 & 1、4 & 2、4 & 3、5 & 1、5 & 2 和 5 & 3）， “值”变量将位于每个相应的组合/类别下。

我想在数据框中得到的输出如下： Gene_pro4_1、gene_pro4_2 等是基因和蛋白质列的组合类别，

     ID  gen_pro4_1   gen_pro4_2    gen_pro4_3    gen_pro5_1    gen_pro5_2     gen_pro5_3
1    1        1            4              5           6              4              10
2    2        5            4              6           13             NA             NA 
3    3        14           54             NA          34             67             45
4    4        1            3              5           NA             7              NA
5    5        5            12             5           6              44             3

非常感谢您的帮助。

原文

I am having a question about pivoting a dataframe using a combination (categories) from 2 variables.

I am having the following dataframe:

df <- data.frame(id = (c(1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5,5,5,5)),
             genes = c(4,4,4,5,5,5,4,4,4,5,4,4,5,5,5,4,4,4,5,4,4,4,5,5,5),
             proteins = c(1,2,3,1,2,3,1,2,3,1,1,2,1,2,3,1,2,3,2,1,2,3,1,2,3),
             values =c(1,4,5,6,4,10,5,4,6,13,14,54,34,67,45,1,3,5,7,5,12,5,6,44,3))

genes and protein variables represent different combinations (categories) within repeated measures of the same person. For example, the first measurement of id 1 gave the combination of gene "4" and protein "1", the second measurement of the same id gave the combination of gene "4" and protein "2" and so on. There are in total 6 combinations in the variables genes and protein (i.e. 4 & 1, 4 & 2, 4 & 3, 5 & 1, 5 & 2 and 5 & 3), with some of the ids having not all of them as you can see.

What I want is to pivot_wider() that dataframe by making these 6 combinations as columns group_by() the id. That means that each person will have only one row of data, 6 columns of these categories (4 & 1, 4 & 2, 4 & 3, 5 & 1, 5 & 2 and 5 & 3) and "values" variable will go under each corresponding combination / category.

What I would like to get as an output in the dataframe is the following:
Were gene_pro4_1, gene_pro4_2 and so on are the combined categories of the columns genes and proteins

     ID  gen_pro4_1   gen_pro4_2    gen_pro4_3    gen_pro5_1    gen_pro5_2     gen_pro5_3
1    1        1            4              5           6              4              10
2    2        5            4              6           13             NA             NA 
3    3        14           54             NA          34             67             45
4    4        1            3              5           NA             7              NA
5    5        5            12             5           6              44             3

Thank you very much for any help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

七禾 2025-01-17 10:18:02

这里有一个方法——

tidyr::pivot_wider(df, 
                   names_from = c(genes, proteins), 
                   values_from = values,
                   names_prefix = 'gen_pro')

#     id gen_pro4_1 gen_pro4_2 gen_pro4_3 gen_pro5_1 gen_pro5_2 gen_pro5_3
#  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#1     1          1          4          5          6          4         10
#2     2          5          4          6         13         NA         NA
#3     3         14         54         NA         34         67         45
#4     4          1          3          5         NA          7         NA
#5     5          5         12          5          6         44          3

Here is a way -

tidyr::pivot_wider(df, 
                   names_from = c(genes, proteins), 
                   values_from = values,
                   names_prefix = 'gen_pro')

#     id gen_pro4_1 gen_pro4_2 gen_pro4_3 gen_pro5_1 gen_pro5_2 gen_pro5_3
#  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#1     1          1          4          5          6          4         10
#2     2          5          4          6         13         NA         NA
#3     3         14         54         NA         34         67         45
#4     4          1          3          5         NA          7         NA
#5     5          5         12          5          6         44          3

回复收藏 0 原文

送君千里 2025-01-17 10:18:02

这个答案虽然与其他答案非常相似（几秒钟前发布），但它展示了如何使用 names_glue 通过字符串插值来组成通用名称组合。

df |>
  pivot_wider(id_cols = id,
              names_from = c(genes,proteins),
              names_glue = "gen_pro{genes}_{proteins}",
              values_from = values)

+ # A tibble: 5 × 7
     id gen_pro4_1 gen_pro4_2 gen_pro4_3 gen_pro5_1 gen_pro5_2 gen_pro5_3
  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
1     1          1          4          5          6          4         10
2     2          5          4          6         13         NA         NA
3     3         14         54         NA         34         67         45
4     4          1          3          5         NA          7         NA
5     5          5         12          5          6         44          3

This answer, although very similar to others (posted just a few seconds before), shows the use of names_glue to compose versatile name combinations using string interpolation.

df |>
  pivot_wider(id_cols = id,
              names_from = c(genes,proteins),
              names_glue = "gen_pro{genes}_{proteins}",
              values_from = values)

+ # A tibble: 5 × 7
     id gen_pro4_1 gen_pro4_2 gen_pro4_3 gen_pro5_1 gen_pro5_2 gen_pro5_3
  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
1     1          1          4          5          6          4         10
2     2          5          4          6         13         NA         NA
3     3         14         54         NA         34         67         45
4     4          1          3          5         NA          7         NA
5     5          5         12          5          6         44          3

回复收藏 0 原文

~没有更多了~