排列数据框中的标签，但对于观察对

发布于 2025-01-09 04:24:20 字数 1613 浏览 1 评论 0原文

不确定标题是否清晰，但我想对数据框中的一列进行洗牌，但不是对每个单独的行进行洗牌，使用 sample() 可以很简单地做到这一点，但对于来自相同的样本。

例如，我有以下数据帧 df1：

>df1
sampleID groupID  A B C D E F
438   1      1      0      0      0      0      0
438   1      0      0      0      0      1      1
386   1      1      1      1      0      0      0
386   1      0      0      0      1      0      0
438   2      1      0      0      0      1      1
438   2      0      1      1      0      0      0
582   2      0      0      0      0      0      0
582   2      1      0      0      0      1      0
597   1      0      1      0      0      0      1
597   1      0      0      0      0      0      0

我想为每个样本随机打乱此处的 groupID 标签，而不是观察，以便结果看起来像

>df2
sampleID groupID  A B C D E F
438   1      1      0      0      0      0      0
438   1      0      0      0      0      1      1
386   2      1      1      1      0      0      0
386   2      0      0      0      1      0      0
438   1      1      0      0      0      1      1
438   1      0      1      1      0      0      0
582   1      0      0      0      0      0      0
582   1      1      0      0      0      1      0
597   2      0      1      0      0      0      1
597   2      0      0      0      0      0      0

：在第 2 列（groupID）中，样本 386 现在为 2（对于两个观察）。

我已经四处搜寻，但没有找到任何可以按照我想要的方式工作的东西。我现在所拥有的只是洗牌第二列。我尝试按如下方式使用 dplyr：

df2 <- df1 %>%
  group_by(sampleID) %>%
  mutate(groupID = sample(df1$groupID, size=2))

但是当然，这只需要所有组 ID 并随机选择 2。

任何提示或建议将不胜感激！

原文

Not sure title is clear or not, but I want to shuffle a column in a dataframe, but not for every individual row, which is very simple to do using sample(), but for pairs of observations from the same sample.

For instance, I have the following dataframe df1:

>df1
sampleID groupID  A B C D E F
438   1      1      0      0      0      0      0
438   1      0      0      0      0      1      1
386   1      1      1      1      0      0      0
386   1      0      0      0      1      0      0
438   2      1      0      0      0      1      1
438   2      0      1      1      0      0      0
582   2      0      0      0      0      0      0
582   2      1      0      0      0      1      0
597   1      0      1      0      0      0      1
597   1      0      0      0      0      0      0

I want to randomly shuffle the labels here for groupID for each sample, not observation, so that the result looks like:

>df2
sampleID groupID  A B C D E F
438   1      1      0      0      0      0      0
438   1      0      0      0      0      1      1
386   2      1      1      1      0      0      0
386   2      0      0      0      1      0      0
438   1      1      0      0      0      1      1
438   1      0      1      1      0      0      0
582   1      0      0      0      0      0      0
582   1      1      0      0      0      1      0
597   2      0      1      0      0      0      1
597   2      0      0      0      0      0      0

Notice that in column 2 (groupID), sample 386 is now 2 (for both observations).

I have searched around but haven't found anything that works the way I want. What I have now is just shuffling the second column. I tried to use dplyr as follows:

df2 <- df1 %>%
  group_by(sampleID) %>%
  mutate(groupID = sample(df1$groupID, size=2))

But of course that only takes all the group IDs and randomly selects 2.

Any tips or suggestions would be appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不交电费瞎发啥光 2025-01-16 04:24:20

一种技术是提取唯一的组合，这样每个样本 ID 就有一行，然后您可以重新排列并将重新排列的项目合并回主表。这就是它的样子

library(dplyr)
df1 %>% 
  distinct(sampleID, groupID) %>% 
  mutate(shuffle_groupID = sample(groupID)) %>% 
  inner_join(df1)

One technique would be to extract the unique combinations so you have one row per sampleID, then you can shuffle and merge the shuffled items back to the main table. Here's what that would look like

library(dplyr)
df1 %>% 
  distinct(sampleID, groupID) %>% 
  mutate(shuffle_groupID = sample(groupID)) %>% 
  inner_join(df1)

回复收藏 0 原文

旧人九事 2025-01-16 04:24:20

使用 dplyr nest_by 和 unnest：

library(dplyr)

df1 |>
    nest_by(sampleID, groupID) |>
    mutate(groupID = sample(groupID, n())) |>
    unnest(cols = c(data))


+ # A tibble: 10 x 3
# Groups:   sampleID, groupID [4]
   sampleID groupID     A
      <dbl>   <int> <dbl>
 1      386       1     1
 2      386       1     0
 3      438       1     0
 4      438       1     0
 5      438       1     0
 6      438       1     1
 7      582       2     0
 8      582       2     0
 9      597       1     1
10      597       1     0

Using dplyr nest_by and unnest:

library(dplyr)

df1 |>
    nest_by(sampleID, groupID) |>
    mutate(groupID = sample(groupID, n())) |>
    unnest(cols = c(data))


+ # A tibble: 10 x 3
# Groups:   sampleID, groupID [4]
   sampleID groupID     A
      <dbl>   <int> <dbl>
 1      386       1     1
 2      386       1     0
 3      438       1     0
 4      438       1     0
 5      438       1     0
 6      438       1     1
 7      582       2     0
 8      582       2     0
 9      597       1     1
10      597       1     0

回复收藏 0 原文

~没有更多了~