排列数据框中的标签,但对于观察对
不确定标题是否清晰,但我想对数据框中的一列进行洗牌,但不是对每个单独的行进行洗牌,使用 sample()
可以很简单地做到这一点,但对于来自相同的样本。
例如,我有以下数据帧 df1:
>df1
sampleID groupID A B C D E F
438 1 1 0 0 0 0 0
438 1 0 0 0 0 1 1
386 1 1 1 1 0 0 0
386 1 0 0 0 1 0 0
438 2 1 0 0 0 1 1
438 2 0 1 1 0 0 0
582 2 0 0 0 0 0 0
582 2 1 0 0 0 1 0
597 1 0 1 0 0 0 1
597 1 0 0 0 0 0 0
我想为每个样本随机打乱此处的 groupID 标签,而不是观察,以便结果看起来像
>df2
sampleID groupID A B C D E F
438 1 1 0 0 0 0 0
438 1 0 0 0 0 1 1
386 2 1 1 1 0 0 0
386 2 0 0 0 1 0 0
438 1 1 0 0 0 1 1
438 1 0 1 1 0 0 0
582 1 0 0 0 0 0 0
582 1 1 0 0 0 1 0
597 2 0 1 0 0 0 1
597 2 0 0 0 0 0 0
:在第 2 列(groupID)中,样本 386 现在为 2(对于两个观察)。
我已经四处搜寻,但没有找到任何可以按照我想要的方式工作的东西。我现在所拥有的只是洗牌第二列。我尝试按如下方式使用 dplyr:
df2 <- df1 %>%
group_by(sampleID) %>%
mutate(groupID = sample(df1$groupID, size=2))
但是当然,这只需要所有组 ID 并随机选择 2。
任何提示或建议将不胜感激!
Not sure title is clear or not, but I want to shuffle a column in a dataframe, but not for every individual row, which is very simple to do using sample()
, but for pairs of observations from the same sample.
For instance, I have the following dataframe df1:
>df1
sampleID groupID A B C D E F
438 1 1 0 0 0 0 0
438 1 0 0 0 0 1 1
386 1 1 1 1 0 0 0
386 1 0 0 0 1 0 0
438 2 1 0 0 0 1 1
438 2 0 1 1 0 0 0
582 2 0 0 0 0 0 0
582 2 1 0 0 0 1 0
597 1 0 1 0 0 0 1
597 1 0 0 0 0 0 0
I want to randomly shuffle the labels here for groupID for each sample, not observation, so that the result looks like:
>df2
sampleID groupID A B C D E F
438 1 1 0 0 0 0 0
438 1 0 0 0 0 1 1
386 2 1 1 1 0 0 0
386 2 0 0 0 1 0 0
438 1 1 0 0 0 1 1
438 1 0 1 1 0 0 0
582 1 0 0 0 0 0 0
582 1 1 0 0 0 1 0
597 2 0 1 0 0 0 1
597 2 0 0 0 0 0 0
Notice that in column 2 (groupID), sample 386 is now 2 (for both observations).
I have searched around but haven't found anything that works the way I want. What I have now is just shuffling the second column. I tried to use dplyr as follows:
df2 <- df1 %>%
group_by(sampleID) %>%
mutate(groupID = sample(df1$groupID, size=2))
But of course that only takes all the group IDs and randomly selects 2.
Any tips or suggestions would be appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一种技术是提取唯一的组合,这样每个样本 ID 就有一行,然后您可以重新排列并将重新排列的项目合并回主表。这就是它的样子
One technique would be to extract the unique combinations so you have one row per sampleID, then you can shuffle and merge the shuffled items back to the main table. Here's what that would look like
使用 dplyr
nest_by
和unnest
:Using dplyr
nest_by
andunnest
: