如何按行和列随机化(或排列)数据帧?

发布于 2024-11-16 21:54:04 字数 1067 浏览 7 评论 0原文

我有一个像这样的数据框(df1)。

     f1   f2   f3   f4   f5
d1   1    0    1    1    1  
d2   1    0    0    1    0
d3   0    0    0    1    1
d4   0    1    0    0    1

d1...d4 列是行名,f1...f5 行是列名。

为了执行示例(df1),我得到了一个与 df1 相同的计数为 1 的新数据帧。因此,对于整个数据帧,1 的计数是保守的,但对于每行或每列而言,1 的计数并不保守。

是否可以按行或按列进行随机化?

我想对每列按列随机化 df1,即每列中 1 的数量保持不变。并且每列至少需要更改一次。例如,我可能有一个像这样的随机 df2:(请注意,每列中 1 的计数保持不变,但每行中 1 的计数不同。

     f1   f2   f3   f4   f5
d1   1    0    0    0    1  
d2   0    1    0    1    1
d3   1    0    0    1    1
d4   0    0    1    1    0

同样,我也想对每个 df1 逐行进行随机化行,即每行中的 1 的数量保持不变,并且每行都需要更改(但更改的条目的数量可能不同),例如,随机的 df3 可能是这样的:

     f1   f2   f3   f4   f5
d1   0    1    1    1    1  <- two entries are different
d2   0    0    1    0    1  <- four entries are different
d3   1    0    0    0    1  <- two entries are different
d4   0    0    1    0    1  <- two entries are different

PS 非常感谢。为Gavin Simpson、Joris Meys 和 Chase 为我之前关于随机化两列的问题提供了先前的答案。

I have a dataframe (df1) like this.

     f1   f2   f3   f4   f5
d1   1    0    1    1    1  
d2   1    0    0    1    0
d3   0    0    0    1    1
d4   0    1    0    0    1

The d1...d4 column is the rowname, the f1...f5 row is the columnname.

To do sample(df1), I get a new dataframe with count of 1 same as df1. So, the count of 1 is conserved for the whole dataframe but not for each row or each column.

Is it possible to do the randomization row-wise or column-wise?

I want to randomize the df1 column-wise for each column, i.e. the number of 1 in each column remains the same. and each column need to be changed by at least once. For example, I may have a randomized df2 like this: (Noted that the count of 1 in each column remains the same but the count of 1 in each row is different.

     f1   f2   f3   f4   f5
d1   1    0    0    0    1  
d2   0    1    0    1    1
d3   1    0    0    1    1
d4   0    0    1    1    0

Likewise, I also want to randomize the df1 row-wise for each row, i.e. the no. of 1 in each row remains the same, and each row need to be changed (but the no of changed entries could be different). For example, a randomized df3 could be something like this:

     f1   f2   f3   f4   f5
d1   0    1    1    1    1  <- two entries are different
d2   0    0    1    0    1  <- four entries are different
d3   1    0    0    0    1  <- two entries are different
d4   0    0    1    0    1  <- two entries are different

PS. Many thanks for the help from Gavin Simpson, Joris Meys and Chase for the previous answers to my previous question on randomizing two columns.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

陈年往事 2024-11-23 21:54:04

给定 R data.frame:

> df1
  a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0

按行随机播放

> df2 <- df1[sample(nrow(df1)),]
> df2
  a b c
3 0 1 0
4 0 0 0
2 1 0 0
1 1 1 0

默认情况下,sample() 随机重新排序作为第一个参数传递的元素。这意味着默认大小是传递数组的大小。将参数 replace=FALSE(默认值)传递给 sample(...) 可确保在不进行替换的情况下完成采样,从而实现按行洗牌。

按列随机播放:

> df3 <- df1[,sample(ncol(df1))]
> df3
  c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0

Given the R data.frame:

> df1
  a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0

Shuffle row-wise:

> df2 <- df1[sample(nrow(df1)),]
> df2
  a b c
3 0 1 0
4 0 0 0
2 1 0 0
1 1 1 0

By default sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.

Shuffle column-wise:

> df3 <- df1[,sample(ncol(df1))]
> df3
  c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0
你列表最软的妹 2024-11-23 21:54:04

这是使用包 dplyr 洗牌 data.frame 的另一种方法:

按行:

df2 <- slice(df1, sample(1:n()))

df2 <- sample_frac(df1, 1L)

按列:

df2 <- select(df1, one_of(sample(names(df1)))) 

This is another way to shuffle the data.frame using package dplyr:

row-wise:

df2 <- slice(df1, sample(1:n()))

or

df2 <- sample_frac(df1, 1L)

column-wise:

df2 <- select(df1, one_of(sample(names(df1)))) 
薄荷梦 2024-11-23 21:54:04

看一下vegan包中的permatswap()。下面是一个同时维护行和列总计的示例,但您可以放松这一点并仅修复行或列总计之一。

mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5)
set.seed(4)
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")

这给出:

R> out$perm[[1]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    1    1    1
[2,]    0    1    0    1    0
[3,]    0    0    0    1    1
[4,]    1    0    0    0    1
R> out$perm[[2]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    0    1    1
[2,]    0    0    0    1    1
[3,]    1    0    0    1    0
[4,]    0    0    1    0    1

解释一下调用:

out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
  1. times 是您想要的随机矩阵的数量,这里 99
  2. burnin 是我们开始随机采样之前进行的交换次数。这使得我们在开始提取每个随机矩阵之前采样的矩阵变得相当随机
  3. thin 表示每次 thin 交换时只进行随机抽取
  4. mtype = “prab” 表示将矩阵视为存在/不存在,即二进制 0/1 数据。

有几点需要注意,这并不能保证任何列或行都已随机化,但如果“burnin”足够长,则很有可能发生这种情况。此外,您可以绘制比您需要的更多的随机矩阵,并丢弃不符合您所有要求的矩阵。

这里也没有涵盖您对每行具有不同数量的更改的要求。同样,您可以采样比您想要的更多的矩阵,然后丢弃不满足此要求的矩阵。

Take a look at permatswap() in the vegan package. Here is an example maintaining both row and column totals, but you can relax that and fix only one of the row or column sums.

mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5)
set.seed(4)
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")

This gives:

R> out$perm[[1]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    1    1    1
[2,]    0    1    0    1    0
[3,]    0    0    0    1    1
[4,]    1    0    0    0    1
R> out$perm[[2]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    0    1    1
[2,]    0    0    0    1    1
[3,]    1    0    0    1    0
[4,]    0    0    1    0    1

To explain the call:

out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
  1. times is the number of randomised matrices you want, here 99
  2. burnin is the number of swaps made before we start taking random samples. This allows the matrix from which we sample to be quite random before we start taking each of our randomised matrices
  3. thin says only take a random draw every thin swaps
  4. mtype = "prab" says treat the matrix as presence/absence, i.e. binary 0/1 data.

A couple of things to note, this doesn't guarantee that any column or row has been randomised, but if burnin is long enough there should be a good chance of that having happened. Also, you could draw more random matrices than you need and discard ones that don't match all your requirements.

Your requirement to have different numbers of changes per row, also isn't covered here. Again you could sample more matrices than you want and then discard the ones that don't meet this requirement also.

放飞的风筝 2024-11-23 21:54:04

您还可以使用 R 包 picante 中的 randomizeMatrix 函数

示例:

test <- matrix(c(1,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0),nrow=4,ncol=4)
> test
     [,1] [,2] [,3] [,4]
[1,]    1    0    1    0
[2,]    1    1    0    1
[3,]    0    0    0    0
[4,]    1    0    1    0

randomizeMatrix(test,null.model = "frequency",iterations = 1000)

     [,1] [,2] [,3] [,4]
[1,]    0    1    0    1
[2,]    1    0    0    0
[3,]    1    0    1    0
[4,]    1    0    1    0

randomizeMatrix(test,null.model = "richness",iterations = 1000)

     [,1] [,2] [,3] [,4]
[1,]    1    0    0    1
[2,]    1    1    0    1
[3,]    0    0    0    0
[4,]    1    0    1    0
> 

选项 null.model="Frequency" 维护列总和和 richness 维护行总和。
虽然主要用于随机化群落生态中物种存在缺失数据集,但它在这里效果很好。

此函数还有其他空模型选项,请查看以下链接以了解 picante 文档

you can also use the randomizeMatrix function in the R package picante

example:

test <- matrix(c(1,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0),nrow=4,ncol=4)
> test
     [,1] [,2] [,3] [,4]
[1,]    1    0    1    0
[2,]    1    1    0    1
[3,]    0    0    0    0
[4,]    1    0    1    0

randomizeMatrix(test,null.model = "frequency",iterations = 1000)

     [,1] [,2] [,3] [,4]
[1,]    0    1    0    1
[2,]    1    0    0    0
[3,]    1    0    1    0
[4,]    1    0    1    0

randomizeMatrix(test,null.model = "richness",iterations = 1000)

     [,1] [,2] [,3] [,4]
[1,]    1    0    0    1
[2,]    1    1    0    1
[3,]    0    0    0    0
[4,]    1    0    1    0
> 

The option null.model="frequency" maintains column sums and richness maintains row sums.
Though mainly used for randomizing species presence absence datasets in community ecology it works well here.

This function has other null model options as well, check out following link for more details (page 36) of the picante documentation

相对绾红妆 2024-11-23 21:54:04

当然,您可以对每一行进行采样:

sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))

将随机排列行本身,因此每行中的 1 数量不会改变。小改动,它也适用于列,但这是给读者的练习:-P

Of course you can sample each row:

sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))

will shuffle the rows itself, so the number of 1's in each row doesn't change. Small changes and it also works great with columns, but this is a exercise for the reader :-P

巨坚强 2024-11-23 21:54:04

如果目标是随机打乱每列,则上述某些答案不起作用,因为列是联合打乱的(这保留了列间相关性)。其他需要安装软件包。然而,存在一句话:

df2 = lapply(df1, function(x) { sample(x) })

If the goal is to randomly shuffle each column, some of the above answers don't work since the columns are shuffled jointly (this preserves inter-column correlations). Others require installing a package. Yet a one-liner exist:

df2 = lapply(df1, function(x) { sample(x) })
勿忘心安 2024-11-23 21:54:04

您还可以使用如下所示的方法“采样”数据框中相同数量的项目:

nr<-dim(M)[1]
random_M = M[sample.int(nr),]

You can also "sample" the same number of items in your data frame with something like this:

nr<-dim(M)[1]
random_M = M[sample.int(nr),]
一场信仰旅途 2024-11-23 21:54:04

这是使用 .Nsampledata.table 选项,如下所示:

library(data.table)
setDT(df)
df[sample(.N)]
#>    a b c
#> 1: 0 1 0
#> 2: 1 1 0
#> 3: 1 0 0
#> 4: 0 0 0

Created on 2023-01-28 with reprex v2.0.2


数据

df <- read.table(text = "  a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0", header = TRUE)

Here is a data.table option using .N with sample like this:

library(data.table)
setDT(df)
df[sample(.N)]
#>    a b c
#> 1: 0 1 0
#> 2: 1 1 0
#> 3: 1 0 0
#> 4: 0 0 0

Created on 2023-01-28 with reprex v2.0.2


Data:

df <- read.table(text = "  a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0", header = TRUE)
傲世九天 2024-11-23 21:54:04

数据框中的随机样本和排列
如果是矩阵形式转换成data.frame
使用基础包中的示例函数
索引 = 样本(1:nrow(df1), 大小=1*nrow(df1))
随机样本和排列

Random Samples and Permutations ina dataframe
If it is in matrix form convert into data.frame
use the sample function from the base package
indexes = sample(1:nrow(df1), size=1*nrow(df1))
Random Samples and Permutations

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文