从列联表中抽样

发布于 2024-11-15 19:30:53 字数 550 浏览 5 评论 0原文

我已经按照下面的代码编写了一个从列联表中采样的函数 - 与单元格中的频率成比例。

它使用 expand.grid 然后使用 table 返回原始尺寸表。只要样本量足够大,某些类别没有完全缺失,这种方法就可以正常工作。否则,table 命令将返回一个尺寸小于原始尺寸的表。

FunSample<- function(Full, n) {
  Frame <- expand.grid(lapply(dim(Full), seq))
  table(Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}
Full<-array(c(1,2,3,4), dim=c(2,2,2))
FunSample(Full, 100) # OK
FunSample(Full, 1) # not OK, I want it to still have dim=c(2,2,2)!

我的大脑已经停止工作,我知道必须进行一些小调整才能使其恢复正常!?

I've managed as far as the code below in writing a function to sample from a contingency table - proportional to the frequencies in the cells.

It uses expand.grid and then table to get back to the original size table. Which works fine as long as the sample size is large enough that some categories are not completely missing. Otherwise the table command returns a table that is of smaller dimensions than the original one.

FunSample<- function(Full, n) {
  Frame <- expand.grid(lapply(dim(Full), seq))
  table(Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}
Full<-array(c(1,2,3,4), dim=c(2,2,2))
FunSample(Full, 100) # OK
FunSample(Full, 1) # not OK, I want it to still have dim=c(2,2,2)!

My brain has stopped working, I know it has to be a small tweak to get it back on track!?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

書生途 2024-11-22 19:30:53

交叉表也是多项分布,因此您可以使用 rmultinom 并重置输出的维度。这应该会显着提高性能并减少需要维护的代码。

> X <- rmultinom(1, 500, Full)
> dim(X) <- dim(Full)
> X
, , 1

     [,1] [,2]
[1,]   18   92
[2,]   45   92

, , 2

     [,1] [,2]
[1,]   28   72
[2,]   49  104

> X2 <-rmultinom(1, 4, Full)
> dim(X2) <- dim(Full)
> X2
, , 1

     [,1] [,2]
[1,]    0    1
[2,]    0    0

, , 2

     [,1] [,2]
[1,]    0    1
[2,]    1    1

A crosstab is also a multinomial distribution, so you can use rmultinom and reset the dimension on the output. This should give a substantial performance boost and cut down on the code you need to maintain.

> X <- rmultinom(1, 500, Full)
> dim(X) <- dim(Full)
> X
, , 1

     [,1] [,2]
[1,]   18   92
[2,]   45   92

, , 2

     [,1] [,2]
[1,]   28   72
[2,]   49  104

> X2 <-rmultinom(1, 4, Full)
> dim(X2) <- dim(Full)
> X2
, , 1

     [,1] [,2]
[1,]    0    1
[2,]    0    0

, , 2

     [,1] [,2]
[1,]    0    1
[2,]    1    1
追星践月 2024-11-22 19:30:53

如果您不希望 table() “删除”缺失的组合,则需要强制 Frame 的列成为因子:

FunSample <- function(Full, n) {
  Frame <- as.data.frame( lapply( expand.grid(lapply(dim(Full), seq)), factor) )  
  table( Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}   

> dim( FunSample(Full, 1))
[1] 2 2 2
> dim( FunSample(Full, 100))
[1] 2 2 2

If you don't want table() to "drop" missing combinations, you need to force the columns of Frame to be factors:

FunSample <- function(Full, n) {
  Frame <- as.data.frame( lapply( expand.grid(lapply(dim(Full), seq)), factor) )  
  table( Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}   

> dim( FunSample(Full, 1))
[1] 2 2 2
> dim( FunSample(Full, 100))
[1] 2 2 2
無心 2024-11-22 19:30:53

您可以使用 tabulate 而不是 table;它适用于整数值向量,就像你在这里一样。您还可以直接使用 array 将输出获取到数组中,就像创建原始数据时一样。

FunSample<- function(Full, n) {
  samp <- sample(1:length(Full), n, prob = Full, replace = TRUE)
  array(tabulate(samp), dim=dim(Full))
}

You could use tabulate instead of table; it works on integer-valued vectors, as you have here. You could also get the output into an array by using array directly, just like when you created the original data.

FunSample<- function(Full, n) {
  samp <- sample(1:length(Full), n, prob = Full, replace = TRUE)
  array(tabulate(samp), dim=dim(Full))
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文