从列联表中抽样
我已经按照下面的代码编写了一个从列联表中采样的函数 - 与单元格中的频率成比例。
它使用 expand.grid
然后使用 table
返回原始尺寸表。只要样本量足够大,某些类别没有完全缺失,这种方法就可以正常工作。否则,table
命令将返回一个尺寸小于原始尺寸的表。
FunSample<- function(Full, n) {
Frame <- expand.grid(lapply(dim(Full), seq))
table(Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}
Full<-array(c(1,2,3,4), dim=c(2,2,2))
FunSample(Full, 100) # OK
FunSample(Full, 1) # not OK, I want it to still have dim=c(2,2,2)!
我的大脑已经停止工作,我知道必须进行一些小调整才能使其恢复正常!?
I've managed as far as the code below in writing a function to sample from a contingency table - proportional to the frequencies in the cells.
It uses expand.grid
and then table
to get back to the original size table. Which works fine as long as the sample size is large enough that some categories are not completely missing. Otherwise the table
command returns a table that is of smaller dimensions than the original one.
FunSample<- function(Full, n) {
Frame <- expand.grid(lapply(dim(Full), seq))
table(Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}
Full<-array(c(1,2,3,4), dim=c(2,2,2))
FunSample(Full, 100) # OK
FunSample(Full, 1) # not OK, I want it to still have dim=c(2,2,2)!
My brain has stopped working, I know it has to be a small tweak to get it back on track!?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
交叉表也是多项分布,因此您可以使用 rmultinom 并重置输出的维度。这应该会显着提高性能并减少需要维护的代码。
A crosstab is also a multinomial distribution, so you can use
rmultinom
and reset the dimension on the output. This should give a substantial performance boost and cut down on the code you need to maintain.如果您不希望
table()
“删除”缺失的组合,则需要强制Frame
的列成为因子:If you don't want
table()
to "drop" missing combinations, you need to force the columns ofFrame
to be factors:您可以使用
tabulate
而不是table
;它适用于整数值向量,就像你在这里一样。您还可以直接使用array
将输出获取到数组中,就像创建原始数据时一样。You could use
tabulate
instead oftable
; it works on integer-valued vectors, as you have here. You could also get the output into an array by usingarray
directly, just like when you created the original data.