如何使用 ddply 按组对数据进行子采样?
我有一个包含太多行的数据框,无法进行空间相关图。相反,我想为每个物种抓取 40 行,并在该子集上运行我的相关图。
我编写了一个函数来对数据框进行子集化,如下所示:
samp <- function(dataf)
{
dataf[sample(1:dim(dataf)[1], size=40, replace=FALSE),]
}
现在我想将此函数应用于更大的数据框中的每个物种。
当我尝试类似的操作时,
culled_data = ddply (larger_data, .(species), subset, samp)
我收到此错误:
Error in subset.data.frame(piece, ...) :
'subset' must evaluate to logical
有人对如何执行此操作有想法吗?
I've got a data frame with far too many rows to be able to do a spatial correlogram. Instead, I want to grab 40 rows for each species and run my correlogram on that subset.
I wrote a function to subset a data frame as follows:
samp <- function(dataf)
{
dataf[sample(1:dim(dataf)[1], size=40, replace=FALSE),]
}
Now I want to apply this function to each species in a larger data frame.
When I try something like
culled_data = ddply (larger_data, .(species), subset, samp)
I get this error:
Error in subset.data.frame(piece, ...) :
'subset' must evaluate to logical
Anyone got ideas on how to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一旦您从调用中删除
,subset
,它看起来应该可以工作。It looks like it should work once you remove
, subset
from your call.德克的答案当然是正确的,但为了添加额外的解释,我发布了自己的答案。
为什么你的电话打不通?
首先,您的语法是简写。它相当于
您可以清楚地看到您提供了
function
(请参阅class(samp)
)作为subset
的第二个参数。您可以使用samp(dfrm)
,但它也不起作用,因为samp
返回data.frame
和subset
> 需要逻辑向量。因此,当它返回逻辑索引时,您可以使用 samp(dfrm) 。在这种情况下如何使用子集?
通过向他提供逻辑向量来使
subset
工作:我用 40
TRUE
创建逻辑向量(顺便说一句,当某些片段少于 40 个案例时,它会起作用,然后返回全部)并随机它。Dirk answer is of course correct, but to add additional explanation I post my own.
Why your call don't work?
First of all your syntax is a shorthand. It's equivalent of
so you can clearly see that you provide
function
(seeclass(samp)
) as second argument ofsubset
. You could usesamp(dfrm)
, but it won't work too causesamp
returndata.frame
andsubset
need logical vector. So you could usesamp(dfrm)
when it returns logical indexing.How to use subset in this case?
Make
subset
work by feed him with logical vector:I create logical vector with 40
TRUE
(btw it works when for some spieces is less then 40 cases, then it return all) and random it.