R reshape2 中的cast()调用的自定义聚合函数出错

发布于 2024-10-14 20:53:28 字数 1072 浏览 9 评论 0原文

我想使用 R 将具有非唯一行名称的表中的数字数据汇总到具有唯一行名称的结果表,并使用自定义函数汇总值。汇总逻辑为:如果最大值与最小值之比<1,则取平均值。 1.5,否则使用中位数。因为表非常大,所以我尝试在 reshape2 包。

# example table with non-unique row-names
tab <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9))
# melt
tab.melt <- melt(tab, id=1)
# function to summarize with logic: mean if max/min < 1.5, else median
summarize <- function(x){ifelse(max(x)/min(x)<1.5, mean(x), median(x))}
# cast with summarized values
dcast(tab.melt, gene~variable, summarize)

上面的最后一行代码会产生错误通知。

Error in vapply(indices, fun, .default) : 
  values must be type 'logical',
 but FUN(X[[1]]) result is type 'double'
In addition: Warning messages:
1: In max(x) : no non-missing arguments to max; returning -Inf
2: In min(x) : no non-missing arguments to min; returning Inf

我做错了什么?请注意,如果汇总函数仅返回 min() 或 max(),则不会出现错误,但会出现有关“无非缺失参数”的警告消息。感谢您的任何建议。

(我想要使用的实际表格是 200x10000 的表格。)

I want to use R to summarize numerical data in a table with non-unique rownames to a result table with unique row-names with values summarized using a custom function. The summarization logic is: use the mean of values if the ratio of the maximum to the minimum value is < 1.5, else use median. Because the table is very large, I am trying to use the melt() and cast() functions in the reshape2 package.

# example table with non-unique row-names
tab <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9))
# melt
tab.melt <- melt(tab, id=1)
# function to summarize with logic: mean if max/min < 1.5, else median
summarize <- function(x){ifelse(max(x)/min(x)<1.5, mean(x), median(x))}
# cast with summarized values
dcast(tab.melt, gene~variable, summarize)

The last line of code above results in an error notice.

Error in vapply(indices, fun, .default) : 
  values must be type 'logical',
 but FUN(X[[1]]) result is type 'double'
In addition: Warning messages:
1: In max(x) : no non-missing arguments to max; returning -Inf
2: In min(x) : no non-missing arguments to min; returning Inf

What am I doing wrong? Note that if the summarize function were to just return min(), or max(), there is no error, though there is the warning message about 'no non-missing arguments.' Thank you for any suggestion.

(The actual table I want to work with is a 200x10000 one.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

皇甫轩 2024-10-21 20:53:28

简短回答:提供一个填充值,如下所示
acast(tab.melt,gene~variable,summary,fill=0)

长答案:
看来你的函数在传递给 vaggregate 函数中的 vapply 之前被包装如下(dcast 调用cast,它调用 vaggregate,后者调用 vapply):

fun <- function(i) {
    if (length(i) == 0) 
        return(.default)
    .fun(.value[i], ...)
}

要找出 .default 应该是什么,执行此代码,

if (is.null(.default)) {
    .default <- .fun(.value[0])
}

即 .value[0] 是传递给函数。当 x 为数字 (0) 时,min(x) 或 max(x) 返回 Inf 或 -Inf。但是,max(x)/min(x) 返回具有逻辑类的 NaN。 执行 vapply 时,该函数在开始返回双精度数时会失败。

vapply(indices, fun, .default)

因此,当使用逻辑类的默认值(由 vapply 用作模板)

Short answer: provide a value for fill as follows
acast(tab.melt, gene~variable, summarize, fill=0)

Long answer:
It appears your function gets wrapped as follows, before being passed to vapply in the vaggregate function (dcast calls cast which calls vaggregate which calls vapply):

fun <- function(i) {
    if (length(i) == 0) 
        return(.default)
    .fun(.value[i], ...)
}

To find out what .default should be, this code is executed

if (is.null(.default)) {
    .default <- .fun(.value[0])
}

i.e. .value[0] is passed to the function. min(x) or max(x) returns Inf or -Inf on when x is numeric(0). However, max(x)/min(x) returns NaN which has class logical. So when vapply is executed

vapply(indices, fun, .default)

with the default value being is of class logical (used as template by vapply), the function fails when starting to return doubles.

仄言 2024-10-21 20:53:28

dcast() 尝试将缺少的组合的值设置为默认值。

您可以通过 fill 参数指定它,但如果 fill=NULL,
然后 fun(0-lenght vector) 返回的值(即此处的 summarise(numeric(0)))用作默认值。

请参阅 ?dcast

那么,这是一个解决方法:

 dcast(tab.melt, gene~variable, summarize, fill=NaN)

dcast() tries to set the value of missing combination by default value.

you can specify this by fill argument, but if fill=NULL,
then the value returned by fun(0-lenght vector) (i.e., summarize(numeric(0)) here) is used as default.

please see ?dcast

then, here is a workaround:

 dcast(tab.melt, gene~variable, summarize, fill=NaN)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文