R reshape2 中的cast()调用的自定义聚合函数出错
我想使用 R 将具有非唯一行名称的表中的数字数据汇总到具有唯一行名称的结果表,并使用自定义函数汇总值。汇总逻辑为:如果最大值与最小值之比<1,则取平均值。 1.5,否则使用中位数。因为表非常大,所以我尝试在 reshape2 包。
# example table with non-unique row-names tab <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9)) # melt tab.melt <- melt(tab, id=1) # function to summarize with logic: mean if max/min < 1.5, else median summarize <- function(x){ifelse(max(x)/min(x)<1.5, mean(x), median(x))} # cast with summarized values dcast(tab.melt, gene~variable, summarize)
上面的最后一行代码会产生错误通知。
Error in vapply(indices, fun, .default) : values must be type 'logical', but FUN(X[[1]]) result is type 'double' In addition: Warning messages: 1: In max(x) : no non-missing arguments to max; returning -Inf 2: In min(x) : no non-missing arguments to min; returning Inf
我做错了什么?请注意,如果汇总函数仅返回 min() 或 max(),则不会出现错误,但会出现有关“无非缺失参数”的警告消息。感谢您的任何建议。
(我想要使用的实际表格是 200x10000 的表格。)
I want to use R to summarize numerical data in a table with non-unique rownames to a result table with unique row-names with values summarized using a custom function. The summarization logic is: use the mean of values if the ratio of the maximum to the minimum value is < 1.5, else use median. Because the table is very large, I am trying to use the melt() and cast() functions in the reshape2 package.
# example table with non-unique row-names tab <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9)) # melt tab.melt <- melt(tab, id=1) # function to summarize with logic: mean if max/min < 1.5, else median summarize <- function(x){ifelse(max(x)/min(x)<1.5, mean(x), median(x))} # cast with summarized values dcast(tab.melt, gene~variable, summarize)
The last line of code above results in an error notice.
Error in vapply(indices, fun, .default) : values must be type 'logical', but FUN(X[[1]]) result is type 'double' In addition: Warning messages: 1: In max(x) : no non-missing arguments to max; returning -Inf 2: In min(x) : no non-missing arguments to min; returning Inf
What am I doing wrong? Note that if the summarize function were to just return min(), or max(), there is no error, though there is the warning message about 'no non-missing arguments.' Thank you for any suggestion.
(The actual table I want to work with is a 200x10000 one.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
简短回答:提供一个填充值,如下所示
acast(tab.melt,gene~variable,summary,fill=0)
长答案:
看来你的函数在传递给 vaggregate 函数中的 vapply 之前被包装如下(dcast 调用cast,它调用 vaggregate,后者调用 vapply):
要找出 .default 应该是什么,执行此代码,
即 .value[0] 是传递给函数。当 x 为数字 (0) 时,min(x) 或 max(x) 返回 Inf 或 -Inf。但是,max(x)/min(x) 返回具有逻辑类的 NaN。 执行 vapply 时,该函数在开始返回双精度数时会失败。
因此,当使用逻辑类的默认值(由 vapply 用作模板)
Short answer: provide a value for fill as follows
acast(tab.melt, gene~variable, summarize, fill=0)
Long answer:
It appears your function gets wrapped as follows, before being passed to vapply in the vaggregate function (dcast calls cast which calls vaggregate which calls vapply):
To find out what .default should be, this code is executed
i.e. .value[0] is passed to the function. min(x) or max(x) returns Inf or -Inf on when x is numeric(0). However, max(x)/min(x) returns NaN which has class logical. So when vapply is executed
with the default value being is of class logical (used as template by vapply), the function fails when starting to return doubles.
dcast() 尝试将缺少的组合的值设置为默认值。
您可以通过 fill 参数指定它,但如果 fill=NULL,
然后 fun(0-lenght vector) 返回的值(即此处的 summarise(numeric(0)))用作默认值。
请参阅 ?dcast
那么,这是一个解决方法:
dcast() tries to set the value of missing combination by default value.
you can specify this by fill argument, but if fill=NULL,
then the value returned by fun(0-lenght vector) (i.e., summarize(numeric(0)) here) is used as default.
please see ?dcast
then, here is a workaround: