R 有没有办法找到 Inf/-Inf 值?

发布于 2024-12-23 15:08:28 字数 590 浏览 2 评论 0原文

我正在尝试在大型数据集(5000x300)上运行随机森林。不幸的是,我收到如下错误消息:

> RF <- randomForest(prePrior1, postPrior1[,6]
+                    ,,do.trace=TRUE,importance=TRUE,ntree=100,,forest=TRUE)
Error in randomForest.default(prePrior1, postPrior1[, 6], , do.trace = TRUE,  : 
  NA/NaN/Inf in foreign function call (arg 1)

因此,我尝试使用 : 查找任何 NA

> df2 <- prePrior1[is.na(prePrior1)]
> df2 
character(0)
> df2 <- postPrior1[is.na(postPrior1[,6])]
> df2 
numeric(0)

,这使我相信 Inf 才是问题所在,因为似乎没有任何 NA。

关于如何根除 Inf 有什么建议吗?

I'm trying to run a randomForest on a large-ish data set (5000x300). Unfortunately I'm getting an error message as follows:

> RF <- randomForest(prePrior1, postPrior1[,6]
+                    ,,do.trace=TRUE,importance=TRUE,ntree=100,,forest=TRUE)
Error in randomForest.default(prePrior1, postPrior1[, 6], , do.trace = TRUE,  : 
  NA/NaN/Inf in foreign function call (arg 1)

So I try to find any NA's using :

> df2 <- prePrior1[is.na(prePrior1)]
> df2 
character(0)
> df2 <- postPrior1[is.na(postPrior1[,6])]
> df2 
numeric(0)

which leads me to believe that it's Inf's that are the problem as there don't seem to be any NA's.

Any suggestions for how to root out Inf's?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

溺深海 2024-12-30 15:08:28

您可能正在寻找 is.finite,尽管我不能 100% 确定问题出在您的输入数据中的 Infs。

请务必仔细阅读 is.finite 的帮助,了解它会选择哪些缺失、无限等组合。具体来说,是这样的:

> is.finite(c(1,NA,-Inf,NaN))
[1]  TRUE FALSE FALSE FALSE
> is.infinite(c(1,NA,-Inf,NaN))
[1] FALSE FALSE  TRUE FALSE

其中一件事情与其他事情不同。毫不奇怪,还有一个 is.nan 函数。

You're probably looking for is.finite, though I'm not 100% certain that the problem is Infs in your input data.

Be sure to read the help for is.finite carefully about which combinations of missing, infinite, etc. it picks out. Specifically, this:

> is.finite(c(1,NA,-Inf,NaN))
[1]  TRUE FALSE FALSE FALSE
> is.infinite(c(1,NA,-Inf,NaN))
[1] FALSE FALSE  TRUE FALSE

One of these things is not like the others. Not surprisingly, there's an is.nan function as well.

私藏温柔 2024-12-30 15:08:28

randomForest 的“外部函数调用中的 NA/NaN/Inf” 通常是一个错误警告,并且非常令人恼火:

  • 如果传递的任何变量是字符
  • 实际的 NaN, 您就会得到这个和 Infs 在干净的数据中几乎不会发生

我的快速而肮脏的技巧来缩小范围,对变量列表进行二进制搜索,并使用像 ntree=2ntree=2 这样的标记参数代码> 得到一个变量子集即时通过/失败:

RF <- randomForest(prePrior1[m:n],ntree=2,...)

randomForest's 'NA/NaN/Inf in foreign function call' is often a false warning, and really irritating:

  • you will get this if any of the variables passed is character
  • actual NaNs and Infs almost never happen in clean data

My fast-and-dirty trick to narrow things down, do a binary-search on your variable list, and use token parameters like ntree=2 to get an instant pass/fail on the subset of variables:

RF <- randomForest(prePrior1[m:n],ntree=2,...)
三生池水覆流年 2024-12-30 15:08:28

is.na 类似,您可以使用 is.infinite 来查找无穷大的出现。

In analogy to is.na, you can use is.infinite to find occurrences of infinites.

泛滥成性 2024-12-30 15:08:28

看看 with,例如:

> with(df, df == Inf)
        foo   bar   baz   abc ...
[1,]  FALSE FALSE  TRUE FALSE ...
[2,]  FALSE  TRUE FALSE FALSE ...
...

Take a look at with, e.g.:

> with(df, df == Inf)
        foo   bar   baz   abc ...
[1,]  FALSE FALSE  TRUE FALSE ...
[2,]  FALSE  TRUE FALSE FALSE ...
...
泪冰清 2024-12-30 15:08:28

乔兰的回答是你想要的并且内容丰富。有关 is.na()is.infinite() 的更多详细信息,您应该查看 https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/is.na-methods.html
此外,在获得表示原始向量的每个元素是否为 NA/Inf 的逻辑向量后,您可以使用 which() 函数来获取索引,就像这样:

> v1 <- c(1, Inf, 2, NaN, Inf, 3, NaN, Inf)
> is.infinite(v1)
[1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE
> which(is.infinite(v1))
[1] 2 5 8
> is.na(v1)
[1] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE
> which(is.na(v1))
[1] 4 7

文档which() 在这里 https://stat.ethz.ch/R-manual/R-devel/library/base/html/any.html

joran's answer is what you want and informative. For more details about is.na() and is.infinite(), you should check out https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/is.na-methods.html
and besides, after you get the logical vector which says whether each element of the original vector is NA/Inf, you can use the which() function to get the indices, just like this:

> v1 <- c(1, Inf, 2, NaN, Inf, 3, NaN, Inf)
> is.infinite(v1)
[1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE
> which(is.infinite(v1))
[1] 2 5 8
> is.na(v1)
[1] FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE
> which(is.na(v1))
[1] 4 7

the document for which() is here https://stat.ethz.ch/R-manual/R-devel/library/base/html/any.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文