保留 R 数据框中的数值精度？

发布于 2024-10-09 15:55:35 字数 1211 浏览 0 评论 0原文

当我从数值向量创建数据帧时，R 似乎将值截断为低于我在分析中所需的精度：

data.frame(x=0.99999996)

返回 1 （*但请参阅更新 1）

在拟合样条线时我陷入困境(x,y) 并且其中两个 x 值由于舍入而设置为 1，而 y 发生变化。我可以解决这个问题，但我更愿意使用标准解决方案（如果可用）。

示例

这是一个示例数据集

d <- data.frame(x = c(0.668732936336141, 0.95351462456867,
0.994620622127435, 0.999602102672081, 0.999987126195509, 0.999999955814133,
0.999999999999966), y = c(38.3026509783688, 11.5895099585560,
10.0443344234229, 9.86152339768516, 9.84461434575695, 9.81648333804257,
9.83306725758297))

以下解决方案有效，但我更喜欢不太主观的东西：

plot(d$x, d$y, ylim=c(0,50))
lines(spline(d$x, d$y),col='grey') #bad fit
lines(spline(d[-c(4:6),]$x, d[-c(4:6),]$y),col='red') #reasonable fit

更新1

*自从发布此问题后，我意识到这会返回1 即使数据帧仍然包含原始值，例如

> dput(data.frame(x=0.99999999996))

structure(list(x = 0.99999999996), .Names = "x", row.names = c(NA, 
-1L), class = "data.frame")

Update 2

在使用 dput 发布此示例数据集以及 Dirk 的一些指针后，我可以看到问题不在于 x 值的截断，而在于我用来计算 y 的模型中数值误差的限制。这证明删除一些等效数据点是合理的（如示例红线所示）。

原文

When I create a dataframe from numeric vectors, R seems to truncate the value below the precision that I require in my analysis:

data.frame(x=0.99999996)

returns 1 (*but see update 1)

I am stuck when fitting spline(x,y) and two of the x values are set to 1 due to rounding while y changes. I could hack around this but I would prefer to use a standard solution if available.

example

Here is an example data set

d <- data.frame(x = c(0.668732936336141, 0.95351462456867,
0.994620622127435, 0.999602102672081, 0.999987126195509, 0.999999955814133,
0.999999999999966), y = c(38.3026509783688, 11.5895099585560,
10.0443344234229, 9.86152339768516, 9.84461434575695, 9.81648333804257,
9.83306725758297))

The following solution works, but I would prefer something that is less subjective:

plot(d$x, d$y, ylim=c(0,50))
lines(spline(d$x, d$y),col='grey') #bad fit
lines(spline(d[-c(4:6),]$x, d[-c(4:6),]$y),col='red') #reasonable fit

Update 1

*Since posting this question, I realize that this will return 1 even though the data frame still contains the original value, e.g.

> dput(data.frame(x=0.99999999996))

returns

structure(list(x = 0.99999999996), .Names = "x", row.names = c(NA, 
-1L), class = "data.frame")

Update 2

After using dput to post this example data set, and some pointers from Dirk, I can see that the problem is not in the truncation of the x values but the limits of the numerical errors in the model that I have used to calculate y. This justifies dropping a few of the equivalent data points (as in the example red line).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

给妤﹃绝世温柔 2024-10-16 15:55:35

如果您确实希望设置 R 以完全不合理的精度打印其结果，请使用：options(digits=16)。

请注意，这对于使用 htese 结果的函数的准确性没有任何作用。它只是改变值打印到控制台时的显示方式。除非您输入的有效数字超出了横坐标可以处理的范围，否则在存储或访问这些值时不会对这些值进行舍入。 'digits' 选项对浮点数的最大精度没有影响。

回复收藏 0 原文

七婞 2024-10-16 15:55:35

请重新阅读 R FAQ 7.31 以及其中引用的参考文献——一篇非常著名的论文，讲述了每个人都应该了解计算机上的浮点表示。

Kerngighan 和 Plauger 的结束语也很精彩：

10.0 乘以 0.1 几乎不可能是 1.0。

除了数值精度问题之外，当然还有 R 打印时使用的小数位数少于其内部使用的小数位数：

> for (d in 4:8) print(0.99999996, digits=d)
[1] 1
[1] 1
[1] 1
[1] 1
[1] 0.99999996
>

Please re-read R FAQ 7.31 and the reference cited therein -- a really famous paper on what everbody should know about floating-point representation on computers.

The closing quote from Kerngighan and Plauger is also wonderful:

10.0 times 0.1 is hardly ever 1.0.

And besides the numerical precision issue, there is of course also how R prints with fewer decimals than it uses internally:

> for (d in 4:8) print(0.99999996, digits=d)
[1] 1
[1] 1
[1] 1
[1] 1
[1] 0.99999996
>

回复收藏 0 原文

~没有更多了~

关于作者

怀中猫帐中妖

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

保留 R 数据框中的数值精度？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

保留 R 数据框中的数值精度？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。