如何解决导致 Perl 的统计::描述性无限循环的舍入错误?

发布于 2024-07-23 06:14:28 字数 906 浏览 10 评论 0原文

我正在 Perl 中使用 Statistics::Descriptive 库来计算频率分布并进行计算解决浮点舍入误差问题。

我将两个值 0.205 和 0.205(取自其他数字并通过 sprintf 获取)传递给 stats 模块,并要求它计算频率分布,但它陷入了无限循环。

使用调试器单步执行,我可以看到它正在执行:

my $interval = $self->{sample_range}/$partitions;

my $iter = $self->{min};

while (($iter += $interval) <  $self->{max}) {

  $bins{$iter} = 0;

  push @k, $iter;  ##Keep the "keys" unstringified

}

$self->sample_range (范围是最大-最小)返回 2.77555756156289e-17 而不是我期望的 0。 这意味着循环((min+=range)

DB 8 是 打印 $self->{max};
0.205
DB 9 是 打印 $self->{min};
0.205
DB<10> print $self->{max}-$self->{min};
2.77555756156289e-17

所以这看起来像是一个舍入问题。 不过,我不知道如何解决这个问题,而且我不确定编辑库是一个好主意。 我正在寻找解决方法或替代方案的建议。

干杯, 尼尔

I'm using the Statistics::Descriptive library in Perl to calculate frequency distributions and coming up against a floating point rounding error problem.

I pass in two values, 0.205 and 0.205, (taken from other numbers and sprintf'd to those) to the stats module and ask it to calculate the frequency distribution but it's getting stuck in an infinite loop.

Stepping through with a debugger I can see that it's doing:

my $interval = $self->{sample_range}/$partitions;

my $iter = $self->{min};

while (($iter += $interval) <  $self->{max}) {

  $bins{$iter} = 0;

  push @k, $iter;  ##Keep the "keys" unstringified

}

$self->sample_range (The range is max-min)is returning 2.77555756156289e-17 rather than 0 as I'd expect. This means that the loop ((min+=range) < max)) enters a (for all intents and purposes) infinite loop.

DB<8> print $self->{max};
0.205
DB<9> print $self->{min};
0.205
DB<10> print $self->{max}-$self->{min};
2.77555756156289e-17

So this looks like a rounding problem. I can't think how to fix this on my side though, and I'm not sure editing the library is a good idea. I'm looking for suggestions of a workaround or alternative.

Cheers,
Neil

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

诗笺 2024-07-30 06:14:28

我是Statistics::Descriptive 维护者。 由于其数字性质,已报告了许多舍入问题。 我相信这个特定的问题在我最近发布的您使用的版本的更高版本中已修复,通过使用乘法代替 += 来进行除法。

请使用 CPAN 中的最新版本,它应该是更好的。

I am the Statistics::Descriptive maintainer. Due to its numeric nature, many rounding problems have been reported. I believe this particular one was fixed in a later version to the one you were using that I released recently, by using multiplication for the divisions instead of +=.

Please use the most up-to-date version from the CPAN, and it should be better.

玩物 2024-07-30 06:14:28

不完全是一个舍入问题; 您可以使用类似的内容查看更精确的值

printf("%.18g %.18g", $self->{max}, $self->{min});

在我看来,模块中存在一个缺陷,它假设样本范围可以分为 $partitions 块; 因为浮点不具有无限精度,所以这并不总是可能的。 在您的情况下,最小值和最大值是完全相邻的可表示值,因此不能有多个分区。 我不知道该模块到底使用分区的目的是什么,所以我不确定这可能会产生什么影响。
该模块中的另一个可能的问题是它使用数字作为哈希键,这
隐式地对它们进行字符串化,从而稍微舍入值。

在输入数据之前,您可能会通过字符串化成功地清洗数据
到模块:

$data = 0+"$data";

这至少可以确保(使用默认打印精度)看起来相等的两个数字实际上是相等的。

Not exactly a rounding problem; you can see the more precise values with something like

printf("%.18g %.18g", $self->{max}, $self->{min});

Looks to me like there's a flaw in the module where it assumes the sample range can be divided up into $partitions pieces; because floating point doesn't have infinite precision, this isn't always possible. In your case, the min and max values are exactly adjacent representable values, so there can't be more than one partition. I don't know what exactly the module is using the partitions for, so I'm not sure what the impact of this may be.
Another possible problem in the module is that it is using numbers as hash keys, which
implicitly stringifies them which slightly rounds the value.

You may have some success in laundering your data through stringization before feeding it
to the module:

$data = 0+"$data";

This will at least ensure that two numbers that (with the default printing precision) appear equal are actually equal.

星軌x 2024-07-30 06:14:28

这不应该导致无限循环。 如果 $self->{sample_range}/$partitions 为 0,则会导致循环无限。

That shouldn't cause an infinite loop. What would cause that loop to be infinite would be if $self->{sample_range}/$partitions is 0.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文