如何在D中进行高精度计算？

发布于 2024-10-17 00:36:47 字数 233 浏览 8 评论 0原文

对于一些大学工作，我必须近似一些数字 - 例如带有级数的欧拉数字。因此我必须添加非常小的数字，但我在精度方面存在问题。如果数量很小，则不会影响结果。

real s;  //sum of all previous terms
ulong k; //factorial

s += 1.0/ k;

每一步之后，k 都会变得更小，但是在第 10 轮之后，结果不再改变并停留在 2.71828

原文

For some universitary work i have to approximate some numbers - like the Euler one with series. Therefore i have to add very small numbers, but i have problems with the precision. If the number ist very small it doesn't influence the result.

real s;  //sum of all previous terms
ulong k; //factorial

s += 1.0/ k;

after each step, k gets even smaller, but after the 10th round the result isn't changeing anymore and stuck at 2.71828

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

桃扇骨 2024-10-24 00:36:47

固定精度浮点类型，即 CPU 浮点单元本身支持的类型（float、double、real）对于任何情况都不是最佳选择。需要多位精度的计算，例如您给出的示例。

问题在于，这些浮点类型的精度位数（实际上是二进制数字）有限，这限制了此类数据类型可以表示的数字长度。 float 类型的限制为大约 7 位十进制数字（例如 3.141593）； double 类型限制为 14（例如 3.1415926535898）；而real类型也有类似的限制（比double略多）。

因此，将极小的数字添加到浮点值将导致这些数字丢失。观察将以下两个浮点值相加时会发生什么：

float a = 1.234567f, b = 0.0000000001234567
float c = a + b;

writefln("a = %f b = %f c = %f", a, b, c);

a 和 b 都是有效的浮点值，并且各自保留大约 7 位精度。但添加后，仅保留最前面的 7 位数字，因为它被推回浮点数：

1.2345670001234567 => 1.234567|0001234567 => 1.234567
                              ^^^^^^^^^^^
                         sent to the bit bucket

因此 c 最终等于 a，因为加法的精度更高a 和 b 被击落。

这是这个概念的另一种解释，可能比我的好得多。

这个问题的答案是任意精度算术。不幸的是，CPU 硬件不支持任意精度算术；因此，它（通常）不是您的编程语言。但是，有许多库支持任意精度浮点类型以及您想要对其执行的数学运算。请参阅此问题获取一些建议。今天您可能找不到任何用于此目的的 D 特定库，但有大量 C 库（GMP、MPFR 等）应该很容易单独使用，如果您能找到，则更是如此其中之一的 D 绑定。

Fixed-precision floating point types, the ones natively supported by your CPU's floating point unit (float, double, real) are not optimal for any calculation that needs many digits of precision, such as the example you've given.

The problem is that these floating-point types have a finite number of digits of precision (binary digits, actually) that limits the length of number that can be represented by such a data type. The float type has a limit of approximately 7 decimal digits (e.g. 3.141593); the double type is limited to 14 (e.g. 3.1415926535898); and the real type has a similar limit (slightly more than that of double).

Adding exceedingly small numbers to a floating-point value will therefore result in those digits being lost. Watch what happens when we add the following two float values together:

float a = 1.234567f, b = 0.0000000001234567
float c = a + b;

writefln("a = %f b = %f c = %f", a, b, c);

Both a and b are valid float values and retain approximately 7 digits of precision apiece in isolation. But when added, only the frontmost 7 digits are preserved because it's getting shoved back into a float:

1.2345670001234567 => 1.234567|0001234567 => 1.234567
                              ^^^^^^^^^^^
                         sent to the bit bucket

So c ends up equal to a because the finer digits of precision from the addition of a and b get whacked off.

Here's another explanation of the concept, probably much better than mine.

The answer to this problem is arbitrary-precision arithmetic. Unfortunately, support for arbitrary-precision arithmetic is not in CPU hardware; therefore, it's not (typically) in your programming language. However, there are many libraries that support arbitrary-precision floating-point types and the math you want to perform on them. See this question for some suggestions. You probably won't find any D-specific libraries for this purpose today, but there are plenty of C libraries (GMP, MPFR, and so on) that should be easy enough to use in isolation, and even more so if you can find D bindings for one of them.

回复收藏 0 原文

长安忆 2024-10-24 00:36:47

如果您需要一个使用本机类型运行的解决方案，您应该能够通过尝试始终添加相似大小的数字来获得合理的结果。一种方法是计算级数的前 X 项，然后用总和重复替换两个最小的数字：（

auto data = real[N];
foreach(i, ref v; data) {
  v = Fn(i);
}

while(data.length > 1) {
  data.sort(); // IIRC .sort is deprecated but I forget what replaced it.
  data[1] += data[0];
  data = data[1..$];
}

return data[0];

最小堆会使速度更快一些。）

If you need a solution that will run using the native types you should be able to get reasonable results by trying to always add numbers of similar magnitude. One way to do this is to compute the first X terms of the series, and then repeatedly replace the two smallest numbers with there sum:

auto data = real[N];
foreach(i, ref v; data) {
  v = Fn(i);
}

while(data.length > 1) {
  data.sort(); // IIRC .sort is deprecated but I forget what replaced it.
  data[1] += data[0];
  data = data[1..$];
}

return data[0];

(A min heap would make this a bit faster.)

回复收藏 0 原文