避免精度损失的最佳算法?

发布于 2024-07-12 06:16:33 字数 601 浏览 13 评论 0原文

我最近收到的一项家庭作业要求我们采用在计算机中执行时可能会造成精度损失的表达式,并更改它们以避免这种损失。

不幸的是,执行此操作的指示尚未非常明确。 通过观看各种正在执行的示例,我知道有一些方法可以做到这一点:使用泰勒级数,如果涉及平方根则使用共轭,或者在两个分数相减时找到公分母。

然而,我很难准确地注意到何时会发生精度损失。 到目前为止,我唯一确定的是,当您减去两个接近相同的数字时,会发生精度损失,因为高位数字很重要,并且您会因四舍五入而丢失这些数字。

我的问题是我应该寻找哪些其他常见情况,以及哪些被认为是解决这些情况的“好”方法?

例如,这里有一个问题:

f(x) = tan(x) − sin(x)  when x ~ 0

从这三个选择中评估这一点的最佳和最差算法是什么:

(a) (1/ cos(x) − 1) sin(x),
(b) (x^3)/2
(c) tan(x)*(sin(x)^2)/(cos(x) + 1).

我知道当 x 接近于零时,tan(x) 和 sin(x) 几乎相同。 我不明白这些算法如何或为何在解决问题方面更好或更差。

A recent homework assignment I have received asks us to take expressions which could create a loss of precision when performed in the computer, and alter them so that this loss is avoided.

Unfortunately, the directions for doing this haven't been made very clear. From watching various examples being performed, I know that there are certain methods of doing this: using Taylor series, using conjugates if square roots are involved, or finding a common denominator when two fractions are being subtracted.

However, I'm having some trouble noticing exactly when loss of precision is going to occur. So far the only thing I know for certain is that when you subtract two numbers that are close to being the same, loss of precision occurs since high order digits are significant, and you lose those from round off.

My question is what are some other common situations I should be looking for, and what are considered 'good' methods of approaching them?

For example, here is one problem:

f(x) = tan(x) − sin(x)  when x ~ 0

What is the best and worst algorithm for evaluating this out of these three choices:

(a) (1/ cos(x) − 1) sin(x),
(b) (x^3)/2
(c) tan(x)*(sin(x)^2)/(cos(x) + 1).

I understand that when x is close to zero, tan(x) and sin(x) are nearly the same. I don't understand how or why any of these algorithms are better or worse for solving the problem.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

不知所踪 2024-07-19 06:16:33

通常使用的另一个经验法则是:当添加一长串数字时,从最接近零的数字开始添加,并以最大的数字结束。

解释为什么这很好有点棘手。 当您将小数字添加到大数字时,它们有可能被完全丢弃,因为它们小于大数字当前尾数中的最低数字。 以这种情况为例:

a = 1,000,000;
do 100,000,000 time:
   a += 0.01;

如果 0.01 小于最低尾数,则循环不执行任何操作,最终结果为 a == 1,000,000
但如果你这样做:

a = 0;
do 100,000,000 time:
   a += 0.01;
a += 1,000,000;

比低数字缓慢增长,你更有可能最终得到接近 == 2,000,000 的值,这是正确的答案。
这当然是一个极端的例子,但我希望你能明白。

Another rule of thumb usually used is this: When adding a long series of numbers, start adding from numbers closest to zero and end with the biggest numbers.

Explaining why this is good is abit tricky. when you're adding small numbers to a large numbers, there is a chance they will be completely discarded because they are smaller than then lowest digit in the current mantissa of a large number. take for instance this situation:

a = 1,000,000;
do 100,000,000 time:
   a += 0.01;

if 0.01 is smaller than the lowest mantissa digit, then the loop does nothing and the end result is a == 1,000,000
but if you do this like this:

a = 0;
do 100,000,000 time:
   a += 0.01;
a += 1,000,000;

Than the low number slowly grow and you're more likely to end up with something close to a == 2,000,000 which is the right answer.
This is ofcourse an extreme example but I hope you get the idea.

秋日私语 2024-07-19 06:16:33

当我还是一名本科生时,我不得不重新上一门数字课,这非常痛苦。 无论如何,IEEE 754 是现代 CPU 通常实现的浮点标准。 了解它的基础知识很有用,因为这可以让您对不该做什么有很多直觉。 对此的简单解释是,计算机以类似于以 2 为基数的科学计数法的形式存储浮点数,其中指数和尾数具有固定数量的数字(位)。 这意味着数字的绝对值越大,表示的精度就越低。 对于 IEEE 754 中的 32 位浮点数,一半可能的位模式表示在 -1 和 1 之间,即使高达约 10^38 的数字可以用 32 位浮点数表示。 对于大于 2^24(大约 1670 万)的值,32 位浮点无法准确表示所有整数。

这对您来说意味着您通常希望避免以下情况:

  1. 当最终答案预计很小时,中间值很大。
  2. 将小数与大数相加/减去。 例如,如果您写了类似以下内容:

    for(float index = 17000000;index < 17000001;index++) {}

该循环永远不会终止,因为 17,000,000 + 1 向下舍入为 17,000,000。
如果您有类似的情况:

float foo = 10000000 - 10000000.0001

由于舍入误差,foo 的值将为 0,而不是 -0.0001。

I had to take a numerics class back when I was an undergrad, and it was thoroughly painful. Anyhow, IEEE 754 is the floating point standard typically implemented by modern CPUs. It's useful to understand the basics of it, as this gives you a lot of intuition about what not to do. The simplified explanation of it is that computers store floating point numbers in something like base-2 scientific notation with a fixed number of digits (bits) for the exponent and for the mantissa. This means that the larger the absolute value of a number, the less precisely it can be represented. For 32-bit floats in IEEE 754, half of the possible bit patterns represent between -1 and 1, even though numbers up to about 10^38 are representable with a 32-bit float. For values larger than 2^24 (approximately 16.7 million) a 32-bit float cannot represent all integers exactly.

What this means for you is that you generally want to avoid the following:

  1. Having intermediate values be large when the final answer is expected to be small.
  2. Adding/subtracting small numbers to/from large numbers. For example, if you wrote something like:

    for(float index = 17000000; index < 17000001; index++) {}

This loop would never terminate becuase 17,000,000 + 1 is rounded down to 17,000,000.
If you had something like:

float foo = 10000000 - 10000000.0001

The value for foo would be 0, not -0.0001, due to rounding error.

酒儿 2024-07-19 06:16:33

我的问题是还有其他一些
我应该关注的常见情况
以及什么被认为是“好”
接近他们的方法?

有多种方法可能导致严重甚至灾难性的精度损失。

最重要的原因是浮点数的位数有限,例如双精度数有 53 位。 这意味着如果您有“无用”的数字,这些数字不是解决方案的一部分但必须存储,您就会失去精度。

例如(我们使用十进制类型进行演示):

2.598765000000000000000000000100 -

2.598765000000000000000000000099

有趣的部分是 100-99 = 1 答案。 由于 2.598765 在两种情况下都相等,因此
不改变结果,但浪费8位数字。 更糟糕的是,因为计算机不
知道这些数字是无用的,它被迫存储它并在其后面塞入21个零,
浪费了全部29位数字。 不幸的是,没有办法规避差异,
但还有其他情况,例如exp(x)-1,这是物理学中经常出现的函数。

exp 函数在 0 附近几乎是线性的,但它强制以 1 作为前导数字。 所以有 12
有效数字
exp(0.001)-1 = 1.00100050017 - 1 = 1.00050017e-3

如果我们使用函数 expm1(),则使用泰勒级数:

1 + x +x^2/2 +x^3/6 ... -1 =

x +x^2/2 +x^3/6 =: expm1(x)

expm1(0.001) = 1.00500166667e-3

好多了。

第二个问题是具有非常陡峭斜率的函数,例如 x 在 pi/2 附近的正切。
tan(11) 的斜率为 50000,这意味着由舍入误差引起的任何小偏差
之前会放大50000倍! 或者,如果结果接近 0/0,则存在奇点,这意味着它可以具有任何值。

在这两种情况下,您都创建一个替代函数,简化原始函数。 强调不同的解决方法是没有用的,因为如果不经过培训,您一开始就不会“看到”问题。

一本非常好的学习和培训书籍:Forman S. Acton:真正的计算成为现实

My question is what are some other
common situations I should be looking
for, and what are considered 'good'
methods of approaching them?

There are several ways you can have severe or even catastrophic loss of precision.

The most important reason is that floating-point numbers have a limited number of digits, e.g..doubles have 53 bits. That means if you have "useless" digits which are not part of the solution but must be stored, you lose precision.

For example (We are using decimal types for demonstration):

2.598765000000000000000000000100 -

2.598765000000000000000000000099

The interesting part is the 100-99 = 1 answer. As 2.598765 is equal in both cases, it
does not change the result, but waste 8 digits. Much worse, because the computer doesn't
know that the digits is useless, it is forced to store it and crams 21 zeroes after it,
wasting at all 29 digits. Unfortunately there is no way to circumvent it for differences,
but there are other cases, e.g. exp(x)-1 which is a function occuring very often in physics.

The exp function near 0 is almost linear, but it enforces a 1 as leading digit. So with 12
significant digits
exp(0.001)-1 = 1.00100050017 - 1 = 1.00050017e-3

If we use instead a function expm1(), use the taylor series:

1 + x +x^2/2 +x^3/6 ... -1 =

x +x^2/2 +x^3/6 =: expm1(x)

expm1(0.001) = 1.00500166667e-3

Much better.

The second problem are functions with a very steep slope like tangent of x near pi/2.
tan(11) has a slope of 50000 which means that any small deviation caused by rounding errors
before will be amplified by the factor 50000 ! Or you have singularities if e.g. the result approaches 0/0, that means it can have any value.

In both cases you create a substitute function, simplying the original function. It is of no use to highlight the different solution approaches because without training you will simply not "see" the problem in the first place.

A very good book to learn and train: Forman S. Acton: Real Computing made real

So要识趣 2024-07-19 06:16:33

另一件要避免的事情是减去几乎相等的数字,因为这也会导致对舍入误差的敏感性增加。 对于接近 0 的值,cos(x) 将接近 1,因此 1/cos(x) - 1 是您希望尽可能避免的减法之一,所以我想说应该避免 (a) 。

Another thing to avoid is subtracting numbers that are nearly equal, as this can also lead to increased sensitivity to roundoff error. For values near 0, cos(x) will be close to 1, so 1/cos(x) - 1 is one of those subtractions that you'd like to avoid if possible, so I would say that (a) should be avoided.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文