C——浮点舍入

发布于 2024-08-31 06:20:08 字数 297 浏览 3 评论 0原文

我想了解浮点数是如何工作的。

我想我想通过评估以下内容来测试我所知道/需要学习的内容:我想找到最小的 x 使得 x + 1 = x ,其中 x 是浮点数。

据我了解,如果 x 足够大,使得 x + 1 比下一个比浮点可表示的 x 高的数字更接近 x,就会发生这种情况。直观上来说,似乎我的有效数字没有足够的数字。那么这个数字 x 是否就是尾数全为 1 的数字?但后来我似乎无法弄清楚指数必须是多少。显然它必须很大(无论如何,相对于 10^0)。

I'm trying to understand how floating point numbers work.

I think I'd like to test out what I know / need to learn by evaluating the following: I would like to find the smallest x such that x + 1 = x, where x is a floating point number.

As I understand it, this would happen in the case where x is large enough so that x + 1 is closer to x than the next number higher than x representable by floating point. So intuitively it seems it would be the case where I don't have enough digits in the significand. Would this number x then be the number where the significand is all 1's. But then I can't seem to figure out what the exponent would have to be. Obviously it would have to be big (relative to 10^0, anyway).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我的黑色迷你裙 2024-09-07 06:20:08

您只需要用指数表示尾数中 LS 位的值。当这是>时1 那么你就满足了你的条件。对于单精度浮点数,LS 位的值为 2^-24*2^exp,因此当 exp > 时满足条件。 24,即 25。因此,满足此条件的最小(标准化)数字将为 1.0 * 2^25 = 33554432.0f。

我还没有检查过这一点,所以我的数学可能在某个地方有偏差(例如,2 倍),并且 FP 单元也可能舍入到第 24 位以上,因此可能还需要考虑 2 倍的因素这个,但你明白了一般的想法......

You just need an expression for the value of the LS bit in the mantissa in terms of the exponent. When this is > 1 then you have met your condition. For a single precision float the LS bit has a value of 2^-24*2^exp, so the condition would me met when exp is > 24, i.e. 25. The smallest (normalized) number where this condition would be satisfied would therefore be 1.0 * 2^25 = 33554432.0f.

I haven't checked this, so my maths may be off somewhere (e.g. by a factor of 2) and it's also possible that the FP unit does rounding beyond the 24th bit, so there may be a further factor of 2 needed to account for this, but you get the general idea...

酸甜透明夹心 2024-09-07 06:20:08

从 1.0 开始,不断加倍,直到测试成功:

double x;
for (x = 1.0; x + 1 != x; x *= 2) { }
printf("%g + 1 = %g\n", x, x + 1);

Start with 1.0, and keep doubling it until the test succeeds:

double x;
for (x = 1.0; x + 1 != x; x *= 2) { }
printf("%g + 1 = %g\n", x, x + 1);
花开浅夏 2024-09-07 06:20:08

我建议在尝试理解 fp 数字和 fp 算术时,使用十进制,尾数为 5 位,指数为 2 位。 (或者,如果 5 和 2 不适合您,则可以选择 6 和 3 或您喜欢的任何其他小数字。) 问题:

  • 可表示的数字集有限;
  • 非交换性、非结合性和非分配性;
  • 将 fp 数视为实数时可能出现的问题;

都更容易用十进制计算出来,并且您学到的教训完全是通用的。一旦您弄清楚了这一点,通过 IEEE fp 算法增强您的知识将相对简单。您还可以相对轻松地找出其他 fp 算术系统。

I suggest that while trying to understand f-p numbers and f-p arithmetic you work in decimal with 5 digits in the significand and 2 in the exponent. (Or, if 5 and 2 don't suit you, 6 and 3 or any other small numbers you like.) The issues of:

  • the limited set of numbers which can be represented;
  • non-commutativity, non-associativity and non-distributivity;
  • the problems which can arise when treating f-p numbers as real numbers;

are all much easier to figure out in decimal and the lessons you learn are entirely general. Once you've got this figured out, enhancing your knowledge with IEEE f-p arithmetic will be relatively straightforward. You'll also be able to figure out other f-p arithmetic systems with relative ease.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文