C 中的浮点算法
我最近在思考浮点数学如何在计算机上工作,并且我很难理解公式背后的所有技术细节。我需要了解加法、减法、乘法、除法和余数的基础知识。有了这些,我将能够制作三角函数和公式。
我能猜到一些,但有点不清楚。我知道可以通过用信号标志、基数和尾数分隔 4 字节整数来得到固定点。这样我们就有了一个 1 位标志、一个 5 位基数和一个 10 位尾数。 32 位字对于浮点值来说是完美的:)
要在两个浮点数之间进行加法,我可以简单地尝试将两个尾数相加并将进位添加到 5 位基数上?这是一种进行浮点数学(或定点数学,确实如此)的方法,还是我完全错了?
我看到的所有解释都使用公式、乘法等,它们看起来很复杂,我想,会更简单一点。我需要更多针对初级程序员而不是数学家的解释。
I am thinking recently on how floating point math works on computers and is hard for me understand all the tecnicals details behind the formulas. I would need to understand the basics of addition, subtraction, multiplication, division and remainder. With these I will be able to make trig functions and formulas.
I can guess something about it, but its a bit unclear. I know that a fixed point can be made by separating a 4 byte integer by a signal flag, a radix and a mantissa. With this we have a 1 bit flag, a 5 bits radix and a 10 bit mantissa. A word of 32 bits is perfect for a floating point value :)
To make an addition between two floats, I can simply try to add the two mantissas and add the carry to the 5 bits radix? This is a way to do floating point math (or fixed point math, to be true) or I am completely wrong?
All the explanations I saw use formulas, multiplications, etc. and they look so complex for a thing I guess, would be a bit more simple. I would need an explanation more directed to beginning programmers and less to mathematicians.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
请参阅浮点数剖析
See Anatomy of a floating point number
基数取决于表示形式,如果您使用基数 r=2,则永远无法更改它,该数字甚至没有任何数据告诉您哪个基数有。我认为你错了,你的意思是指数。
要将两个浮点数相加,必须通过旋转尾数使指数等于另一个。右一位表示指数+1,左一位表示指数-1,当有相同指数的数字时,可以将它们相加。
Value(x) = 尾数 * 基数 ^ 指数
使指数相等后即可进行运算。
您还必须知道表示形式是否具有隐式位,我的意思是,最高有效位必须是 1,所以通常,正如在 iee 标准中那样,它已知存在,但它没有表示,尽管它用于操作。
我知道这可能有点令人困惑,而且我不是最好的老师,所以如果您有任何疑问,尽管问。
The radix depends of the representation, if you use radix r=2 you can never change it, the number doesn't even have any data that tell you which radix have. I think you're wrong and you mean exponent.
To add two numbers in floating point you must make the exponent one equal to another by rotating the mantissa. One bit right means exponent+1, and one bit left means exponent -1, when you have the numbers with the same exponent then you can add them.
Value(x) = mantissa * radix ^ exponent
After making exponent equal one to another you can operate.
You also have to know if the representation has implicit bit, I mean, the most significant bit must be a 1, so usually, as in the iee standard its known to be there, but it isn't representated, although its used to operate.
I know this can be a bit confusing and I'm not the best teacher so any doubt you have, just ask.
跑步,不要走路,去获得 Knuth 的半数值算法,其中包含美妙的直觉以及多精度和浮点运算背后的算法。
Run, don't walk, to get Knuth's Seminumerical Algorithms which contains wonderful intuition and algorithms behind doing multiprecision and floating point arithmetic.