python中float的底层数据结构
有一个关于Python中浮点(和精度)的底层数据结构的问题:
>>> b = 1.4 + 2.3
>>> b
3.6999999999999997
>>> c = 3.7
>>> c
3.7000000000000002
>>> print b, c
3.7 3.7
>>> b == c
False
b和c的值似乎与机器相关,它们是最接近目标值的数字,但不完全相同。有人监督我用“Print”得到了“正确”的数字,有人告诉我这是因为 print 是“谎言”,而 Python 选择告诉我们真相,即准确地显示他们存储的内容。
我的问题是:
1、如何说谎?例如,在一个函数中,我们采用两个值,如果它们相同则返回,如果小数位数(精度)未知,我如何才能最好地猜测?就像上面提到的 b 和 c 吗?有没有明确的算法来做到这一点?有人告诉我,如果涉及浮点计算,每种语言(C/C++)都会遇到此类问题,但他们如何“解决”这个问题?
2. 为什么我们不能只存储实际数字而不是存储最接近的数字?这是一种限制还是为了效率而进行的交易?
非常感谢 约翰
Got a question regarding to the underlying data structure of float (and precision) in Python:
>>> b = 1.4 + 2.3
>>> b
3.6999999999999997
>>> c = 3.7
>>> c
3.7000000000000002
>>> print b, c
3.7 3.7
>>> b == c
False
it seems the values of b and c are machine dependent, they are the numbers that closest to the target values but not exactly the same numbers. I was supervised that we get the 'right' numbers with 'Print', and someone told me that it was because print 'lies' while Python chose to tell us the truth i.e. showing exactly what they have stored.
And my questions are:
1. How to lie? e.g. in a function we take two values and return if they are the same, how I could have a best guess if the number of decimal(precision) is unknown? like b and c mentioned above? is there a well defined algorithm to do that? I was told that every language (C/C++) will have this kind of issue if we have floating point calculation involved, but how do they 'solve' this?
2. why we cannot just store the actual number instead of storing the closest number? is it a limitation or trading for efficiency?
many thanks
John
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
要回答第一个问题,请看一下 Python 源代码中的以下(稍微精简的)代码:
因此,基本上,
repr(float)
将返回一个以 17 位精度格式化的字符串,并且 < code>str(float) 将返回一个 12 位精度的字符串。您可能已经猜到,print
使用str()
,而在解释器中输入变量名称则使用repr()
。由于只有 12 位精度,看起来您得到了“正确”答案,但这只是因为您期望的值和实际值在 12 位以内是相同的。这是差异的一个简单示例:
至于你的第二个问题,我建议你阅读 Python 教程的以下部分:浮点算术:问题和限制
当您以 2 为基数存储 10 进制小数时,它归结为效率、更少的内存和更快的浮点运算。表示,但您确实需要处理不精确性。
正如 JBernardo 在评论中指出的那样,这种行为在 Python 2.7 及更高版本中是不同的,上面教程链接中的以下引用描述了差异(使用
0.1
作为示例):For the answer to your first question, take a look at the following (slightly condensed) code from Python's source:
So basically,
repr(float)
will return a string formatted with 17 digits of precision, andstr(float)
will return a string with 12 digits of precision. As you might have guessed,print
usesstr()
and entering the variable name in the interpreter usesrepr()
. With only 12 digits of precision, it looks like you get the "correct" answer, but that is just because what you expect and the actual value are the same up to 12 digits.Here is a quick example of the difference:
As for your second question, I suggest you read the following section of the Python tutorial: Floating Point Arithmetic: Issues and Limitations
It boils down to efficiency, less memory and quicker floating point operations when you are storing base 10 decimals in base 2 than any other representation, but you do need to deal with the imprecision.
As JBernardo pointed out in comments, this behavior is different in Python 2.7 and above, the following quote from the above tutorial link describes the difference (using
0.1
as an example):您应该阅读臭名昭著的论文:
每个计算机科学家都应该了解浮动-点算术
单击“CACHED”链接以下载 PDF 格式的论文。
You should read the infamous paper:
What every computer scientist should know about floating-point arithmetic
Click on the link that says "CACHED" to download the paper in PDF format.
您在计算中会得到不同的结果,因为数字 1.4 和 2.3 也没有准确表示。添加它们时,您还会累积它们的精度限制。
所有浮点数都具有有限的精度,并且由于浮点数通常在内部表示的方式(使用基数 2 而不是基数 10),这些限制适用于我们人类认为很容易精确表示的数字。
有限的精度对于计算来说很少是问题,因为精度对于大多数应用来说仍然足够。另一方面,在比较浮点数时,必须考虑有限的精度。
这通常是通过将数字相减并检查与数字相比差异是否足够小来完成的。
因此,例如,如果:
那么您可以认为它们相等。您要考虑多少位数字取决于浮点数的精度,即您使用的是单精度还是双精度数字,以及您进行了哪些计算来得出这些数字。随着每次计算的精度限制不断累积,您可能需要降低它们被视为相等时的阈值。
显示浮点数时,会根据其精度进行舍入。例如,如果它能够准确表示 15 位数字,则可以在显示之前四舍五入为 13 位数字。
浮点数旨在用于快速计算。还有其他数据类型,例如 Decimal,可以精确存储数字。例如,它们用于存储货币值。
You get a different result in your calculation because the numbers 1.4 and 2.3 are not represented exactly either. When adding them, you also accumulate their precision limitations.
All floating point numbers have a limited precision, and because of the way that floating point numbers are usually represented internally (using base 2 rather than base 10), the limitations apply to numbers that we humans percieve to be easy to represent exactly.
The limited precision is rarely a problem for calculations, as the precision is still enough for most applications. When comparing floating point numbers on the other hand, the limited precision has to be considered.
This is usually done by subtracting the numbers, and checking if the difference is small enough compared to the numbers.
So, for exmample, if:
then you could consider them equal. How many digits you want to consider depends on the precision of the floating point number, i.e. if you are using single or double precision numbers, and what calculations you have done to reach the numbers. As the precision limits accumulate with each calculation, you might need to lower the threshold for when they are considered equal.
When displaying a floating point number, it's rounded corrseponding to it's precision. If for example it's capable of representing 15 digits accurately, it could be rounded to 13 digits before being displayed.
Floating point numbers are intended for fast calculations. There are other data types, like Decimal, that can store a number exactly. Those are used for example for storing currency values.
浮点数不精确;这是表示方法的一个方面。有很多关于其确切原因的背景信息;可以说,这在几乎所有提供浮点数的平台上都是一个问题。
处理不精确的最好方法是有一个置信区间;也就是说,比较两个计算出的浮点数的等价性可能会出现问题,因为表示形式可能会有微小的偏差,因此处理此问题的方法是将它们两者相减,并确保差异不超过一个小值数量。许多库已经为浮点数内置了这种功能,但是当有疑问时,自己实现并不特别困难。
Floating point numbers are imprecise; it's a facet of the representation method. There's a lot of back information about precisely why this is; suffice it to say that it's an issue on pretty much any platform that provides floating point numbers.
The best way to deal with the imprecision is to have a confidence interval; that is, comparison of two calculated floats for equivalency can be problematic because the representations can be off by a tiny amount, so the way to deal with this is to subtract the two of them, and make sure the difference is no more than a small quantity. Many libraries already have this sort of functionality built in for floats, but it's not particularly hard to implement yourself when in doubt.
本讲座很好地了解了变量如何存储在内存中,并且教授提供了一个示例,可以给出您所看到的意想不到的结果。
http://www.youtube.com/watch?v=jTSvthW34GU
如果您需要比较数字,请先将它们都转换为整数,并且如果您执行测试,您会发现它们相等。
This lecture is a pretty good insight to how the variables are stored in-memory and the professor includes an example that would give the unexpected results you are seeing.
http://www.youtube.com/watch?v=jTSvthW34GU
If you need to compare the numbers cast them both as integers first and you will notice that they do equal if you perform the test.
所有数字都存储在有限的位数上,因此您不能只存储实际数字,而必须存储最接近的数字(想象一个分数
1 /3
,如果你想用十进制数将其存储在纸上,你将耗尽世界上的树木资源)。另一种方法是符号表示,例如您可以在 Mathematica 中找到,它只是将1/3
存储为1
和3
,但距离很远来自机器,使计算更慢、更复杂。看看人们在这里发布的一些链接并阅读有关浮点数的内容……但这有点可怕,您将不再信任机器。
All numbers are stored on a limited numbers of bits, hence you cannot just store the actual number and have to live with storing the closest number (imagine a fraction
1/3
, if you want to store it on paper using decimal numbers, you will run out of world's recourses of trees). The alternative is symbolic representation you can find for example in Mathematica, which is just storing1/3
as1
and3
, but it's far away from machine and makes computations slower and more complicated.Take a look at some links people are posting here and read about floating point numbers... it's a little bit scary though and you won't trust machines anymore.