python中float的底层数据结构

发布于 2024-11-24 21:42:01 字数 599 浏览 1 评论 0原文

有一个关于Python中浮点（和精度）的底层数据结构的问题：

>>> b = 1.4 + 2.3
>>> b
3.6999999999999997

>>> c = 3.7
>>> c
3.7000000000000002

>>> print b, c
3.7  3.7

>>> b == c
False

b和c的值似乎与机器相关，它们是最接近目标值的数字，但不完全相同。有人监督我用“Print”得到了“正确”的数字，有人告诉我这是因为 print 是“谎言”，而 Python 选择告诉我们真相，即准确地显示他们存储的内容。

我的问题是：

1、如何说谎？例如，在一个函数中，我们采用两个值，如果它们相同则返回，如果小数位数（精度）未知，我如何才能最好地猜测？就像上面提到的 b 和 c 吗？有没有明确的算法来做到这一点？有人告诉我，如果涉及浮点计算，每种语言（C/C++）都会遇到此类问题，但他们如何“解决”这个问题？

2. 为什么我们不能只存储实际数字而不是存储最接近的数字？这是一种限制还是为了效率而进行的交易？

非常感谢约翰

原文

Got a question regarding to the underlying data structure of float (and precision) in Python:

>>> b = 1.4 + 2.3
>>> b
3.6999999999999997

>>> c = 3.7
>>> c
3.7000000000000002

>>> print b, c
3.7  3.7

>>> b == c
False

it seems the values of b and c are machine dependent, they are the numbers that closest to the target values but not exactly the same numbers. I was supervised that we get the 'right' numbers with 'Print', and someone told me that it was because print 'lies' while Python chose to tell us the truth i.e. showing exactly what they have stored.

And my questions are:

1. How to lie? e.g. in a function we take two values and return if they are the same, how I could have a best guess if the number of decimal(precision) is unknown? like b and c mentioned above? is there a well defined algorithm to do that? I was told that every language (C/C++) will have this kind of issue if we have floating point calculation involved, but how do they 'solve' this?

2. why we cannot just store the actual number instead of storing the closest number? is it a limitation or trading for efficiency?

many thanks
John

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ぃ双果 2024-12-01 21:42:01

要回答第一个问题，请看一下 Python 源代码中的以下（稍微精简的）代码：

#define PREC_REPR       17
#define PREC_STR        12

void PyFloat_AsString(char *buf, PyFloatObject *v) {
    format_float(buf, 100, v, PREC_STR);
}

void PyFloat_AsReprString(char *buf, PyFloatObject *v) {
    format_float(buf, 100, v, PREC_REPR);
}

因此，基本上，repr(float) 将返回一个以 17 位精度格式化的字符串，并且 < code>str(float) 将返回一个 12 位精度的字符串。您可能已经猜到，print 使用 str()，而在解释器中输入变量名称则使用 repr()。由于只有 12 位精度，看起来您得到了“正确”答案，但这只是因为您期望的值和实际值在 12 位以内是相同的。

这是差异的一个简单示例：

>>> str(.1234567890123)
'0.123456789012'
>>> repr(.1234567890123)
'0.12345678901230001'

至于你的第二个问题，我建议你阅读 Python 教程的以下部分：浮点算术：问题和限制

当您以 2 为基数存储 10 进制小数时，它归结为效率、更少的内存和更快的浮点运算。表示，但您确实需要处理不精确性。

正如 JBernardo 在评论中指出的那样，这种行为在 Python 2.7 及更高版本中是不同的，上面教程链接中的以下引用描述了差异（使用 0.1 作为示例）：

在 Python 2.7 和 Python 3.1 之前的版本中，Python 对此进行了四舍五入
值保留 17 位有效数字，即“0.10000000000000001”。在
当前版本，Python 显示基于最短值的值
正确舍入为真实二进制值的小数部分，
结果只是“0.1”。

For the answer to your first question, take a look at the following (slightly condensed) code from Python's source:

#define PREC_REPR       17
#define PREC_STR        12

void PyFloat_AsString(char *buf, PyFloatObject *v) {
    format_float(buf, 100, v, PREC_STR);
}

void PyFloat_AsReprString(char *buf, PyFloatObject *v) {
    format_float(buf, 100, v, PREC_REPR);
}

So basically, repr(float) will return a string formatted with 17 digits of precision, and str(float) will return a string with 12 digits of precision. As you might have guessed, print uses str() and entering the variable name in the interpreter uses repr(). With only 12 digits of precision, it looks like you get the "correct" answer, but that is just because what you expect and the actual value are the same up to 12 digits.

Here is a quick example of the difference:

>>> str(.1234567890123)
'0.123456789012'
>>> repr(.1234567890123)
'0.12345678901230001'

As for your second question, I suggest you read the following section of the Python tutorial: Floating Point Arithmetic: Issues and Limitations

It boils down to efficiency, less memory and quicker floating point operations when you are storing base 10 decimals in base 2 than any other representation, but you do need to deal with the imprecision.

As JBernardo pointed out in comments, this behavior is different in Python 2.7 and above, the following quote from the above tutorial link describes the difference (using 0.1 as an example):

In versions prior to Python 2.7 and Python 3.1, Python rounded this
value to 17 significant digits, giving ‘0.10000000000000001’. In
current versions, Python displays a value based on the shortest
decimal fraction that rounds correctly back to the true binary value,
resulting simply in ‘0.1’.

回复收藏 0 原文

静待花开 2024-12-01 21:42:01

您应该阅读臭名昭著的论文：

每个计算机科学家都应该了解浮动-点算术

单击“CACHED”链接以下载 PDF 格式的论文。

回复收藏 0 原文

稚然 2024-12-01 21:42:01

您在计算中会得到不同的结果，因为数字 1.4 和 2.3 也没有准确表示。添加它们时，您还会累积它们的精度限制。

所有浮点数都具有有限的精度，并且由于浮点数通常在内部表示的方式（使用基数 2 而不是基数 10），这些限制适用于我们人类认为很容易精确表示的数字。

有限的精度对于计算来说很少是问题，因为精度对于大多数应用来说仍然足够。另一方面，在比较浮点数时，必须考虑有限的精度。

这通常是通过将数字相减并检查与数字相比差异是否足够小来完成的。

因此，例如，如果：

abs(b - c) < abs(b) / 1000000000000

那么您可以认为它们相等。您要考虑多少位数字取决于浮点数的精度，即您使用的是单精度还是双精度数字，以及您进行了哪些计算来得出这些数字。随着每次计算的精度限制不断累积，您可能需要降低它们被视为相等时的阈值。

显示浮点数时，会根据其精度进行舍入。例如，如果它能够准确表示 15 位数字，则可以在显示之前四舍五入为 13 位数字。

浮点数旨在用于快速计算。还有其他数据类型，例如 Decimal，可以精确存储数字。例如，它们用于存储货币值。

You get a different result in your calculation because the numbers 1.4 and 2.3 are not represented exactly either. When adding them, you also accumulate their precision limitations.

All floating point numbers have a limited precision, and because of the way that floating point numbers are usually represented internally (using base 2 rather than base 10), the limitations apply to numbers that we humans percieve to be easy to represent exactly.

The limited precision is rarely a problem for calculations, as the precision is still enough for most applications. When comparing floating point numbers on the other hand, the limited precision has to be considered.

This is usually done by subtracting the numbers, and checking if the difference is small enough compared to the numbers.

So, for exmample, if:

abs(b - c) < abs(b) / 1000000000000

then you could consider them equal. How many digits you want to consider depends on the precision of the floating point number, i.e. if you are using single or double precision numbers, and what calculations you have done to reach the numbers. As the precision limits accumulate with each calculation, you might need to lower the threshold for when they are considered equal.

When displaying a floating point number, it's rounded corrseponding to it's precision. If for example it's capable of representing 15 digits accurately, it could be rounded to 13 digits before being displayed.

Floating point numbers are intended for fast calculations. There are other data types, like Decimal, that can store a number exactly. Those are used for example for storing currency values.

回复收藏 0 原文

萌无敌 2024-12-01 21:42:01

浮点数不精确；这是表示方法的一个方面。有很多关于其确切原因的背景信息；可以说，这在几乎所有提供浮点数的平台上都是一个问题。

处理不精确的最好方法是有一个置信区间；也就是说，比较两个计算出的浮点数的等价性可能会出现问题，因为表示形式可能会有微小的偏差，因此处理此问题的方法是将它们两者相减，并确保差异不超过一个小值数量。许多库已经为浮点数内置了这种功能，但是当有疑问时，自己实现并不特别困难。

回复收藏 0 原文

一个人的夜不怕黑 2024-12-01 21:42:01

本讲座很好地了解了变量如何存储在内存中，并且教授提供了一个示例，可以给出您所看到的意想不到的结果。
http://www.youtube.com/watch?v=jTSvthW34GU
如果您需要比较数字，请先将它们都转换为整数，并且如果您执行测试，您会发现它们相等。

回复收藏 0 原文

寂寞笑我太脆弱 2024-12-01 21:42:01

所有数字都存储在有限的位数上，因此您不能只存储实际数字，而必须存储最接近的数字（想象一个分数1 /3，如果你想用十进制数将其存储在纸上，你将耗尽世界上的树木资源）。另一种方法是符号表示，例如您可以在 Mathematica 中找到，它只是将 1/3 存储为 1 和 3，但距离很远来自机器，使计算更慢、更复杂。

看看人们在这里发布的一些链接并阅读有关浮点数的内容……但这有点可怕，您将不再信任机器。