用 C++ 表示概率
我试图用 C++ 表示一组简单的 3 个概率。例如:(
a = 0.1
b = 0.2
c = 0.7
据我所知概率加起来必须为 1)
我的问题是,当我尝试在 C++ 中将 0.7 表示为浮点数时,我最终得到 0.69999999,这在我稍后进行计算时不会有帮助。 0.8、0.80000001 也是如此。
在 C++ 中是否有更好的方法来表示 0.0 到 1.0 之间的数字?
请记住,这与数字在内存中的存储方式有关,因此当对它们是否正确的值进行测试时,我不关心它们如何显示/打印出来。
I'm trying to represent a simple set of 3 probabilities in C++. For example:
a = 0.1
b = 0.2
c = 0.7
(As far as I know probabilities must add up to 1)
My problem is that when I try to represent 0.7 in C++ as a float I end up with 0.69999999, which won't help when I am doing my calculations later. The same for 0.8, 0.80000001.
Is there a better way of representing numbers between 0.0 and 1.0 in C++?
Bear in mind that this relates to how the numbers are stored in memory so that when it comes to doing tests on the values they are correct, I'm not concerned with how they are display/printed out.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
这与 C++ 无关,而与浮点数在内存中的表示方式有关。您永远不应该使用相等运算符来比较浮点值,请参阅此处以获取更好的方法: http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
This has nothing to do with C++ and everything to do with how floating point numbers are represented in memory. You should never use the equality operator to compare floating point values, see here for better methods: http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
真的有问题吗?如果您只需要更高的精度,请使用双精度而不是浮点数。这应该可以得到大约 15 位数字的精度,对于大多数工作来说已经足够了。
考虑您的源数据。 0.7 真的比 0.69999999 更正确吗?
如果是这样,您可以使用有理数库,例如:
http ://www.boost.org/doc/libs/1_40_0/libs/rational/index.html
如果问题是根据定义概率加起来为 1,则将它们存储为数字集合,忽略最后一张。通过从 1 中减去其他值的总和来推断最后一个值。
Is it really a problem? If you just need more precision, use a double instead of a float. That should get you about 15 digits precision, more than enough for most work.
Consider your source data. Is 0.7 really significantly more correct than 0.69999999?
If so, you could use a rational number library such as:
http://www.boost.org/doc/libs/1_40_0/libs/rational/index.html
If the problem is that probabilities add up to 1 by definition, then store them as a collection of numbers, omitting the last one. Infer the last value by subtracting the sum of the others from 1.
您需要多少精度?您可能会考虑缩放值并以定点表示形式对它们进行量化。
How much precision do you need? You might consider scaling the values and quantizing them in a fixed-point representation.
您想要对您的数字进行的测试将是不正确的。
对于像 0.1 这样的数字,在以 2 为基数的数字系统中没有精确的浮点表示,因为它是无限周期数。考虑三分之一,在以 3 为基数的系统中可以精确地表示为 0.1,但在以 10 为基数的系统中则可以表示为 0.333...。
因此,您使用浮点数字 0.1 进行的任何测试都容易出现缺陷。
一个解决方案是使用有理数(boost 有一个有理库),它对于有理数总是精确的,或者使用自制的以 10 为底的系统,将数字乘以 10 的幂。
The tests you want to do with your numbers will be incorrect.
There is no exact floating point representation in a base-2 number system for a number like 0.1, because it is a infinte periodic number. Consider one third, that is exactly representable as 0.1 in a base-3 system, but 0.333... in the base-10 system.
So any test you do with a number 0.1 in floating point will be prone to be flawed.
A solution would be using rational numbers (boost has a rational lib), which will be always exact for, ermm, rationals, or use a selfmade base-10 system by multiplying the numbers with a power of ten.
如果您确实需要精度,并且坚持使用有理数,我想您可以使用定点算术。我以前没有这样做过,所以我不能推荐任何库。
或者,您可以在比较 fp 数字时设置一个阈值,但您必须在一侧或另一侧犯错误 - 请
注意,在每次计算中都会自动截断多余的精度,因此在以许多不同的顺序进行操作时应该小心算法中的大小。一个人为的例子来说明:
vs
两者的结果会非常不同。使用第一种方法,前两个数字的许多精度将在除以 1e20 时丢失。假设您想要的最终值约为 1e20,则第二种方法将为您提供更高的精度。
If you really need the precision, and are sticking with rational numbers, I suppose you could go with a fixed point arithemtic. I've not done this before so I can't recommend any libraries.
Alternatively, you can set a threshold when comparing fp numbers, but you'd have to err on one side or another -- say
Note that excess precision is automatically truncated in each calculation, so you should take care when operating at many different orders of magnitude in your algorithm. A contrived example to illustrate:
versus
The two results will be very different. Using the first method a lot of the precision of the first two numbers will be lost in the divide by 1e20. Assuming that the final value you want is on the order of 1e20, the second method will give you more precision.
如果您只需要几位精度,则只需使用整数。如果您需要更好的精度,那么您将不得不寻找提供精度保证的不同库。
If you only need a few digits of precision then just use an integer. If you need better precision then you'll have to look to different libraries that provide guarantees on precision.
这里的问题是浮点数存储在基数 2 中。您不能用基数 2 的浮点数精确表示基数 10 的小数。
让我们退后一步。 .1 是什么意思?还是.7?它们意味着 1x10-1 和 7x10-1。如果您使用 二进制 作为您的号码,而不是像我们通常那样以 10 为基数,. 1 表示 1x2-1 或 1/2。 .11 表示 1x2-1 + 1x2-2,或 1/2+1/4,或 3/4。
请注意,在这个系统中,分母始终是 2 的幂。如果分母不为 2 的幂,则无法用有限位数表示一个数字。例如,.1(十进制)表示 1/10,但在二进制中表示无限重复分数,0.000110011...(0011 模式永远重复)。这类似于在 10 进制中,1/3 是无限分数,0.3333....;以 10 为底只能精确地表示分母为 2 和 5 的倍数的数字。(顺便说一句,以 12 为底和以 60 为底实际上是非常方便的底数,因为 12 可以被 2、3 和 4 整除,并且60 可以被 2、3、4 和 5 整除;但出于某种原因,我们无论如何都使用十进制,而在计算机中我们使用二进制)。
由于浮点数(或定点数)的位数总是有限的,因此它们无法准确地表示这些无限重复的分数。因此,它们要么截断或四舍五入值,使其尽可能接近真实值,但并不完全等于真实值。一旦开始将这些舍入值相加,就会开始出现更多错误。以十进制表示,如果 1/3 的表示形式是 0.333,那么它的三个副本加起来将是 0.999,而不是 1。
有四种可能的解决方案。如果您关心的只是精确表示小数,例如 0.1 和 0.7(例如,您不关心 1/3 会遇到您提到的相同问题),那么您可以将您的数字表示为小数,例如使用二进制编码的十进制,并操纵它们。这是金融领域的常见解决方案,其中许多运算都是用十进制定义的。这样做的缺点是您需要自己实现所有算术运算,而没有计算机 FPU 的好处,或者找到一个 十进制算术库。正如前面提到的,这对于无法精确表示为十进制的分数也没有帮助。
另一种解决方案是使用分数来表示数字。如果您使用分数,并使用 bignum(任意大的数字)作为分子和分母,则可以表示适合计算机内存的任何有理数。同样,缺点是算术会比较慢,您需要自己实现算术或使用现有的库 。这将解决所有有理数的问题,但如果您最终得到基于 π 或 √2 计算的概率,您仍然会遇到无法准确表示它们的相同问题,并且还需要使用一个后面的解决方案。
第三种解决方案,如果您关心的只是让数字加起来正好为 1,那么对于有 n 种可能性的事件,只存储 n 的值-1 个概率,并将最后一个概率计算为 1 减去其余概率的总和。
第四种解决方案是在处理浮点数(或任何不精确的数字,例如用于表示无理数的分数)时始终需要记住的事情,并且永远不要比较两个数字是否相等。同样以 10 为基数,如果将 3 个 1/3 的副本相加,最终将得到 0.999。当你想将该数字与 1 进行比较时,你必须比较它是否足够接近 1;检查差值的绝对值 1-.999 是否小于阈值,例如 0.01。
The issue here is that floating point numbers are stored in base 2. You can not exactly represent a decimal in base 10 with a floating point number in base 2.
Lets step back a second. What does .1 mean? Or .7? They mean 1x10-1 and 7x10-1. If you're using binary for your number, instead of base 10 as we normally do, .1 means 1x2-1, or 1/2. .11 means 1x2-1 + 1x2-2, or 1/2+1/4, or 3/4.
Note how in this system, the denominator is always a power of 2. You cannot represent a number without a denominator that is a power of 2 in a finite number of digits. For instance, .1 (in decimal) means 1/10, but in binary that is an infinite repeating fraction, 0.000110011... (with the 0011 pattern repeating forever). This is similar to how in base 10, 1/3 is an infinite fraction, 0.3333....; base 10 can only represent numbers exactly with a denominator that is a multiple of powers of 2 and 5. (As an aside, base 12 and base 60 are actually really convenient bases, since 12 is divisible by 2, 3, and 4, and 60 is divisible by 2, 3, 4, and 5; but for some reason we use decimal anyhow, and we use binary in computers).
Since floating point numbers (or fixed point numbers) always have a finite number of digits, they cannot represent these infinite repeating fractions exactly. So, they either truncate or round the values to be as close as possible to the real value, but are not equal to the real value exactly. Once you start adding up these rounded values, you start getting more error. In decimal, if your representation of 1/3 is .333, then three copies of that will add up to .999, not 1.
There are four possible solutions. If all you care about is exactly representing decimal fractions like .1 and .7 (as in, you don't care that 1/3 will have the same problem you mention), then you can represent your numbers as decimal, for instance using binary coded decimal, and manipulate those. This is a common solution in finance, where many operations are defined in terms of decimal. This has the downside that you will need to implement all of your own arithmetic operations yourself, without the benefits of the computer's FPU, or find a decimal arithmetic library. This also, as mentioned, does not help with fractions that can't be represented exactly in decimal.
Another solution is to use fractions to represent your numbers. If you use fractions, with bignums (arbitrarily large numbers) for your numerators and denominators, you can represent any rational number that will fit in the memory of your computer. Again, the downside is that arithmetic will be slower, and you'll need to implement arithmetic yourself or use an existing library. This will solve your problem for all rational numbers, but if you wind up with a probability that is computed based on π or √2, you will still have the same issues with not being able to represent them exactly, and need to also use one of the later solutions.
A third solution, if all you care about is getting your numbers to add up to 1 exactly, is for events where you have n possibilities, to only store the values of n-1 of those probabilities, and compute the probability of the last as 1 minus the sum of the rest of the probabilities.
And a fourth solution is to do what you always need to remember when working with floating point numbers (or any inexact numbers, such as fractions being used to represent irrational numbers), and never compare two numbers for equality. Again in base 10, if you add up 3 copies of 1/3, you will wind up with .999. When you want to compare that number to 1, you have to instead compare to see if it is close enough to 1; check that the absolute value of the difference, 1-.999, is less than a threshold, such as .01.
二进制机器总是将小数部分(除 .0 和 .5、.25、.75 等)四舍五入为没有精确浮点表示形式的值。这与C++语言无关。除了在代码中从数字角度处理它之外,没有真正的解决方法。
至于实际产生您寻求的概率:
Binary machines always round decimal fractions (except .0 and .5, .25, .75, etc) to values that don't have an exact representation in floating point. This has nothing to do with the language C++. There is no real way around it except to deal with it from a numerical perspective within your code.
As for actually producing the probabilities you seek:
很遗憾,您的问题并没有一个简单的答案。
它属于一个名为“数值分析”的研究领域,该领域处理这些类型的问题(这远远超出了确保您不检查两个浮点值之间是否相等的范围)。就研究领域而言,我的意思是有大量的书籍、期刊文章、课程等涉及它。有人以此为主题撰写博士学位论文。
我只能说,我很庆幸自己不必过多处理这些问题,因为问题和解决方案通常非常不直观。
您可能需要做什么来处理表示您正在处理的数字和计算,这在很大程度上取决于您正在执行的操作、这些操作的顺序以及您期望在这些操作中处理的值的范围。
I'm sorry to say there's not really an easy answer to your problem.
It falls into a field of study called "Numerical Analysis" that deals with these types of problems (which goes far beyond just making sure you don't check for equality between 2 floating point values). And by field of study, I mean there are a slew of books, journal articles, courses etc. dealing with it. There are people who do their PhD thesis on it.
All I can say is that that I'm thankful I don't have to deal with these issues very much, because the problems and the solutions are often very non-intuitive.
What you might need to do to deal with representing the numbers and calculations you're working on is very dependent on exactly what operations you're doing, the order of those operations and the range of values that you expect to deal with in those operations.
根据您的应用程序的要求,以下几种解决方案中的任何一种都可能是最好的:
您生活在固有的精度缺乏中,并使用浮点数或双精度数。您无法测试其中任何一个是否相等,这意味着您无法测试概率之和与 1.0 相等。
正如之前所建议的,如果需要固定精度,可以使用整数。您将 0.7 表示为 7,0.1 表示为 1,0.2 表示为 2,它们相加将完美等于 10,即 1.0。如果您必须使用概率进行计算,尤其是进行除法和乘法时,您需要正确舍入结果。这将再次引入不精确性。
用一对整数 (1,2) = 1/2 = 0.5 将您的数字表示为分数。精确,比 2) 更灵活,但你不想用这些来计算。
您可以一直使用实现有理数的库(例如 gmp)。精确,任意精度,您可以用它进行计算,但速度很慢。
Depending on the requirements of your applications any one of several solutions could be best:
You live with the inherent lack of precision and use floats or doubles. You cannot test either for equality and this implies that you cannot test the sum of your probabilities for equality with 1.0.
As proposed before, you can use integers if you require a fixed precision. You represent 0.7 as 7, 0.1 as 1, 0.2 as 2 and they will add up perfectly to 10, i.e., 1.0. If you have to calculate with your probabilities, especially if you do division and multiplication, you need to round the results correctly. This will introduce an imprecision again.
Represent your numbers as fractions with a pair of integers (1,2) = 1/2 = 0.5. Precise, more flexible than 2) but you don't want to calculate with those.
You can go all the way and use a library that implements rational numbers (e.g. gmp). Precise, with arbitrary precision, you can calculate with it, but slow.
是的,如果您担心这些事情,我会缩放数字(0-100)(0-1000)或您需要的任何固定大小。在大多数情况下,它还可以加快数学计算速度。回到过去的糟糕日子,我们会以整数形式定义整个余弦/正弦表和其他此类布莱赫,以减少浮动模糊并提高计算速度。
我确实发现“0.7”在存储上的模糊程度有点有趣。
yeah, I'd scale the numbers (0-100)(0-1000) or whatever fixed size you need if you're worried about such things. It also makes for faster math computation in most cases. Back in the bad-old-days, we'd define entire cos/sine tables and other such bleh in integer form to reduce floating fuzz and increase computation speed.
I do find it a bit interesting that a "0.7" fuzzes like that on storage.