计算组合的数量
干杯,
我知道你可以用下面的公式得到组合的数量(不重复,顺序并不重要):
// Choose r from n n! / r!(n - r)!
但是,我不知道如何在 C++ 中实现这个,因为例如
n = 52 n! = 8,0658175170943878571660636856404e+67
数字变得太大了对于unsigned __int64
(或unsigned long long
)。是否有一些解决方法可以在没有任何第三方“bigint”库的情况下实现该公式?
Cheers,
I know you can get the amount of combinations with the following formula (without repetition and order is not important):
// Choose r from n n! / r!(n - r)!
However, I don't know how to implement this in C++, since for instance with
n = 52 n! = 8,0658175170943878571660636856404e+67
the number gets way too big even for unsigned __int64
(or unsigned long long
). Is there some workaround to implement the formula without any third-party "bigint" -libraries?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
这是一个古老的算法,它是精确的,并且不会溢出,除非结果对于
long long
来说太大。该算法也出现在 Knuth 的“计算机编程艺术,第 3 版,第 2 卷:半数值算法”中“ 我认为。
更新:算法在线上溢出的可能性很小:
对于非常大的n。简单的上限是 sqrt(std::numeric_limits::max()),这意味着
n
大约小于 4,000,000,000。Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a
long long
This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.
UPDATE: There's a small possibility that the algorithm will overflow on the line:
for very large n. A naive upper bound is
sqrt(std::numeric_limits<long long>::max())
which means ann
less than rougly 4,000,000,000.来自Andreas的回答:
考虑 n == 67 和 k == 33。上述算法会因 64 位 unsigned long long 溢出。然而,正确答案可以用 64 位表示:14,226,520,737,620,288,370。上述算法对其溢出保持沉默,选择(67, 33)返回:
8,829,174,638,479,413
一个可信但不正确的答案。
然而,只要最终答案是可表示的,就可以稍微修改上述算法,使其永远不会溢出。
诀窍在于认识到每次迭代时,除法 r/d 都是精确的。暂时重写:
准确地说,这意味着如果将 r、n 和 d 展开为它们的质因数分解,那么可以轻松取消 d,并留下 n 的修改值,称为 t,然后进行计算r 的计算公式很简单:
一种快速而简单的方法是找到 r 和 d 的最大公约数,称之为 g:
现在我们可以对 d_temp 和 n 做同样的事情(找到最大公约数)。然而,由于我们先验地知道 r * n / d 是精确的,所以我们也知道 gcd(d_temp, n) == d_temp,因此我们不需要计算它。因此我们可以将 n 除以 d_temp:
清理:
现在您可以计算 select(67, 33) 而不会溢出。如果您尝试选择(68, 33),您将得到一个异常而不是错误的答案。
From Andreas' answer:
Consider n == 67 and k == 33. The above algorithm overflows with a 64 bit unsigned long long. And yet the correct answer is representable in 64 bits: 14,226,520,737,620,288,370. And the above algorithm is silent about its overflow, choose(67, 33) returns:
8,829,174,638,479,413
A believable but incorrect answer.
However the above algorithm can be slightly modified to never overflow as long as the final answer is representable.
The trick is in recognizing that at each iteration, the division r/d is exact. Temporarily rewriting:
For this to be exact, it means if you expanded r, n and d into their prime factorizations, then one could easily cancel out d, and be left with a modified value for n, call it t, and then the computation of r is simply:
A fast and easy way to do this is to find the greatest common divisor of r and d, call it g:
Now we can do the same thing with d_temp and n (find the greatest common divisor). However since we know a-priori that r * n / d is exact, then we also know that gcd(d_temp, n) == d_temp, and therefore we don't need to compute it. So we can divide n by d_temp:
Cleaning up:
Now you can compute choose(67, 33) without overflow. And if you try choose(68, 33), you'll get an exception instead of a wrong answer.
以下例程将使用递归定义和记忆来计算 n-choose-k。该例程极其快速且准确:
The following routine will compute the n-choose-k, using the recursive definition and memoization. The routine is extremely fast and accurate:
请记住
n! /(n-r)! = n * ( n - 1) * .. * (n - r + 1 )
所以它比 n! 小得多。所以解决方案是评估 n* ( n - 1 ) * ... * ( n - r + 1) 而不是先计算 n!然后划分它。
当然,这完全取决于 n 和 r 的相对大小 - 如果 r 与 n 相比相对较大,那么它仍然不适合。
Remember that
n! / ( n - r )! = n * ( n - 1) * .. * (n - r + 1 )
so it's way smaller than n!. So the solution is to evaluate n* ( n - 1 ) * ... * ( n - r + 1) instead of first calculating n! and then dividing it .
Of course it all depends on the relative magnitude of n and r - if r is relatively big compared to n, then it still won't fit.
好吧,我必须回答我自己的问题。我在阅读有关帕斯卡三角形的内容时,偶然注意到我们可以用它计算组合的数量:
Well, I have to answer to my own question. I was reading about Pascal's triangle and by accident noticed that we can calculate the amount of combinations with it:
稍微改进一下 Howard Hinnant 的答案(在这个问题中):
每个循环调用 gcd() 似乎有点慢。
我们可以将 gcd() 调用聚合到最后一个,同时充分利用 Knuth 的书“计算机编程的艺术,第 3 版,第 2 卷:半数值算法”中的标准算法:
请注意,递归深度通常为 2 (我真的没有看到一个情况变成3,组合减少相当不错。),即对于非溢出情况,调用choose() 3次。
如果您愿意,请将 uint64_t 替换为 unsigned long long。
Improves Howard Hinnant's answer (in this question) a little bit:
Calling gcd() per loop seems a bit slow.
We could aggregate the gcd() call into the last one, while making the most use of the standard algorithm from Knuth's book "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms":
Note that the recursive depth is normally 2 (I don't really see a case goes to 3, the combinatorial reducing is quite decent.), i.e. calling choose() for 3 times, for non-overflow cases.
Replace uint64_t with unsigned long long if you prefer it.
获得二项式系数的素因数分解可能是计算它的最有效的方法,特别是在乘法成本昂贵的情况下。这对于计算阶乘的相关问题来说确实如此(例如,请参见单击此处 )。
这是一个基于埃拉托斯特尼筛法的简单算法,用于计算素因数分解。这个想法基本上是在使用筛子找到质数时遍历它们,然后计算它们的倍数中有多少落在 [1, k] 和 [n-k+1,n] 范围内。筛子本质上是一个 O(n \log \log n) 算法,但没有进行乘法运算。一旦找到质因数分解,所需的实际乘法次数最多是 O\left(\frac{n \log \log n}{\log n}\right) 并且可能有比这更快的方法。
Getting the prime factorization of the binomial coefficient is probably the most efficient way to calculate it, especially if multiplication is expensive. This is certainly true of the related problem of calculating factorial (see Click here for example).
Here is a simple algorithm based on the Sieve of Eratosthenes that calculates the prime factorization. The idea is basically to go through the primes as you find them using the sieve, but then also to calculate how many of their multiples fall in the ranges [1, k] and [n-k+1,n]. The Sieve is essentially an O(n \log \log n) algorithm, but there is no multiplication done. The actual number of multiplications necessary once the prime factorization is found is at worst O\left(\frac{n \log \log n}{\log n}\right) and there are probably faster ways than that.
使用带有 long double 的肮脏技巧,可以获得与 Howard Hinnant 相同的精度(甚至可能更高):
这个想法是首先除以 k!然后乘以 n(n-1)...(n-k+1)。通过反转 for 循环的顺序可以避免通过 double 进行近似。
Using a dirty trick with a long double, it is possible to get the same accuracy as Howard Hinnant (and probably more):
The idea is to divide first by k! and then to multiply by n(n-1)...(n-k+1). The approximation through the double can be avoided by inverting the order of the for loop.
最短的方法之一:
One of SHORTEST way :
类似于埃拉托斯特尼筛法的方法。
虽然埃拉托斯特尼的筛子是多重湮灭,
这一次是多重半杀。
由于 n!/((nr)!r!) 始终是整数,因此首先取消分母,然后乘以其余部分。
即使对于非大整数,该算法也能很好地工作。
在自然数序列中,第k个数可以整除第(k的倍数)个数。这可以通过 k=2,3,4,... 连续完成,利用这个事实,首先取消分母,然后乘以余数。这确保了如果答案没有溢出,则在计算过程中也不会溢出。
入山算法
A method similar to the Sieve of Eratosthenes.
While the sieve of Eratosthenes is a multiple annihilation,
this one is a multiple half-kill.
Since n!/((n-r)!r!) is always an integer, first cancel the denominator and then multiply the rest.
This algorithm works well even for non-big integers.
In the sequence of natural numbers, the k-th number can divide the (multiple of k)-th number. This can be done continuously with k=2,3,4,... Taking advantage of this fact, first cancel the denominator and then multiply the remainder. This ensures that if the answer does not overflow, it will not overflow in the course of the calculation.
Iriyama’s algorithm
如果你想百分百确定只要最终结果在数值限制内就不会发生溢出,你可以逐行求帕斯卡三角求和:
但是,这种算法比乘法慢得多。因此,也许您可以使用乘法来生成所有您知道“安全”的情况,然后从那里使用加法。 (..或者你可以只使用 BigInt 库)。
If you want to be 100% sure that no overflows occur so long as the final result is within the numeric limit, you can sum up Pascal's Triangle row-by-row:
However, this algorithm is much slower than the multiplication one. So perhaps you could use multiplication to generate all the cases you know that are 'safe' and then use addition from there. (.. or you could just use a BigInt library).