所有可能的组合。更快的方法
我有一个 1 到 100 之间的数字向量(这并不重要),它的大小可以是 3 到 1.000.000 之间的值。
如果有人可以帮助我从该向量中获得 3 个值唯一*组合。
*唯一
示例:我在数组中有以下值:1[0] 5[1] 7[2] 8[3] 7[4]([x] 是索引)
在本例中为 1[0] 5 [1] 7[2] 和 1[3] 5[1] 7[4] 不同,但 1[0] 5[1] 7[2] 和 7[2] 1[0] 5[1]是相同的(重复)
当我处理很多值时(例如 1.000.000),我的算法有点慢。所以我想要的是一种更快的方法。
for(unsigned int x = 0;x<vect.size()-2;x++){
for(unsigned int y = x+1;y<vect.size()-1;y++){
for(unsigned int z = y+1;z<vect.size();z++)
{
// do thing with vect[x],vect[y],vect[z]
}
}
}
I have a vector of numbers between 1 and 100(this is not important) which can take sizes between 3 and 1.000.000 values.
If anyone can help me getting 3 value unique* combinations from that vector.
*Unique
Example: I have in the array the following values: 1[0] 5[1] 7[2] 8[3] 7[4] (the [x] is the index)
In this case 1[0] 5[1] 7[2] and 1[3] 5[1] 7[4] are different, but 1[0] 5[1] 7[2] and 7[2] 1[0] 5[1] are the same(duplicate)
My algorithm is a little slow when i work with a lot of values(example 1.000.000). So what i want is a faster way to do it.
for(unsigned int x = 0;x<vect.size()-2;x++){
for(unsigned int y = x+1;y<vect.size()-1;y++){
for(unsigned int z = y+1;z<vect.size();z++)
{
// do thing with vect[x],vect[y],vect[z]
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
事实上,您的值在 1 到 100 之间非常非常重要!因为对于大小为 1,000,000 的向量,您有很多相等的数字,并且您不需要检查所有这些数字!您可以执行以下操作:
注意:以下代码只是一个概要!它可能缺乏足够的错误检查,只是为了给您提供想法,而不是复制粘贴!
注2:当我写答案时,我假设数字在[0, 99]范围内。然后我读到它们实际上在 [1, 100] 中。显然这不是问题,您可以将所有数字设置为 -1,甚至更好,将所有 100 更改为 101。
然后,您可以执行与之前类似的操作:
您可以做的另一件事是花更多时间准备,以减少生成对的时间。例如:
那么
让我们将 100 视为一个变量,因此我们将其称为
k
,数组中存在的实际数字为m
(小于或等于到k
)。使用第一种方法,您需要
O(n)
准备工作和O(m^2*k)
操作来搜索值,速度相当快。在第二种方法中,您需要
O(nm)
准备工作和O(m^3)
来生成值。鉴于您的n
和m
值,准备时间太长。实际上,您可以合并这两种方法以获得两全其美的效果,如下所示:
然后:
此方法具有
O(n)
准备工作和O(m^3)
寻找独特三元组的成本。编辑:事实证明,对于OP来说,不同位置的相同数字被认为是不同的值。如果真是这样的话,那么抱歉,没有更快的解决办法了。原因是所有可能的组合本身都是
C(n, m)
(这是一个 组合),尽管您在O(1)
中生成它们中的每一个,但它对您来说仍然太大了。In fact it is very very important that your values are between 1 and 100! Because with a vector of size 1,000,000 you have a lot of numbers that are equal and you don't need to inspect all of them! What you can do is the following:
Note: the following code is just an outline! It may lack sufficient error checking and is just here to give you the idea, not for copy paste!
Note2: When I wrote the answer, I assumed the numbers to be in the range [0, 99]. Then I read that they are actually in [1, 100]. Obviously this is not a problem and you can either -1 all the numbers or even better, change all the 100s to 101s.
Then, you do similar to what you did before:
Another thing you can do is spend more time in preparation to have less time generating the pairs. For example:
Then
Let us consider 100 to be a variable, so let's call it
k
, and the actual numbers present in the array asm
(which is smaller than or equal tok
).With the first method, you have
O(n)
preparation andO(m^2*k)
operations to search for the value which is quite fast.In the second method, you have
O(nm)
preparation andO(m^3)
for generation of the values. Given your values forn
andm
, the preparation takes too long.You could actually merge the two methods to get the best of both worlds, so something like this:
Then:
This method has
O(n)
preparation andO(m^3)
cost to find the unique triplets.Edit: It turned out that for the OP, the same number in different locations are considered different values. If that is really the case, then I'm sorry, there is no faster solution. The reason is that all the possible combinations themselves are
C(n, m)
(That's a combination) that although you are generating each one of them inO(1)
, it is still too big for you.实际上没有什么可以加速你那里的循环体。考虑到向量大小为 1M,您将进行一万亿次循环迭代。
生成这样的所有组合是一个指数问题,这意味着当输入大小变得足够大时,您将无法实际解决它。如果可能的话,您唯一的选择是利用应用程序的特定知识(您需要结果的目的以及它们将如何使用)来“解决”问题。
There's really nothing that can be done to speed up the loop body you have there. Consider that with 1M vector size, you are making one trillion loop iterations.
Producing all combinations like that is an exponential problem, which means that you won't be able to practically solve it when the input size becomes large enough. Your only option would be to leverage specific knowledge of your application (what you need the results for, and how exactly they will be used) to "work around" the issue if possible.
也许您可以对输入进行排序,使其唯一,并在
a
a
时选择 x[a]、x[b] 和 x[c]。 b< c
.排序的时间复杂度为 O(n log n),选择组合的时间复杂度为 O(n3)。不过,您需要迭代的三元组仍然会更少:Possibly you can sort your input, make it unique, and pick x[a], x[b] and x[c] when
a < b < c
. The sort will be O(n log n) and picking the combination will be O(n³). Still you will have less triplets to iterate over:根据您的实际数据,您可以通过首先创建一个每个值最多包含三个条目的向量并对其进行迭代来显着加快速度。
Depending on your actual data, you may be able to speed it up significantly by first making a vector that has at most three entries with each value and iterate over that instead.
正如 r15habh 指出的那样,我认为数组中的值在 1-100 之间这一事实实际上很重要。
您可以执行以下操作:遍历数组,将值读取到唯一的集合中。这个本身的时间复杂度是O(n)。该集合的元素不超过 100 个,这意味着空间复杂度为 O(1)。
现在,由于您需要生成所有 3 项排列,因此您仍然需要 3 个嵌套循环,但您将在最多包含 100 个元素的集合上进行操作,而不是在潜在的巨大数组上进行操作。
总体时间复杂度取决于您的原始数据集。对于小型数据集,时间复杂度将为 O(n^3)。对于大数据集,它将接近 O(n)。
As r15habh pointed out, I think the fact that the values in the array are between 1-100 is in fact important.
Here's what you can do: make one pass through the array, reading values into a unique set. This by itself is O(n) time complexity. The set will have no more than 100 elements, which means O(1) space complexity.
Now since you need to generate all 3-item permutations, you'll still need 3 nested loops, but instead of operating on the potentially huge array, you'll be operating on a set that has at most 100 elements.
Overall time complexity depends on your original data set. For a small data set, time complexity will be O(n^3). For a large data set, it will approach O(n).
如果正确理解您的应用程序,那么您可以使用元组,并根据您的要求存储在集合或哈希表中。如果三元组的法线很重要,那么请确保移动三元组,这样可以说最大的元素是第一个,如果法线不重要,那么只需对元组进行排序。使用 boost 和 的版本整数:
If understand your application correctly then you can use a tuple instead, and store in either a set or hash table depending on your requirements. If the normal of the tri matters, then make sure that you shift the tri so that lets say the largest element is first, if normal shouldn't matter, then just sort the tuple. A version using boost & integers: