所有可能的组合。更快的方法

发布于 2024-12-12 02:03:43 字数 673 浏览 0 评论 0原文

我有一个 1 到 100 之间的数字向量(这并不重要),它的大小可以是 3 到 1.000.000 之间的值。

如果有人可以帮助我从该向量中获得 3 个值唯一*组合。

*唯一

示例:我在数组中有以下值:1[0] 5[1] 7[2] 8[3] 7[4]([x] 是索引)

在本例中为 1[0] 5 [1] 7[2] 和 1[3] 5[1] 7[4] 不同,但 1[0] 5[1] 7[2] 和 7[2] 1[0] 5[1]是相同的(重复)

当我处理很多值时(例如 1.000.000),我的算法有点慢。所以我想要的是一种更快的方法。

           for(unsigned int x = 0;x<vect.size()-2;x++){
                for(unsigned int y = x+1;y<vect.size()-1;y++){
                    for(unsigned int z = y+1;z<vect.size();z++)
                    {

                        // do thing with vect[x],vect[y],vect[z]
                    }
                }
            }

I have a vector of numbers between 1 and 100(this is not important) which can take sizes between 3 and 1.000.000 values.

If anyone can help me getting 3 value unique* combinations from that vector.

*Unique

Example: I have in the array the following values: 1[0] 5[1] 7[2] 8[3] 7[4] (the [x] is the index)

In this case 1[0] 5[1] 7[2] and 1[3] 5[1] 7[4] are different, but 1[0] 5[1] 7[2] and 7[2] 1[0] 5[1] are the same(duplicate)

My algorithm is a little slow when i work with a lot of values(example 1.000.000). So what i want is a faster way to do it.

           for(unsigned int x = 0;x<vect.size()-2;x++){
                for(unsigned int y = x+1;y<vect.size()-1;y++){
                    for(unsigned int z = y+1;z<vect.size();z++)
                    {

                        // do thing with vect[x],vect[y],vect[z]
                    }
                }
            }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

枕花眠 2024-12-19 02:03:43

事实上,您的值在 1 到 100 之间非常非常重要!因为对于大小为 1,000,000 的向量,您有很多相等的数字,并且您不需要检查所有这些数字!您可以执行以下操作:

注意:以下代码只是一个概要!它可能缺乏足够的错误检查,只是为了给您提供想法,而不是复制粘贴!

注2:当我写答案时,我假设数字在[0, 99]范围内。然后我读到它们实际上在 [1, 100] 中。显然这不是问题,您可以将所有数字设置为 -1,甚至更好,将所有 100 更改为 101。

bool exists[100] = {0};  // exists[i] means whether i exists in your vector

for (unsigned int i = 0, size = vect.size(); i < size; ++i)
    exists[vect[i]] = true;

然后,您可以执行与之前类似的操作:

for(unsigned int x = 0; x < 98; x++)
  if (exists[x])
    for(unsigned int y = x+1; y < 99; y++)
      if (exists[y])
        for(unsigned int z = y+1; z < 100; z++)
          if (exists[z])
          {
            // {x, y, z} is an answer
          }

您可以做的另一件事是花更多时间准备,以减少生成对的时间。例如:

int nums[100];  // from 0 to count are the numbers you have
int count = 0;

for (unsigned int i = 0, size = vect.size(); i < size; ++i)
{
  bool exists = false;
  for (int j = 0; j < count; ++j)
    if (vect[i] == nums[j])
    {
      exists = true;
      break;
    }
  if (!exists)
    nums[count++] = vect[i];
}

那么

for(unsigned int x = 0; x < count-2; x++)
  for(unsigned int y = x+1; y < count-1; y++)
    for(unsigned int z = y+1; z < count; z++)
    {
      // {nums[x], nums[y], nums[z]} is an answer
    }

让我们将 100 视为一个变量,因此我们将其称为 k,数组中存在的实际数字为 m(小于或等于到k)。

使用第一种方法,您需要 O(n) 准备工作和 O(m^2*k) 操作来搜索值,速度相当快。

在第二种方法中,您需要 O(nm) 准备工作和 O(m^3) 来生成值。鉴于您的 nm 值,准备时间太长。

实际上,您可以合并这两种方法以获得两全其美的效果,如下所示:

int nums[100];           // from 0 to count are the numbers you have
int count = 0;
bool exists[100] = {0};  // exists[i] means whether i exists in your vector

for (unsigned int i = 0, size = vect.size(); i < size; ++i)
{
  if (!exists[vect[i]])
    nums[count++] = vect[i];
  exists[vect[i]] = true;
}

然后:

for(unsigned int x = 0; x < count-2; x++)
  for(unsigned int y = x+1; y < count-1; y++)
    for(unsigned int z = y+1; z < count; z++)
    {
      // {nums[x], nums[y], nums[z]} is an answer
    }

此方法具有 O(n) 准备工作和 O(m^3) 寻找独特三元组的成本。

编辑:事实证明,对于OP来说,不同位置的相同数字被认为是不同的值。如果真是这样的话,那么抱歉,没有更快的解决办法了。原因是所有可能的组合本身都是 C(n, m) (这是一个 组合),尽管您在 O(1) 中生成它们中的每一个,但它对您来说仍然太大了。

In fact it is very very important that your values are between 1 and 100! Because with a vector of size 1,000,000 you have a lot of numbers that are equal and you don't need to inspect all of them! What you can do is the following:

Note: the following code is just an outline! It may lack sufficient error checking and is just here to give you the idea, not for copy paste!

Note2: When I wrote the answer, I assumed the numbers to be in the range [0, 99]. Then I read that they are actually in [1, 100]. Obviously this is not a problem and you can either -1 all the numbers or even better, change all the 100s to 101s.

bool exists[100] = {0};  // exists[i] means whether i exists in your vector

for (unsigned int i = 0, size = vect.size(); i < size; ++i)
    exists[vect[i]] = true;

Then, you do similar to what you did before:

for(unsigned int x = 0; x < 98; x++)
  if (exists[x])
    for(unsigned int y = x+1; y < 99; y++)
      if (exists[y])
        for(unsigned int z = y+1; z < 100; z++)
          if (exists[z])
          {
            // {x, y, z} is an answer
          }

Another thing you can do is spend more time in preparation to have less time generating the pairs. For example:

int nums[100];  // from 0 to count are the numbers you have
int count = 0;

for (unsigned int i = 0, size = vect.size(); i < size; ++i)
{
  bool exists = false;
  for (int j = 0; j < count; ++j)
    if (vect[i] == nums[j])
    {
      exists = true;
      break;
    }
  if (!exists)
    nums[count++] = vect[i];
}

Then

for(unsigned int x = 0; x < count-2; x++)
  for(unsigned int y = x+1; y < count-1; y++)
    for(unsigned int z = y+1; z < count; z++)
    {
      // {nums[x], nums[y], nums[z]} is an answer
    }

Let us consider 100 to be a variable, so let's call it k, and the actual numbers present in the array as m (which is smaller than or equal to k).

With the first method, you have O(n) preparation and O(m^2*k) operations to search for the value which is quite fast.

In the second method, you have O(nm) preparation and O(m^3) for generation of the values. Given your values for n and m, the preparation takes too long.

You could actually merge the two methods to get the best of both worlds, so something like this:

int nums[100];           // from 0 to count are the numbers you have
int count = 0;
bool exists[100] = {0};  // exists[i] means whether i exists in your vector

for (unsigned int i = 0, size = vect.size(); i < size; ++i)
{
  if (!exists[vect[i]])
    nums[count++] = vect[i];
  exists[vect[i]] = true;
}

Then:

for(unsigned int x = 0; x < count-2; x++)
  for(unsigned int y = x+1; y < count-1; y++)
    for(unsigned int z = y+1; z < count; z++)
    {
      // {nums[x], nums[y], nums[z]} is an answer
    }

This method has O(n) preparation and O(m^3) cost to find the unique triplets.

Edit: It turned out that for the OP, the same number in different locations are considered different values. If that is really the case, then I'm sorry, there is no faster solution. The reason is that all the possible combinations themselves are C(n, m) (That's a combination) that although you are generating each one of them in O(1), it is still too big for you.

一影成城 2024-12-19 02:03:43

实际上没有什么可以加速你那里的循环体。考虑到向量大小为 1M,您将进行一万亿次循环迭代。

生成这样的所有组合是一个指数问题,这意味着当输入大小变得足够大时,您将无法实际解决它。如果可能的话,您唯一的选择是利用应用程序的特定知识(您需要结果的目的以及它们将如何使用)来“解决”问题。

There's really nothing that can be done to speed up the loop body you have there. Consider that with 1M vector size, you are making one trillion loop iterations.

Producing all combinations like that is an exponential problem, which means that you won't be able to practically solve it when the input size becomes large enough. Your only option would be to leverage specific knowledge of your application (what you need the results for, and how exactly they will be used) to "work around" the issue if possible.

浮生未歇 2024-12-19 02:03:43

也许您可以对输入进行排序,使其唯一,并在 a a 时选择 x[a]、x[b] 和 x[c]。 b< c.排序的时间复杂度为 O(n log n),选择组合的时间复杂度为 O(n3)。不过,您需要迭代的三元组仍然会更少:

std::vector<int> x = original_vector;
std::sort(x.begin(), x.end());
std::erase(std::unique(x.begin(), x.end()), x.end());
for(a = 0; a < x.size() - 2; ++a)
  for(b=a+1; b < x.size() - 1; ++b)
     for(c=b+1; c< x.size(); ++c
        issue triplet(x[a],x[b],x[c]);

Possibly you can sort your input, make it unique, and pick x[a], x[b] and x[c] when a < b < c. The sort will be O(n log n) and picking the combination will be O(n³). Still you will have less triplets to iterate over:

std::vector<int> x = original_vector;
std::sort(x.begin(), x.end());
std::erase(std::unique(x.begin(), x.end()), x.end());
for(a = 0; a < x.size() - 2; ++a)
  for(b=a+1; b < x.size() - 1; ++b)
     for(c=b+1; c< x.size(); ++c
        issue triplet(x[a],x[b],x[c]);
萌辣 2024-12-19 02:03:43

根据您的实际数据,您可以通过首先创建一个每个值最多包含三个条目的向量并对其进行迭代来显着加快速度。

Depending on your actual data, you may be able to speed it up significantly by first making a vector that has at most three entries with each value and iterate over that instead.

↘紸啶 2024-12-19 02:03:43

正如 r15habh 指出的那样,我认为数组中的值在 1-100 之间这一事实实际上很重要。

您可以执行以下操作:遍历数组,将值读取到唯一的集合中。这个本身的时间复杂度是O(n)。该集合的元素不超过 100 个,这意味着空间复杂度为 O(1)。

现在,由于您需要生成所有 3 项排列,因此您仍然需要 3 个嵌套循环,但您将在最多包含 100 个元素的集合上进行操作,而不是在潜在的巨大数组上进行操作。

总体时间复杂度取决于您的原始数据集。对于小型数据集,时间复杂度将为 O(n^3)。对于大数据集,它将接近 O(n)。

As r15habh pointed out, I think the fact that the values in the array are between 1-100 is in fact important.

Here's what you can do: make one pass through the array, reading values into a unique set. This by itself is O(n) time complexity. The set will have no more than 100 elements, which means O(1) space complexity.

Now since you need to generate all 3-item permutations, you'll still need 3 nested loops, but instead of operating on the potentially huge array, you'll be operating on a set that has at most 100 elements.

Overall time complexity depends on your original data set. For a small data set, time complexity will be O(n^3). For a large data set, it will approach O(n).

我要还你自由 2024-12-19 02:03:43

如果正确理解您的应用程序,那么您可以使用元组,并根据您的要求存储在集合或哈希表中。如果三元组的法线很重要,那么请确保移动三元组,这样可以说最大的元素是第一个,如果法线不重要,那么只需对元组进行排序。使用 boost 和 的版本整数:

#include <set>
#include <algorithm>
#include "boost/tuple/tuple.hpp"
#include "boost/tuple/tuple_comparison.hpp"

int main()
{
    typedef boost::tuple< int, int, int > Tri;
    typedef std::set< Tri > TriSet;
    TriSet storage;
    // 1 duplicate
    int exampleData[4][3] = { { 1, 2, 3 }, { 2, 3, 6 }, { 5, 3, 2 }, { 2, 1, 3 } };
    for( unsigned int i = 0; i < sizeof( exampleData ) / sizeof( exampleData[0] ); ++i )    
    {
        std::sort( exampleData[i], exampleData[i] + ( sizeof( exampleData[i] ) / sizeof( exampleData[i][0] ) ) );
        if( !storage.insert( boost::make_tuple( exampleData[i][0], exampleData[i][1], exampleData[i][2] ) ).second )
            std::cout << "Duplicate!" << std::endl;
        else
            std::cout << "Not duplicate!" << std::endl;
    }
}

If understand your application correctly then you can use a tuple instead, and store in either a set or hash table depending on your requirements. If the normal of the tri matters, then make sure that you shift the tri so that lets say the largest element is first, if normal shouldn't matter, then just sort the tuple. A version using boost & integers:

#include <set>
#include <algorithm>
#include "boost/tuple/tuple.hpp"
#include "boost/tuple/tuple_comparison.hpp"

int main()
{
    typedef boost::tuple< int, int, int > Tri;
    typedef std::set< Tri > TriSet;
    TriSet storage;
    // 1 duplicate
    int exampleData[4][3] = { { 1, 2, 3 }, { 2, 3, 6 }, { 5, 3, 2 }, { 2, 1, 3 } };
    for( unsigned int i = 0; i < sizeof( exampleData ) / sizeof( exampleData[0] ); ++i )    
    {
        std::sort( exampleData[i], exampleData[i] + ( sizeof( exampleData[i] ) / sizeof( exampleData[i][0] ) ) );
        if( !storage.insert( boost::make_tuple( exampleData[i][0], exampleData[i][1], exampleData[i][2] ) ).second )
            std::cout << "Duplicate!" << std::endl;
        else
            std::cout << "Not duplicate!" << std::endl;
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文