人群分割算法

发布于 2024-11-27 23:01:36 字数 380 浏览 0 评论 0原文

我有 50 个有序整数 (1,2,3,..,50),我寻找一种通用方法来将其切片“n”种方式(“n”是范围从 1 到 25 的截止点数量)维持元素的顺序。

例如,对于 n=1(一个截止点),有 49 种可能的分组选择([1,2-49]、[1-2,3-50]、[1-3,4-50]、... )。对于 n=2(两个截止点),分组替代方案如下:[1,2,3-50],[1,2-3,4-50],...

您能否推荐任何通用算法来有效地完成这项任务吗?

谢谢, 克里斯


感谢大家的反馈。我查看了您的所有评论,并且正在研究一个通用解决方案,该解决方案将返回所有数字的所有组合(例如,[1,2,3-50]、[1,2-3,4-50],...)的截止点。

再次感谢, 克里斯

I have a population of 50 ordered integers (1,2,3,..,50) and I look for a generic way to slice it "n" ways ("n" is the number of cutoff points ranging from 1 to 25) that maintains the order of the elements.

For example, for n=1 (one cutoff point) there are 49 possible grouping alternatives ([1,2-49], [1-2,3-50], [1-3,4-50],...). For n=2 (two cutoff points), the grouping alternatives are like: [1,2,3-50], [1,2-3,4-50],...

Could you recommend any general-purpose algorithm to complete this task in an efficient way?

Thanks,
Chris


Thanks everyone for your feedback. I reviewed all your comments and I am working on a generic solution that will return all combinations (e.g., [1,2,3-50], [1,2-3,4-50],...) for all numbers of cutoff points.

Thanks again,
Chris

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

岁月静好 2024-12-04 23:01:36

令序列长度为N,切片数量为n

当您注意到选择对 n 个切片进行切片相当于从 N - 1 个可能的分割点(分割点位于每个分割点之间)中选择 n - 1 时,这个问题就变得更容易了。序列中的两个数字)。因此存在(N - 1 选择n - 1)这样的切片。

要生成所有切片(到 n 个切片),您必须生成从 1 到 N - 1 的所有 n - 1 数字元素子集。

此问题的确切算法放在此处: 如何在java中从大小为n的集合中迭代生成k个元素子集?

Let sequence length be N, and number of slices n.

That problem becomes easier when you notice that, choosing a slicing to n slices is equivalent to choosing n - 1 from N - 1 possible split points (a split point is between every two numbers in the sequence). Hence there is (N - 1 choose n - 1) such slicings.

To generate all slicings (to n slices), you have to generate all n - 1 element subsets of numbers from 1 to N - 1.

The exact algorithm for this problem is placed here: How to iteratively generate k elements subsets from a set of size n in java?

ぺ禁宫浮华殁 2024-12-04 23:01:36

您是否需要截止值,或者您只是在计算它们。如果你只是想计算它们,那么很简单:

1 个截止值 = (n-1) 个选项

2 个截止值 = (n-1)*(n-2)/2 个选项

3 个截止值 = (n-1)(n-2)(n-3)/4 个选项,

您可以在此处看到模式

如果您确实需要截止,那么您必须实际执行循环,但由于 n 太小,埃米利奥是对的, 只是暴力破解它。

1截断

for(i=1,i<n;++i)
  cout << i;

2截断

for(i=1;<i<n;++i)
  for(j=i+1,j<n;++j)
    cout << i << " " << j;

再3截断

for(i=1;<i<n;++i)
  for(j=i+1,j<n;++j)
    for(k=j+1,k<n;++k)
      cout << i << " " << j << " " << k;

,就可以看到图案了

Do you need the cutoffs, or are you just counting them. If you're just going to count them, then it's simple:

1 cutoff = (n-1) options

2 cutoffs = (n-1)*(n-2)/2 options

3 cutoffs = (n-1)(n-2)(n-3)/4 options

you can see the patterns here

If you actually need the cutoffs, then you have to actually do the loops, but since n is so small, Emilio is right, just brute force it.

1 cutoff

for(i=1,i<n;++i)
  cout << i;

2 cutoffs

for(i=1;<i<n;++i)
  for(j=i+1,j<n;++j)
    cout << i << " " << j;

3 cutoffs

for(i=1;<i<n;++i)
  for(j=i+1,j<n;++j)
    for(k=j+1,k<n;++k)
      cout << i << " " << j << " " << k;

again, you can see the pattern

橘香 2024-12-04 23:01:36

所以你想以所有可能的方式从 49 个选择中选择 25 个分割点。有很多众所周知的算法这样做。

我想提请您注意这个问题的另一面。有 49!/(25!*(49-25)!) = 63 205 303 218 876 >= 2^45 ~= 10^13 个不同 组合。所以如果要存储的话,需要的内存量是32TB * sizeof(Combination)。我估计会突破1PB大关。

现在假设您想要动态处理生成的数据。让我们乐观假设您可以处理 100 万个组合每秒(这里我假设没有并行化)。所以这个任务需要 10^7 秒 = 2777 小时 = 115 天。

这个问题比乍一看更复杂。如果你想在合理的时间内在家解决这个问题,我的建议是改变策略或者等待量子计算机的进步。

So you want to select 25 split point from 49 choices in all possible ways. There are a lot of well known algorithms to do that.

I want to draw your attention to another side of this problem. There are 49!/(25!*(49-25)!) = 63 205 303 218 876 >= 2^45 ~= 10^13 different combinations. So if you want to store it, the required amount of memory is 32TB * sizeof(Combination). I guess that it will pass 1 PB mark.

Now lets assume that you want to process generated data on the fly. Lets make rather optimistic assumption that you can process 1 million combinations per second (here i assume that there is no parallelization). So this task will take 10^7 seconds = 2777 hours = 115 days.

This problem is more complicated than it seems at first glance. If you want to solve if at home in reasonable time, my suggestion is to change the strategy or wait for the advance of quantum computers.

月依秋水 2024-12-04 23:01:36

这将生成一个所有范围的数组,但我警告你,由于结果数量很大(50个元素,3个分割是49 * 48 * 47 = 110544),它会占用大量内存,我什至没有尝试编译它,所以可能有错误,但这是我使用的通用算法。

typedef std::vector<int>::iterator iterator_t;
typedef std::pair<iterator_t, iterator_t> range_t;
typedef std::vector<range_t> answer_t;
answer_t F(std::vector<int> integers, int slices) {
    answer_t prev;  //things to slice more
    answer_t results; //thin
    //initialize results for 0 slices
    results.push_back(answer(range(integers.begin(), integers.end()), 1));
    //while there's still more slicing to do
    while(slices--) {  
        //move "results" to the "things to slice" pile
        prev.clear(); 
        prev.swap(results); 
        //for each thing to slice
        for(int group=0; group<prev.size(); ++group) { 
            //for each range
            for(int crange=0; crange<prev[group].size(); ++crange) { 
                //for each place in that range
                for(int newsplit=0; newsplit<prev[group][crange].size(); ++newsplit) { 
                    //copy the "result"
                    answer_t cur = prev[group];
                    //slice it
                    range_t L = range(cur[crange].first, cur[crange].first+newsplit);
                    range_t R = range(cur[crange].first+newsplit), cur[crange].second);
                    answer_t::iterator loc = cur.erase(cur.begin()+crange);
                    cur.insert(loc, R);
                    cur.insert(loc, L);
                    //add it to the results
                    results.push_back(cur);
                }
            }
        }
    }
    return results;
}

This will generate an array of all the ranges, but I warn you, it'll take tons of memory, due to the large numbers of results (50 elements with 3 splits is 49*48*47=110544) I haven't even tried to compile it, so there's probably errors, but this is the general algorithm I'd use.

typedef std::vector<int>::iterator iterator_t;
typedef std::pair<iterator_t, iterator_t> range_t;
typedef std::vector<range_t> answer_t;
answer_t F(std::vector<int> integers, int slices) {
    answer_t prev;  //things to slice more
    answer_t results; //thin
    //initialize results for 0 slices
    results.push_back(answer(range(integers.begin(), integers.end()), 1));
    //while there's still more slicing to do
    while(slices--) {  
        //move "results" to the "things to slice" pile
        prev.clear(); 
        prev.swap(results); 
        //for each thing to slice
        for(int group=0; group<prev.size(); ++group) { 
            //for each range
            for(int crange=0; crange<prev[group].size(); ++crange) { 
                //for each place in that range
                for(int newsplit=0; newsplit<prev[group][crange].size(); ++newsplit) { 
                    //copy the "result"
                    answer_t cur = prev[group];
                    //slice it
                    range_t L = range(cur[crange].first, cur[crange].first+newsplit);
                    range_t R = range(cur[crange].first+newsplit), cur[crange].second);
                    answer_t::iterator loc = cur.erase(cur.begin()+crange);
                    cur.insert(loc, R);
                    cur.insert(loc, L);
                    //add it to the results
                    results.push_back(cur);
                }
            }
        }
    }
    return results;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文