人群分割算法
我有 50 个有序整数 (1,2,3,..,50),我寻找一种通用方法来将其切片“n”种方式(“n”是范围从 1 到 25 的截止点数量)维持元素的顺序。
例如,对于 n=1(一个截止点),有 49 种可能的分组选择([1,2-49]、[1-2,3-50]、[1-3,4-50]、... )。对于 n=2(两个截止点),分组替代方案如下:[1,2,3-50],[1,2-3,4-50],...
您能否推荐任何通用算法来有效地完成这项任务吗?
谢谢, 克里斯
感谢大家的反馈。我查看了您的所有评论,并且正在研究一个通用解决方案,该解决方案将返回所有数字的所有组合(例如,[1,2,3-50]、[1,2-3,4-50],...)的截止点。
再次感谢, 克里斯
I have a population of 50 ordered integers (1,2,3,..,50) and I look for a generic way to slice it "n" ways ("n" is the number of cutoff points ranging from 1 to 25) that maintains the order of the elements.
For example, for n=1 (one cutoff point) there are 49 possible grouping alternatives ([1,2-49], [1-2,3-50], [1-3,4-50],...). For n=2 (two cutoff points), the grouping alternatives are like: [1,2,3-50], [1,2-3,4-50],...
Could you recommend any general-purpose algorithm to complete this task in an efficient way?
Thanks,
Chris
Thanks everyone for your feedback. I reviewed all your comments and I am working on a generic solution that will return all combinations (e.g., [1,2,3-50], [1,2-3,4-50],...) for all numbers of cutoff points.
Thanks again,
Chris
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
令序列长度为
N
,切片数量为n
。当您注意到选择对 n 个切片进行切片相当于从
N - 1
个可能的分割点(分割点位于每个分割点之间)中选择n - 1
时,这个问题就变得更容易了。序列中的两个数字)。因此存在(N - 1 选择n - 1)这样的切片。要生成所有切片(到 n 个切片),您必须生成从
1 到 N - 1
的所有n - 1
数字元素子集。此问题的确切算法放在此处: 如何在java中从大小为n的集合中迭代生成k个元素子集?
Let sequence length be
N
, and number of slicesn
.That problem becomes easier when you notice that, choosing a slicing to n slices is equivalent to choosing
n - 1
fromN - 1
possible split points (a split point is between every two numbers in the sequence). Hence there is (N - 1 choose n - 1) such slicings.To generate all slicings (to n slices), you have to generate all
n - 1
element subsets of numbers from1 to N - 1
.The exact algorithm for this problem is placed here: How to iteratively generate k elements subsets from a set of size n in java?
您是否需要截止值,或者您只是在计算它们。如果你只是想计算它们,那么很简单:
1 个截止值 = (n-1) 个选项
2 个截止值 = (n-1)*(n-2)/2 个选项
3 个截止值 = (n-1)(n-2)(n-3)/4 个选项,
您可以在此处看到模式
如果您确实需要截止,那么您必须实际执行循环,但由于 n 太小,埃米利奥是对的, 只是暴力破解它。
1截断
2截断
再3截断
,就可以看到图案了
Do you need the cutoffs, or are you just counting them. If you're just going to count them, then it's simple:
1 cutoff = (n-1) options
2 cutoffs = (n-1)*(n-2)/2 options
3 cutoffs = (n-1)(n-2)(n-3)/4 options
you can see the patterns here
If you actually need the cutoffs, then you have to actually do the loops, but since n is so small, Emilio is right, just brute force it.
1 cutoff
2 cutoffs
3 cutoffs
again, you can see the pattern
所以你想以所有可能的方式从 49 个选择中选择 25 个分割点。有很多众所周知的算法这样做。
我想提请您注意这个问题的另一面。有 49!/(25!*(49-25)!) = 63 205 303 218 876 >= 2^45 ~= 10^13 个不同 组合。所以如果要存储的话,需要的内存量是32TB * sizeof(Combination)。我估计会突破1PB大关。
现在假设您想要动态处理生成的数据。让我们乐观假设您可以处理 100 万个组合每秒(这里我假设没有并行化)。所以这个任务需要 10^7 秒 = 2777 小时 = 115 天。
这个问题比乍一看更复杂。如果你想在合理的时间内在家解决这个问题,我的建议是改变策略或者等待量子计算机的进步。
So you want to select 25 split point from 49 choices in all possible ways. There are a lot of well known algorithms to do that.
I want to draw your attention to another side of this problem. There are 49!/(25!*(49-25)!) = 63 205 303 218 876 >= 2^45 ~= 10^13 different combinations. So if you want to store it, the required amount of memory is 32TB * sizeof(Combination). I guess that it will pass 1 PB mark.
Now lets assume that you want to process generated data on the fly. Lets make rather optimistic assumption that you can process 1 million combinations per second (here i assume that there is no parallelization). So this task will take 10^7 seconds = 2777 hours = 115 days.
This problem is more complicated than it seems at first glance. If you want to solve if at home in reasonable time, my suggestion is to change the strategy or wait for the advance of quantum computers.
这将生成一个所有范围的数组,但我警告你,由于结果数量很大(50个元素,3个分割是49 * 48 * 47 = 110544),它会占用大量内存,我什至没有尝试编译它,所以可能有错误,但这是我使用的通用算法。
This will generate an array of all the ranges, but I warn you, it'll take tons of memory, due to the large numbers of results (50 elements with 3 splits is 49*48*47=110544) I haven't even tried to compile it, so there's probably errors, but this is the general algorithm I'd use.