选择集合的子集,使得子集满足全局约束
我们有一组项目 I = {i_1, i_2, ..., i_n}。这些项目中的每一项都有我们所说的p值,它是一些实数。我们想要选择 I 的一个子集,称为 I',大小为 m(对于某些 m,其中 1 <= m <= n),使得项目的 p 值的平均值I' 落在某个指定范围内,[p_l, p_u]。 (例如,我们可能需要 0.70 到 0.74 之间的平均 p 值。)此外,我们希望高效地做到这一点。
我们希望在 O(n) 时间内完成此操作,但任何多项式时间算法都足够好。我们当然不想只尝试大小为 m 的 I 的每个可能子集,然后检查它是否满足平均 p 值约束。
最后,我们将重复执行此操作,并且希望选择的子集在所有可能的此类子集中均匀随机分布。
有办法做到这一点吗?
We have a set of items I = {i_1, i_2, ..., i_n}. Each of these items has what we call a p value, which is some real number. We want to choose a subset of I, call it I', of size m (for some m with 1 <= m <= n) such that the average of the p values of the items in I' falls within some specified range, [p_l, p_u]. (For example, we might require an average p value between 0.70 and 0.74.) Moreover, we want to do this efficiently.
We hope to do this in O(n) time, but any polynomial time algorithm is good enough. We certainly do not want to just try every possible subset of I of size m and then check whether it satisfies the average p-value constraint.
Finally, we will be doing this repeatedly and we want the subsets chosen to be a uniformly random distribution over all the possible such subsets.
Is there a way of doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您有一个子集及其总和,并且按 |subset+1|/|subset| 缩放总和,则您添加的每个新元素都会对总和做出线性贡献。因此,这看起来与子集和问题(NP 完全)非常相似,其中的目标是找到总和为 0 的所有子集,尽管这里我们希望总和接近 0。例如,如果您有一个很大的子集设置一个元素大致在可接受的 p 范围内,如果您粗略地假设该元素无关紧要,突然之间它实际上是等效的......假设您有大量正和负 p 值,并且这个问题不是由对手制造的。如果是这种情况,您可以使用 http://en.wikipedia 中给出的两个近似解决方案之一。 org/wiki/Subset_sum_problem ,使等式“模糊”,然后将 0.7
当然,您从解决方案中获得的效率难以置信取决于 p 值的生成方式(其“分布”)。
If you have a subset and its sum, if you scale the sum by |subset+1|/|subset|, each new element you add contributes linearly to the sum. This thus seems extremely similar to the subset-sum problem (NP-complete), where the goal is to find all subset which sum to 0, albeit here we wish the sum to be close to 0. For example, if you have a large set where one element is roughly in the acceptable p-range, if you make the sketchy assumption that that element doesn't matter, suddenly it is practically equivalent... assuming you have a large number of positive and negative p-values, and the problem is not constructed by an adversary. If this is the case, you could use one of the two approximating solutions given at http://en.wikipedia.org/wiki/Subset_sum_problem , make the equality "fuzzy", and just tack on a "reserved" element with 0.7<p<0.74 to the set.
Of course the efficiency you can eek out of your solution is incredibly dependent on the way p-values are generated (the "distribution" thereof).