Clojure 中的最小“设置覆盖”解决方案
我一直在尝试将(很多)数据库索引建议提炼成一组适用于大多数数据库的索引。为此,我需要解决一个非常基本但 NP 完整的集合理论问题:最小集合覆盖问题。
这意味着给定一组集合,选择覆盖特定域 u
的集合子集 s
,但如果未给出 u
使其成为s
的并集。集合的最佳子集是达到某个最小值的子集,通常是集合的最小数量,但如果对集合进行加权,它也可能是总权重的最小值。
(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})
(def u (apply set/union s))
(set-cover s u)
=> (#{7 1 4} #{6 2 5} #{3} #{6 8})
我通过使用 clojure.math.combinatorics 实现了一个简单的版本,依靠它按集合数量增加的顺序返回子集。
(defn set-cover
([s]
(set-cover s (apply set/union s)))
([s u]
(->> s
(combo/subsets)
(filter (fn [s] (= u (apply set/union s))))
first)))
然而,由于 NP 性质和循环并集(甚至是优化的并集),这在较大的 s
上非常慢。对于我的用例,支持加权集的版本也更好。
研究优化版本时,大多数路线最终都进入了论文领域,遗憾的是我不够聪明。我在SO上发现了这个小的 python 实现
def setCover(setList,target=None):
if not setList: return None
if target is None: target = set.union(*setList)
bestCover = []
for i,values in enumerate(setList):
remaining = target - values
if remaining == target: continue
if not remaining: return [values]
subCover = setCover(setList[i+1:],remaining)
if not subCover: continue
if not bestCover or len(subCover)<len(bestCover)-1:
bestCover = [values] + subCover
return bestCover
它勾选了许多框:
- 工作递归地
- 比较部分结果,因为优化
- 似乎适合不同的最小定义:计数或重量
- 有额外的优化我可以理解
- 这可以在基本算法之外完成
- 按照较高的最低分数(大小、重量)对输入集进行排序
- 识别
u
中其他集合中未找到的唯一单例集合
- 这可以在基本算法之外完成
我一直在尝试将其作为 loop-recur
转换为 Clojure > 函数,但无法使其基本版本正常工作,因为两种语言之间存在细微的范式差距。
有没有人建议我如何在 Clojure 中解决这个问题,或者通过提示如何成功转换 python 算法,或者我可以使用哪些其他 Clojure(甚至 Java)库以及如何使用?
I've been trying ways to distill (a lot of) database index suggestions into a set of indexes that are applicable to most databases. To do that it turns out I need to solve a pretty basic but NP complete set theory problem: the minimum set cover problem.
This means that given a set of sets, select a subset of sets s
that covers a certain domain u
, but if u
isn't given make it the union of s
. The most optimal subset of sets is the one reaching a certain minimum, usually the minimal amount of sets, but it could also be a minimum in total weight if the sets are weighted.
(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})
(def u (apply set/union s))
(set-cover s u)
=> (#{7 1 4} #{6 2 5} #{3} #{6 8})
I implemented a naive version of this by using clojure.math.combinatorics
, relying on it returning subsets in order of increasing amounts of sets.
(defn set-cover
([s]
(set-cover s (apply set/union s)))
([s u]
(->> s
(combo/subsets)
(filter (fn [s] (= u (apply set/union s))))
first)))
However this is very slow on larger s
, because of the NP nature and the recurring unions (even optimized ones). For my use-case a version supporting weighted sets would also be preferable.
Looking into optimized versions most trails ended up in thesis-land, which I'm regrettably not smart enough for. I found this small python implementation on SO
def setCover(setList,target=None):
if not setList: return None
if target is None: target = set.union(*setList)
bestCover = []
for i,values in enumerate(setList):
remaining = target - values
if remaining == target: continue
if not remaining: return [values]
subCover = setCover(setList[i+1:],remaining)
if not subCover: continue
if not bestCover or len(subCover)<len(bestCover)-1:
bestCover = [values] + subCover
return bestCover
It ticks many boxes:
- work recursively
- compares partial results as optimization
- seems suitable for different minimum definitions: count or weight
- has additional optimizations I can grok
- which can be done outside of the basic algorithm
- sorting input sets on high minimum score (size, weight)
- identifying unique singleton sets in
u
not found in other sets
- which can be done outside of the basic algorithm
I have been trying to translate this into Clojure as a loop-recur
function, but couldn't get the basic version of it to work, since there are niggling paradigm gaps between the two languages.
Does anyone have suggestions how I could go about solving this problem in Clojure, either by tips how to convert the python algorithm successfully, or which other Clojure (or even Java) libraries I could use and how ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是贪婪集合覆盖算法的 Clojure 版本,即在每次迭代时选择一个覆盖最多未覆盖元素的集合。它不是使用
loop
/recur
来构建完整的结果,而是使用lazy-seq
延迟返回每个结果元素:Here's a Clojure version of the greedy set cover algorithm i.e. selects a set which covers the most uncovered elements at each iteration. Rather than use
loop
/recur
to build the complete result, it lazily returns each result element usinglazy-seq
: