Clojure 中的最小“设置覆盖”解决方案

发布于 2025-01-17 01:50:07 字数 1986 浏览 3 评论 0原文

我一直在尝试将(很多)数据库索引建议提炼成一组适用于大多数数据库的索引。为此,我需要解决一个非常基本但 NP 完整的集合理论问题:最小集合覆盖问题

这意味着给定一组集合,选择覆盖特定域 u 的集合子集 s,但如果未给出 u使其成为s 的并集。集合的最佳子集是达到某个最小值的子集,通常是集合的最小数量,但如果对集合进行加权,它也可能是总权重的最小值。

(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})
(def u (apply set/union s))
(set-cover s u)
=> (#{7 1 4} #{6 2 5} #{3} #{6 8})

我通过使用 clojure.math.combinatorics 实现了一个简单的版本,依靠它按集合数量增加的顺序返回子集。

(defn set-cover
  ([s]
   (set-cover s (apply set/union s)))
  ([s u]
   (->> s
        (combo/subsets)
        (filter (fn [s] (= u (apply set/union s))))
        first)))

然而,由于 NP 性质和循环并集(甚至是优化的并集),这在较大的 s 上非常慢。对于我的用例,支持加权集的版本也更好。

研究优化版本时,大多数路线最终都进入了论文领域,遗憾的是我不够聪明。我在SO上发现了这个小的 python 实现

def setCover(setList,target=None):
    if not setList: return None
    if target  is None: target  = set.union(*setList)
    bestCover = []
    for i,values in enumerate(setList):
        remaining = target - values
        if remaining == target: continue
        if not remaining: return [values]
        subCover = setCover(setList[i+1:],remaining)
        if not subCover: continue
        if not bestCover or len(subCover)<len(bestCover)-1:
            bestCover = [values] + subCover
    return bestCover

它勾选了许多框:

  • 工作递归地
  • 比较部分结果,因为优化
  • 似乎适合不同的最小定义:计数或重量
  • 有额外的优化我可以理解
    • 这可以在基本算法之外完成
      • 按照较高的最低分数(大小、重量)对输入集进行排序
      • 识别 u 中其他集合中未找到的唯一单例集合

我一直在尝试将其作为 loop-recur 转换为 Clojure > 函数,但无法使其基本版本正常工作,因为两种语言之间存在细微的范式差距。

有没有人建议我如何在 Clojure 中解决这个问题,或者通过提示如何成功转换 python 算法,或者我可以使用哪些其他 Clojure(甚至 Java)库以及如何使用?

I've been trying ways to distill (a lot of) database index suggestions into a set of indexes that are applicable to most databases. To do that it turns out I need to solve a pretty basic but NP complete set theory problem: the minimum set cover problem.

This means that given a set of sets, select a subset of sets s that covers a certain domain u, but if u isn't given make it the union of s. The most optimal subset of sets is the one reaching a certain minimum, usually the minimal amount of sets, but it could also be a minimum in total weight if the sets are weighted.

(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})
(def u (apply set/union s))
(set-cover s u)
=> (#{7 1 4} #{6 2 5} #{3} #{6 8})

I implemented a naive version of this by using clojure.math.combinatorics, relying on it returning subsets in order of increasing amounts of sets.

(defn set-cover
  ([s]
   (set-cover s (apply set/union s)))
  ([s u]
   (->> s
        (combo/subsets)
        (filter (fn [s] (= u (apply set/union s))))
        first)))

However this is very slow on larger s, because of the NP nature and the recurring unions (even optimized ones). For my use-case a version supporting weighted sets would also be preferable.

Looking into optimized versions most trails ended up in thesis-land, which I'm regrettably not smart enough for. I found this small python implementation on SO

def setCover(setList,target=None):
    if not setList: return None
    if target  is None: target  = set.union(*setList)
    bestCover = []
    for i,values in enumerate(setList):
        remaining = target - values
        if remaining == target: continue
        if not remaining: return [values]
        subCover = setCover(setList[i+1:],remaining)
        if not subCover: continue
        if not bestCover or len(subCover)<len(bestCover)-1:
            bestCover = [values] + subCover
    return bestCover

It ticks many boxes:

  • work recursively
  • compares partial results as optimization
  • seems suitable for different minimum definitions: count or weight
  • has additional optimizations I can grok
    • which can be done outside of the basic algorithm
      • sorting input sets on high minimum score (size, weight)
      • identifying unique singleton sets in u not found in other sets

I have been trying to translate this into Clojure as a loop-recur function, but couldn't get the basic version of it to work, since there are niggling paradigm gaps between the two languages.

Does anyone have suggestions how I could go about solving this problem in Clojure, either by tips how to convert the python algorithm successfully, or which other Clojure (or even Java) libraries I could use and how ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

将军与妓 2025-01-24 01:50:07

这是贪婪集合覆盖算法的 Clojure 版本,即在每次迭代时选择一个覆盖最多未覆盖元素的集合。它不是使用loop/recur来构建完整的结果,而是使用lazy-seq延迟返回每个结果元素:

(defn greedy-set-cover
  ([xs]
   (greedy-set-cover xs (apply set/union xs)))
  ([xs us]
   (lazy-seq
     (when (seq us)
       (let [x (apply max-key #(count (set/intersection us %)) xs)
             xs' (disj xs x)
             us' (set/difference us x)]
         (cons x (greedy-set-cover xs' us')))))))
(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})

(greedy-set-cover s)   ;; = (#{7 1 4} #{6 2 5} #{6 8} #{3})

Here's a Clojure version of the greedy set cover algorithm i.e. selects a set which covers the most uncovered elements at each iteration. Rather than use loop/recur to build the complete result, it lazily returns each result element using lazy-seq:

(defn greedy-set-cover
  ([xs]
   (greedy-set-cover xs (apply set/union xs)))
  ([xs us]
   (lazy-seq
     (when (seq us)
       (let [x (apply max-key #(count (set/intersection us %)) xs)
             xs' (disj xs x)
             us' (set/difference us x)]
         (cons x (greedy-set-cover xs' us')))))))
(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})

(greedy-set-cover s)   ;; = (#{7 1 4} #{6 2 5} #{6 8} #{3})
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文