R/C 中集合覆盖问题的变体++

发布于 2024-11-24 22:53:41 字数 452 浏览 5 评论 0原文

给定一个由元素组成的宇宙 U = {1, 2, 3,...,n} 以及该宇宙中的许多集合 {S1, S2,...,Sm}，我们可以创建的最小集合是多少？至少覆盖 m 组中每一组中的一个元素？

例如，给定以下元素 U = {1,2,3,4} 和集合 S = {{4,3,1},{3,1},{4}}，以下集合将至少覆盖一个每个集合中的元素： {1,4} 或者 {3,4} 所以这里所需的最小大小集是 2。

对于如何扩展它以解决 m=100 或 m=1000 集的问题有什么想法吗？或者关于如何用 R 或 C++ 编写代码的想法？

上面的示例数据使用 R 的library(sets)。

s1 <- set(4, 3, 1)
s2 <- set(3, 1)
s3 <- set(4)
s <- set(s1, s2, s3)

干杯

原文

Given a universe of elements U = {1, 2, 3,...,n} and a number of sets in this universe {S1, S2,...,Sm}, what is the smallest set we can create that will cover at least one element in each of the m sets?

For example, given the following elements U = {1,2,3,4} and sets S = {{4,3,1},{3,1},{4}}, the following sets will cover at least one element from each set:
{1,4}
or
{3,4}
so the minimum sized set required here is 2.

Any thoughts on how this can be scaled up to solve the problem for m=100 or m=1000 sets? Or thoughts on how to code this up in R or C++?

The sample data, from above, using R's library(sets).

s1 <- set(4, 3, 1)
s2 <- set(3, 1)
s3 <- set(4)
s <- set(s1, s2, s3)

Cheers

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

满地尘埃落定 2024-12-01 22:53:41

这就是命中集合问题，它基本上是元素和集合角色的集合覆盖互换了。令 A = {4, 3, 1} 且 B = {3, 1} 且 C = {4}，则元素集包含关系是这样的，

  A B C
1 + + -
2 - - -
3 + + -
4 + - +

因此您基本上想要解决用以下方式覆盖 {A, B, C} 的问题：设 1 = {A, B} 且 2 = {} 且 3 = {A, B} 且 4 = {A, C}。

在实践中解决集合覆盖的非平凡实例的最简单方法可能是找到具有 R 或 C++ 接口的整数编程包。您的示例将呈现为以下 LP 格式的整数程序。

Minimize
    obj: x1 + x2 + x3 + x4
Subject To
    A: x1 + x3 + x4 >= 1
    B: x1 + x3 >= 1
    C: x4 >= 1
Binary
    x1 x2 x3 x4
End

This is the hitting set problem, which is basically set cover with the roles of elements and sets interchanged. Letting A = {4, 3, 1} and B = {3, 1} and C = {4}, the element-set containment relation is

  A B C
1 + + -
2 - - -
3 + + -
4 + - +

so you basically want to solve the problem of covering {A, B, C} with sets 1 = {A, B} and 2 = {} and 3 = {A, B} and 4 = {A, C}.

Probably the easiest way to solve nontrivial instances of set cover in practice is to find an integer programming package with an interface to R or C++. Your example would be rendered as the following integer program, in LP format.

Minimize
    obj: x1 + x2 + x3 + x4
Subject To
    A: x1 + x3 + x4 >= 1
    B: x1 + x3 >= 1
    C: x4 >= 1
Binary
    x1 x2 x3 x4
End

回复收藏 0 原文

一个人的夜不怕黑 2024-12-01 22:53:41

起初我误解了问题的复杂性，并想出了一个函数来找到一个覆盖 m 个集合的集合 - 但后来我意识到它不一定是最小的集合：

cover <- function(sets, elements = NULL) {
  if (is.null(elements)) {
    # Build the union of all sets
    su <- integer() 
    for(si in sets) su <- union(su, si)
  } else {
    su <- elements
  }

  s <- su
  for(i in seq_along(s)) {
    # create set candidate with one element removed
    sc <- s[-i] 

    ok <- TRUE
    for(si in sets) {
      if (!any(match(si, sc, nomatch=0L))) {
        ok <- FALSE
        break
      }
    }

    if (ok) {
      s <- sc
    }
  }

  # The resulting set
  s
}

sets <- list(s1=c(1,3,4), s2=c(1,3), s3=c(4))
> cover(sets) # [1] 3 4

然后我们可以计时：

n <- 100  # number of elements
m <- 1000 # number of sets
sets <- lapply(seq_len(m), function(i) sample.int(n, runif(1, 1, n)))
system.time( s <- cover(sets) ) # 0.53 seconds

还不错，但是仍然不是最小的。

显而易见的解决方案：生成元素的所有排列并将其传递给 cover 函数并保留最小的结果。这将接近“永远”。

另一种方法是生成有限数量的随机排列 - 这样您就可以获得最小集合的近似值。

ns <- 10 # number of samples
elements <- seq_len(n)
smin <- sets
for(i in seq_len(ns)) {
   s <- cover(sets, sample(elements))
   if (length(s) < length(smin)) {
     smin <- s
   }
}
length(smin) # approximate smallest length

At first I misunderstood the complexity of the problem and came up with a function that finds a set that covers the m sets - but I then realized that it isn't necessarily the smallest one:

cover <- function(sets, elements = NULL) {
  if (is.null(elements)) {
    # Build the union of all sets
    su <- integer() 
    for(si in sets) su <- union(su, si)
  } else {
    su <- elements
  }

  s <- su
  for(i in seq_along(s)) {
    # create set candidate with one element removed
    sc <- s[-i] 

    ok <- TRUE
    for(si in sets) {
      if (!any(match(si, sc, nomatch=0L))) {
        ok <- FALSE
        break
      }
    }

    if (ok) {
      s <- sc
    }
  }

  # The resulting set
  s
}

sets <- list(s1=c(1,3,4), s2=c(1,3), s3=c(4))
> cover(sets) # [1] 3 4

Then we can time it:

n <- 100  # number of elements
m <- 1000 # number of sets
sets <- lapply(seq_len(m), function(i) sample.int(n, runif(1, 1, n)))
system.time( s <- cover(sets) ) # 0.53 seconds

Not too bad, but still not the smallest one.

The obvious solution: generate all permutations of elements and pass is to the cover function and keep the smallest result. This will take close to "forever".

Another approach is to generate a limited number of random permutations - this way you get an approximation of the smallest set.

ns <- 10 # number of samples
elements <- seq_len(n)
smin <- sets
for(i in seq_len(ns)) {
   s <- cover(sets, sample(elements))
   if (length(s) < length(smin)) {
     smin <- s
   }
}
length(smin) # approximate smallest length

回复收藏 0 原文