查找方法测试矩阵(数学问题)解释的有效方法

发布于 2024-09-12 07:42:21 字数 743 浏览 7 评论 0原文

设置:

我有一个布尔矩阵,例如

1  0
1  1
0  1

,其中行被命名为 m1m2m3 (作为方法),列被命名为 t1 和 t2 (作为测试)。

定义:

解释是一组行(方法),这些行(方法)在任何列中至少有一个“1”(每个测试必须与至少一个方法匹配)。

在我们的示例中,解释集将是:

{
  {m2}, 
  {m1,m2},{m1,m3},{m2,m3},
  {m1,m2,m3}
}

问题:

我现在想要计算所有解释。

现在,我已经有了两种可以解决问题的实现,一种是自上而下搜索解释,另一种是自下而上搜索解释,但两者的计算时间都呈指数增长(通过将行数增加一倍而增加一倍)。

这是一个已知的(也许可以有效解决的)数学问题吗?

可以让事情变得更容易的是,最后,我只需要每个方法的解释中出现的次数。在我们的示例中,m1 出现 3 次,m2 出现 4 次,m3 出现 3 次。

我当前的算法在 26 行之前都可以正常工作。但更进一步,它变得非常慢。

感谢您的帮助!

Setup:

I have a boolean matrix e.g.

1  0
1  1
0  1

where rows are named m1, m2, m3 (as methods) and columns t1 and t2 (as tests).

Definition:

An explanation is a set of rows (methods) which in union have at least one "1" in any column (every test hast to be matched by at least one method).

in our example, the set of explanations would be:

{
  {m2}, 
  {m1,m2},{m1,m3},{m2,m3},
  {m1,m2,m3}
}

Problem:

I want now to calculate all explanations.

Now, I already have two implementations which can solve the problem, one searching explanations top-down, the other bottom-up, but both suffer in exponential growth in computing time (doubles by increasing the number of rows by one).

Is this a known (maybe solved efficiently) mathematical problem?

What could make things easier is that at the end, I only need the number of occurrences in explanations for every method. In our example this would be for m1 three occurrences, for m2 four occurrences and for m3 three occurrences.

My current algorithms work fine until let's say 26 rows. But further on, it is getting very slow.

Thank you for your help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

情绪失控 2024-09-19 07:42:21

如果您可以接受近似概率并想要可扩展的东西,那么吉布斯抽样可能会起作用。基本思想非常简单:从所有行解释开始,然后重复以下操作以对一堆解释进行采样。

  1. 选择随机行。
  2. 抛一枚硬币。
  3. 如果硬币正面朝上,则将该行添加到说明中(如果已经存在,则不执行任何操作)。
  4. 如果硬币反面朝上,请尝试从解释中删除该行。如果结果不能解释,则将该行放回原处。

在极限情况下,包含给定行的样本部分会收敛到其真实值。在关键字“使用吉布斯采样的贝叶斯推理”下有一些实际的实现(您有一个统一的先验,并观察到对于每一列,与其关联的行的析取都是正确的)。不过,由于我不是这方面的专家,因此我无法就自己动手的危险向您提供建议。

If you can settle for approximate probabilities and want something scalable, Gibbs sampling might work. The basic idea is pretty simple: start with the all-rows explanation and repeat the following to sample a bunch of explanations.

  1. Choose a random row.
  2. Flip a coin.
  3. If the coin came up heads, add the row to the explanation (do nothing if it's already there).
  4. If the coin came up tails, attempt to remove the row from the explanation. If the result is not an explanation, put the row back.

In the limit, the fraction of the samples containing a given row converges to its true value. There are some practical implementations under the keywords "Bayesian inference using Gibbs sampling" (you have a uniform prior and observe that for each column, the disjunction of the rows incident with it is true). Since I'm not an expert in this stuff, though, I can't advise you as to the hazards of rolling your own.

笑脸一如从前 2024-09-19 07:42:21

我认为这很可能是一个指数问题。例如,如果其中一个方法在每一列中都有一个,则包含该方法的任何方法子集都是一个解释,因此如果有 M 个方法,则至少有 2^(M-1) 个解释;类似地,如果某对方法在任何列中一起有一个,则至少有 2^(M-2) 个解释。

这是一种方法,虽然仍然是指数级的,但我认为比枚举所有解释要快,特别是当有许多 1 的方法时。

令 T(A,B) 为 A(一组方法)的子集数,其中 B(一组列)的每一列中至少有一个 1。

如果B为空,则T(A,B)是A的子集数,即2^#A,其中A有#A个元素。
否则如果 A 为空,则 T(A,B) 为 0。
否则,如果 i 是 A 的元素(例如第一个元素),则

T(A,B) = T(A \ {i}, B \ m[i]) + T( A \ {i}, B)

(此处A \ {i} 是没有 i 的 A,B \ m[i] 是没有方法 i) 中任何列的 B,

T 可以非常简洁地编码为递归函数。

最后,c[j](方法 j 在解释中出现的次数)为

c[j] = T(A \ {j}, C \ m[j]),

其中 C 是所有列的集合。

I think this is likely to be an exponential problem. For example if one of the methods has a one in every column, then any subset of methods containing that method is an explanation, and so if there are M methods there are at least 2^(M-1) explanations; similarly if some pair of methods together have a one in any column, then there are at least 2^(M-2) explanations.

Here is method that, while still exponential, I think is faster than enumerating all the explanations, particularly when there are methods with many 1s.

Let T(A,B) be the number of subsets of A (a set of methods) which have at least one 1 in each column in B (a set of columns).

If B is empty, T(A,B) is the number of subsets of A, i.e. 2^#A, where A has #A elements.
Otherwise If A is empty, T(A,B) is 0.
Otherwise if i is an element of A (e.g. the first one),

T(A,B) = T(A \ {i}, B \ m[i]) + T( A \ {i}, B)

(here A \ {i} is A without i, B \ m[i] is B without any of the columns in method i)

T can be coded quite succinctly as a recursive function.

Finally c[j], the number of times method j occurs in an explanation, is

c[j] = T(A \ {j}, C \ m[j])

where C is the set of all columns.

﹏半生如梦愿梦如真 2024-09-19 07:42:21

我不知道是否存在指数数量的可能解释(这意味着您无法比指数更快地枚举它们)。

但是,您可以采用动态编程风格来实现此目的,以消除重复的工作:

  • 第一级是单个元素的列表:
  • 每个级别的所有方法循环的集合:
    • 循环此级别中的每个集合:
      • 如果这组是一个解释(将它们的测试放在一起给出所有1):
        • 将其放入结果列表
        • 创建该集合的所有可能的子集,这些子集只少一个方法,并将它们放入下一个“级别”
    • 删除下一级的所有重复项
  • 直到该级别为空

I do not know whether there are not an exponential number of possible explanations (which would mean that you cannot enumerate them faster than exponential).

However, you could approach this in a dynamic programming style in order to eliminate duplicate effort:

  • The first level is a list of a single element: the set of all methods
  • loop for each level:
    • loop for each set in this level:
      • if this set is an explanation (oring their tests together gives all 1s):
        • put it into the result list
        • create all possible subsets of this set that have exactly one method less and put them into the next "level"
    • remove all duplicates from the next level
  • until the level is empty
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文