查找方法测试矩阵(数学问题)解释的有效方法
设置:
我有一个布尔矩阵,例如
1 0 1 1 0 1
,其中行被命名为 m1、m2、m3 (作为方法),列被命名为 t1 和 t2 (作为测试)。
定义:
解释是一组行(方法),这些行(方法)在任何列中至少有一个“1”(每个测试必须与至少一个方法匹配)。
在我们的示例中,解释集将是:
{
{m2},
{m1,m2},{m1,m3},{m2,m3},
{m1,m2,m3}
}
问题:
我现在想要计算所有解释。
现在,我已经有了两种可以解决问题的实现,一种是自上而下搜索解释,另一种是自下而上搜索解释,但两者的计算时间都呈指数增长(通过将行数增加一倍而增加一倍)。
这是一个已知的(也许可以有效解决的)数学问题吗?
可以让事情变得更容易的是,最后,我只需要每个方法的解释中出现的次数。在我们的示例中,m1 出现 3 次,m2 出现 4 次,m3 出现 3 次。
我当前的算法在 26 行之前都可以正常工作。但更进一步,它变得非常慢。
感谢您的帮助!
Setup:
I have a boolean matrix e.g.
1 0 1 1 0 1
where rows are named m1, m2, m3 (as methods) and columns t1 and t2 (as tests).
Definition:
An explanation is a set of rows (methods) which in union have at least one "1" in any column (every test hast to be matched by at least one method).
in our example, the set of explanations would be:
{
{m2},
{m1,m2},{m1,m3},{m2,m3},
{m1,m2,m3}
}
Problem:
I want now to calculate all explanations.
Now, I already have two implementations which can solve the problem, one searching explanations top-down, the other bottom-up, but both suffer in exponential growth in computing time (doubles by increasing the number of rows by one).
Is this a known (maybe solved efficiently) mathematical problem?
What could make things easier is that at the end, I only need the number of occurrences in explanations for every method. In our example this would be for m1 three occurrences, for m2 four occurrences and for m3 three occurrences.
My current algorithms work fine until let's say 26 rows. But further on, it is getting very slow.
Thank you for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您可以接受近似概率并想要可扩展的东西,那么吉布斯抽样可能会起作用。基本思想非常简单:从所有行解释开始,然后重复以下操作以对一堆解释进行采样。
在极限情况下,包含给定行的样本部分会收敛到其真实值。在关键字“使用吉布斯采样的贝叶斯推理”下有一些实际的实现(您有一个统一的先验,并观察到对于每一列,与其关联的行的析取都是正确的)。不过,由于我不是这方面的专家,因此我无法就自己动手的危险向您提供建议。
If you can settle for approximate probabilities and want something scalable, Gibbs sampling might work. The basic idea is pretty simple: start with the all-rows explanation and repeat the following to sample a bunch of explanations.
In the limit, the fraction of the samples containing a given row converges to its true value. There are some practical implementations under the keywords "Bayesian inference using Gibbs sampling" (you have a uniform prior and observe that for each column, the disjunction of the rows incident with it is true). Since I'm not an expert in this stuff, though, I can't advise you as to the hazards of rolling your own.
我认为这很可能是一个指数问题。例如,如果其中一个方法在每一列中都有一个,则包含该方法的任何方法子集都是一个解释,因此如果有 M 个方法,则至少有 2^(M-1) 个解释;类似地,如果某对方法在任何列中一起有一个,则至少有 2^(M-2) 个解释。
这是一种方法,虽然仍然是指数级的,但我认为比枚举所有解释要快,特别是当有许多 1 的方法时。
令 T(A,B) 为 A(一组方法)的子集数,其中 B(一组列)的每一列中至少有一个 1。
如果B为空,则T(A,B)是A的子集数,即2^#A,其中A有#A个元素。
否则如果 A 为空,则 T(A,B) 为 0。
否则,如果 i 是 A 的元素(例如第一个元素),则
T(A,B) = T(A \ {i}, B \ m[i]) + T( A \ {i}, B)
(此处A \ {i} 是没有 i 的 A,B \ m[i] 是没有方法 i) 中任何列的 B,
T 可以非常简洁地编码为递归函数。
最后,c[j](方法 j 在解释中出现的次数)为
c[j] = T(A \ {j}, C \ m[j]),
其中 C 是所有列的集合。
I think this is likely to be an exponential problem. For example if one of the methods has a one in every column, then any subset of methods containing that method is an explanation, and so if there are M methods there are at least 2^(M-1) explanations; similarly if some pair of methods together have a one in any column, then there are at least 2^(M-2) explanations.
Here is method that, while still exponential, I think is faster than enumerating all the explanations, particularly when there are methods with many 1s.
Let T(A,B) be the number of subsets of A (a set of methods) which have at least one 1 in each column in B (a set of columns).
If B is empty, T(A,B) is the number of subsets of A, i.e. 2^#A, where A has #A elements.
Otherwise If A is empty, T(A,B) is 0.
Otherwise if i is an element of A (e.g. the first one),
T(A,B) = T(A \ {i}, B \ m[i]) + T( A \ {i}, B)
(here A \ {i} is A without i, B \ m[i] is B without any of the columns in method i)
T can be coded quite succinctly as a recursive function.
Finally c[j], the number of times method j occurs in an explanation, is
c[j] = T(A \ {j}, C \ m[j])
where C is the set of all columns.
我不知道是否存在指数数量的可能解释(这意味着您无法比指数更快地枚举它们)。
但是,您可以采用动态编程风格来实现此目的,以消除重复的工作:
或
将它们的测试放在一起给出所有1
):I do not know whether there are not an exponential number of possible explanations (which would mean that you cannot enumerate them faster than exponential).
However, you could approach this in a dynamic programming style in order to eliminate duplicate effort:
or
ing their tests together gives all1
s):