我被要求编写一个例程来决定产品配置器中可能的组合数量。
配置器非常简单。 尽管它具有比这更多的功能,但它可以建模为多个“单选组”(如 UI 控件),其中必须选择 n 个选项之一。
唯一可以使用的约束是规则,如果选择了一个选项,则不能选择另一个选项。
所以我想做的是在给定一组选项组和约束的情况下计算可以配置的不同产品的数量。
我使用包含-排除原则来解决这个问题。 然而据我所知,任何基于此方法的算法都应该在 O(2^n) 中运行,这是行不通的。 当然,有几种可能的优化可以提供不错的运行时间,但仍然容易构建最坏的情况。
我现在差不多就是这样。 有什么建议么?
更新
我意识到我还没有很好地解释规则如何适用。
有几个带有选项的组。 每组中必须选择一个且仅一个选项。 一组中可以有一个或多个选项。
只有一种类型的约束。 如果某个组中的选项A被选中,则其他组中的选项B就不能被选中。 可以有任意数量的约束,适用于选项组或选项本身的约束/规则的数量没有限制。
例如:
组 1:
x1 x2 x3 x4 x5
第 2 组:
y1 y2 y3
第 3 组:
z1 z2 z3 z4
约束:
x1 <-> y2 *
x1 <-> z4
y2 <-> z2
*
如果在组 1 中选择了选项 x1,则不能选择组 2 中的选项 y2。
使用包含-排除,我将计算组合数量:
Combinations = Cnorules - Cr[1]
- Cr [2] - Cr[3] + Cr[1,2] + Cr[1,3] + Cr[2,3] - Cr[1,2,3]
其中
C无规则 = 5 * 3 * 4
C r[a,b,c] = 违反规则 a、b 和 c 的组合数。
不幸的是,该方法需要 2^|规则| 计算。
I have been asked to program a routine to decide the number of possible combinations in a product configurator.
The configurator is really simple. Even though it has more features than this, it can be modeled as several "radio groups" (like the UI control) where one of n options has to be selected.
The only kind of constraints that can be used is rules that says if one option is selected another option can not be selected.
So what I want to do is to calculate the number of different products that can be configured, given a set of option groups and constraints.
I made a naive approach to solve this using the Inclusion-exclusion principle. However as far as I can see, any algorithm based on this method should run in O(2^n) which won't work. There are of course several possible optimizations that should give decent runtimes but there are still easily constructed worst case scenarios.
Thats pretty much where I am right now. Any suggestions?
Update
I realize I haven't explained how the rules applies well enough.
There are several groups with options. One and only one option must be selected in each group. There can be one or more of options in a group.
There is only one type of constraints. If option A in some group is selected, then option B in some other group can't be selected. There can be any number of constraints, there is no limit how many constraints/rules apply to a option group or an option itself.
So an example would be:
Group 1:
x1 x2 x3 x4 x5
Group 2:
y1 y2 y3
Group 3:
z1 z2 z3 z4
Constraints:
x1 <-> y2 *
x1 <-> z4
y2 <-> z2
*
If option x1 is selected in group1, then option y2 in group 2 can not be selected.
Using inclusion-exclusion I would calculate the number of combinations as
Combinations = Cno rules - Cr[1]
- Cr[2] - Cr[3] + Cr[1,2] + Cr[1,3] + Cr[2,3] - Cr[1,2,3]
Where
Cno rules = 5 * 3 * 4
Cr[a,b,c] = Number of combinations that violates rule a, b and c.
The method unfortunately requires 2^|rules| calculations.
发布评论
评论(11)
我认为扎克的思考方向是正确的。 查看组合数量的表达式,您会发现二阶项 Cr[i,j] 比 C[k] 项小得多。
想象一个立方体,其中每个轴都是一组选项。 立方体中的每个点代表一个特定的选项组合。 一阶 C[k] 校正排除立方体两个表面之间的一行选项。 仅当两条这样的线在立方体空间中的一个点(选项组合)相交时,才会发生二阶校正 C[i,j]。 因此,无论您拥有多少组,高阶修正总是越来越小。 如果您坚持
组合 = C(无规则) - Cr[1] - Cr[2] - Cr[3],
您最终会得到组合数量的下限。 现在您知道一阶校正的大小,并考虑上面立方体的观察结果,您甚至可以估计二阶校正的数量级。 这将取决于组的数量。 然后,您的算法可以决定是继续执行更高的订单还是停止。
I think Zac is thinking in the right direction. Looking at your expression for the number of combinations, you see that the second order terms Cr[i,j] are much smaller than the C[k] terms.
Imagine a cube where each axis is a group of options. Each point in the cube represents a particular combination of options. A first order C[k] correction excludes a line of options between two surfaces of the cube. A second order correction C[i,j] only happens when two such lines meet at one point (combination of options) in space in the cube. So regardless of the number of groups you have, the higher order corrections are always increasingly smaller. If you stick to
Combinations = C(no rules) - Cr[1] - Cr[2] - Cr[3]
you end up with a lower bound for the number of combinations. Now that you know the size of your first order correction, and thinking about the observation with the cube above, you can even estimate the order of magnitude of the second order correction. It will depend on the number of groups. Your algorithm can then decide whether it wants to continue with the higher orders or stop.
对 Daniel 的帖子的评论:
你的算法看起来不错,但我做不到说服自己它确实有效,所以我安装了 scala 并做了一些测试。 不幸的是我没有得到正确的结果。
例如,考虑这种情况:
我使用此配置配置了基本筛算法,并得到了以下结果(227 组合):
但是在 scala 中使用此代码:
我得到了答案 258 >。
我检查了筛法中的计算,它们似乎是正确的。 也许可以修复你的算法? 我真的无法指出哪里出了问题。
Comment to Daniel's post:
Your algorithm looks good but I couldn't convince myself it really worked, so I installed scala and made some tests. Unfortunaly I don't get correct results.
For example consider this case:
I configured my basic sieve algorithm with this configuration and got the following results (227 combinations):
However using this code in scala:
I got the answer 258.
I've checked the calculations in the sieve method and they seem to be right. Perhaps it is possible to fix your algorithm? I can't really put my finger on what is wrong.
你的问题是相当不可行的。
所以一般不要指望多项式算法; 这种算法的存在意味着 P=NP。
你可以做什么:
证明:
(p1 或不是 p2)和(p2 或不是 p3)
允许用户选择 p1、p2、p3 的值。 您可以轻松地形成约束条件,迫使该问题得以解决。 因此,可能的配置数量 = p1、p2、p3 使布尔公式成立的可能分配数量。
(p1 或 p2 或不是 p3)和(p2 或不是 p1 或 p4)
给出选项:
group p1 p1true p1false
group p2 p2false p2true
group p3 p3false p3true
group p4 p4false p4true
group Clause_1 c1a c1b c1c
group Clause_2 c2a c2b c2c
Clause_1 将控制第一个子句:(p1 或 p2 或不是 p3)。 准确地说,如果选择 p1true,则 c1a 将可检查;如果选择 p2true,则 c1b 将可检查;如果选择 p3false,则 c1c 将可检查。 所以约束是:
p1false <-> c1a
p2false <->; c1b
p3true <->; c1c
与clause_2相同,约束为
p2false <-> c2a
p1true <->; c2b
p4false <-> c2c
如果用户能够选择所有答案(因此配置数量 > 0),他将证明变量 p1,...,p4 的某些估值使 3SAT 实例为真。 相反,如果用户无法选择与假设一致的答案(可用配置的数量 = 0),则 3SAT 实例将无法令人满意。 所以知道答案是否是> 0 表示知道 3SAT 实例是否可解。
当然,这种减少是多项式时间的。 证明结束。
如果您对答案可能为 0 的事实不满意:如果您忽略此类配置器,它仍然是 NP 困难的。 (您可以向所有组添加一些“bogus”选项,如果未选择“bogus”,则允许指数级的多种选择。这解释起来比较复杂,所以我将根据要求执行此操作。)
Your problem is rather infeasible.
So in general don't count on a polynomial algorithm; existence of such algorithms would imply P=NP.
What you can do:
Proofs:
(p1 or not p2) and (p2 or not p3)
Allow the user to choose values of p1, p2, p3. You can easily form constraints that will force this to be solvable. So the number of possible configurations = number of possible assignments of p1, p2, p3 making the Boolean formula true.
(p1 or p2 or not p3) and (p2 or not p1 or p4)
Give options:
group p1 p1true p1false
group p2 p2false p2true
group p3 p3false p3true
group p4 p4false p4true
group clause_1 c1a c1b c1c
group clause_2 c2a c2b c2c
clause_1 will be controlling the first clause: (p1 or p2 or not p3). Precisely, c1a will be checkable if p1true was chosen, c1b will be checkable if p2true was chosen, c1c will be checkable if p3false was chosen. So the constraints are:
p1false <-> c1a
p2false <-> c1b
p3true <-> c1c
Same with clause_2, constaints are
p2false <-> c2a
p1true <-> c2b
p4false <-> c2c
If the user will be able to choose all answers (so the number of configurations is > 0), he'll prove that there is some valuation of variables p1, ..., p4 that make the 3SAT instance true. And conversely, if the user won't be able to choose answers consistent with assumptions (the number of available configurations = 0), the 3SAT instance will not be satisfable. So knowing if the answer is > 0 means knowing if 3SAT instance is solvable.
Of course this reduction is polynomial-time. End of proof.
If you aren't satisfied with the fact that answer might be 0: it is still NP-hard if you disregard such configurators. (You would add some "bogus" option to all groups and allow exponentially many choices if "bogus" was not chosen. This is more complex to explain, so I'll do it on request.)
上面 sdcvvc 的优秀答案中简要提到了这一点,但是蒙特卡洛近似是否足够好? 生成 N 个随机配置(其中选择 N 以使您的实验功效足够高:我不知道足够多的信息来帮助您),然后测试其中有多少与您的约束兼容。 将该比例推断到配置空间的其余部分。
This is mentioned briefly in sdcvvc's excellent answer above, but might a Monte Carlo approximation be good enough? Generate N random configurations (where N is chosen such that the power of your experiment is high enough: I don't know enough to help you here), then test how many of them are compatible with your constraints. Extrapolate that proportion to the rest of your configuration space.
如果您有
N
个选项组,每个选项组都有Xi
选项 (0<=i),那么
会给您您想要的答案。 换句话说,乘以每组选项的大小。
If you have
N
option groups, each withXi
options (0<=i<N
) thenGives you the answer you want. In other words, multiply the size of each group of options.
如果您有
n
个参数,每个参数具有Ci
个可能值和m
约束,则配置数量的上限如下(忽略的约束条件)。形式为
ci == x => 的单个约束 cj != y
不允许以下数量的配置。因此,配置的数量是通过从忽略约束的上限中减去不允许的配置来获得的。
这里,
xj
和yj
是第j
约束的两个参数索引。示例
更新
我认为这还不完全正确,因为它没有考虑约束的依赖性。
If you have
n
parameters withCi
possible values for each parameter andm
constraints, a upper bound for the number of configuartions is the following (ignoring the constraints).A single constraint of the form
ci == x => cj != y
disallows the following number of configurations.Hence the number of configuration is obtained by subtracting the disallowed configuartions from the upper bound ignoring the constraints.
Here
xj
andyj
are the both parameter indices from thej
-th constraint.Example
UPDATE
I think this is not yet completly correct because it does not account for dependencies of the constraints.
对于一般情况没有捷径。它并不像你想象的那么糟糕。 请参阅下面的“重新思考”。2^n 真的那么糟糕吗? 我们在这里谈论多少排除规则? 您实际上只需为每个配置器执行一次此操作,除非规则/选项集不断动态变化并需要动态重新计算。 如果真的有大量规则,那么我不会寻求精确的解决方案 - 只考虑 k 阶交集并说“组合数至少/最多......”。 可能还有其他筛选方法可以让您快速得出答案的范围。
另请记住:如果您只考虑实际需要的排除项,那么 2^n 只是一个上限,对于任何实际场景,您的实际计算次数可能会少得多。 也就是说,如果 C[1,2] 为零,则 C[1,2,...] 也为零。 考虑一下:对于每个约束,如果约束集共享任何共同选项,则将它们“聚集”在一起。 很明显,您的实际运行时间将由最大“团块”的大小定义(是的,可能与 n 一样大)。
重新思考:在大多数情况下,C[x,y] 将为零。 约束只能与涉及不同组的其他约束重叠。 换句话说,(x1<->y1)只能与(x1<->z1)或(y1<->z2)或其他东西重叠,而不能与(x1<->y2)重叠。 类似地,一组约束只能与一个新组重叠:(x1<->y1) 与 (y1<->z2) 的组合与 (x3<->) 没有交互;z2)(x 组已固定为 x1)。 您只需考虑包含/排除,其中添加到组合中的每个规则都会将以前未触及的组添加到组合中。 所以你实际上是 O(2G),其中 G 是组的数量(也可能是基于组大小的不同界限)。 更易于管理!
There are no shortcuts here for the general case.It's not as bad as you think. See "Rethinking" below.Is 2^n really so bad? How many exclusion rules are we talking about here? You really only have to do this once for each configurator, unless the set of rules/options is constantly changing on the fly and requiring dynamic recalculation. If there's really a huge number of rules, then I would not seek an exact solution -- only consider the kth-order intersections and say "number of combinations is at least/most ...". There might be other sieve methods that will let you quickly derive bounds on the answer.
Also keep in mind: if you only consider the exclusions you actually need, then 2^n is merely an upper bound, and your actual number of calculations may be significantly less for any actual scenarios. That is, if C[1,2] is zero, then so is C[1,2,...]. Consider this: for each constraint, "clump" sets of constraints together if they share any options in common. It's clear that your real running time is going to be defined by the size of the largest "clump" (which, yes, may be as big as n).
Rethinking: C[x,y] is going to be zero in most cases. A constraint can only overlap with other constraints involving a different group. In other words, (x1<->y1) can only overlap with (x1<->z1) or (y1<->z2) or something, not (x1<->y2). Similarly, a set of constraints can only overlap with a new group: the combination of (x1<->y1) with (y1<->z2) has no interaction with (x3<->z2) (the x group is already fixed at x1). You only have to consider inclusion/exclusion where each rule you add to the combination adds a previously-untouched group to the mix. So you are actually O(2G), where G is the number of groups (and also perhaps a different bound based on the size of the groups). Much more manageable!
好吧,我无法得到大约 2 ^ N,但我可以减少样本集。 为此,我们将计算“组合约束”。 组合约束是一种约束,其中如果选择了左侧的所有选项,则无法选择右侧的任何选项,但不能应用基于左侧选项的其他约束。
我们需要从一组约束计算一组所有可能的组合约束。 虽然没有必要,但如果右手组小于左手组,我们将通过交换左右手来“修复”现有约束。 这可能会减少一些组合约束,尽管可以使用更好的启发式进行交换。
我们还需要计算可以为每个组任意选择的选项的“最小集”。 该最小集是通过从可用选项列表中删除出现在组合约束左侧的任何选项来计算的。
接下来是一个算法,但我并没有证明它可以正确计算 CC。 我将证明,如果确实如此,那么它们可以用来计算可能组合的数量。
编写约束:
依次,对于每个约束:
x1 <-> y1,
x1 <-> y2 ...
x1 <-> yN
进入Set(x1) <-> 设置(y1 ... yN)
计算最小值通过获取所有选项并删除出现在固定约束左侧的选项来设置。
现在您可以使用下面的公式计算组合数。 我们将 CC 称为组合约束。 那么组合数为:
其中:
请注意,该表达式纯粹是相加的。 这意味着,要使该表达式产生预期结果,必须满足以下两个条件:
对于第一个证明,请注意不存在同一左手的两个不同的 CC。 如果两个 CC 具有相同的左手但不同的右手,则意味着存在必须应用于其中一个 CC 的附加约束,或者应用于另一个 CC 的无效约束。
由于不存在具有相同左手的两个CC,并且根据定义,最小集合不包含任何CC的左手,因此可以通过至少一个选项来区分任何两个CC,该选项为一个选择而另一个不选择。 因此,任何两个 CC 都不会产生相同的组合。
对于第二个证明,请注意,根据定义,CC 集包含左侧选项的所有有效组合。
假设有一个组合既没有出现在最小集合中,也没有出现在CC集合中。 如果该组合不包含任何左侧选项,则根据定义,它必须是最小集的组合。 因此,它必须包含左手的选项。
由于这组 CC 包含左手的所有有效组合,因此有一个 CC 具有相同的左手选项。 因此,该组合必须具有一个不包含在该 CC 的任何组合中的选项。 但该 CC 中唯一未包含的选项是出现在其他 CC 左侧的选项,以及必须通过约束从其中排除的选项。 既然情况都不是这样,那么这种组合就不可能存在。
对于第三个证明,我们首先考虑最小集。 最小集不包含任何组左侧的任何选项。 由于所有约束都在一个左手选项和一个右手选项之间,因此没有任何约束适用于最小集合。
现在让我们考虑一下 CC。 根据定义,CC 具有左手选项的有效组合。 任何与左手不兼容的选项都必须出现在右手中,并且右手中的任何选项都必须从最小集中删除。 由于最小集合上没有选项本身之间不兼容,因此 CC 上的任何组合都不会存在不满足的约束。
证明就结束了。
让我们通过评论中的示例来看看这是如何应用的:
让我们简要思考一下不在列表中的组合组:
现在,让我们看看每组中可能有哪些选项:
现在,让我们添加一些内容:
我将在这里添加一个进一步的想法。 尽管 5 条规则只有 6 个 CCC,远少于预期的 32 个项,但这些 CCC 是用 2^N 最差时间计算的,因为对于每条规则,您必须将其与迄今为止创建的所有 CCC 进行比较和组合。 您可以将它们视为二进制数,如果规则被组合,则设置一个位,如果不组合则不设置。
然而,不兼容的组合会立即被丢弃,因此对于组合的每个新规则,不会在已经被视为无效的组合上浪费时间。 此外,通过预先对规则进行排序,同一组中的连续规则可能会被丢弃,甚至无需测试与正确数据结构的不兼容性。
正如这个特定示例所示,平均时间可能比 2^N 好得多。
替代算法和考虑因素
有一些关于 2-SAT 和 3-SAT 的讨论。 在我看来,这是一个 2-SAT 问题,因为每个都约束一个 <-> 。 b 实际上是一个子句“!a || !b”。 因此,所有约束一起可以写为“(!x1 || !y2) && (!x1 || !z4) && (!y2 && !z3)”等。这意味着您可以“解决”它在某种意义上您可以找出每个选项是否有一个布尔值赋值可以将其变为真。 Aspall、Plass 和 Tarjan 对此有一个线性算法,并附有幻灯片 这里。
但是知道约束是否可以解决并不是我们所要求的。 问题是在保持 2-SAT 问题成立的同时,可以设置所有选项的数量。
现在,有一些有效的算法可以计算满足 2-SAT 问题的方法数量。 例如,本文提出了一种在 1.2561n 中运行的算法。 但即使这样也对我们没有帮助,因为我们需要知道什么解决方案才能计算出满足该解决方案的组合数量。
根据维基百科,本文有一个算法可以有效地枚举所有解决方案,这就是我们想。 但如果计数已经是指数级的,那么枚举也将是指数级的。 也许比 2n 更好,但仍然是指数级的。
如果我们枚举 2-SAT 问题的所有解决方案,则每组的组合数量由 1 乘以自由选项(未选择任何选项的每组中未出现在任何约束中的选项)的数量给出通过解决方案。
例如,采用之前的一组组和约束。 2-SAT 问题(包括互斥)是:
(!x1 || !y2) && (!x3 || !y2) && (!y1 || !z1) && (!y2 || !z2) && (!y2 || !z3) &&
(!x1 || !x3) && (!y1 || !y2) && (!z1 || !z2) && (!z1 || !z3) && (!z2 || !z3)
第一行是五个规则。 第二行是约束规则中出现的同一组中所有选项的互斥。
这个 2-SAT 问题的解是:
在前两个解中,没有没有选择选项的组,因此组合数为 1。第三个解没有为组 G3 选择选项,所以我们将 1 乘以0. 以“false false”开头的行中,没有为组G1选择任何选项,并且有一个自由选项:x2。 因此,我们将它们乘以 1,如果没有为 G2 或 G3 选择选项,则乘以 0(在这种情况下,自由选项的数量为 0)。
现在的问题是,我如何强制在每组中选择一个选项,同时仍然声称自己是 2-SAT。 如前所述,该问题有两个隐含的约束:对于每一组,必须选择一个且只能选择一个选项。 这两个约束可以写为:
x1 || x<子>2 || x3(对于 x 组,带有选项 x1 .. x3)
(!x1 || !x2) && (!x1 || !x3) && (!x2 || !x3)
对于任何重要情况,后者约束为 2-SAT,前者约束为 3-SAT。 碰巧,我不强制执行第一个约束,但计数会变为 0。计数算法应该如下所示:
因此,对于其中至少有一个无约束选项的每一组,选择是隐式且匿名的。 对于所有选项都是某个约束的一部分的每个组,如果没有选择任何选项,则该组的计数将变为 0,因此该解决方案的组合数也将变为 0。
这感觉像是从有效的 > 中欺骗了问题。 2-SAT 约束。 毕竟,如果这是可能的,那么只需枚举 2-SAT 部分的解决方案,然后丢弃所有不满足 3-SAT 部分的解决方案即可解决 3-SAT 问题。 唉,我可以识别出一个根本区别:
鉴于对子句的这种相当严格的约束,我们可以通过枚举 2-SAT 显式约束的解决方案来解决这个问题。
如果有人想进一步追求这一点,那就继续吧。 我对我建议的 2n 解决方案的改进感到满意。
Ok, I can't get around 2 ^ N, but I can reduce the sample set. To do this, we'll compute "Composed Constraints". A Composed Constraint is a constraint where, if the all options on the left side is selected, then none of the options on the right side can be selected, but no other constraint based on the left side options may apply.
We need to compute a set of all possible Composed Constraints from a set of constraints. While not necessary, we'll "fix" the existing constraints by swapping left and right hands if the group of the right hand is smaller than the group of the left. This may reduce some composed constraints, though better heuristics are possible for swapping.
We also need to compute a "minimum set" of options that can be arbitrarily chosen for each group. This minimum set is computed by removing from the list of available options any options appearing on the left hand of a composed constraint.
An algorithm follows, but I'm not proving it computes CCs correctly. I will prove that, if it does, then they can be used to compute the number of possible combinations.
Compose the constraints:
Sequentially, for each constraint:
x1 <-> y1
,x1 <-> y2
...x1 <-> yN
intoSet(x1) <-> Set(y1 ... yN)
Compute the Minimum Set, by taking all options and removing the ones appearing in the left hand of the fixed constraints.
Now you can compute the number of combinations with the formula below. Let's call CC a composed constraint. Then the number of combinations is:
Where:
Note that the expression is purely additive. This means that, for that expression to yield the expected result, the following two conditions must be true:
For the first proof, note that there are no two distinct CCs with the same left hand. If two CCs had the same left hand but different right hands, that would mean there was an addition constraint that must apply to one of the CCs, or an invalid constraint being applied to the other.
Since there are no two CCs with the same left hand, and the minimum set does not contain the left hand of any CC by definition, then any two CC can be distinguished by at least one option which is selected for one but not for the other. Therefore, no two CCs may yield the same combination.
For the second proof, note that the set of CCs contains, by definition, all valid combinations for options on the left hand.
Assume that there is one combination that does not appear in either the minimum set nor the set of CCs. If this combination does not contain any left-hand option, then it must be a combination from the minimum set, by definition. Therefore, it must contain options from the left hand.
Since the set of CCs contains all valid combinations for the left hand, then there is one CC with the same left-hand options. This combination must have, therefore, an option that is not included in any combination for that CC. But the only options not included in that CC are the ones appearing in the left hand for other CCs, and the ones that must be excluded from it by constraints. Since neither may be the case, then this combination cannot exist.
For the third proof, let's first consider the minimum set. The minimum set does not contain any option in the left hand of any group. Since all constraints are between one left and one right hand option, no constraint applies to the minimum set.
Now let's consider the CCs. A CC has a valid combination of left hand options by definition. Any option that is incompatible with that left hand must appear in the right hand, and any option from that right hand must be removed from the minimum set. Since no options on the minimum set where incompatible between themselves to begin with, there can be no unsatisfied constraint in any combination on a CC.
And that ends the proof.
Let's see how this applies with an example from the comments:
Let's ponder briefly on composed groups not in the list:
Now, let's see what options are possible in each set:
Now, let's add things up:
I'll add one further thought here. Even though there are only 6 CCCs for 5 rules, much less than the 32 terms expected otherwise, these CCCs are computed with 2^N worst time, since, for each rule, you must compare and combine it to all CCCs created so far. You may think of them as binary numbers, where a bit is set if the rule is being combined, and not set if not.
However, incompatible combinations are discard right away, so that for each new rule being combined, no time is lost on combinations already deemed invalid. Also, by sorting the rules before-hand, successive rules in the same group may be discarded without even testing for incompatibility, with the right data structures.
As this particular example shows, the average time can be much better than 2^N.
Alternative Algorithms and Considerations
There's some talk of 2-SAT and 3-SAT going around. It seems, to me, that this is a 2-SAT problem, in the sense that every constrain a <-> b is actually a clause "!a || !b". So all constrains together can just be written as "(!x1 || !y2) && (!x1 || !z4) && (!y2 && !z3)", etc. That means you can "solve" it in the sense that you can find out if there is a boolean assignment to each option that will turn this true. There is a linear algorithm for this by Aspall, Plass and Tarjan, with a slide presentation here.
But knowing whether the constrains can be solved or not is not what was asked. What was asked is the number of ways all options can be set while keeping the 2-SAT problem true.
Now, there are efficient algorithms for counting the number of of ways to satisfy a 2-SAT problem. For instance, this paper presents an algorithm that runs in 1.2561n. But even this won't help us, as we need to know what the solutions are to be able to compute the number of combinations that satisfy that solution.
According to Wikipedia, this paper has an algorithm which efficiently enumerate all solutions, which is what we want. But if counting is already exponential, so will be enumerating. Better than 2n, perhaps, but still exponential.
If we do enumerate all solutions to the 2-SAT problem, the number of combinations for each group is given by 1 multiplied by the number of free options, options that do not appear in any constraint, of each group for which no option was selected by the solution.
For example, taking the previous set of groups and constraints. The 2-SAT problem, including mutual exclusion, is:
(!x1 || !y2) && (!x3 || !y2) && (!y1 || !z1) && (!y2 || !z2) && (!y2 || !z3) &&
(!x1 || !x3) && (!y1 || !y2) && (!z1 || !z2) && (!z1 || !z3) && (!z2 || !z3)
The first line are the five rules. The second line is the mutual exclusion of all options in the same group that appear in the constraint rules.
The solutions to this 2-SAT problems are:
In the first two solutions, there are no groups without a selected option, so the number of combinations is 1. The third solution has no option selected for the group G3, so we multiply 1 by 0. In the lines that start with "false false", there are no option selected for group G1, and one free option: x2. So we multiply 1 by 1 for them, and by 0 if there are no option selected for G2 or G3 (in which case the number of free options is 0).
Now, there is the question of how do I enforce one option being chosen in each group and still claim to be 2-SAT. The problem, as stated, has two implicit constraints: for each group, there must be one, and only one, option selected. These two constrains can be written as:
x1 || x2 || x3 (for group x with options x1 .. x3)
(!x1 || !x2) && (!x1 || !x3) && (!x2 || !x3)
The later constrain being 2-SAT, the former being 3-SAT for any non-trivial case. As it happens, I do not enforce the first constraint, but the count then becomes 0. The counting algorithm should go like this:
So, for every group in which there is at least one constraint-less option, the selection is implicit and anonymous. For every group in which all options are part of some constraint, if no option was selected then the count for that group becomes 0, and, therefore, the number of combinations for that solution becomes 0 as well.
This feels like cheating the problem out of a valid > 2-SAT constraint. After all, if this was possible, then the 3-SAT problem could be solved just by enumerating the solutions to the 2-SAT part of it, and then discarding every one which does not satisfy the 3-SAT part of it. Alas, there is one fundamental difference I can identify:
Given this rather restrictive constraint on the clauses, we can solve this problem by enumerating solutions to the 2-SAT explicit constraints.
If anyone wants to pursue this further, go ahead. I'm satisfied with the improvement on the 2n solution I suggested.
编辑
该算法不正确。 我在另一篇文章中提出了一个替代答案,在最坏的情况下仍然是 2^N,但否则可能会给出更好的结果。
这在所选示例中有效,因为 y2 是 x1 的排除集的一部分,并且前两个约束基于 x1。 尽管如此,我现在看到了需要做什么。 它仍然接近 2^N,但有一些优化可以带来显着的收益。
为了修复这个算法,组合规则必须采用 set(ox) <-> 的形式。 设置(哦)。 为了组合它们,对于您组合的左侧 oX 的每个约束,您还可以使用您已经组合的每个规则对其进行其他组合,如果 oX 不是组合规则右侧的一部分,也不是组与组的左侧相同。
对于完全独立的约束,这是 2^N。 否则,您可以通过以下方式减少 N:
我认为修复此算法不值得。 它占用的内存相当大,而且它的顺序与我的替代答案非常相同,而我的替代答案要轻得多。
结束编辑
让我们扭转局面。 对于算法来说怎么样:
o1 <-> o2,
组(o1)< group(o2)
oX <-> o?
转换为单个规则oX <-> Set(o?)
让我们看看它的工作原理:
现在,也许这个算法是不正确的。 现在,我无法清晰地思考来证明它是正确的还是其他的——我已经太接近这个问题太久了。 但让我们检查一下这个例子:
如果我的算法是正确的,它似乎是多项式的。 再说一遍,现在我的思考还不够清晰,我需要考虑集合中操作的大O。
这是它的 Scala 实现:
EDIT
This algorithm is incorrect. I have presented an alternate answer in another post which is still 2^N in the worse case, but might give better results otherwise.
This one works in the example choosen because y2 is part of the exclusion set of x1, and the two first constrains are based on x1. Still, I now see what needs to be done. It's still close to 2^N, but there are optimizations which can lead to significant gains.
To fix this algorithm, the composed rules must be of the form set(ox) <-> set(oy). To compose them, for each constrain with left-hand oX you compose, you also make other compositions of it with each rule you already composed, if oX is not part of the right-hand side of the composed rule, nor it's group is the same as the left-hand side of the group.
For completely independent constrains, this is 2^N. Otherwise, you are reducing N by doing:
I don't think fixing this algorithm is worth it. It is rather memory-heavy, and it would have the very same order as my alternate answer, which is much lighter.
End Edit
Let's turn this around. How about this for an algorithm:
o1 <-> o2
,group(o1) < group(o2)
oX <-> o?
into a single ruleoX <-> Set(o?)
Let's see this at work:
Now, maybe this algorithm is incorrect. Right now, I can't think clearly enough to prove it correct or otherwise -- I have been too close to the problem for too long. But let's check against the example:
If my algorithm is correct, it seems to be polynomial. Again, right now I can't think clearly enough, and I'd need to consider the big-O of the manipulation in the sets.
Here is an Scala implementation of it:
这可能不是一个直接有用的答案,所以请随意忽略它......但是; 我自己目前没有使用类似的系统; 坦率地说,除了简单的例子,我不确定尝试计算有效组合的数量是否有用。 举个例子,我目前正在研究的模型有(例如)18000 个候选项目,分布在 80 多个选择中,有些是单选,有些是多选。 除了最小的模型之外,知道这个数字没有任何好处,因为您永远不会将其写为完整的真值表; 您几乎被迫按需运行规则(即删除不再有效的任何内容,根据需要自动选择任何内容,并检查是否没有违反规则)。 这不一定是问题; 我当前的代码在约 450 毫秒内处理此模型(作为 Web 服务),其中大部分实际上是处理输入/输出的 xml 所花费的时间。 如果输入/输出不是 xml,我认为大约是 150 毫秒,这很酷。
我很想说服我的雇主开源该引擎; 不过,那是另一天的战斗。
This might not be a directly useful answer, so feel free to ignore it... however; I'm currently working no a similar system myself; and frankly other than for trivial examples I'm not sure it is useful to try to calculate the number of valid combinations. As an example, the models I'm working on at the moment have (for example) 18000 candidate items spread over 80+ selections, some single-select / some multi-select. Beyond the smallest models, there is no benefit in knowing the number, as you would simply never write it as a full truth table; you are pretty-much forced to run the rules (i.e. remove anything that is no longer valid, auto-select anything as appropriate, and check that no rules are broken) on-demand. Which isn't necessarily a problem; my current code processes this model (as a web-service) in ~450ms, and most of that is actually the time spent processing xml for input/output. If the input/output wasn't xml, I think it would be ~150ms, which is cool.
I'd love to convince my employer to open-source the engine; that is a battle for another day, though.
难道不是 x^n,其中 n 是选项数,x 是每个选项的选项数吗?
Won't it just be x^n, where n is the number of options and x is the number of choices per option?