针对特定指令集优化合取范式表达式的算法?
我正在使用 Espresso 逻辑最小化器 生成一组布尔方程的最小化形式。然而,我不是为可编程阵列逻辑生成逻辑(这是 Espresso 通常使用的用途),而是希望在标准微处理器上实现这些逻辑。问题是 Espresso 产生连接范式的输出,这对于 PAL 来说是完美的,但对于 x86 或 PPC 来说不是最佳的。
例如,Espresso 完全忽略 XOR - 在下面的 Espresso 输出中,子表达式 (!B0&!B1&B2&!B3) | (!B0&!B1&!B2&B3)
相当于 (!B0&!B1&(B2^B3))
。这种替换确实增加了表达式的门深度/关键路径,但考虑到我正在查看具有足够数量项的表达式,以完全饱和周围任何 CPU 的执行资源,因此牺牲一些门深度以减少指令总数。我还想扩展它以了解如何使用 ANDC 或 NOR 等指令,这些指令在我感兴趣的某些处理器上可用。
我正在查看的 CNF 表达式示例:
O0 = (B0&!B1&!B2&B3) | (!B0&B1&B2&B3) | (!B0&!B1&B2&B3) | (B1&!B3) | (!B0 &!B2&!B3);
O1 = (B0&B1&!B2&B3) | (B0&!B1&B2&!B3) | (B0&B1&B2&!B3) | (!B0&!B1&B2&!B3) | (!B0&!B1&!B2&B3) | (!B0&!B1&B2&B3) | (!B0&!B2&!B3);
O2 = (B0&!B1&!B2&B3) | (B0&!B1&B2&!B3) | (B0&B1&B2&B3) | (!B0&B1&!B3) | (!B0&!B2&B3) | (!B0&!B1&B2&B3);
O3 = (!B0&B1&!B2&!B3) | (B0&B1&B2&B3) | (!B0&B1&B2&B3) | (B0&B1&B2&!B3) | (B0&!B1&!B2) | (!B0&!B1&B2&!B3) | (!B0&!B1&!B2&B3);
因此,要使其成为一个实际问题;按偏好顺序:
您知道浓缩咖啡的选项或扩展可以产生我想要的表达方式吗?
您是否知道任何用于布尔逻辑最小化的工具可以理解(或可以学习)各种门类型,而不仅仅是为 PAL 生成 CNF?
您是否知道从上述 CNF 表达式转换为使用其他类型门的表达式的算法?
如果您不知道它的算法,您是否知道或能想到执行此操作的任何有用的启发式方法?
(而且,以防万一你要建议它 - 测试表明 GCC 和 ICC(或者,我敢打赌,任何其他现有的 C 编译器)都不够智能,无法从 CNF 表达式中为我执行处理器特定的最小化- 这真的非常好,但是检查它们两个的 -O3 -S 的输出表明它们甚至无法捕获可以使用 XOR 的情况)。
I'm using Espresso logic minimizer to produce a minimized form of a set of boolean equations. However rather than generating logic for a programmable array logic (which is what Espresso is normally used for), I am looking to implement these on a standard microprocessor. The trouble is that Espresso produces output in conjunctive normal form, which is perfect for a PAL but non-optimal for an x86 or PPC.
For instance, Espresso ignores XOR entirely - in the below Espresso output, the subexpression (!B0&!B1&B2&!B3) | (!B0&!B1&!B2&B3)
is equivalent to (!B0&!B1&(B2^B3))
. This substitution does increase the gate depth / critical path of the expression, but given that I'm looking at expressions with a sufficient number of terms to completely saturate the execution resources of any CPU around it seems reasonable to trade off some gate depth for reducing the overall # of instructions. I'd also like to extend it to understand how to use instructions like ANDC or NOR which are available on some processors of interest to me.
Example of the CNF expressions I'm looking at:
O0 = (B0&!B1&!B2&B3) | (!B0&B1&B2&B3) | (!B0&!B1&B2&B3) | (B1&!B3) | (!B0 &!B2&!B3);
O1 = (B0&B1&!B2&B3) | (B0&!B1&B2&!B3) | (B0&B1&B2&!B3) | (!B0&!B1&B2&!B3) | (!B0&!B1&!B2&B3) | (!B0&!B1&B2&B3) | (!B0&!B2&!B3);
O2 = (B0&!B1&!B2&B3) | (B0&!B1&B2&!B3) | (B0&B1&B2&B3) | (!B0&B1&!B3) | (!B0&!B2&B3) | (!B0&!B1&B2&B3);
O3 = (!B0&B1&!B2&!B3) | (B0&B1&B2&B3) | (!B0&B1&B2&B3) | (B0&B1&B2&!B3) | (B0&!B1&!B2) | (!B0&!B1&B2&!B3) | (!B0&!B1&!B2&B3);
So, to make this an actual question; in order of preference:
Do you know of an option or extension of Espresso that will produce the kind of expressions I want?
Do you know of any tool for boolean logic minimization that understands (or can be taught) various gate types, rather than only producing CNF for PALs?
Do you know of an algorithm for converting from CNF expressions like the ones above to expressions using additional types of gates?
If you don't know of an algorithm for it, do you know of, or can think of, any useful heuristics in doing this?
(And, just in case you were going to suggest it - testing shows that GCC and ICC (or, I would bet, any other C compiler in existence) aren't smart enough to do the processor specific minimization for me from the CNF expressions - that would be really really nice, but examining the output of -O3 -S for both of them shows they can't even catch the cases where XOR can be used).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
布尔公式最小化最著名的算法是 Quine-McCluskey 算法,它产生最小的 DNF 公式,但计算量很大(必然是这样,因为问题是在 PTIME 之外,参见 布尔公式最小化的复杂性,2007)。有一个有文字的Java实现;基本概念对于 Prolog 至关重要,因此如果您有任何使用 Prolog 的经验,这个想法应该很容易实现。
后记有一篇 IEEE 付费文章,扩展 Quine-McCluskey对于异或逻辑综合,摘要:
在查找此方法之前,我一直在考虑如何扩展该方法:您应该能够通过使用与它们相对应的替代版本的分辨率来综合 XOR 的应用程序。例如,对于 CNF 中的析取子句
F
,它不包含原子A
或B
,来自子句 <代码>F |一个 | ~B 和F | 〜A | B
,那么你可以将它们替换为F |异或(A,B)。
The most well-known algorithm for Boolean formula minimisation is the Quine-McCluskey algorithm, which yields the smallest DNF formula, but is computational expensive (necessarily, since the problem is outside PTIME, cf. The complexity of Boolean formula minimization, 2007). There's a literate Java implementation; the basic concept is crucial to Prolog, so if you have any experience with Prolog, the idea should come easily enough.
Postscript There's an IEEE-paywalled article, Extending Quine-McCluskey for exclusive-or logic synthesis, abstract:
I had been thinking of how to extend the method before I looked this up: you should be able to synthesise applications of XOR by using an alternate version of resolution that corresponds to them. For example, for a disjunctive clause
F
in a CNF, which does not contain either of the atomsA
, orB
, from the clausesF | A | ~B
andF | ~A | B
, then you can replace them byF | XOR(A,B)
.1) 您是否考虑过使用超级优化来为您选择指令序列?
例如,GNU 超级优化器生成极短的指令序列,这些指令序列将计算您提供给它的函数的等效值。在您的情况下,您将提供一个实现感兴趣的布尔方程的函数。
这些工具通常通过从最小的开始枚举可能的计算空间来操作,并确定单个计算是否符合您的计算目标。他们不一定能为非常复杂的指令提供解决方案(更不用说最佳解决方案),但如果适量的指令能够成功,他们通常会找到一个解决方案
(不寻常!)并且非常有效的序列。 XOR/NOR/ANDC 很容易属于 GNU 超级优化器的范围。
2)您可以尝试使用提供了正确的代数等价集的代数简化器。我们使用了 DMS Software Reengineering Toolkit,这是一个程序转换引擎,它接受任意重写规则并理解交换律和结合律,以实现涉及 XOR 和 NOR 等各种运算符的布尔简化器。您需要规则应用程序、爬山算法(用最少数量的操作员爬山)和回溯算法。通过深度优先迭代加深搜索,如果表达式不太复杂,您可以找到最佳解决方案;通过分支定界搜索,您可以快速找到解决方案,然后尝试最小化其大小。您甚至有一个相对较好的色调测量:到目前为止尚未包含在计算中的操作数。
方程简化器的最大问题是它不会考虑特定指令集的寄存器限制或可用机会。
3) 您可以对可用的布尔指令集实施您自己的搜索(迭代深化、分支定界),并包含约束。 (这在某种程度上回到了超级优化者所做的事情)。我这样做是为了计算在 x86 上实现乘以常数的最小指令序列,最多占用 3 个寄存器并利用 3 操作数指令,例如(加载有效地址)LEA X,[Y+K* Z] 位于寄存器 X、Y、Z 上,常数 K=1、2、4、8、ADD X、Y、SUB X、Y、MOV 和 NEG 指令。如果您用任何合理的语言将其编码为递归程序,您可以
几百行代码之一。 (它产生了一些真正松散的序列)。
1) Have you considered using Superoptimization to choose instruction sequences for you?
As an example, the GNU superoptimizer manufactures extremely short instruction sequences that will compute the equivalent of a function you provide to it. In your case, you'd provide a function implementing the boolean equation of interest.
These tools often operate by enumerating the space of possible computations starting with the smallest, and deciding if an individual computation matches your computation goal. They can't necessarily provide solutions (let alone optimal ones) for very complex instructions, but if a modest number of instructions will succeed they often find a
(unusual!) and very efficient sequence. XOR/NOR/ANDC would easily be in scope of the GNU superoptimizer.
2) You might try using an algebraic simplifier provided with the right set of algebraic equivalences. We've used our DMS Software Reengineering Toolkit, a program transformation engine that accepts arbitrary rewrite rules and understands commutative and associative laws, to implement boolean simplifiers that involve various operators such as XOR and NOR. You need the rule applier, and a hill-climbing algorithm (climbing the hill with the least number of operators) and a backtracking algorithm. With a depth-first iterative deepening search you can find an optimal solution if the expression isn't too complex; with a branch and bound search search, you can find a solution quickly and then attempt to minimize its size. You even have a relatively good hueristic measure: operands so far not included in the computation.
The biggest problem with an equational simplifier is that it won't take into account the register constraints or opportunities available with your specific instruction set.
3) You can implement your own search (iterative deepening, branch and bound) over the sets of boolean instructions available to you, and include the constraints. (This is getting somewhat back to what the superoperoptimizers do). I've done this to compute minimal sequences of instructions to implement multiply-by-constant on x86, accounting for up to 3 registers and taking advantage of 3-operand instructions such as (load effective address) LEA X,[Y+K*Z] on registers X, Y, Z with constant K=1,2,4,8, ADD X,Y, SUB X,Y, MOV and NEG instructions. If you code this as a recursive program in any reasonable language, you can
code one in a few hundred lines. (It produces some truly squirrely sequences).