将 CNF 格式转换为 DIMACS 格式

发布于 2025-01-13 01:41:29 字数 381 浏览 0 评论 0原文

我和我的实验室伙伴正在编写代码，为我们的一门课程使用 Python 制作我们自己的 SAT 求解器。到目前为止，我们已经编写了将 SoP 转换为 CNF 的代码。现在我们陷入如何将 CNF 转换为 DIMACS 格式的问题。我们了解 DIMACS 格式在手动完成时的工作原理，但我们仍坚持编写从 CNF 到 DIMACS 的实际代码。到目前为止我们发现的所有输入文件都已经是 DIMACS 格式。

    from sympy.logic.boolalg import to_cnf
    from sympy.abc import A, B, C, D
    f = to_cnf(~(A | B) | D)
    g = to_cnf(~A&B&C | ~D&A)

原文

My lab partner and I are working on writing code to make our own SAT solver using Python for one of our courses. So far we have written this code to convert SoP to CNF. Now we are stuck as to how to convert the CNF to DIMACS format. We understand how the DIMACS format works when completing it by hand but we are stuck on writing the actually code to go from CNF to DIMACS. Everything we have found so far inputs files that are already in the DIMACS format.

    from sympy.logic.boolalg import to_cnf
    from sympy.abc import A, B, C, D
    f = to_cnf(~(A | B) | D)
    g = to_cnf(~A&B&C | ~D&A)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゃ懵逼小萝莉 2025-01-20 01:41:29

sympy boolalg 模块可让您为表达式构建抽象语法树 (AST)。在 CNF 形式中，您将拥有一个顶级 And 节点，其中包含一个或多个子节点；每个子节点都是一个带有一个或多个文字的 Or 节点；文字可以是符号的 Not 节点，也可以直接是符号。

从 DIMACS 端来看，顶级 And 是隐式的。您只需列出 Or 节点，如果某个符号位于 Not 中，则可以在该符号变量之前用 - 对其进行标记。您本质上只是为符号指定新名称并以新的文本形式写下来。（事实上，DIMACS 变量名称看起来像整数只是因为它很方便；它们没有整数语义/算术/等。）

要跟踪 DIMACS 变量和 sympy 符号之间的映射，这样的东西很有帮助：

class DimacsMapping:
    def __init__(self):
        self._symbol_to_variable = {}
        self._variable_to_symbol = {}
        self._total_variables = 0

    @property
    def total_variables(self):
        return self._total_variables

    def new_variable(self):
        self._total_variables += 1
        return self._total_variables

    def get_variable_for(self, symbol):
        result = self._symbol_to_variable.get(symbol)
        if result is None:
            result = self.new_variable()
            self._symbol_to_variable[symbol] = result
            self._variable_to_symbol[result] = symbol

        return result

    def get_symbol_for(self, variable):
        return self._variable_to_symbol[variable]

    def __str__(self) -> str:
        return str(self._variable_to_symbol)

您可以随时请求使用 new_variable 创建新的（新鲜的、从未使用过的）变量。 DIMACS 变量从 1 开始，而不是 0。（0 值用于指示非变量，主要用于标记子句结束。）

我们不想每次都分配新变量，但也要记住哪些变量被分配给一个符号。这维护了从符号到变量的映射，反之亦然。您将 sympy 符号传递给 get_variable_for ，然后返回该符号之前使用的变量，或者分配并返回一个新变量，并记录映射。

它跟踪反向映射，因此您可以在给定 get_symbol_for 中的变量的情况下恢复原始符号；这对于将 SAT 作业转回 sympy 作业非常有用。

接下来，我们需要一些东西来存储这个映射以及子句列表。您需要两者来发出有效的DIMACS，因为标题行包含变量计数（映射知道）和子句计数（子句列表知道）。这基本上是一个美化的元组，带有一个 __str__ ，它可以转换为格式良好的 DIMACS 文本：

class DimacsFormula:
    def __init__(self, mapping, clauses):
        self._mapping = mapping
        self._clauses = clauses

    @property
    def mapping(self):
        return self._mapping

    @property
    def clauses(self):
        return self._clauses

    def __str__(self):
        header = f"p cnf {self._mapping.total_variables} {len(self._clauses)}"
        body = "\n".join(
            " ".join([str(literal) for literal in clause] + ["0"])
            for clause in self._clauses
        )

        return "\n".join([header, body])

最后，我们只需遍历 sympy AST 来创建 DIMACS 子句：

from sympy.core.symbol import Symbol
from sympy.logic.boolalg import to_cnf, And, Or, Not

def to_dimacs_formula(sympy_cnf):
    dimacs_mapping = DimacsMapping()
    dimacs_clauses = []

    assert type(sympy_cnf) == And
    for sympy_clause in sympy_cnf.args:
        assert type(sympy_clause) == Or

        dimacs_clause = []
        for sympy_literal in sympy_clause.args:
            if type(sympy_literal) == Not:
                sympy_symbol, polarity = sympy_literal.args[0], -1
            elif type(sympy_literal) == Symbol:
                sympy_symbol, polarity = sympy_literal, 1
            else:
                raise AssertionError("invalid cnf")

            dimacs_variable = dimacs_mapping.get_variable_for(sympy_symbol)
            dimacs_literal = dimacs_variable * polarity
            dimacs_clause.append(dimacs_literal)

        dimacs_clauses.append(dimacs_clause)

    return DimacsFormula(dimacs_mapping, dimacs_clauses)

这只是沿树下降，直到它获取根符号以及它是否被否定（即处于表示负极性的 Not 状态）。一旦符号被映射到它的变量，我们就可以将其保留为正或取反以保持极性并将其附加到 DIMACS 子句中。

对所有 Or 节点执行此操作，我们就有了映射的 DIMACS 公式。

f = to_cnf(~(A | B) | D)
print(f)
print()

f_dimacs = to_dimacs_formula(f)
print(f_dimacs)
print()
print(f_dimacs.mapping)

(D | ~A) & (D | ~B)

p cnf 3 2
1 -2 0
1 -3 0

{1: D, 2: A, 3: B}

顺便说一句，您可能不想使用to_cnf来获取CNF来测试可满足性。一般来说，将布尔公式转换为等效的 CNF 可能会导致大小呈指数增长。

请注意，在上面的 f 示例中，变量 D 在公式中仅出现一次，但在 CNF 中出现了两次。如果它更复杂，比如 (C | D)，那么整个子公式都会被复制：

f = to_cnf(~(A | B) | (C | D))
print(f)

(C | D | ~A) & (C | D | ~B)

如果它更复杂，你可以看到如何最终得到副本的副本的副本。。等等。为了测试可满足性，我们不需要等价公式，而只需要一个可等满足的公式。

这是一个可能不等价的公式，但当且仅当原始公式是等价的时才是可满足的。它可以有新的子句和不同的变量。这种松弛反而给出了线性大小的平移。

为此，我们将分配一个代表该子公式真值的变量并使用它，而不是允许复制子公式。这称为 Tseitin 变换，我在这个答案。

举一个简单的例子，假设我们想使用变量 x 来表示(a ∧ b)。我们可以将其写为 x ≡ (a ∧ b)，这可以通过三个 CNF 子句来完成：(Øx ∨ a) ∧ (Øx ∨ b) ∧ (Øa ∨ Øb ∨ x)。现在，当且仅当 (a ∧ b) 为真时，x 才为真。

该顶级函数启动该过程，以便递归调用共享相同的映射和子句集。最终结果是代表整个公式的真值的单个变量。我们必须强制这是真的（否则 SAT 求解器将简单地选择公式的任何输入变量，遵循含义，并生成任何输出的评估公式）。

def to_dimacs_tseitin(sympy_formula):
    dimacs_mapping = DimacsMapping()
    dimacs_clauses = []

    # Convert the formula, with this final variable representing the outcome
    # of the entire formula. Since we are stating this formula should evaluate
    # to true, this variable is appended as a unit clause stating such.
    formula_literal = _to_dimacs_tseitin_literal(
        sympy_formula, dimacs_mapping, dimacs_clauses
    )
    dimacs_clauses.append([formula_literal])

    return DimacsFormula(dimacs_mapping, dimacs_clauses)

大部分翻译是添加特定于正在执行的操作的子句的代码。递归发生在我们需要一个代表子公式参数的输出的变量时。

def _to_dimacs_tseitin_literal(sympy_formula, dimacs_mapping, dimacs_clauses):
    # Base case, the formula is just a symbol.
    if type(sympy_formula) == Symbol:
        return dimacs_mapping.get_variable_for(sympy_formula)

    # Otherwise, it is some operation on one or more subformulas. First we
    # need to get a literal representing the outcome of each of those.
    args_literals = [
        _to_dimacs_tseitin_literal(arg, dimacs_mapping, dimacs_clauses)
        for arg in sympy_formula.args
    ]

    # As a special case, we won't bother wasting a new variable for `Not`.
    if type(sympy_formula) == Not:
        return -args_literals[0]

    # General case requires a new variable and new clauses.
    result = dimacs_mapping.new_variable()

    if type(sympy_formula) == And:
        for arg_literal in args_literals:
            dimacs_clauses.append([-result, arg_literal])
        dimacs_clauses.append(
            [result] + [-arg_literal for arg_literal in args_literals]
        )
    elif type(sympy_formula) == Or:
        for arg_literal in args_literals:
            dimacs_clauses.append([result, -arg_literal])
        dimacs_clauses.append(
            [-result] + [arg_literal for arg_literal in args_literals]
        )
    else:
        # TODO: Handle all the other sympy operation types.
        raise NotImplementedError()

    return result

现在，布尔公式不需要在 CNF 中即可成为 DIMACS：

f = ~(A | B) | (C | D)
print(f)
print()

f_dimacs = to_dimacs_tseitin(f)
print(f_dimacs)
print()
print(f_dimacs.mapping)

C | D | ~(A | B)

p cnf 6 8
5 -3 0
5 -4 0
-5 3 4 0
6 -1 0
6 -2 0
6 5 0
-6 1 2 -5 0
6 0

{1: C, 2: D, 3: A, 4: B}

即使在单位传播之后，在这个小示例中，所得公式也明显更大。然而，当转换“真实”公式时，线性缩放有很大帮助。此外，现代 SAT 求解器了解公式将以这种方式进行转换，并执行针对其量身定制的处理中操作。

The sympy boolalg module lets you build an abstract syntax tree (AST) for the expression. In CNF form you'll have a top-level And node, with one or more children; each child is an Or node with one or more literals; a literal is either a Not node of a symbol, or just a symbol directly.

From the DIMACS side, the top-level And is implicit. You are just listing the Or nodes, and if a symbol was in a Not you mark that with a - before the symbol's variable. You are essentially merely assigning new names for the symbols and writing it down in a new text form. (The fact that DIMACS variable names look like integers is just because it's convenient; they do not have integer semantics/arithmetic/etc.)

To track mapping between DIMACS variables and sympy symbols, something like this is helpful:

class DimacsMapping:
    def __init__(self):
        self._symbol_to_variable = {}
        self._variable_to_symbol = {}
        self._total_variables = 0

    @property
    def total_variables(self):
        return self._total_variables

    def new_variable(self):
        self._total_variables += 1
        return self._total_variables

    def get_variable_for(self, symbol):
        result = self._symbol_to_variable.get(symbol)
        if result is None:
            result = self.new_variable()
            self._symbol_to_variable[symbol] = result
            self._variable_to_symbol[result] = symbol

        return result

    def get_symbol_for(self, variable):
        return self._variable_to_symbol[variable]

    def __str__(self) -> str:
        return str(self._variable_to_symbol)

You can always ask for a new (fresh, never been used) variable with new_variable. DIMACS variables start from 1, not 0. (The 0 value is used to indicate not-a-variable, primarily for marking the end-of-clause.)

We don't want to just allocate new variables every time, but also remember which variables were assigned to a symbol. This maintains a mapping from symbol to variable and vice versa. You hand a sympy symbol to get_variable_for and either the previously used variable for that symbol is returned, or a new variable is allocated and returned, with the mapping noted.

It tracks the reverse mapping, so you can recover the original symbol given a variable in get_symbol_for; this is useful for turning a SAT assignment back into sympy assignments.

Next, we need something to store this mapping along with the clause list. You need both to emit valid DIMACS, since the header line contains both the variable count (which the mapping knows) and the clause count (which the clause list knows). This is basically a glorified tuple, with a __str__ that does the conversion to well-formed DIMACS text:

class DimacsFormula:
    def __init__(self, mapping, clauses):
        self._mapping = mapping
        self._clauses = clauses

    @property
    def mapping(self):
        return self._mapping

    @property
    def clauses(self):
        return self._clauses

    def __str__(self):
        header = f"p cnf {self._mapping.total_variables} {len(self._clauses)}"
        body = "\n".join(
            " ".join([str(literal) for literal in clause] + ["0"])
            for clause in self._clauses
        )

        return "\n".join([header, body])

Last, we just walk over the sympy AST to create DIMACS clauses:

from sympy.core.symbol import Symbol
from sympy.logic.boolalg import to_cnf, And, Or, Not

def to_dimacs_formula(sympy_cnf):
    dimacs_mapping = DimacsMapping()
    dimacs_clauses = []

    assert type(sympy_cnf) == And
    for sympy_clause in sympy_cnf.args:
        assert type(sympy_clause) == Or

        dimacs_clause = []
        for sympy_literal in sympy_clause.args:
            if type(sympy_literal) == Not:
                sympy_symbol, polarity = sympy_literal.args[0], -1
            elif type(sympy_literal) == Symbol:
                sympy_symbol, polarity = sympy_literal, 1
            else:
                raise AssertionError("invalid cnf")

            dimacs_variable = dimacs_mapping.get_variable_for(sympy_symbol)
            dimacs_literal = dimacs_variable * polarity
            dimacs_clause.append(dimacs_literal)

        dimacs_clauses.append(dimacs_clause)

    return DimacsFormula(dimacs_mapping, dimacs_clauses)

This just descends down the tree, until it gets the root symbol and whether or not it was negated (i.e., was in a Not indicating negative polarity). Once the symbol is mapped to its variable, we can leave it positive or negate it to maintain polarity and append it to the DIMACS clause.

Do this for all Or nodes and we have a mapped DIMACS formula.

f = to_cnf(~(A | B) | D)
print(f)
print()

f_dimacs = to_dimacs_formula(f)
print(f_dimacs)
print()
print(f_dimacs.mapping)

(D | ~A) & (D | ~B)

p cnf 3 2
1 -2 0
1 -3 0

{1: D, 2: A, 3: B}

As an aside, you probably don't want to use to_cnf to get a CNF for purposes of testing satisfiability. In general, converting a boolean formula to an equivalent CNF can result in exponential size increase.

Note in the above example for f, variable D only appeared once in the formula yet appeared twice in the CNF. If it had been more complicated, like (C | D), then that entire subformula gets copied:

f = to_cnf(~(A | B) | (C | D))
print(f)

(C | D | ~A) & (C | D | ~B)

If it was even more complicated, you can see how you end up with copies of copies of copies... and so on. For the purposes of testing satisfiability, we do not need an equivalent formula but merely an equisatisfiable one.

This is a formula that may not be equivalent, but is satisfiable if and only if the original was. It can have new clauses and different variables. This relaxation gives a linear sized translation instead.

To do this, rather than allow subformulas to be copied we will allocate a variable that represents the truth value of that subformula and use that instead. This is called a Tseitin transformation, and I go into more detail in this answer.

As a simple example, let's say we want to use a variable x to represent (a ∧ b). We would write this as x ≡ (a ∧ b), which can be done with three CNF clauses: (¬x ∨ a) ∧ (¬x ∨ b) ∧ (¬a ∨ ¬b ∨ x). Now x is true if and only if (a ∧ b) is.

This top-level function kicks off the process, so that the recursive calls share the same mapping and clause set. The final outcome is a single variable representing the truth value of the entire formula. We must force this to be true (else a SAT solver will simply choose any input variables to the formula, follow the implications, and produce an evaluated formula of any output).

def to_dimacs_tseitin(sympy_formula):
    dimacs_mapping = DimacsMapping()
    dimacs_clauses = []

    # Convert the formula, with this final variable representing the outcome
    # of the entire formula. Since we are stating this formula should evaluate
    # to true, this variable is appended as a unit clause stating such.
    formula_literal = _to_dimacs_tseitin_literal(
        sympy_formula, dimacs_mapping, dimacs_clauses
    )
    dimacs_clauses.append([formula_literal])

    return DimacsFormula(dimacs_mapping, dimacs_clauses)

The bulk of the translation is the code that adds clauses specific to the operation being performed. The recursion happens at the point where we demand a single variable that represents the output of the subformula arguments.

def _to_dimacs_tseitin_literal(sympy_formula, dimacs_mapping, dimacs_clauses):
    # Base case, the formula is just a symbol.
    if type(sympy_formula) == Symbol:
        return dimacs_mapping.get_variable_for(sympy_formula)

    # Otherwise, it is some operation on one or more subformulas. First we
    # need to get a literal representing the outcome of each of those.
    args_literals = [
        _to_dimacs_tseitin_literal(arg, dimacs_mapping, dimacs_clauses)
        for arg in sympy_formula.args
    ]

    # As a special case, we won't bother wasting a new variable for `Not`.
    if type(sympy_formula) == Not:
        return -args_literals[0]

    # General case requires a new variable and new clauses.
    result = dimacs_mapping.new_variable()

    if type(sympy_formula) == And:
        for arg_literal in args_literals:
            dimacs_clauses.append([-result, arg_literal])
        dimacs_clauses.append(
            [result] + [-arg_literal for arg_literal in args_literals]
        )
    elif type(sympy_formula) == Or:
        for arg_literal in args_literals:
            dimacs_clauses.append([result, -arg_literal])
        dimacs_clauses.append(
            [-result] + [arg_literal for arg_literal in args_literals]
        )
    else:
        # TODO: Handle all the other sympy operation types.
        raise NotImplementedError()

    return result

Now boolean formulas do not need to be in CNF to become DIMACS:

f = ~(A | B) | (C | D)
print(f)
print()

f_dimacs = to_dimacs_tseitin(f)
print(f_dimacs)
print()
print(f_dimacs.mapping)

C | D | ~(A | B)

p cnf 6 8
5 -3 0
5 -4 0
-5 3 4 0
6 -1 0
6 -2 0
6 5 0
-6 1 2 -5 0
6 0

{1: C, 2: D, 3: A, 4: B}

Even after unit propagation the resulting formula is clearly larger in this small example. However, the linear scaling helps substantially when 'real' formulas are converted. Furthermore, modern SAT solvers understand that formulas will be translated this way and perform in-processing tailored toward it.

回复收藏 0 原文

~没有更多了~