Z3可以用来推理子串吗？

发布于 2024-12-01 00:43:36 字数 1697 浏览 5 评论 0原文

我正在尝试使用 Z3 来推理子字符串，并且遇到了一些不直观的行为。当询问“xy”是否出现在“xy”中时，Z3 返回“sat”，但当询问“x”是否在“x”中或“x”是否在“xy”中时，它返回“未知”。

我注释了以下代码来说明此行为：

(set-logic AUFLIA)
(declare-sort Char 0)

;characters to build strings are _x_ and _y_
(declare-fun _x_ () Char)
(declare-fun _y_ () Char)
(assert (distinct _x_ _y_))

;string literals
(declare-fun findMeX () (Array Int Char))  
(declare-fun findMeXY () (Array Int Char))  
(declare-fun x () (Array Int Char))
(declare-fun xy () (Array Int Char))
(declare-fun length ( (Array Int Char) ) Int )

;set findMeX = 'x'
(assert (= (select findMeX 0) _x_))
(assert (= (length findMeX) 1))

;set findMeXY = 'xy'
(assert (= (select findMeXY 0) _x_))
(assert (= (select findMeXY 1) _y_))
(assert (= (length findMeXY) 2))

;set x = 'x'
(assert (= (select x 0) _x_))
(assert (= (length x) 1))

;set xy = 'xy'
(assert (= (select xy 0) _x_))
(assert (= (select xy 1) _y_))
(assert (= (length xy) 2))

现在问题已解决，我们尝试查找子字符串：

;search for findMeX='x' in x='x' 

(push 1)
(assert 
  (exists 
    ((offset Int)) 
    (and 
      (<= offset (- (length x) (length findMeX))) 
      (>= offset 0) 
      (forall 
        ((index Int)) 
        (=> 
          (and 
            (< index (length findMeX)) 
            (>= index 0)) 
          (= 
            (select x (+ index offset)) 
            (select findMeX index)))))))

(check-sat) ;'sat' expected, 'unknown' returned
(pop 1)

如果我们改为在 xy 中搜索 findMeXY，正如预期的那样，求解器返回“sat”。看起来既然求解器可以处理“xy”的查询，它应该能够处理“x”的查询。此外，如果在 'xy='xy' 中搜索 findMeX='x'，则会返回“unknown”。

任何人都可以建议一个解释，或者可能是在 SMT 求解器中表达此问题的替代模型吗？

原文

I am trying to use Z3 to reason about substrings, and have come across some non-intuitive behavior. Z3 returns 'sat' when asked to determine if 'xy' appears within 'xy', but it returns 'unknown' when asked if 'x' is in 'x', or 'x' is in 'xy'.

I've commented the following code to illustrate this behavior:

(set-logic AUFLIA)
(declare-sort Char 0)

;characters to build strings are _x_ and _y_
(declare-fun _x_ () Char)
(declare-fun _y_ () Char)
(assert (distinct _x_ _y_))

;string literals
(declare-fun findMeX () (Array Int Char))  
(declare-fun findMeXY () (Array Int Char))  
(declare-fun x () (Array Int Char))
(declare-fun xy () (Array Int Char))
(declare-fun length ( (Array Int Char) ) Int )

;set findMeX = 'x'
(assert (= (select findMeX 0) _x_))
(assert (= (length findMeX) 1))

;set findMeXY = 'xy'
(assert (= (select findMeXY 0) _x_))
(assert (= (select findMeXY 1) _y_))
(assert (= (length findMeXY) 2))

;set x = 'x'
(assert (= (select x 0) _x_))
(assert (= (length x) 1))

;set xy = 'xy'
(assert (= (select xy 0) _x_))
(assert (= (select xy 1) _y_))
(assert (= (length xy) 2))

Now that the problem is set up, we try to find the substrings:

;search for findMeX='x' in x='x' 

(push 1)
(assert 
  (exists 
    ((offset Int)) 
    (and 
      (<= offset (- (length x) (length findMeX))) 
      (>= offset 0) 
      (forall 
        ((index Int)) 
        (=> 
          (and 
            (< index (length findMeX)) 
            (>= index 0)) 
          (= 
            (select x (+ index offset)) 
            (select findMeX index)))))))

(check-sat) ;'sat' expected, 'unknown' returned
(pop 1)

If we instead search for findMeXY in xy, the solver returns 'sat', as expected. It would seem that since the solver can handle this query for 'xy', it should be able to handle it for 'x'. Further, if searching for findMeX='x' in 'xy='xy', it returns 'unknown'.

Can anyone suggest an explanation, or perhaps an alternate model for expressing this problem within a SMT solver?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一直在等你来 2024-12-08 00:43:36

观察到的行为的简短答案是：Z3 返回“未知”，因为您的断言包含量词。

Z3 包含许多处理量词的过程和启发式方法。
Z3 使用一种称为基于模型的量词实例化 (MBQI) 的技术来构建模型来满足像您这样的查询。
第一步是该过程包括基于满足无量词断言的解释创建候选解释。
不幸的是，在当前的Z3中，这一步与阵列理论的交互并不顺利。
基本问题是该模块无法修改由数组理论构建的解释。

一个合理的问题是：为什么当我们删除推送/弹出命令时它会起作用？
它之所以有效，是因为当不使用增量求解命令（例如push和pop命令）时，Z3使用更积极的简化（预处理）步骤。

我看到您的问题有两种可能的解决方法。

您可以避免使用量词，并继续使用数组理论。这在您的示例中是可行的，因为您知道所有“字符串”的长度。因此，您可以扩展量词。
尽管这种方法可能看起来很尴尬，但它在实践中以及许多验证和测试工具中都有使用。
你可以避免数组理论。您将 string 声明为未解释的排序，就像您对 Char 所做的那样。然后，声明一个函数 char-of ，该函数应该返回字符串的第 i 个字符。
您可以将此操作公理化。例如，如果两个字符串具有相同的长度并包含相同的字符，您可能会说它们相等：


(assert (forall ((s1 String) (s2 String))
                (=> (and 
                     (= (length s1) (length s2))
                     (forall ((i Int))
                             (=> (and (<= 0 i) (< i (length s1)))
                                 (= (char-of s1 i) (char-of s2 i)))))
                    (= s1 s2))))

以下链接包含使用第二种方法编码的脚本：
http://rise4fun.com/Z3/yD3

第二种方法更具吸引力，并且可以让您证明有关字符串的更复杂的属性。
然而，很容易写出令人满意的量化公式，而Z3将无法建立模型。
Z3 指南描述了 MBQI 模块的主要功能和限制。
它包含可能由 Z3 处理的可判定片段。
顺便说一句，请注意，如果你有量词，放弃数组理论不会是一个大问题。该指南展示了如何使用量词和函数对数组进行编码。

您可以在以下论文中找到有关 MBQI 工作原理的更多信息：

The short answer for the observed behavior is: Z3 returns ‘unknown’ because your assertions contain quantifiers.

Z3 contains many procedures and heuristics for handling quantifiers.
Z3 uses a technique called Model-Based Quantifier Instantiation (MBQI) for building models for satisfying queries like yours.
The first step is this procedure consists of creating a candidate interpretation based on an interpretation that satisfies the quantifier free assertions.
Unfortunately, in the current Z3, this step does not interact smoothly with the array theory.
The basic problem is that the interpretation built by the array theory cannot be modified by this module.

A fair question is: why does it work when we remove the push/pop commands?
It works because Z3 uses more aggressive simplification (preprocessing) steps when incremental solving commands (such as push and pop commands) are not used.

I see two possible workarounds for your problem.

You can avoid quantifiers, and keep using array theory. This is feasible in your example, since you know the length of all “strings”. Thus, you can expand the quantifier.
Although, this approach may seem awkward, it is used in practice and in many verification and testing tools.
You can avoid array theory. You declare string as an uninterpreted sort, like you did for Char. Then, you declare a function char-of that is supposed to return the i-th character of a string.
You can axiomatize this operation. For example, you may say that two strings are equal if they have the same length and contain the same characters:


(assert (forall ((s1 String) (s2 String))
                (=> (and 
                     (= (length s1) (length s2))
                     (forall ((i Int))
                             (=> (and (<= 0 i) (< i (length s1)))
                                 (= (char-of s1 i) (char-of s2 i)))))
                    (= s1 s2))))

The following link contains your script encoded using the second approach:
http://rise4fun.com/Z3/yD3

The second approach is more attractive, and will allow you to prove more complicated properties about strings.
However, it is very easy to write satisfiable quantified formulas that Z3 will not be able to build a model.
The Z3 Guide describes the main capabilities and limitations of the MBQI module.
It contains may decidable fragments that can be handled by Z3.
BTW, note that dropping array theory will not be a big problem if you have quantifiers. The guide shows how to encode arrays using quantifiers and functions.

You can find more information about how MBQI works in the following papers:

回复收藏 0 原文

~没有更多了~