定理掠夺者(Z3,Vampire,带有TPTP语法)中的归纳证明

发布于 2025-01-22 13:24:19 字数 1648 浏览 6 评论 0 原文

我正在使用TPTP语法测试某些定理抛弃(例如Z3,Alt-Ergo,Vampire等)的感应能力。令我惊讶的是,他们都没有能够证明以下简单的猜想:

tff(t1, type, (fun: $int > $int )).

tff(ax1, axiom, ( 
    ! [A: $int] : (
        $less(A, 1) => (fun(A) = 123)
    )
)).

tff(ax2, axiom, ( 
    ! [A: $int] : (
        $greatereq(A, 1) => (fun(A) = fun($difference(A, 1))) 
    )
)).

tff(conj1, conjecture, ! [A: $int] : ($greatereq(A, 1) => (fun(A) = 123))).

% END OF SYSTEM OUTPUT
% RESULT: SOT_EWCr1V - Z3---4.8.9.0 says Timeout - CPU = 60.09 WC = 35.47 
% OUTPUT: SOT_EWCr1V - Z3---4.8.9.0 says None - CPU = 0 WC = 0 

可以通过归纳来轻松证明这种猜想,但是对于我已经对此进行了测试的绝大多数定理掠夺似乎并非如此。显然,如果我将域仅限制为一个元素而不是整个整数集,则ATP成功,因为它只需要对有限的数字进行检查:

tff(t1, type, (fun: $int > $int )).

tff(ax1, axiom, ( 
    ! [A: $int] : (
        $less(A, 1) => (fun(A) = 123)
    )
)).

tff(ax2, axiom, ( 
    ! [A: $int] : (
        $greatereq(A, 1) => (fun(A) = fun($difference(A, 1))) 
    )
)).

tff(conj1, conjecture, ! [A: $int] : ((A = 5) => (fun(A) = 123))).

% END OF SYSTEM OUTPUT
% RESULT: SOT_j9liHr - Z3---4.8.9.0 says Theorem - CPU = 0.00 WC = 0.08 
% OUTPUT: SOT_j9liHr - Z3---4.8.9.0 says Proof - CPU = 0.00 WC = 0.09 

这是否是自动定理抛弃的一般限制?是否有任何工具在诱导方面表现良好?

ps:您可以使用以下在线工具进行尝试: https:// tptp.org/cgi-bin/systemontptp

ps2:可以在这里找到TPTP语法手册: https://www.tptp.org/tptp/tptp/tr/tptptr.shtml

I am testing the inductive capabilities of some theorem provers (such as Z3, Alt-Ergo, Vampire etc.) using the TPTP syntax. To my surprise, none of them managed to demonstrate the following simple conjecture:

tff(t1, type, (fun: $int > $int )).

tff(ax1, axiom, ( 
    ! [A: $int] : (
        $less(A, 1) => (fun(A) = 123)
    )
)).

tff(ax2, axiom, ( 
    ! [A: $int] : (
        $greatereq(A, 1) => (fun(A) = fun($difference(A, 1))) 
    )
)).

tff(conj1, conjecture, ! [A: $int] : ($greatereq(A, 1) => (fun(A) = 123))).

% END OF SYSTEM OUTPUT
% RESULT: SOT_EWCr1V - Z3---4.8.9.0 says Timeout - CPU = 60.09 WC = 35.47 
% OUTPUT: SOT_EWCr1V - Z3---4.8.9.0 says None - CPU = 0 WC = 0 

This conjecture can be easily proven by induction, however it does not seem to be the case for the vast majority of theorem provers I have tested this on. Obviously, if I restrict the domain to only one element instead of the whole set of integers, the ATP succeeds because it only needs to check against a limited set of numbers:

tff(t1, type, (fun: $int > $int )).

tff(ax1, axiom, ( 
    ! [A: $int] : (
        $less(A, 1) => (fun(A) = 123)
    )
)).

tff(ax2, axiom, ( 
    ! [A: $int] : (
        $greatereq(A, 1) => (fun(A) = fun($difference(A, 1))) 
    )
)).

tff(conj1, conjecture, ! [A: $int] : ((A = 5) => (fun(A) = 123))).

% END OF SYSTEM OUTPUT
% RESULT: SOT_j9liHr - Z3---4.8.9.0 says Theorem - CPU = 0.00 WC = 0.08 
% OUTPUT: SOT_j9liHr - Z3---4.8.9.0 says Proof - CPU = 0.00 WC = 0.09 

Is this a general limitation of automatic theorem provers? Is there any tool that performs well with induction?

PS: You can try it out using the following online tool: https://tptp.org/cgi-bin/SystemOnTPTP

PS2: The TPTP syntax manual can be found here: https://www.tptp.org/TPTP/TR/TPTPTR.shtml

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

凌乱心跳 2025-01-29 13:24:19

正如上面的评论所指出的那样,吸血鬼 dos 支持诱导。但是,就像其他定理掠夺者一样,要使吸血鬼做您想做的事情有时有些棘手。在这种情况下,要使它使用归纳,您必须使用

--mode portfolio --schedule induction

这些选项设置的选项运行,吸血鬼几乎没有时间(我的机器上的0.04秒)找到上述

证明运行抛弃时的选项,因此,如果您想尝试以上内容,则必须从或从源构建。

As pointed out in a comment above, Vampire does support induction. However, as is common with other theorem provers, to get Vampire to do what you want is sometimes a little tricky. In this case, to get it to use induction, you have to run with options

--mode portfolio --schedule induction

With those options set, Vampire finds a proof of the above in next to no time (0.04 seconds on my machine)

The TPTP website does not let you set specific options when running provers, so if you want to try out the above you will have to grab a Vampire release from here or build from source.

素食主义者 2025-01-29 13:24:19

这是预期的。 SMT求解器不会在开箱即用。您可以通过证明基本案例,然后提出诱导 - 假设来“哄骗”他们进行归纳,并使用量词证明它们。但这通常是傻瓜的差事。 SMT求解器根本不是做归纳证明的正确选择。在此问题上,关于Stackoverflow的一些相关讨论:

和许多其他序列。

话虽如此,新的在Smtlib中定义了fun-rec 构造允许递归定义,其证明自然是通过归纳来完成的。因此,我希望社区将朝着这个方向发展。随着时间的推移添加归纳功能。例如,请参见:

有关如何在SMT求解器中正确执行此操作的论文。据我所知,CVC5已结合了其中一些想法,但是在这一点上期望开箱即用的电源证明是天真的。 (请参阅 https://github.com/cvc5/cvc5/cvc5/cvc5/sissues/1796 )))

因此,长话短说:不,SMT求解器不会归纳。您可以哄骗他们去做,最近有一些工作增加了进一步的功能,但是不太可能的按钮体验不可能。如果您的目标是推理有关递归定义的原因,因此您依靠归纳,最好的选择是使用诸如Isabelle,Coq,Hol,Holight,holight,acll2,Lean等的半自动定理供您使用。有强大的设施需要归纳。此外,它们将SMT求解器作为“战术”结合在一起,因此从这个意义上讲,您可以获得两全其美。 (即,使用手动策略将您的证明剖析到足够简单的子目标,处理归纳等,然后将其余的将其运送到SMT-Solver。)

This is expected. SMT solvers don't do induction out-of-the-box. You can "coax" them to do induction by proving the base case, then posing the induction-hypothesis and have them prove it using quantifiers; but that's usually a fool's errand. SMT solvers are simply not the right choice for doing inductive proofs. Here're some relevant prior discussions on stackoverflow regarding this matter:

And many others.

Having said that, the new define-fun-rec construct in SMTLib allows for recursive definitions, whose proofs are naturally done by induction. So, I expect the community will be going towards that direction; adding inductive capabilities over time. For instance, see:

for a paper on how to do this properly in an SMT solver. As far as I know CVC5 has incorporated some of these ideas, but expecting out-of-the-box inductive proofs would be naive at this point. (See https://github.com/cvc5/cvc5/issues/1796)

So, long story short: No, SMT solvers don't do induction. You can coax them to do it, and there's recent work that add further capabilities, but a push-button experience is unlikely. If your goal is to reason about recursive definitions and hence you rely on induction, your best bet is to use a semi-automated theorem prover such as Isabelle, Coq, HOL, HOL-Light, ACLL2, Lean, etc.. all of which have strong facilities to do induction. Furthermore they incorporate SMT solvers as "tactics" so you get the best of both worlds in that sense. (i.e., use a manual strategy to dissect your proof to simple enough subgoals, handle induction etc, and ship the rest off to an SMT-solver.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文