“查找给定二进制文件中的所有代码相当于停止问题。”真的吗?

发布于 2024-10-22 00:29:29 字数 283 浏览 3 评论 0原文

刚刚阅读有关模拟器的高度投票问题和声明

事实证明,找到所有 给定二进制文件中的代码是等效的 停止问题。

真的很让我印象深刻。

这肯定不是真的吗?这不就是一个很大的依赖图吗?

如果您能进一步了解此声明,我将不胜感激。

Was just reading the highly voted question regarding Emulators and the statement

It's been proven that finding all the
code in a given binary is equivalent
to the Halting problem.

Really stuck out at me.

Surely that can't be true? Isn't it just a large dependency graph?

Would be really grateful for some further insight into this statement.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

一念一轮回 2024-10-29 00:29:29

我不同意拉斯曼的观点。

暂停问题表示不存在可以接受任何程序并决定该程序是否执行halt指令的程序P。我引用一下维基百科吧:

艾伦·图灵在 1936 年证明,解决所有可能的程序输入对的停止问题的通用算法是不存在的。我们说停机问题在图灵机上是不可判定的。

另一方面,我们并不是试图制作这样的程序/算法,而是试图找到这个/这些特定程序中的所有代码。如果我们对程序进行逆向工程,发现它立即调用 exit()(非常乐观的示例情况),我们已经证明它调用 halt,虽然这是不可能的?!

如果我们试图构建一个可以运行任何程序的模拟器,那么我们会失败,那么您可以(轻松)将其减少到停止问题。但通常你正在为像 Game Boy 这样的东西构建一个模拟器,它支持有限数量的游戏卡带(程序),因此这是可能的。

I disagree with larsman.

The halting problem says that no program P exists that can take any program and decide whether that program executes the halt instruction. Let me quote wikipedia:

Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist. We say that the halting problem is undecidable over Turing machines.

On the other hand we're not trying to make such program/algorithm, but we're trying to find all the code in this/these specific program(s). If we reverse-engineer the program and see that it immediately calls exit() (very optimistic example situation) we have proven that it will call halt, while it was impossible?!

If we we're trying to build an emulator that can run any program we would fail since then you can (easily) reduce that to the Halting problem. But usually you are building an emulator for something like a Game Boy which supports a finite amount of game cartridges (programs) and thus it is possible.

黑凤梨 2024-10-29 00:29:29

我相信这意味着“查找所有执行过的代码”,即查找覆盖范围,可能与动态生成的代码相结合。这确实可以简化为停机问题。

假设您有一个完美的覆盖工具,可以找到程序中可能执行过的每一段代码(因此其余的都是死代码)。给定一个程序 P,该工具还能够决定扩展程序 (P ;halt) 是否执行过 halt 指令,或者halt 部分是否是死代码。因此,它将解决停机问题,我们知道这是不可判定的。

I believe what is meant is "finding all code that is ever executed", i.e. finding coverage, possibly in combination with dynamically generated code. That can indeed be reduced to the halting problem.

Say that you have a perfect coverage tool that will find every piece of code in a program that may ever be executed (so the rest is dead code). Given a program P, this tool would also be able to decide whether the extended program (P ; halt) ever executes the halt instruction, or whether the halt part is dead code. So, it would solve the halting problem, which we know is undecidable.

层林尽染 2024-10-29 00:29:29

有限状态机的停止问题是可以解决的(尽管它可能需要很多生命周期......宇宙),并且您可能模拟的任何机器都是有限状态机。只需运行程序,步骤数受可能状态数的限制;如果超出该数字而不停止,则程序将永远不会停止,因为它必须处于循环中。

实际上,查找所有代码是一个容易得多的问题,除非代码可以使用计算的 goto。您无需运行代码,只需在每个可能的分支点处精确执行一次所有分支即可。

The halting problem for finite state machines is solvable (although it might take many lifetimes.....of the universe), and any machine you might be emulating is a finite state machine. Just run the program, and the number of steps is bounded by the number of possible states; if that number is exceeded without halting then the program will never halt, since it must be in a loop.

Practically speaking, finding all code is a much easier problem unless the code can use computed gotos. Rather than running the code, you simply take all branches exactly once at each possible branch point.

西瑶 2024-10-29 00:29:29

我同意拉斯曼的观点,并且我相信这个论点可以变得精确。首先,我必须同意,在这种情况下,“查找给定二进制文件中的所有代码”意味着拥有一个可计算函数来确定输入二进制文件中的哪些字节对应于所执行的指令。

编辑:如果有人不清楚为什么这里会出现问题,请考虑混淆的机器代码。此类代码的反汇编是一个不小的问题。如果在多字节指令的中间开始反汇编,则会得到错误的反汇编结果。这不仅破坏了相关指令,而且通常还破坏了除此之外的一些指令。等调查一下。 (例如,谷歌“混淆和反汇编”)。

使这一精确化的策略草图:

首先,定义一个理论计算机/机器模型,其中程序以多位二进制指令进行编码,很像“真实”计算机上的机器代码,但变得精确(并且使得它是图灵性的)完全的)。假设二进制对程序和所有输入进行编码。这一切都必须精确,但假设该语言具有(二进制编码的)暂停指令,并且当且仅当程序执行“暂停”指令时,程序才会暂停。

首先,我们假设机器无法更改程序代码,但有一个可以工作的内存。 (在有趣的情况下假设无限)。

那么任何给定的位要么位于程序执行期间运行的指令的开头,要么不位于该指令的开头。 (例如,它可能位于指令中间,或者在数据或“垃圾代码”中——即永远不会运行的代码。请注意,我并没有声称它们是互斥的,例如,跳转指令可以有一个位于特定指令中间的目标,从而创建另一个“在第一次通过时”看起来不像执行指令的指令。)。

将 ins(i, j) 定义为如果二进制 i 具有从位位置 j 开始且在由 i 编码的程序的运行中执行的指令,则返回 1 的函数。否则定义 ins(i,j) 为 0。假设 ins(i,j) 是可计算的。

令halt_candidate(i) 为以下程序:

for j = 1 to length(i)
  if(i(j..j+len('halt')-1) == 'halt')
    if(ins(i,j) == 1)
      return 1
return 0

由于我们不允许上面的自修改代码,因此程序停止的唯一方法是在某个位置j 执行一条“停止”指令。根据设计,halt_candidate(i) 计算暂停函数。

这提供了证明的一个方向的非常粗略的草图。即,如果我们假设我们可以测试程序 i 是否在位置 j 处具有针对所有 j 的指令,那么我们可以编写暂停函数。

对于另一个方向,必须在形式机内编码形式机的模拟器。然后,创建一个程序加上输入 i 和 j 来模拟程序 i,并且当执行位位置 j 处的指令时,它返回 1 并停止。当执行任何其他指令时,它会继续运行,并且如果模拟曾经模拟 i 中的“停止”功能,则模拟器会跳转到无限循环。那么ins(i,j)等价于halt(emulator(i,j)),因此暂停问题意味着查找代码问题。

当然,我们必须假设有一台理论上的计算机,才能将其等同于著名的无法解决的停机问题。否则,对于“真正的”计算机来说,停机问题是可以解决的,但却是棘手的。

对于允许自修改代码的系统,参数更复杂,但我预计不会有太大不同。

编辑:我相信自修改情况的证明是在上面的静态代码加数据系统中实现系统加自修改代码的模拟器。然后,使用允许程序 i 和数据 x 的自修改代码来停止,与在上面的静态代码加数据系统中使用包含模拟器、i 和 x 作为代码加数据的二进制文件停止相同。

I agree with Larsman, and I believe the argument can be made precise. First, I have to agree that "finding all the code in a given binary" means, in this context, having a single computable function that determines which bytes within an input binary correspond to instructions that are executed.

EDIT: If anyone is not clear on why there is a problem here, think about obfuscated machine code. Disassembly of such code is a non-trivial problem. If you begin disassembly in the middle of a multi-byte instruction, you get the wrong disassembly. This breaks not only the instruction in question, but typically a few instructions beyond that. etc. look into it. (for example, google "obfuscation and disassembly").

Sketch of strategy to make this precise:

First, define a theoretical computer / machine model in which a program is encoded in multi-bit binary instructions, much like machine code on "real" computers, but made precise (and such that it is turing complete). Assume that the binary encodes the program and all input. This must all be made precise, but assume that the language has a (binary encoded) halt instruction, and that a program halts if and only if it executes a 'halt' instruction.

First, let us assume that the machine is not able to alter the program code, but has a memory in which to work. (assumed infinite in interesting cases).

Then any given bit will either be at the start of an instruction that is run during execution of the program, or it will not. (e.g. it may be in the middle of an instruction, or in data or "junk code" -- that is, code that will never run. Note that I have not claimed these are mutually exclusive, as, for example, a jump instruction can have a target that is in the middle of a particular instruction, thereby creating another instruction that, "on first pass" did not look like an executed instruction.).

Define ins(i, j) to be the function that returns 1 if the binary i has an instruction beginning at bit-position j, that executes in a run of the program encoded by i. Define ins(i,j) to be 0 otherwise. suppose ins(i,j) is computable.

Let halt_candidate(i) be the following program:

for j = 1 to length(i)
  if(i(j..j+len('halt')-1) == 'halt')
    if(ins(i,j) == 1)
      return 1
return 0

Since we disallow self-modifying code above, they only way for a program to halt is for there to be a 'halt' instruction at some position j that gets executed. By design, halt_candidate(i) computes the halting function.

This provides a very rough sketch of one direction of the proof. i.e. if we assume we can test whether program i has an instruction at location j for all j, then we can code the halting function.

For the other direction, one must encode an emulator of the formal machine, within the formal machine. Then, create a program plus inputs i and j which emulates program i and when an instruction at bit position j is executed, it returns 1 and halts. When any other instruction is executed it continues to run, and if the simulation ever simulates a 'halt' function in i, the emulator jumps to an infinite loop. Then ins(i,j) is equivalent to halt(emulator(i,j)), and so the halting problem implies the find code problem.

Of course one must assume a theoretical computer for this to be equivalent to the famously unsolvable halting problem. Otherwise, for a "real" computer, the halting problem is solvable but intractable.

For a system that allows self-modifying code the argument is more complex, but I expect not that different.

EDIT: I believe a proof in the self-modifying case would be to implement an emulator of the system-plus-self-modifying-code in the static code plus data system above. Then halting with self-modifying code allowed for program i with data x, is the same as halting in the static code plus data system above with the binary containing the emulator, i, and x as code plus data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文