编写 VM - 格式良好的字节码?

发布于 2024-09-01 00:45:05 字数 723 浏览 14 评论 0原文

我用 C 语言编写虚拟机只是为了好玩。蹩脚,我知道,但幸运的是我在 SO,所以希望没有人会取笑:)

我写了一个非常快速的虚拟机,它读取(我自己的)ASM 行并执行一些操作。现在,我只有 3 条指令:addjmpend。一切都很好,而且能够输入行实际上非常酷(执行类似 write_line(&prog[1], "jmp", regA, regB, 0); 然后运行程序:

while (machine.code_pointer <= BOUNDS && DONE != true)
{
    run_line(&prog[machine.cp]);
}

我在 C 中使用操作码查找表(可能效率不高,但很优雅),一切似乎都正常。

我的问题更多的是“最佳实践”问题,但我确实认为有一个正确的答案。我使虚拟机能够读取二进制文件(在 unsigned char[] 中存储字节)并执行字节码。我的问题是:虚拟机的工作是确保字节码是。格式良好还是只是编译器的工作是确保它吐出的二进制文件格式良好?

我只是问这个,因为如果有人编辑二进制文件并拧紧会发生什么?显然,该程序会出现错误,并且可能无法正常工作,这甚至是虚拟机的问题吗?我确信比我聪明得多的人已经找到了这些问题的解决方案。我只是好奇它们是什么!

I'm writing a virtual machine in C just for fun. Lame, I know, but luckily I'm on SO so hopefully no one will make fun :)

I wrote a really quick'n'dirty VM that reads lines of (my own) ASM and does stuff. Right now, I only have 3 instructions: add, jmp, end. All is well and it's actually pretty cool being able to feed lines (doing it something like write_line(&prog[1], "jmp", regA, regB, 0); and then running the program:

while (machine.code_pointer <= BOUNDS && DONE != true)
{
    run_line(&prog[machine.cp]);
}

I'm using an opcode lookup table (which may not be efficient but it's elegant) in C and everything seems to be working OK.

My question is more of a "best practices" question but I do think there's a correct answer to it. I'm making the VM able to read binary files (storing bytes in unsigned char[]) and execute bytecode. My question is: is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?

I only ask this because what would happen if someone would edit a binary file and screw stuff up (delete arbitrary parts of it, etc). Clearly, the program would be buggy and probably not functional. Is this even the VM's problem? I'm sure that people much smarter than me have figured out solutions to these problems, I'm just curious what they are!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

孤云独去闲 2024-09-08 00:45:05

确保字节码格式正确是虚拟机的工作还是编译器的工作确保其生成的二进制文件格式正确?

你必须做出决定。

最佳实践是让虚拟机在执行前进行一次检查,成本与程序的大小成正比,该程序足够复杂以保证在执行期间不会发生任何奇怪的情况。然后在字节码的实际执行期间,您无需检查即可运行。
然而,运行前检查的想法可能需要一些非常复杂的分析,甚至最注重性能的虚拟机也经常在运行时进行一些检查(例如:数组边界)。

对于一个业余爱好项目,我会让事情变得简单,并让虚拟机在每次执行指令时检查健全性。大多数指令的开销不会太大。

Is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?

You get to decide.

Best practice is to have the VM do a single check before execution, cost proportional to the size of the program, which is sophisticated enought to guarantee that nothing wonky can happen during execution. Then during actual execution of the bytecode, you run with no checks.
However, the check-before-running idea can require some very sophisticated analysis, and even the most performance-conscious VMs often have some checks at run time (example: array bounds).

For a hobby project, I'd keep things simple and have the VM check sanity every time you execute an instruction. The overhead for most instructions won't be too great.

沉默的熊 2024-09-08 00:45:05

Java 中也出现同样的问题,我记得,在这种情况下,虚拟机必须进行一些检查以确保字节码格式正确。在这种情况下,这实际上是一个严重的问题,因为潜在的安全问题:如果有人可以更改 Java 字节码文件以包含编译器永远不会输出的内容(例如从另一个文件访问 private 变量)类),它可能会暴露应用程序内存中保存的敏感数据,或者可能允许应用程序访问不应被允许的网站或其他内容。 Java 的虚拟机包含一个字节码验证器,以尽可能确保此类事情不会发生。

现在,就您而言,除非您的自制语言开始流行并变得流行,否则您不必太担心安全方面的问题;毕竟,除了你之外,还有谁会攻击你的程序呢?不过,我想说,确保您的虚拟机至少在字节码无效时具有合理的故障策略是个好主意。至少,如果它遇到不理解且无法处理的内容,它应该检测到并失败并显示错误消息,这将使您的调试更加容易。

The same issue arises in Java, and as I recall, in that case the VM does have to do some checks to make sure the bytecode is well formed. In that situation, it's actually a serious issue because of the potential for security problems: if someone can alter a Java bytecode file to contain something that the compiler would never output (such as accessing a private variable from another class), it could potentially expose sensitive data being held in the application's memory, or could allow the application to access a website that it shouldn't be allowed to, or something. Java's virtual machine includes a bytecode verifier to make sure, to the extent possible, that these sorts of things don't happen.

Now, in your case, unless your homemade language takes off and becomes popular, the security aspect is something you don't have to worry about so much; after all, who's going to be hacking your programs, other than you? Still, I would say it's a good idea to make sure that your VM at least has a reasonable failure strategy for when the bytecode is invalid. At a minimum, if it encounters something it doesn't understand and can't process, it should detect that and fail with an error message, which will make debugging easier on your part.

情释 2024-09-08 00:45:05

解释字节码的虚拟机通常有某种方法来验证其输入;例如,如果类文件处于不一致的状态,Java 将抛出一个VerifyError

但是,听起来您正在实现一个处理器,并且由于它们往往是较低级别的,因此您可以设法将事物置于可检测的状态的方法较少无效状态——给它一个未定义的操作码是一种明显的方法。真正的处理器会发出信号表明进程试图执行非法指令,操作系统将处理它(例如,Linux 使用 SIGILL 杀死它)

Virtual machines that interpret bytecode generally have some way of validating their input; for example, Java will throw a VerifyError if the class file is in an inconsistent state

However, it sounds like you're implementing a processor, and since they tend to be lower-level there's less ways you can manage to get things in a detectable invalid state -- giving it an undefined opcode is one obvious way. Real processors will signal that the process attempted to execute an illegal instruction, and the OS will deal with it (Linux kills it with SIGILL, for example)

林空鹿饮溪 2024-09-08 00:45:05

如果您担心有人编辑了二进制文件,那么您的问题只有一个答案:虚拟机必须进行检查。这是您有机会检测到篡改的唯一方法。编译器只是创建二进制文件。它无法检测下游篡改。

If you're concerned about someone having edited the binary file, then there is only one answer to your question: the VM must do the check. It's the only way you have a chance to detect the tampering. The compiler just creates the binary. It has no way of detecting downstream tampering.

谜兔 2024-09-08 00:45:05

让编译器尽可能多地进行健全性检查是有意义的(因为它只需要做一次),但总会存在静态分析无法检测到的问题,例如[咳嗽]堆栈溢出、数组范围误差等。

It makes sense to have the compiler do as much sanity checking as possible (since it only has to do it once), but there's always going to be issues that can't be detected by static analysis, like [cough] stack overflow, array range errors, and the like.

深者入戏 2024-09-08 00:45:05

我想说,只要虚拟机实现本身不崩溃,虚拟机让模拟处理器着火是合法的。作为 VM 实施者,您可以设置规则。但如果你希望虚拟硬件公司虚拟地购买你的虚拟芯片,你就必须做一些更宽容的事情:好的选择可能是引发异常(更难实现)或重置处理器(更容易)。或者,也许您只是将每个操作码定义为有效,但有些操作码是“未记录的”——它们会执行一些未指定的操作,而不是导致您的实现崩溃。理由:如果(!)您的虚拟机实现是同时运行多个来宾实例,那么如果一个来宾能够导致其他来宾失败,那将是非常糟糕的。

I'd say it's legitimate for your VM to let the emulated processor catch fire, as long as the VM implementation itself doesn't crash. As the VM implementor, you get to set the rules. But if you want virtual hardware companies to virtually buy your virtual chip, you'll have to do something a little more forgiving of errors: good options might be to raise an exception (harder to implement) or reset the processor (much easier). Or maybe you just define every opcode to be valid, except that some are "undocumented" - they do something unspecified, other than crashing your implementation. Rationale: if (!) your VM implementation is to run several instances of the guest simultaneously, it would be very bad if one guest were able to cause others to fail.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文