编译器和语言规范/语法是否相同?

发布于 2024-10-07 19:57:27 字数 172 浏览 0 评论 0原文

是否可以编写一个编译器,从中您无法对输入语言的语法和含义进行逆向工程。

即你总能从编译器获得语言的规范吗?

假设我想从 ?? 进行编译某些语言,但我不希望阅读编译器的人能够阅读 &理解 ??

我个人有一种感觉,编译器和语言规范是同构的,但我从学术的角度感兴趣这是否是错误的。

Can a compiler be written, from which you can not reverse engineer the grammar and meaning of the input language.

i.e. can you always get the specification of the language from the compiler?

Let's say I want to compile from ?? to some language but I do not want people who read the compiler to be able to read & understand ??

I personally have a feeling that compilers and language specifications are isomorphic but I'm interested from an academic point of view whether this is wrong.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

森林迷了鹿 2024-10-14 19:57:27

假设您说他们只能访问二进制文件:

简短的回答:如果一个人足够关心则不能。

长答案:如果一个人如此倾向于并且有大量空闲时间,那么总是有可能将编译器分解为字节级别并完全映射它。从那里,您可以找出逻辑树,并重建语言。

这会很痛苦,但这与“我能否制定一种算法来防止专门用户破解 CD 密钥验证”属于同一类别。

现在,如果您从未真正将编译器交给某人(想象某种代理系统?),那么可以合理地说,如果用户可以的话,他将不得不花费非常非常长的时间来暴力破解语言规范曾经产生过可以完全运用它的东西。

如果您暗示他们可以访问源代码:

不。您可以对其进行混淆,但编译器仍然必须构造相同的逻辑树,无论阅读起来多么困难。

可能有一些深奥的方法可以做到这一点..如果您以某种加密的二进制形式单独提供语言树...并且不提供编译器的源代码..并且您的用户不会感到无聊的 NSA 类型。

Assuming you're saying they only have access to the binary:

Short answer: Not if a person cares enough.

Long answer: It is always possible, if a person was so inclined and had lots of free time, to rip the compiler down to the byte level and map it completely. From there, you could figure out the logic trees, and reconstruct the language.

It would be painful, but this falls under the same category as "can i ever make an algorithm that prevents a dedicated user from cracking the cd-key verification".

Now, if you never actually gave the compiler to a person (imagine some sort of proxy system?) it might be reasonable to say that a user would have to take a very, very long time to brute force the language specifications, if he could ever generate something that could exercise it completely.

If you're implying that they have access to the source code:

No. You can obfuscate it, but the compiler still has to construct the same logical trees, no matter how difficult to read.

There might be some esoteric way to do this..if you provided the language tree separately in some sort of encrypted binary form...and didn't supply the compiler's source..and your users weren't bored NSA types.

忘你却要生生世世 2024-10-14 19:57:27

我认为编译器总是揭示它编译的语言的规范(我知道这是超级手动的)。

然而,可能没有算法可以这样做(即它是不可判定的),因为例如该算法需要找出编译器将停止哪些程序。

I think the compiler always reveals the specification of the language that it compiles (I'm aware that this is super hand-wavy).

However, there is probably no algorithm to do so (i.e. it is undecidable), because, for e.g. that algorithm would need to find out which programs the compiler will halt on.

吲‖鸣 2024-10-14 19:57:27

不,它们不一样。但是编译器不可避免地会理解输入语言的语法,并且(希望)非常精确地遵循语言规范。因此,了解编译器就意味着了解这些。

当然,可能会严重混淆编译器源代码,以至于没有人会费心去阅读它并提取语法和语言规则。当然,这也会伤害开发人员(祝你好运,维护这些垃圾!)。

另外,如果我想了解有关该语言的一些信息(不是关于它是如何实现的,而是关于它如何在更抽象的层面上定义的),那么阅读编译器的源代码将是我的最后选择 - 我会前往规范或其他一些权威源(官方文档等),因为即使编译器的代码非常容易理解,它也会更容易。

No, they are not the same. But a compiler inevitably understands the input language's grammar, and (hopefully) follows the language specification very precisely. Therefore, understanding the compiler means understanding those.

Of course it's possible to obfuscate the compiler source code so heavily that no one will bother to read it and extract the grammar and the language's rules. Of course that hurts developers too (good luck maintaining that crap!).

Also, reading the compiler's source would be my last option if I wanted to know something about the language (not about how it's implemented but how it's defined on an more abstract level) - I'd head to the spec or some other authorative source (official docs etc), as it would be way easier even if the compiler's code is very understandable.

女皇必胜 2024-10-14 19:57:27

我的直觉是,您可以通过检查编译器的输出来确定编译器的语义行为。但是如果没有文档或访问编译器源代码,您就无法获得实际的语法。如果您有源代码,这将变得微不足道,所以我假设您没有编译器的源代码,只需将其作为工具访问即可。

My gut feeling is that you could determine the semantic behavior of the compiler by inspecting it's output. But you couldn't obtain the actual syntax without documentation or access to the compiler source. if you have the source this becomes trivial, so I'm assuming you don't have the source of the compiler, just access to it as a tool.

许你一世情深 2024-10-14 19:57:27

一般来说,如果代码中有关于语义的信息(并且在任何解释器或编译器中总是定义了操作语义),那么总是可以提取该信息。唯一的问题是这种逆向工程的复杂性。因此,您需要一种混淆的语言和一个混淆的编译器。

例如,看看 Malbolge“反编译器”。

In general, if there is an information about semantics in the code (and there's always an operational semantics defined in any interpreter or compiler), then it's always possible to extract that information. The only question is the complexity of such a reverse engineering. So, you need an obfuscated language and an obfuscated compiler.

Take a look at the Malbolge "decompiler" for example.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文