如何手动创建可执行的 .exe PE 文件?
所有关于如何创建编译器的文本都在解释词法分析器和解析器之后停止。他们没有解释如何创建机器代码。我想了解端到端的流程。
目前我的理解是,Windows exe 文件格式称为可移植可执行文件。我阅读了它的标题,但尚未找到可以轻松解释这一点的资源。
我的下一个问题是,我没有看到任何解释机器代码如何存储在文件中的资源。是不是就像 .text
部分中依次存储的 32 位定长指令?
是否有任何地方至少解释了如何创建一个不执行任何操作的 exe 文件(它有一个 No Op 指令)。我的下一步是链接到 dll 文件以打印到控制台。
All texts on how to create a compiler stop after explaining lexers and parsers. They don't explain how to create the machine code. I want to understand the end-to-end process.
Currently what I understand is that, the Windows exe file formats are called Portable Executable. I read about the headers it has and am yet to find a resource which explains this easily.
My next issue is, I don't see any resource which explains how machine code is stored in the file. Is it like 32-bit fixed length instructions stored one after another in the .text
section?
Is there any place which at least explains how to create an exe file which does nothing (it has a No Op instruction). My next step then would be linking to dll files to print to console.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
好问题!我对这个具体问题没有太多专业知识,但这就是我的开始方式:
PE 或 ELF 不会创建纯机器代码。它还包含一些标头信息等。阅读更多内容:将自定义数据写入 Windows 和 Linux 中的可执行文件
我假设您正在寻找 ELF/PE 文件如何保存机器代码,您可以从这个问题中得到它(使用 objdump):如何仅提取 ELF 部分的内容
现在,如果你想知道内容部分是如何生成的,即机器代码是如何生成的,那么这就是 编译器的代码生成。
尝试一些资源编辑器,例如 ResourceEditor 来了解 exe 或简单地 ildasm。
PS:这些大多是 Unix 解决方案,但我确信,PE 应该做一些基本类似的事情。
我认为解决这个问题的最佳方法是首先尝试分析现有的 PE/ELF 是如何工作的,基本上是逆向工程。要做到这一点,Unix 机器将是一个很好的起点。然后发挥你的魔力:)
不一样但类似的问题这里。
更新:
我从示例 c 代码中生成了对象转储。现在,我想这就是你的目标,对吧?您需要知道您是否生成此文件(a.out)?
https://gist.github.com/1329947
看看这张图片,一生的时间交流代码。
来源
现在,需要明确的是,您正在寻求实现最后一步,即将目标代码转换为可执行代码?
Nice question! I don't have much expertise on this specific question, but this is how I would start:
PE or ELF does not create pure machine code. It also contains some header info etc. Read more: Writing custom data to executable files in Windows and Linux
I assume you are looking for how does ELF/PE file hold the machine code, you can get that from this question (using objdump): How do you extract only contents of an ELF section
Now, if you want to know how the content part is generated in the first place, i.e. how is the machine code generated, then that's the task of the compiler's code generation.
Try out some resource editor like ResourceEditor to understand the exe or simply ildasm.
PS: These are mostly Unix solutions, but I am sure, PE should be doing something fundamentally similar.
I think the best way to approach it will be first try to analyze how existing PE/ELFs work, basically reverse engineering. And to do that, Unix machine will be a good point to start. And then do your magic :)
Not same but a similar question here.
Update:
I generated an object dump out of a sample c code. Now, I assume that's what you are targeting right? You need to know do you generate this file (a.out)?
https://gist.github.com/1329947
Take a look at this image, a life time of a c code.
Source
Now, just to be clear, you are looking to implement the final step, i.e. conversion of object code to executable code?
正如他的许多文章一样,我会说Matt Pietrek 关于 PE 内部结构的文章 仍然是对该问题的最佳介绍。
As in many of his articles, I'd say Matt Pietrek's piece about PE internals remains the best introdction to the matter more than a decade after being written.
我多年来一直使用“Wotsit 的文件格式”...一直追溯到 MS-Dos 时代 :-) 并回到它只是可以从大多数 BBS 系统下载的文本文件集合的时候,称为“The游戏程序员文件类型百科全书”
现在由运行 Gamedev.Net 的人拥有,并且可能是互联网上保守得最好的秘密之一。
您可以在此页面上找到 EXE 格式:http://www.wotsit.org /list.asp?fc=5
尽情享受吧。
2020 年 6 月更新 - 上面的链接现在似乎已失效,我在 wotsit 网站的网络存档页面上找到了列出的“EXE”页面:https://web.archive.org/web/20121019145432/http://www.wotsit.org/list.asp?al=E
更新 2 - 我保持编辑原样当我添加更新时,感谢那些想要编辑它的人,但我拒绝它是有充分理由的:
1) Wotsit.org 可能会在某个时候将来重新上线时,如果您实际尝试访问该网址,您会发现它并没有消失,它仍然有响应,只是响应了一条错误消息。这告诉我有人出于某种原因使域名保持活动状态。
2) 存档链接确实似乎有点不稳定,有些有效,有些无效,有时它们似乎有效,然后刷新后无效,然后又有效。我记得根据经验,当 wotsit 仍然在线时,他们有一些非常奇怪的下载/链接检测代码,这可能导致 archive.org 得到一些非常奇怪的结果,我确实记得他们采取了这种立场,因为大量的第三方网站试图通过冒充附属网站,然后从广告泛滥的网站直接链接到 wotsit 来获利。
直到 wotsit 域完全从互联网上删除,甚至 DNS 也没有响应,然后才是将所有内容打包到单个存档链接中的时候,在那之前,这是维护链接的最佳方式。
Iv'e used "Wotsit's File Format" for years... all the way back to the days of MS-Dos :-) and back to when it was just a collection of text files you could download from most BBS systems called "The Game programmers file type encyclopaedia"
It's now owned by the people that run Gamedev.Net, and probably one of the best kept secrets on the internet.
You'll find the EXE format on this page : http://www.wotsit.org/list.asp?fc=5
Enjoy.
UPDATE June 2020 - The link above seems to be now dead, I've found the "EXE" page listed on this web archive page of the wotsit site: https://web.archive.org/web/20121019145432/http://www.wotsit.org/list.asp?al=E
UPDATE 2 - I'm keeping the edit as it was when I added the update erlier, thanks to those who wanted to edit it, but it's for a good reason I'm rejecting it:
1) Wotsit.org may at some point in the future come back online, if you actually try visiting the url, you'll find that it's not gone, it does still respond, it just responds with an error message. This tells me that someone is keeping the domain alive for whatever reason.
2) The archive links do seem to be a bit jittery, some work, some don't, sometimes they seem to work, then after a refresh they don't work, then they do work again. I remember from experience when wotsit was still online, they they had some very strange download/linking detection code in, and this probably caused archive.org to get some very wierd results, I do remember them taking this stance because of the huge number of 3rd party sites trying to cash in on their success, by pretending to be affiliate's and then direct linking to wotsit from an ad infested site.
Until the wotsit domain is removed entirely from the internet and not even the DNS responds, then would be the time to wrap everything up into single archive links, until then, this is the best way to maintain the link.
毫不奇怪,有关编写 PE 格式文件的信息的最佳网站都是有关创建病毒的。
在 VX Heavens 中搜索“PE”会给出一大堆修改 PE 文件的教程
Not surprisingly the best sites for information about writing PE format files are all about creating viruses.
A search of VX Heavens for "PE" gives a whole bunch of tutorials for modifying PE files
有关使 PE 文件尽可能小的一些信息: 微小的PE。
如果您只是想尝试一些简单的事情,那么搞乱代码生成的简约方法就是输出 MS-DOS .COM 文件,没有标头或元数据。遗憾的是,您只能使用 16 位代码。这种格式在 演示中仍然有些流行。
至于指令格式,据我记得x86指令集是可变长度的,包括1字节指令。 RISC CPU 可能有固定长度的指令。
Some information about making PE files as small as possible: Tiny PE.
The minimalistic way to mess around with code generation, if you're just looking to try a few simple things out, is to output MS-DOS .COM files, which have no header or metadata. Sadly, you'd be restricted to 16-bit code. This format is still somewhat popular for demos.
As for the instruction format, from what I recall the x86 instruction set is variable-length, including 1-byte instructions. RISC CPUs would probably have fixed-length instructions.
对于 Linux,可以阅读并运行以下示例
Jonathan Bartlett 的《从头开始编程》:
http://www.cs.princeton.edu/courses/archive/spr08/cos217/reading/ProgrammingGroundUp-1-0-lettersize.pdf
那么当然,人们可能更喜欢破解 Windows程序。但或许前者
提供了更好的方式来了解到底发生了什么。
For Linux, one may read and run the examples from
"Programming from the Ground Up" by Jonathan Bartlett:
http://www.cs.princeton.edu/courses/archive/spr08/cos217/reading/ProgrammingGroundUp-1-0-lettersize.pdf
Then of course one may prefer to hack Windows programs. But perhaps the former
gives a better way to understand what really goes on.
可执行文件格式取决于操作系统。对于 Windows,它是 PE32(32 位)或 PE32+(64 位)。
最终可执行文件的外观取决于操作系统的 ABI(应用程序二进制接口)。 ABI 告诉操作系统加载程序应如何加载 exe 以及如何重新定位它,无论它是 dll 还是普通可执行文件等。
每个目标文件(可执行文件或 dll 或驱动程序)都包含一个称为节的部分。这是我们所有代码、数据、跳转表等所在的位置。
现在,要创建目标文件(编译器所做的事情),您不仅应该创建可执行机器代码,还应该创建标头、符号表、重定位记录、导入/导出表等。
纯机器代码生成部分是完全取决于您希望代码的优化程度。但要在 PC 中实际运行代码,您必须创建一个包含所有标头和相关数据的文件(检查 MSDN 以获得精确的 PE32+ 格式),然后将所有可执行机器代码(编译器生成的)放入一个文件中各节的组成(通常代码驻留在名为 .text 的节中)。如果您已经创建了符合PE32+格式的文件,那么您现在已经在Windows中成功创建了可执行文件。
Executable file format is dependent on the OS. For windows it is PE32(32 bit) or PE32+(64 bit).
The way the final executable look like depends on the ABI (application binary interface) of the OS. The ABI tells how the OS loader should load the exe and how it should relocate it, whether it is dll or plain executable etc..
Every object file(executable or dll or driver) contains a part called sections. This is where all of our code, data, jump tables etc.. are situated.
Now, to create an object file, which is what a compiler does, you should not just create the executable machine code, but also the headers, symbol table, relocation records, import/export tables etc..
The pure machine code generation part is completely dependent on how much optimized you want your code to be. But to actually run the code in the PC, you must have to create a file with all of the headers and related data(check MSDN for precise PE32+ format) and then put all of the executable machine code(which your compiler generated) into one of the sections(usually code resides in section called .text). If you have created the file conforming to the PE32+ format, then you have now successfully created an executable in windows.