如何在重建 C# 应用程序时始终生成逐字节相同的 .exe?
我将首先向您提供一些背景知识,解释为什么我会问这个问题:
我目前在一个严格监管的行业工作,因此我们的代码受到官方测试机构的仔细检查。这些测试机构希望能够构建代码并生成每次都完全相同的 .exe 或 .dll(显然无需更改任何代码!)。他们检查所创建的可执行文件的 MD5 和 SHA1 以确保这一点。
到目前为止,我主要使用 C++ 进行编码,(在进行了一些项目设置调整之后)我设法使项目一致地重建为相同的 MD5/SHA1。我现在在一个项目中使用 C#,并且在重建后很难使 MD5 匹配。我知道文件的 PE 标头中有“时间戳”,并且它们已被清除为 0。我还知道 .exe 有一个 GUID,它再次被清除为 00 00 00 ...等等。但是文件仍然不匹配。
我正在使用 CFF Explorer 查看和编辑 PE 标头以删除时间和日期戳。使用二进制比较工具后,.exe 中只有 2 个字节块不同(都非常小)。
其中一个不一致的块恰好出现在一些二进制代码之前,该二进制代码以 ASCII 形式详细说明了 *Project*\obj\Release\xxx.pdb
文件的路径。
编辑:现在已知这是 *.pdb 文件的 GUID,但是我仍然不知道是否可以修改它而不导致任何错误!?
另一个块出现在看起来像是函数名称的中间,即。 (典型部分)AssemblyName.GetName.Version.get_Version.System.IO.Ports.SerialPort.Parity.Byte.
然后是不同的代码块:
4A134ACE-D6A0- 461B-A47C-3A4232D90816
后面跟着:
“}.ValueType.__StaticArrayInitTypeSize=7.$$method0x60000ab-1.RuntimeFieldHandle.InitializeArray`...等等。
任何想法或建议将非常受欢迎!
I'll give you a little bit of background first as to why I'm asking this question:
I am currently working in a stricly-regulated industry and as such our code is quite carefully looked-over by official test houses. These test houses expect to be able to build the code and generate an .exe or .dll which is EXACTLY the same each and every time (without changing any code obviously!). They check the MD5 and the SHA1 of the executables that they create to ensure this.
Up until this point I have predominantly been coding in C++, where (after a few project setting tweaks) I managed to get the projects to rebuild consistantly to the same MD5/SHA1. I am now using C# in a project and am having great difficulty getting the MD5's to match after a rebuild. I am aware that there are "Time-Stamps" in the PE header of the file, and they have been cleared to 0. I am also aware that there is a GUID for the .exe, which again has been cleared to 00 00 00... etc. However the files still don't match.
I'm using CFF Explorer to view and edit the PE Header to remove the time and date stamps. After using a binary comparison tool there are only 2 blocks of bytes in the .exe's that are different (both very small).
One of the inconsistant blocks appears just before some binary code, which in ASCII details the path of the *Project*\obj\Release\xxx.pdb
file.
EDIT: This is now known to be the GUID of the *.pdb file, however I still don't know if I can modify it without causing any errors!?
The other block appears in the middle of what looks to be function names, ie. (a typical section) AssemblyName.GetName.Version.get_Version.System.IO.Ports.SerialPort.Parity.Byte.<PrivateImplementationDetails>{
then the different code block:
4A134ACE-D6A0-461B-A47C-3A4232D90816
followed by:
"}.ValueType.__StaticArrayInitTypeSize=7.$$method0x60000ab-1.RuntimeFieldHandle.InitializeArray`... etc..
Any ideas or suggestions would be most welcome!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
更新:Roslyn 似乎有一个用于可重现构建的
/feature:deterministic
编译器标志,尽管 还没有 100% 正常工作。您应该能够通过禁用 PDB 生成来摆脱调试 GUID。如果没有,将 GUID 设置为零就可以了 - 只有调试器会查看该部分(您将无法再调试程序集,但它仍然应该运行良好)。
PrivateImplementationDetails 有点困难 - 这些是编译器为某些语言构造(数组初始值设定项、使用字符串的 switch 语句等)生成的内部帮助程序类。因为它们仅在内部使用,所以类名并不重要,因此您只需为它们分配一个流水号即可。
为此,我将遍历 #Strings 元数据流,并将“{GUID}”形式的所有字符串替换为“{运行编号,填充为与 GUID 相同的长度}”。
#Strings 元数据流只是元数据使用的字符串列表,以 UTF-8 编码并用 \0 分隔;因此,一旦您知道 #Strings 流在可执行文件中的位置,查找和替换名称应该很容易。
不幸的是,包含此信息的“元数据流标头”完全隐藏在文件格式中。您必须从 NT 可选标头开始,找到指向 CLI 运行时标头的指针,使用 PE 节表将其解析为文件位置(它是一个 RVA,但您需要文件内的位置),然后转到元数据根并读取流标头。
Update: Roslyn seems to have a
/feature:deterministic
compiler flag for reproducible builds, although it's not 100% working yet.You should be able to get rid of the debug GUID by disabling PDB generation. If not, setting the GUID to zeroes is fine - only debuggers look at that section (you won't be able to debug the assembly anymore, but it should still run fine).
The PrivateImplementationDetails are a bit more difficult - these are internal helper classes generated by the compiler for certain language constructs (array initializers, switch statements using strings, etc.). Because they are only used internally, the class name doesn't really matter, so you could just assign a running number to them.
I would do this by going through the #Strings metadata stream and replacing all strings of the form "<PrivateImplementationDetails>{GUID}" with "<PrivateImplementationDetails>{running number, padded to same length as a GUID}".
The #Strings metadata stream is simply the list of strings used by the metadata, encoded in UTF-8 and separated by \0; so finding and replacing the names should be easy once you know where the #Strings stream is inside the executable file.
Unfortunately the "metadata stream headers" containing this information are quite buried inside the file format. You'll have to start at the NT Optional Header, find the pointer to the CLI Runtime Header, resolve it to a file position using the PE section table (it's an RVA, but you need a position inside the file), then go to the metadata root and read the stream headers.
我对此不确定,但只是一个想法:您是否使用编译器可能在幕后生成名称的任何匿名类型,这些名称每次编译器运行时可能会有所不同?只是我想到的一种可能性。可能是乔恩·斯基特 (Jon Skeet) 的一个;-)
更新: 您也许还可以使用 Reflector addins 用于比较和反汇编。
I'm not sure about this, but just a thought: are you using any anonymous types for which the compiler might generate names behind the scenes, which might be different each time the compiler runs? Just a possibility which occurred to me. Probably one for Jon Skeet ;-)
Update: You could perhaps also use Reflector addins for comparison and disassembly.
关于 PDB GUID 问题,如果您指定不应在编译发布版本时生成 PDB,则二进制文件是否仍包含 PDB 的文件系统 GUID?
要禁用 PDB 生成:
如果您从控制台进行构建,请使用 /debug- 以获得相同的结果。
Regarding the PDB GUID problem, if you specify that a PDB shouldn't be generated at compilation for Release builds, does the binary still contain the PDB's file system GUID?
To disable PDB generation:
If you're building from the console, use /debug- to get the same result.
查看此问题的答案。特别是在第三个提供的外部链接上。
编辑:
我实际上想要链接到这篇文章。
Take a look at the answers from this question. Especially on the external link provided in the 3rd one.
EDIT:
I actually wantetd to link to this article.
您说过,经过一些项目调整后,您能够让 C++ 应用程序重复编译为相同的 SHA1/MD5 值。我和您处于同一条船上,所在行业有第三方测试实验室,需要重复重建完全相同的可执行文件。
在研究如何在 VS2005 中实现这一点时,我在这里看到了您的帖子。您能否分享一下您为使 C++ 应用程序构建一致的相同 SHA1/MD5 值而进行的项目调整?这对我自己以及任何有此要求的其他人都有很大的帮助。
You said that after a few project tweaks you were able to get C++ apps to compile repeatably to the same SHA1/MD5 values. I'm in the same boat as you in being in an industry with a third party test lab that needs to rebuild exactly the same executables repeatably.
In researching how to make this happen in VS2005, I came across your post here. Could you share the project tweaks you did to make the C++ apps build to the same SHA1/MD5 values consistently? It would be of great help to myself and perhaps any others that share this requirement.
使用 ildasm.exe 完全反汇编两个程序并比较 IL。然后,您可以使用基于文本的方法“清理”代码并(可以预见)再次重新编译它。
Use ildasm.exe to fully disassemble both programs and compare the IL. Then you can "clean" the code using text-based methods and (predictably) recompile it again.