任何人都可以定义 Windows PE 校验和算法吗?
我想在 C# 中实现这个,
我在这里查看: http://www.codeproject.com/KB/cpp/PEChecksum.aspx
并且我知道ImageHlp.dll MapFileAndCheckSum 函数。
然而,由于种种原因,我想自己实现这一点。
我发现的最好的就在这里: 但是
,我不明白这个解释。谁能解释一下校验和是如何计算的?
谢谢!
更新
从代码示例中,我不明白这意味着什么,以及如何将其翻译成 C#
sum -= sum < low 16 bits of CheckSum in file // 16-bit borrow
sum -= low 16 bits of CheckSum in file
sum -= sum < high 16 bits of CheckSum in file
sum -= high 16 bits of CheckSum in file
更新#2
谢谢,也遇到了一些类似的 Python 代码此处
def generate_checksum(self):
# This will make sure that the data representing the PE image
# is updated with any changes that might have been made by
# assigning values to header fields as those are not automatically
# updated upon assignment.
#
self.__data__ = self.write()
# Get the offset to the CheckSum field in the OptionalHeader
#
checksum_offset = self.OPTIONAL_HEADER.__file_offset__ + 0x40 # 64
checksum = 0
# Verify the data is dword-aligned. Add padding if needed
#
remainder = len(self.__data__) % 4
data = self.__data__ + ( '\0' * ((4-remainder) * ( remainder != 0 )) )
for i in range( len( data ) / 4 ):
# Skip the checksum field
#
if i == checksum_offset / 4:
continue
dword = struct.unpack('I', data[ i*4 : i*4+4 ])[0]
checksum = (checksum & 0xffffffff) + dword + (checksum>>32)
if checksum > 2**32:
checksum = (checksum & 0xffffffff) + (checksum >> 32)
checksum = (checksum & 0xffff) + (checksum >> 16)
checksum = (checksum) + (checksum >> 16)
checksum = checksum & 0xffff
# The length is the one of the original data, not the padded one
#
return checksum + len(self.__data__)
但是,它仍然对我不起作用 - 这是我对此代码的转换:
using System;
using System.IO;
namespace CheckSumTest
{
class Program
{
static void Main(string[] args)
{
var data = File.ReadAllBytes(@"c:\Windows\notepad.exe");
var PEStart = BitConverter.ToInt32(data, 0x3c);
var PECoffStart = PEStart + 4;
var PEOptionalStart = PECoffStart + 20;
var PECheckSum = PEOptionalStart + 64;
var checkSumInFile = BitConverter.ToInt32(data, PECheckSum);
Console.WriteLine(string.Format("{0:x}", checkSumInFile));
long checksum = 0;
var remainder = data.Length % 4;
if (remainder > 0)
{
Array.Resize(ref data, data.Length + (4 - remainder));
}
var top = Math.Pow(2, 32);
for (int i = 0; i < data.Length / 4; i++)
{
if (i == PECheckSum / 4)
{
continue;
}
var dword = BitConverter.ToInt32(data, i * 4);
checksum = (checksum & 0xffffffff) + dword + (checksum >> 32);
if (checksum > top)
{
checksum = (checksum & 0xffffffff) + (checksum >> 32);
}
}
checksum = (checksum & 0xffff) + (checksum >> 16);
checksum = (checksum) + (checksum >> 16);
checksum = checksum & 0xffff;
checksum += (uint)data.Length;
Console.WriteLine(string.Format("{0:x}", checksum));
Console.ReadKey();
}
}
}
谁能告诉我我在哪里?我很蠢吗?
I would like to implement this in C#
I have looked here:
http://www.codeproject.com/KB/cpp/PEChecksum.aspx
And am aware of the ImageHlp.dll MapFileAndCheckSum function.
However, for various reasons, I would like to implement this myself.
The best I have found is here:
http://forum.sysinternals.com/optional-header-checksum-calculation_topic24214.html
But, I don't understand the explanation. Can anyone clarify how the checksum is calculated?
Thanks!
Update
I from the code example, I do not understand what this means, and how to translate it into C#
sum -= sum < low 16 bits of CheckSum in file // 16-bit borrow
sum -= low 16 bits of CheckSum in file
sum -= sum < high 16 bits of CheckSum in file
sum -= high 16 bits of CheckSum in file
Update #2
Thanks, came across some Python code that does similar too here
def generate_checksum(self):
# This will make sure that the data representing the PE image
# is updated with any changes that might have been made by
# assigning values to header fields as those are not automatically
# updated upon assignment.
#
self.__data__ = self.write()
# Get the offset to the CheckSum field in the OptionalHeader
#
checksum_offset = self.OPTIONAL_HEADER.__file_offset__ + 0x40 # 64
checksum = 0
# Verify the data is dword-aligned. Add padding if needed
#
remainder = len(self.__data__) % 4
data = self.__data__ + ( '\0' * ((4-remainder) * ( remainder != 0 )) )
for i in range( len( data ) / 4 ):
# Skip the checksum field
#
if i == checksum_offset / 4:
continue
dword = struct.unpack('I', data[ i*4 : i*4+4 ])[0]
checksum = (checksum & 0xffffffff) + dword + (checksum>>32)
if checksum > 2**32:
checksum = (checksum & 0xffffffff) + (checksum >> 32)
checksum = (checksum & 0xffff) + (checksum >> 16)
checksum = (checksum) + (checksum >> 16)
checksum = checksum & 0xffff
# The length is the one of the original data, not the padded one
#
return checksum + len(self.__data__)
However, it's still not working for me - here is my conversion of this code:
using System;
using System.IO;
namespace CheckSumTest
{
class Program
{
static void Main(string[] args)
{
var data = File.ReadAllBytes(@"c:\Windows\notepad.exe");
var PEStart = BitConverter.ToInt32(data, 0x3c);
var PECoffStart = PEStart + 4;
var PEOptionalStart = PECoffStart + 20;
var PECheckSum = PEOptionalStart + 64;
var checkSumInFile = BitConverter.ToInt32(data, PECheckSum);
Console.WriteLine(string.Format("{0:x}", checkSumInFile));
long checksum = 0;
var remainder = data.Length % 4;
if (remainder > 0)
{
Array.Resize(ref data, data.Length + (4 - remainder));
}
var top = Math.Pow(2, 32);
for (int i = 0; i < data.Length / 4; i++)
{
if (i == PECheckSum / 4)
{
continue;
}
var dword = BitConverter.ToInt32(data, i * 4);
checksum = (checksum & 0xffffffff) + dword + (checksum >> 32);
if (checksum > top)
{
checksum = (checksum & 0xffffffff) + (checksum >> 32);
}
}
checksum = (checksum & 0xffff) + (checksum >> 16);
checksum = (checksum) + (checksum >> 16);
checksum = checksum & 0xffff;
checksum += (uint)data.Length;
Console.WriteLine(string.Format("{0:x}", checksum));
Console.ReadKey();
}
}
}
Can anyone tell me where I'm being stupid?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
好吧,终于让它工作正常了...我的问题是我使用的是整数而不是单位!
所以,这段代码可以工作(假设数据是 4 字节对齐的,否则你必须稍微填充它) - 并且 PECheckSum 是 PE 中 CheckSum 值的位置(在计算校验和时显然不使用它! !!!)
请注意,以下是 C# 代码。
Ok, finally got it working ok... my problem was that I was using ints not uints!!!
So, this code works (assuming data is 4-byte aligned, otherwise you'll have to pad it out a little) - and PECheckSum is the position of the CheckSum value within the PE (which is clearly not used when calculating the checksum!!!!)
Note that the following is C# code.
论坛帖子中的代码与实际反汇编Windows PE代码时注意到的并不严格相同。 您引用的 CodeProject 文章给出了“将 32 位值折叠为 16 位” as:
你可以将其翻译成 C#:
The code in the forum post is not strictly the same as what was noted during the actual disassembly of the Windows PE code. The CodeProject article you reference gives the "fold 32-bit value into 16 bits" as:
Which you could translate into C# as:
没有人真正回答最初的问题“任何人都可以定义 Windows PE 校验和算法吗?”所以我将尽可能简单地定义它。到目前为止给出的许多示例都是针对无符号 32 位整数(也称为 DWORD)进行优化,但如果您只想从最基本的角度了解算法本身,那么就是这样:
使用无符号 16 位整数整数(也称为 WORD)来存储校验和,将除 PE 可选标头校验和的 4 个字节之外的所有数据 WORD 相加。如果文件不是字对齐的,那么最后一个字节是 0x00。
将校验和从 WORD 转换为 DWORD 并添加文件的大小。
上述 PE 校验和算法实际上与原始 MS-DOS 校验和算法相同。唯一的区别是跳过的位置并替换末尾的 XOR 0xFFFF 并添加文件的大小。
从我的 PHP 的 WinPEFile 类,上面的算法看起来像:
No one really answered the original question of "Can anyone define the Windows PE Checksum Algorithm?" so I'm going to define it as simply as possible. A lot of the examples given so far are optimizing for unsigned 32-bit integers (aka DWORDs), but if you just want to understand the algorithm itself at its most fundamental, it is simply this:
Using an unsigned 16-bit integer (aka a WORD) to store the checksum, add up all of the WORDs of the data except for the 4 bytes of the PE optional header checksum. If the file is not WORD-aligned, then the last byte is a 0x00.
Convert the checksum from a WORD to a DWORD and add the size of the file.
The PE checksum algorithm above is effectively the same as the original MS-DOS checksum algorithm. The only differences are the location to skip and replacing the XOR 0xFFFF at the end and adding the size of the file instead.
From my WinPEFile class for PHP, the above algorithm looks like:
以下来自 emmanuel 的 Java 代码可能无法工作。就我而言,它挂起并且未完成。我相信这是由于代码中大量使用 IO:特别是 data.read()。这可以与数组交换作为解决方案。 RandomAccessFile 完全或增量地将文件读取到字节数组中。
我尝试了此操作,但由于校验和偏移的条件是跳过校验和标头字节,因此计算速度太慢。我想OP的C#解决方案也会有类似的问题。
下面的代码也删除了这个。
然而,我仍然认为代码过于冗长和笨重,因此我用通道替换了 raf,并将罪魁祸首字节重写为零以消除条件。这段代码可能仍然可以使用缓存样式缓冲读取。
Java code below from emmanuel may not work. In my case it hangs and does not complete. I believe this is due to the heavy use of IO in the code: in particular the data.read()'s. This can be swapped with an array as solution. Where the RandomAccessFile fully or incrementally reads the file into a byte array(s).
I attempted this but the calculation was too slow due to the conditional for the checksum offset to skip the checksum header bytes. I would imagine that the OP's C# solution would have a similar problem.
The below code removes this also.
I still however think that code was too verbose and clunky so I swapped out the raf with a channel and rewrote the culprit bytes to zero's to eliminate the conditional. This code could still probably do with a cache style buffered read.
CheckSum 字段长度为 32 位,计算方式如下
1. 将整个文件的所有 dword(32 位片段)添加到总和中
将整个文件的所有 dword 添加不包括 CheckSum 字段本身,包括所有标题和所有内容,为一个双字。如果双字溢出,则将溢出的位添加回双字的第一位 (2^0)。
如果文件不能完全分为双字(4 位块),请参阅 2。
我知道实现这一点的最佳方法是使用 GNU C 编译器整数溢出内置函数 __builtin_uadd_overflow。
在 Jeffrey 记录的原始 ChkSum 函数 中沃尔顿总和
通过执行
add (%esi),%eax
计算得出,其中esi
包含文件的基址,eax
为 0,并像这样添加文件的其余部分第一个
add
添加第一个 dword 忽略任何进位标志。下一个双字由
adc
指令添加,该指令的作用与add
相同,但是添加在执行指令之前设置的任何进位标志
到被加数。最后一个
adc $0x0,%eax
仅添加最后一个进位标志,如果已设置且无法丢弃。
请记住,不应添加 CheckSum 字段本身的双字。
2. 如果有一个,则将余数添加到总和中
如果文件不能完全整除为双字,则将余数添加为
零填充双字。例如:假设您的文件有 15 个字节长,如下所示
<代码> 0E 1F BA 0E | 00 B4 09 CD | 21 B8 01 4C | CD 21 54
您需要将余数作为
0x005421CD
添加到总和中。我的系统是小端系统。我不知道校验和是否会因为
而改变
大端系统上的字节顺序,或者您只需模拟这个
行为。
我通过将
buffer_size
四舍五入到可被 4 整除的下一个字节数来实现此目的没有余数或换句话说:表示下一个完整的双字计数
以字节为单位。然后我使用
calloc
进行分配,因为它会初始化内存块全为零。
3. 将总和的低位字(16 位片)和高位字相加。
sum=(sum&0xffff)+(sum>>16);
4. 再次添加新的较高位单词
sum+=(sum>>16);
5只保留低位字
sum&=0xffff;
6. 将文件中的字节数与 sum 相加
return(sum+size);
就是这样我写的。它不是C#,而是C。off_t size是文件中的字节数。 uint32_t *base 是指向加载到内存中的文件的指针。内存块应该在下一个可被 4 整除的字节数末尾用零填充。
如果您愿意,您可以查看正在运行的代码并阅读更多内容 此处。
The CheckSum field is 32 bits long and is calculated as follows
1. Add all dwords (32 bit pieces) of the entire file to a sum
Add all dwords of the entire file not including the CheckSum field itself, including all headers and all of the contents, to a dword. If the dword overflows, add the overflowed bit back to the first bit (2^0) of the dword.
If the file is not entirely divisible into dwords (4 bit pieces) see 2.
The best way I know to realize this is by using the GNU C Compilers Integer Overflow Builtin function __builtin_uadd_overflow.
In the original ChkSum function documented by Jeffrey Walton the sum
was calculated by performing an
add (%esi),%eax
whereesi
contains the base address of the file andeax
is 0 and adding the rest of the file like thisThe first
add
adds the first dword ignoring any carry flag. The next dwordsare added by the
adc
instruction which does the same thing asadd
butadds any carry flag that was set before executing the instruction in addition
to the summand. The last
adc $0x0,%eax
adds only the last carry flag if itwas set and cannot be discarded.
Please keep in mind that the dword of CheckSum field itself should not be added.
2. Add the remainder to the sum if there is one
If the file is not entirely divisible into dwords, add the remainder as a
zero-padded dword. For example: say your file is 15 bytes long and looks like this
0E 1F BA 0E | 00 B4 09 CD | 21 B8 01 4C | CD 21 54
You need to add the remainder as
0x005421CD
to the sum. My system is alittle-endian system. I do not know if the checksum would change because of the
this order of the bytes on big-endian systems, or you would just simulate this
behaviour.
I do this by rounding up the
buffer_size
to the next bytecount divisible by 4without remainder or put differently: the next whole dword count represented
in bytes. Then I allocate with
calloc
because it initializes the memory blockwith all zeros.
3. Add the lower word (16 bit piece) and the higher word of the sum together.
sum=(sum&0xffff)+(sum>>16);
4. Add the new higher word once again
sum+=(sum>>16);
5. Only keep the lower word
sum&=0xffff;
6. Add the number of bytes in the file to the sum
return(sum+size);
This is how I wrote it. It is not C#, but C. off_t size is the number of bytes in the file. uint32_t *base is a pointer to the file loaded into memory. The block of memory should be padded with zeros at the end to the next bytecount divisible by 4.
If you want you can see the code in action and read a little bit more here.
Java 示例并不完全正确。以下 Java 实现与 Microsoft 来自
Imagehlp.MapFileAndCheckSumA
的原始实现的结果相对应。重要的是输入字节被
inputByte & 屏蔽。 0xff
以及生成的long
在与currentWord & 的加法项中使用时再次被屏蔽。 0xffffffffL
(考虑L):在这种情况下,Java有点不方便。
The Java example is not entirely correct. Following Java implementation corresponds with the result of Microsoft's original implementation from
Imagehlp.MapFileAndCheckSumA
.It's important that the input bytes are getting masked with
inputByte & 0xff
and the resultinglong
masked again when it's used in the addition term withcurrentWord & 0xffffffffL
(consider the L):In this case, Java is a bit inconvenient.
我试图用Java解决同样的问题。以下是将 Mark 的解决方案翻译成 Java,使用 RandomAccessFile 而不是字节数组作为输入:
I was trying to solve the same issue in Java. Here is Mark's solution translated into Java, using a RandomAccessFile instead of a byte array as input:
如果您需要短的不安全...(不需要使用 Double 和 Long 整数,也不需要算法内部的数组对齐)
If you need short unsafe... (Not need use Double and Long integers and not need Array aligning inside algorithm)
我想通过 PE 校验和“揭开”整个故事的神秘面纱来澄清这一情况。
校验和算法如下:
有没有任何“折叠”。这是考虑到任何溢出的简单有限加法。就 x86 CPU 指令而言,这是“adc”(带进位的加法)助记符,而不是“add”助记符。
在普通 C 中,没有任何编译器内在函数,这是:
我将整个循环分成两部分(在 CheckSum 结构元素之前和之后,必须排除该元素)以使 CPU 的分支预测单元满意(否则,它会检查每次循环迭代中的 CheckSum 偏移量)。
面临溢出(即CPU进位为1)仅意味着:“如果计算x+y并且结果小于之前的总和(此处:carry_check),则进位为1”。
证明此实现正确的证据很简单:使用 Windows 安装中的“user32.dll”并计算校验和(这就是我为测试所做的)。
一些附加说明:
I would like to clarify the situation with the PE checksum 'de-mystifying' the whole story.
The checksum algorithm is just like this:
There is no 'folding' whatsoever. This is simple finite addition taking any overflows into account. In terms of x86 CPU instructions, this is the 'adc' (add with carry) mnemonic instead of the 'add' mnemonic.
In plain C, without any compiler intrinsics, this is:
I have split the entire loop into two parts (before and after the CheckSum structure element, which has to be excluded) to make the CPU's branch prediction unit happy (otherwise, it would have to check for the CheckSum offset in every loop iteration).
Facing an overflow (i.e. CPU carry bit is 1) just means: 'If you compute x+y and the result is lesser than the previous sum (here: carry_check), then carry bit is 1'.
The proof that this implementation is correct, is simple: Use 'user32.dll' from your Windows installation and compute the checksum (that is what I did for testing).
Some additional remarks: