我正在将我从自己的类中定义的一些对象保存到文件中。 (保存流数据)。
这一切都很好,但我希望能够在文件中存储该文件的 CRC 校验和。
然后,每当我的应用程序尝试打开文件时,它都可以读取内部存储的 CRC 值。
然后对实际文件进行检查,如果文件的 CRC 与内部存储的 CRC 值匹配,我可以正常处理该文件,否则显示一条错误消息,表明该文件无效。
不过,我需要一些关于如何执行此操作的建议,我认为我可以执行以下操作:
- 从我的应用程序保存文件。
- 计算保存文件的 CRC。
- 编辑存储 CRC 值的已保存文件。
- 每当打开文件时,检查 CRC 是否与内部 CRC 值匹配。
问题是,一旦文件中的单个数据字节发生更改,就会导致 CRC 校验和与预期完全不同。
I am saving some Objects I have defined from my own classes, to File. (saving the stream data).
That is all fine, but I would like to be able to store in the File the CRC checksum of that File.
Then, whenever my Application attemps to Open a File, it can read the internally stored CRC value.
Then perform a check on the actual File, if the CRC of the File matches the internally stored CRC value I can process the File normally, otherwise display an error message to say the File is not valid.
I need some advice on how to do this though, I thought I could do something like this:
- Save the File from my Application.
- Calculate the CRC of the Saved File.
- Edit the Saved File storing the CRC Value.
- Whenever a File is Opened, Check the CRC matches internal CRC Value.
Problem is, as soon as a single Byte of Data is altered in the File, results in the CRC checksum being completely different - as expected.
发布评论
评论(4)
我通常更喜欢将 CRC 排除在检查之外的方法。但如果由于某种原因无法做到这一点,有一个解决方法:
您需要保留 8 个字节,其中 4 个用于 CRC,4 个用于补偿数据。首先用特定的虚拟值(例如
0x00
)填充保留字节。然后将 CRC 计算到前 4 个字节中,最后更改其他 4 个字节,使文件的 CRC 保持不变。有关如何执行此计算的详细信息:Reversing CRC32
我实际上在 我的项目之一:
我正在设计一种基于 zip 的文件格式。存档中的第一个文件未压缩存储并用作头文件。这也意味着它存储在文件中的固定偏移量处。到目前为止相当标准,类似于 ePub。
现在我决定在标头中包含 sha1 哈希,为每个文件提供一个基于内容的唯一 ID 并进行完整性检查。由于标头和 sha1 哈希值位于文件中已知的偏移量,因此在哈希值很简单时将其屏蔽。因此,我输入一个虚拟哈希并创建 zip 文件,然后对文件进行哈希并填写真实哈希。
但现在有一个问题:Zip 存储了所有包含的文件的 CRC。不仅在 sha1 散列时很容易屏蔽的一个地方,而且在文件末尾附近具有可变偏移量的第二个地方。所以我决定使用 CRC 伪造,这样我就得到了强哈希,并且 zip 得到了有效的 CRC32。
由于我已经为最终文件伪造了 CRC,因此我决定为原始头文件伪造它也不会造成伤害。因此,这种格式的所有文件现在都以具有 CRC
0xD1CE0DD5
的头文件开头。I'd generally prefer the approach where the CRC is excluded from the checking. But if that's not possible for some reason, there is a workaround:
You need to reserve 8 bytes, 4 for the CRC, and 4 for compensation data. First fill the reserved bytes with a certain dummy value (say
0x00
). Then calculate the CRC into the first 4 bytes, and finally change the other 4 bytes so the CRC of the file stays the same.For details on how to perform this calculation: Reversing CRC32
I actually used this in one of my projects:
I was designing a file format based on zip. The first file in the archive is stored uncompressed and serves as header file. This also means it is stored at a fixed offset in the file. So far pretty standard, and similar to for example ePub.
Now I decided to include a sha1 hash in the header, to give each file a unique content based Id and for integrity checking. Since the header and thus the sha1 hash is at a known offset in the file, masking it when hashing is trivial. So I put in a dummy hash and create the zip file, then hash the file and fill in the real hash.
But now there is a problem: Zip stores the CRC of all contained files. And not only in one place which would be easy to mask when sha1-hashing, but in a second place with variable offset near the end of the file. So I decided to go with CRC faking, so I get my strong hash, and zip gets its valid CRC32.
And since I was already faking the CRC for the final file, I decided faking it for the original header file wouldn't hurt either. Thus all files in this format now start with a header file that has the CRC
0xD1CE0DD5
.简而言之,您需要从校验和计算中排除用于存储校验和的字节。
将校验和写在文件的最后。除了校验和之外,还根据文件的内容计算它。当你读取文件时,根据校验和之前的内容计算校验和。或者您可以将校验和写入随机访问文件的第一个字节。只要你知道它在哪里就可以了。
Simply put you need to exclude the bytes used to store the checksum from the checksum calculation.
Write the checksum as the last thing in the file. Calculate it based on the contents of the file apart from the checksum. When you come to read the file calculate the checksum based on the contents before the checksum. Or you could write the checksum as the first bytes of the file with random access. Just so long as you know where it is.
将 CRC 存储为文件本身的一部分,但不将其数据包含在 CRC 计算中。如果您有某种固定标头,则在将 CRC 字段传递给 CRC 函数之前将其清零。如果不是,只需将其附加到文件末尾,并将除最后 4 个字节之外的所有内容传递到 CRC 函数中。
或者,如果文件存储在 NTFS 驱动器上并且不需要将它们传输到另一台计算机,则可以使用 NTFS 备用数据流 来存储 CRC。基本上,您打开的文件的 ADS 名称与文件名之间用冒号分隔(例如
C:\file.txt:CRC
)。 Windows 在内部处理差异,因此您可以使用普通的 TFileStream 函数来操作它们。备用数据流与标准文件流分开存储,因此仅打开或修改
C:\file.txt
不会影响它。因此,代码如下所示:
如果您需要查找附加到文件的所有备用数据流(可以有多个),您可以 使用 href="http://msdn.microsoft.com/en-us/library/windows/desktop/aa362509%28v=vs.85%29.aspx">备份阅读。 Internet Explorer 使用 ADS 来支持“此文件已从 Internet 下载。您确定要打开它吗?”迅速的。
Store the CRC as part of the file itself, but don't include the data for it in the CRC calculation. If you have some sort of fixed header zero out the CRC field before passing it to the CRC function. If not, just append it to the end of the file and pass everything but the last 4 bytes into the CRC function.
Alternatively, if the files are stored on an NTFS drive and you don't need to transfer them to another computer you can use NTFS Alternate Data Streams to store the CRCs. Basically you open the file with the ADS name separated from the filename by a colon (like
C:\file.txt:CRC
). Windows handles the difference internally, so you can use plain TFileStream functions to manipulate them.Alternate data streams are stored separately from the standard file stream, so opening or modifying just
C:\file.txt
won't affect it.So, the code would look like this:
If you need to find all of the alternate data streams attached to a file (there can be more than one), you can iterate over them using BackupRead. Internet Explorer uses ADSs to support the "This file has been downloaded from the Internet. Are you sure you want to open it?" prompt.
我建议将校验和存储在另一个文件中,可能是 .ini 文件。或者对于一个非常奇怪的想法,您可以将校验和作为文件名的一部分。
即 MyFile_checksum_digits_here.dat
I would recommend storing the checksum in another file, maybe a .ini file. Or for a really weird idea, you could incorporate the checksum as part of the filename.
i.e. MyFile_checksum_digits_here.dat