在 C# 中检查文件内容是否更改的最简单方法是什么？ Sha、crc32、md5 还是其他？

发布于 2025-01-13 08:06:49 字数 251 浏览 5 评论 0原文

我想检查文件的内容是否更改。我的计划是在文件的最后一行添加哈希。

稍后，我可以读取该文件，对其进行散列（散列除最后一行之外的所有内容）并将其与文件的最后一行进行比较（初始散列）。

我无法使用上次修改的日期/时间。我需要使用散列或存储在文件内的任何类型的编码。我使用 C# 来编写应用程序代码。最合理/最简单的方法是什么？我不知道以下哪一个最适合我：Sha1,2,3 - crc16/32/64 - md5？我不需要该方法快速或安全。

谢谢你！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟织青萝梦 2025-01-20 08:06:49

在我看来，如果将哈希值存储在文件中，就会遇到先有鸡还是先有蛋的问题。在对文件进行哈希处理之前，您不会知道哈希值。但是，当您散列文件并将该值添加到文件末尾时，散列将会改变。很明显，您需要对文件进行哈希处理，而不包括实际的哈希值本身。 你已经说过了这一点，但我再次添加它是为了澄清我接下来的观点。

诀窍是哈希/求和算法为您提供整个文件（或字节流，或其他）的总和。他们不会像以前那样给你一个“运行总计”。这意味着您需要在测试哈希值是否发生更改之前将哈希值与其余内容分开。除非您自己编写自定义哈希工具。

这当然可以使用所有哈希算法，但是您提出这个问题的事实使我相信您可能不希望编写专门设计的自定义（例如）SHA256 工具的麻烦当它到达存储的哈希值时退出。

在我看来，您有三种选择：

将哈希值与文件分开存储 - 或者至少写入一个不包含哈希值的临时文件，然后对其进行哈希处理。这将允许您使用 C# 中内置的哈希工具，无需任何修改或花哨的技巧。我知道这并不完全符合您列出的要求，但您可能会考虑这是一个选项。
您没有提及文件的大小，但如果它足够小，您可以简单地将其放入内存中减去散列的字节，使用内置工具对内存中的数据进行散列，然后然后比较。这将再次允许您使用内置工具。
使用自定义哈希工具，当到达“有趣”数据的末尾时，该工具会故意退出。如果是这样的话，我毫无疑问会推荐像 CRC 这样的非安全哈希方法，因为它会更容易理解和自己修改代码（毕竟这是更简单的代码）。您已经提到您不需要它的安全性，因此这可以满足您的要求。

如果您决定使用选项 #3，那么我建议您转到 Rosetta Code 来搜索C# 中的 CRC 算法。从那里您可以读取文件，减去散列的字节，然后通过散列算法发送剩余部分。那里列出的算法一次处理所有字节，但是将累加器转换为参数以便您可以分块发送数据是微不足道的。这将允许您就地处理任意大的文件。

[编辑] FWIW，我已经走上了类似的道路。就我而言，我编写了一个自定义工具，它允许我们通过 WAN 增量复制极大的文件。太大了以至于我们在安全复制文件时遇到了问题。该工具的正确使用方法是远程源服务器、预运行 CRC32 检查并以任意时间间隔保存总和。然后将 CRC32 校验复制到客户端，并开始复制文件。如果目标在中间停止，或者可能以某种方式损坏，则可以简单地提供本地部分的名称、远程源、包含 CRC32 和的文件以及最后一个目标。该程序将从本地部分开始复制，并且仅在发现部分 CRC32 和问题时才开始从远程复制。我们的问题是，字节复制末尾的简单恢复并不总是有效。这令人沮丧，因为复制需要很长时间。我和我的队友好几次笑说我们可以尝试USB驱动器和信鸽......

It seems to me as if you're going to have a chicken or egg issue if you store the hash inside the file. You won't know the hash until you hash the file. But then when you hash the file and add that value to the end of the file, the hash will change. So clearly you need to hash the file without including the actual hash itself. You already said this, but I'm adding it again to clarify my next points.

The trick is that hash/sum algorithms give you the sum of the entire file (or byte stream, or whatever). They don't tend to give you a "running total" as it were. Which means you'll need to separate out the hash from the rest of the content before testing to see if it's changed. That is unless you write a custom hashing tool yourself.

This is of course possible using all hashing algorithms, but the fact that you are asking this question leads me to believe that you probably won't want the hassle of writing a custom (e.g.) SHA256 tool specifically designed to drop out when it reaches the stored hash.

To my eye, you have three choices:

Store the hash separately from your file - or at the minimum write a temporary file which does not contain the hash, and hash that. This would allow you to use a hashing tool already built into C# without any modification or fancy trickery. I know this does not exactly match your requirements as listed, but it's an option that you might consider.
You don't mention the size of the file, but if it is sufficiently small, you could simply slurp it up into memory minus the bytes of the hash, hash your in-memory data using a built-in tool, and then compare. This would again allow you to use built-in tools.
Use a custom hashing tool that purposely drops out when it reaches the end of the "interesting" data. If that's the case, I would unquestionably recommend a non-secure hashing method like CRC, simply because it will be so much easier to understand and modify the code yourself (it is much simpler code after all). You already mention that you don't need it to be secure, so this would meet your requirements.

If you decide to go with option #3, then I would suggest schlepping over to Rosetta Code to search for a CRC algorithm in C#. From there you can read your file, subtract out the bytes of the hash, send the remainder through your hashing algorithm. The algorithm listed there processes all bytes at once, but it would be trivial to turn the accumulator into a parameter so that you could send data in chunks. This would allow you to work on an arbitrarily large file in situ.

[EDIT] FWIW, I have already gone down a similar path. In my case I wrote a custom tool which allows us to incrementally copy extremely large files over the WAN. So big that we had problems getting the file to copy safely. Proper use of the tool is to remote the source server, pre-run a CRC32 check and save the sums at arbitrary intervals. Then one copies the CRC32 checks to the client side, and starts copying the file. Should the target get stopped in the middle, or possibly corrupted somehow, one can simply supply the name of the local partial, the remote source, the file containing CRC32 sums, and finally a target. The program will start copying from the local partial, and will only start copying from the remote when a partial CRC32 sum issue is found. Our problem was that a simple resume at the end of the bytes copy did not always work. Which was frustrating since it takes so long to copy. My team mates and I laughed several times that we might try USB drives and homing pigeons...

回复收藏 0 原文