多页 tiff 文档的校验和
我想计算可能无法放入内存的大型 tiff 文件的校验和。 如果我计算每个页面的校验和,然后计算页面校验和数组的校验和,我会得到一个可靠的值吗?或者我会遇到一个我没有看到的数学问题,唯一正确的方法是事实上与整个事情有关吗?
谢谢!
I want to calculate the checksum for a large tiff file that might not fit in memory. Will I get a reliable value if I instead calculate the checksum of every page and then calculate the checksum of the array of page checksums or will I run into a mathematical problem that I am not seeing and the only correct way to do it is to in fact work with the whole thing?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不知道是否正确理解了这个问题,但是对于大多数校验和算法,您只需要加载 记忆消息的一小部分。 因此,可以在流而不是内存位置上进行操作,并且已经完成之前。
编辑:
我只知道在对短消息进行校验和时必须小心Adler-32,你不会覆盖整个哈希空间,并且误报的可能性更大(是的,校验和数组会可能是一条短信)。
对于加密哈希,我真的不知道。 我的直觉是 md5(msg1 + msg2 + ...) 与 md5(md5(msg1) + md5(msg2) + ...) 一样可靠,但我们必须等待比我更聪明的人给出明确的答案:)
I don't know if understood the question correctly, but with most checksum algorithms you only have to load a small part of the message to memory. Because of that operating on the streams instead of memory locations is possible and has been done before.
Edit:
I only know that you have to be careful with Adler-32 when checksumming short messages, you would not be covering the whole hash space and false positives are more likely (yest, the array of checksums would probably be a short message).
With crypto hashes I honestly don't know. My intuition is that md5(msg1 + msg2 + ...) is as reliable as md5(md5(msg1) + md5(msg2) + ...) but we'll have to wait for someone smarter than me to give definitive answer :)