对于可以恢复文件传输的点对点应用程序,在恢复文件之前检查文件大小/修改日期是否足以进行更改?

发布于 2024-11-26 11:46:03 字数 565 浏览 4 评论 0原文

我正在开发一个具有点对点文件传输组件(例如即时通讯程序)的网络应用程序,我希望使其能够优雅地恢复文件传输。

如果正在进行文件传输,并且一个用户退出,接收者仍然知道他已成功接收了多少文件,因此知道从哪里恢复传输。但是,如果文件在此期间发生了更改,如何检测到呢?关于我的问题,我在这里关注的不是网络造成的损坏,而是源文件被更改造成的损坏。

我开始的方法是让发件人在发送文件之前对文件进行哈希处理,这样收件人就有一个哈希值来检查完成的文件。然而,这只能在最后检测到损坏,除非每个简历也进行哈希处理。通过按块查看文件并对每个文件进行哈希处理可以缓解此问题。然而,散列的更大问题是它可能需要非常非常长的时间,当用户只想立即发送某些内容时,这只是一种糟糕的用户体验(例如:慢速网络共享上的 Linux ISO 是要发送的文件)发送)。

我正在考虑更改为每次传输开始或恢复时仅检查文件大小和修改日期。虽然这显然不是万无一失的,除非我遗漏了一些东西(如果遗漏了,请纠正我),但最终用户用来更改文件的几乎所有方法都将表现良好,并且至少标记修改日期,即使没有,大小的变化也应该能捕获 99% 的情况。这看起来是一个可以接受的妥协吗?坏主意?

已建立的协议如何处理这个问题?

I'm working on a networked application that has a peer-to-peer file transfer component (think instant messenger), and I'd like to make it able to resume file transfers gracefully.

If there is an ongoing file transfer, and one user drops out, the recipient still knows how much of the file he's successfully received and therefore where to resume the transfer from. However, if the file has changed in the meantime, how can this be detected? With regards to my questions, I'm not focused here on corruption by the network so much as corruption by the source file being altered.

The way I was starting out on this was by having the sender hash the file before sending it, so the recipient has a hash to check the finished file against. However, this only detects corruption at the very end, unless each resume also hashes. This problem could be alleviated by viewing the file in chunks, and hashing each of those. However, the bigger problem with hashing is that it can take a really, really long time, which is just a bad user experience when a user just wants to immediately send something (Ex: Linux ISO on a slow network share is the file to be sent).

I was thinking about changing to simply checking the file size and modified date each time a transfer begins or is resumed. While this is clearly not foolproof, unless I'm missing something (and please correct me if I am), almost every means an end-user would be using to alter files will be well-behaved and at the very least mark the modified date, and even if not, the change in size should catch 99% of cases. Does this seem like an acceptable compromise? Bad idea?

How do the established protocols handle this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

忘东忘西忘不掉你 2024-12-03 11:46:03

对您问题的快速回答是,它在大多数情况下都有效,除非经常修改文件。

使用校验和(例如 CRC32)代替哈希值。这些可以更快地检查文件是否已被修改。

如果连接中断,您只需将计算出的块校验和发送回源,源可以计算当前块之间是否已被修改。然后,它可以决定重新发送哪一个并发送丢失的块。

块和校验和是关于用户体验的完整文件和哈希的最佳权衡。

The quick answer to your question is that it will work in most cases, unless files are modified often.

Instead of hashes, use check sums (CRC32 for example). These are much faster to check whether a file has been modified.

If a connection breaks, you only need to send the computed chunk checksums back to the source which can compute whether the current chunks have been modified in between. Then, it can decide which one to resend and send the missing chunks.

Chunk & checksums are the best trade-off over full files and hashes regarding user experience.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文