使用基于差异的补丁方法更新我的程序
目前,我的程序通过下载包含源代码的最新 .tar.gz 文件并将其解压到程序所在的当前目录来进行自我更新。有 2 种更新“模式” - 一种适用于运行 Python 源代码的用户,另一种适用于用户将程序作为 Windows exe 运行。
随着时间的推移,由于新的图像、库、文档和代码,我的程序的文件大小随着每次发布而变得越来越大。然而,有时从一个版本到另一个版本只发生代码更改,因此当只有很小的代码更改时,用户最终会一遍又一遍地重新下载所有图像、文档等。
我认为一种更有效的方法是使用基于补丁/差异的系统,其中程序仅通过下载小的更改集将自身从一个版本增量更新到另一个版本。
然而,我该怎么做呢?如果用户运行的是 0.38 版本,并且有可用的 0.42 版本,那么他们会下载 0.38->39 吗? 0.39→40; 0.40→41,0.41→42?我将如何处理二进制文件中的差异? (图像,就我而言)。
我还必须维护一些包含所有补丁的存储库,这还不错。我只是生成每个新版本的差异。但我想对可执行文件执行此操作比对纯 python 代码更难?
任何意见都会受到赞赏。非常感谢。
Currently my program updates itself by downloading the latest .tar.gz file containing the source code, and extracting it over the current directory where the program lives. There are 2 "modes" of update - one for users running the Python source, and one if the user is running the program as a Windows exe.
Over time my program's filesize is becoming larger with each release, due to new images, libraries, documentation and code. However, sometimes only code changes occur from one release to another, so the user ends up re-downloading all the images, documentation etc over and over, when only there are only small code changes.
I was thinking that a more efficient approach would be to use a patch/diff based system where the program incrementally updates itself from one version to another by only downloading small change sets.
However, how should I do this? If the user is running version 0.38, and there is a 0.42 available, do they download 0.38->39; 0.39->40; 0.40->41, 0.41->42? How would I handle differences in binary files? (images, in my case).
I'd also have to maintain some repository containing all the patches, which isn't too bad. I'd just generate the diffs with each new release. But I guess it would be harder to do this to executables than to pure python code?
Any input is appreciated. Many thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我建议您不要重新发明自己的更新管理系统,而是看看开源选项,例如 google updater(一年多前开源的 Omaha) -- 我想 Windows 焦点是可以的,因为您确实专门提到了 Windows,但如果您还需要 Mac 支持,更新引擎(对于 Linux,您可能希望使用特定发行版的包管理系统,而不是使用任何附加组件)。
正如您在omaha 概述中看到的那样,重点并不是专门针对确定和应用“增量”而不是完整更新,而是为了用户的方便而自动化该过程(以及安全性,当更新解决潜在的安全问题时)。至于差异,我建议行为类似于版本控制系统,如 subversion (事实上,你可以毫无疑问重用大部分 svn 的代码)——只有文本文件有差异,二进制文件的“差异”是全有或全无(对于大多数二进制文件格式来说,尝试发送少于整个新文件(如果有任何更改);特别是对于图像,以及更普遍的各种压缩文件,通常底层内容的微小变化可能会在结果文件中产生巨大的变化)。
如果您认为部分或全部二进制文件实际上可能受益于使用差异和增量补丁的方法,而不是全部或全部不逐个文件替换,我建议您首先尝试使用专门的实用程序,例如 jojodiff 进行验证 - 如果确实如此(也许仅适用于某些文件,而其他文件也可能被替换完全),您可以将其补丁部分与更新程序一起打包(并将其作为 Python 等的子进程运行)。
至于在服务器上维护增量,应该采用混合方法:即,您尝试保留所有(二次数)更新(从 A → A+1、A → A+2、A+1 → A+ 2等),但是当增量做事的优势变得太小而无法保证占用服务器上的存储成本和客户端的处理时间(例如,当然,除了启发法(又称尝试/实验并查看)之外,别无他法来确定“太小”的阈值;-)。
I suggest that rather than reinventing your own update management system, you take a look at open source options, such as google updater (which was open sourced over a year ago as Omaha) -- I imagine the Windows focus is OK since you do specifically refer to Windows, but if you also need Mac support a similar functionality is offered in update engine (for Linux you probably want to work with the specific distribution's package management system rather than using any add-on one).
As you'll see in the omaha overview, the focus is not specifically on determining and applying "deltas" rather than full updates, but on automating the process for the user's convenience (and security, when updates address potential security issues). As for the differences, I would suggest behaving similar to version control systems like subversion (indeed, you can no doubt reuse much of svn's code) -- only text files are differenced, binary files' "differences" are all-or-nothing (for most binary file formats there's just too little gain -- if any -- in trying to send less than the whole new file, if changed at all; for images in particular, and more generally compressed files of all kinds, it's typical that a tiny change in the underlying content can produce huge changes in the resulting file).
If you think some or all of your binary files might actually benefit from the approach of using differences and incremental patches, rather than all or nothing file-by-file replacement, I would suggest you first experiment with a specialized utility such as jojodiff to verify -- and if that is indeed the case (perhaps only for some files, while others might as well be replaced entirely), you might package the patch part of it with your updater (and run it as a subprocess from Python, etc).
As for maintaining deltas on your server, a mixed approach should work: i.e., you'd try to keep all the (quadratic numbers of) updates (from A → A+1, A → A+2, A+1 → A+2, etc) but "cut off" each branch (in favor of a total-replacement approach) when the advantage of doing things incrementally becomes too small to warrant the cost of taking up storage on your server and processing time at the client (of course, there's nothing but heuristics, aka try/experiment and see, for determining the threshold for "too small";-).
您的更新管理器可以知道当前应用程序是哪个版本,以及哪个版本是最新版本,并仅应用相关补丁。
假设用户运行0.38,当前有0.42可用。 0.42 的更新包含 0.39、0.40、0.41 和 0.42 的补丁(可能还有更远的历史)。更新管理器下载 0.42 更新,知道它是 0.38 并应用所有相关补丁。如果当前运行的是 0.41,则仅应用最新的补丁,依此类推。
Your update manager can know which version the current app is, and which version is the most recent one and apply only the relevant patches.
Suppose the user runs 0.38, and currently there is 0.42 available. The update for 0.42 contains patches for 0.39, 0.40, 0.41 and 0.42 (and probably farther down the history). The update manager downloads the 0.42 update, knows it's at 0.38 and applies all the relevant patches. If it currently runs 0.41, it only applies the latest patch, and so on.