用于查找最匹配的二进制文件的实用程序或库
我希望能够将二进制文件 X 与其他二进制文件的目录进行比较,并找到哪个其他文件与 X 最相似。数据的性质是文件之间存在相同的块,但可能会移动位置。文件大小均为1MB,大约有200个。我希望能够在现代台式计算机上以足够快的速度在几分钟或更短的时间内分析这些内容。 我用谷歌搜索了一下,发现了一些不同的二进制差异实用程序,但它们似乎都不适合我的应用程序。
例如,有 bsdiff,它看起来像是创建了一些大小优化的补丁文件。或者 vbindiff 它只是以图形方式显示差异,但这些似乎并不能真正帮助我弄清楚一个文件是否比另一个文件更类似于 X。
如果没有可以直接用于此目的的工具,是否有人可以推荐一个好的库来编写我自己的实用程序? Python 会更好,但我很灵活。
I would like to be able to compare a binary file X to a directory of other binary files and find which other file is most similar to X. The nature of the data is such that identical chunks will exist between files, but possibly shifted in location. The files are all 1MB in size, and there are about 200 of them. I would like to be have something quick enough to analyze these in a few minutes or less on a modern desktop computer.
I've googled a bit and found a few different binary diff utilities, but none of them seem appropriate for my application.
For example there is bsdiff, which looks like it creates some a patch file which is optimized for size. Or vbindiff which just displays the differences graphically, but those don't really seem to help me figure out if one file is more similar to X than another file.
If there is not a tool that I can use directly for this purpose, is there a good library someone could recommend for writing my own utility? Python would be preferable, but I'm flexible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一个简单的perl脚本,它或多或少地试图做到这一点。
编辑:另请查看以下 stackoverflow 线程。
Here's a simple perl script which more or less tries to do exactly that.
Edit: Also have a look at the following stackoverflow thread.