使用 Base64 编码作为检测更改的机制
是否可以通过检测对象的base64编码的变化来检测对象的变化程度。
假设我将文档附件发送给多个用户,每个用户都对其进行了更改并通过电子邮件发回给我,我可以使用原始 Base64 和接收到的 Base64 之间的字符串距离来检测哪个版本的更改最多。这是一个有效的指标吗?
如果没有,是否还有其他指标来量化增量?
Is it possible to detect changes in the base64 encoding of an object to detect the degree of changes in the object.
Suppose I send a document attachment to several users and each makes changes to it and emails back to me, can I use the string distance between original base64 and the received base64s to detect which version has the most changes. Would that be a valid metric?
If not, would there be any other metrics to quantify the deltas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这完全取决于您编码的文档的类型。如果它是一个文本文件,那么当然,base64 编码的差异可能与实际更改相同。但是,您可能有一种文件格式,其中内容的更改实际上会产生完全不同的二进制文件。 ZIP 文件就是一个例子。
That would depend entirely on the type of the document you had encoded. If it was a text file, then sure, the base64 encoded difference are probably on a par with the actual changes. However, you may have a format of a file where changes to the contents effectively produce a completely different binary file. An example of this would be a ZIP file.
您应该执行与 diff 相同的操作。然后例如对差异字段大小进行度量。
you should do the same that diff does. Then for example do the metrics on diff fiel size.
从理论上讲,是的,如果进行智能差异(检测插入、删除和修改)。
实际上,不会,除非文档绝对是纯文本。二进制格式无法进行有意义的比较。
In theory, yes, if do a smart diff (detecting inserts, deletions, and modifications).
In practice, no, unless the documents are absolutely plain text. Binary formats can't be meaningfully diff'd.
Base64 将 3x8 位值组打包为 4x6。如果将一个 8 位值更改一位位,则只会影响 6 位值中的一个。如果更改两位,则大约有 5/12 的机会达到其他 6 位值之一。所以如果你计算位数,它是完全等价的;否则,您将根据您使用的指标引入噪音。
Base64 packs groups of 3x8 bit values into 4x6. If you change one 8 bit value by one bit, then you'll impact only one of the 6 bit values. If you change by two bits, then you have about a 5/12 chance of hitting one of the other 6 bit values. So if you're counting bits, it is entirely equivalent; otherwise, you will introduce noise depending on the metric you use.