如何在数据库中存储文本差异?
我已经决定在 LAMP 堆栈中使用 Horde Text_Diff 引擎来计算差异并渲染它们。我的问题是:
将增量实际存储在数据库中的好方法是什么?我以前从未设计过这种数据库应用程序,而且大多数引擎似乎都需要整个原始文本和更改文本的完全序列化副本,以便呈现差异。
如果是这样,那么如何将差异的数据存储在数据库中而不存储整个新文档?
(注意:对于这个特定目的,它将始终是当前版本 - >建议的差异 - >新的当前版本,这意味着我正在尝试存储实际的差异而不是反向差异。)
I have already decided on using the Horde Text_Diff engine in a LAMP stack for calculating diff's and rendering them. My question is this:
What would be a good way of actually storing the incrementals in a database? I've never had to design this kind of database application before, and it appears that most engines want a fully serialized copy of the entire original and changed text in order to render the differences.
If that's the case, then how can I store the data of the diff in a database without storing the entire new document?
(NOTE: For this particular purpose, it will always be current version->proposed diff->new current version, meaning that I'm trying to store an actual diff instead of a reverse diff.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于 Wiki 应用程序,请考虑存储:
StoredEdition[X] = diff(Edition[X+1], Edition[X])
,其中Edition[0]
是最旧的。例如,在表“articles_revisions”中,每行都有一个时间戳并引用articleID。抱歉,目前我没有关于从串行差异或反向差异重建文本的工具的建议。
For Wiki applications, consider storing:
StoredEdition[X] = diff(Edition[X+1], Edition[X])
, whereEdition[0]
is the oldest. E.g. in a table "articles_revisions", with each row having a timestamp and referring to articleID.Sorry, at this moment I don't have a suggestion for tools to reconstitute text from serial diffs or reverse-diffs.
我认为您应该能够使用
patch
实用程序 。它仅以更改的形式创建两个文本(或文件)之间的差异。然后可以将创建的补丁存储在数据库中。您仍然需要原始文本,然后将所有补丁更新到最新版本。对于 PHP,可以使用 xdiff 扩展用于创建文本和文件的差异。
在数据库中存储差异
要将差异存储在数据库中,您需要保留差异的顺序、差异内容和原始文本。
我假设您已经存储了原始文本。然后,可以将差异存储到差异表中,其中包含对原始文本的引用和自动增量键,以保留差异的文本内容旁边的顺序。然后,您需要按照正确的顺序在另一个之后插入一个差异,应该没问题。
要重新创建当前版本,请查询原始版本和订购的所有差异。然后应用一个又一个差异以获得您想要的版本。
或者,您可以创建另一个包含特定修订结果的表,以防止一遍又一遍地运行大量循环。但这会使数据库内的数据变得冗余。
I think you should be able to work with the
patch
utility. It creates the difference between two texts (or files) in form of the changes only. That created patch can then be stored inside the database. You still need the original text and then all patches up to the latest revision.For PHP the xdiff Extension can be used for creating diffs for text and files.
Storing DIFFs in the database
To store the diffs inside the database you need to preserve the order of diffs, the diffs contents and the original text.
I assume you are already storing the original text. The diffs then can be stored into a diffs table containing a reference to the original text and and auto-increment key to preserve the order next to the text-contents of the diffs. You then need to insert one diff after the other in the correct order and should be fine.
To recreate the current version, query the original version and all diffs ordered. Then apply one diff after the other to get the version you like to get.
Alternatively you can create another table that contains a specific revisions result as well so to prevent to run lot of cycles over and over again. But then this will make the data inside the database redundant.