散文的版本控制
看来一定有人已经做到了这一点,但我找不到我正在寻找的最终产品。
使用文本版本控制系统很费力。每个句子的末尾甚至长句子中间都需要换行符。查看 git 源代码,似乎通过更改一些检查 '\n'
的例程,应该可以让 git (或任何其他版本控制系统)匹配 '\ n'
或模式 '\\.\s'
。然而,这是一项需要仔细完成的任务,否则我会发现事情会非常糟糕。
有谁认识已经这样做过的人吗?或者还有其他替代方案吗?
谢谢!
It seems that someone has must have done this already, but I cannot find the end product I'm looking for.
Using a version control system for text is laborious. You need newline characters at the end of each sentence, and even in the midst of long sentences. Looking at the git source, it seems that by changing a few routines that check for '\n'
, it should be possible to have git (or any other version control system) match '\n'
or the pattern '\\.\s'
. It is, however, a task that needs to be done meticulously, or I can see things breaking pretty badly.
Does anyone know someone that has already done this? Or any other alternatives?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
任何版本控制系统都应该能够处理散文。问题是它能如何有效地做到这一点。
git diff
命令使用诸如diff -u
之类的东西来显示文件两个版本之间的差异。如果文件由非常长的行文本组成(即,'\n'
字符之间有许多字符),则可能难以有意义地显示差异;它可能会显示两行 5000 个字符的行,只有一个字符发生变化。但这并不一定意味着这就是 git 存储文件的方式。我不太熟悉 git 的内部存储格式,但我的理解是它对二进制文件的处理相当好,二进制文件可能有很多兆字节的数据,没有
'\n'
字符。请注意,一些较旧的版本控制系统(SCCS、RCS)可能确实逐行存储版本之间的差异。但即使对于这样的系统,最坏的情况您也将存储每个版本的完整副本以及一些开销。系统应该仍然能够正常工作。
请注意, git diff --word-diff 应该至少部分解决比较版本的问题。
Any version control system should be able to handle prose. The question is how efficiently it can do so.
The
git diff
command uses something likediff -u
to display the differences between two versions of a file. If the file consists of text with very long lines (i.e., many characters between'\n'
characters), then it might have some difficulty displaying the differences meaningfully; it might show two 5000-character lines with only a single character change.But that doesn't necessarily imply that that's how
git
stores the files. I'm not intimately familiar with git's internal storage format, but my understanding is that it does reasonably well with binary files, which could have many megabytes of data with no'\n'
characters.Note that some older version control systems (SCCS, RCS) probably do store differences between versions on a line-by-line basis. But even for such systems, at worst you'd be storing a full copy of each version plus some overhead. The system should still be able to work properly.
Note that
git diff --word-diff
should at least partially work around the problem of comparing versions.