Git 在一行内合并
序言
我正在使用 git 作为我的实验室正在用 LaTeX 编写的论文的版本控制系统。有几个人在合作。
我遇到了 git 对于如何合并的固执。假设两个人对一行进行了单个单词的更改,然后尝试合并它们。虽然 git diff --word-diff 似乎能够逐字显示分支之间的差异,但 git merge 似乎无法逐字执行合并,而是需要手动合并。
对于 LaTeX 文档,这尤其令人烦恼,因为编写 LaTeX 时的常见习惯是每行写一个完整的段落,然后让文本编辑器在显示时处理自动换行。我们现在正在解决这个问题,为每个句子添加一个换行符,以便 git 至少可以合并一个段落中不同句子的更改。但它仍然会对句子中的多个变化感到困惑,这当然会使文本不再很好地换行。
问题
有没有一种方法可以“逐字”而不是“逐行”合并两个文件?
Preamble
I'm using git as a version control system for a paper that my lab is writing, in LaTeX. There are several people collaborating.
I'm running into git being stubborn about how it merges. Let's say two people have made single-word changes to a line, and then attempt to merge them. Though git diff --word-diff seems capable of SHOWING the difference between the branches word-by-word, git merge seems unable to perform the merge word-by-word, and instead requires a manual merge.
With a LaTeX document this is particularly annoying, as the common habit when writing LaTeX is to write a full paragraph per line and just let your text editor handle word wrapping when displaying for you. We are working around for now by adding a newline for each sentence, so that git can at least merge changes on different sentences within a paragraph. But it will still get confused about multiple changes within a sentence, and this makes the text no longer wrap nicely of course.
The Question
Is there a way to git merge two files "word by word" rather than "line by line"?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一个与 sehe 的解决方案相同的解决方案,做了一些更改,希望能够解决您的意见:
正如在萨哈的解决方案中一样,创建(或附加到)
.gittatributes
。现在要实现干净和污迹过滤器:
我创建了一个包含以下内容的测试文件,请注意一行段落。
将其提交到本地存储库后,我们可以看到原始内容。
因此,clean 过滤器的规则是每当它找到以
.
或?
或!
或'' 结尾的文本字符串时
(这是乳胶中双引号的方式)然后是一个空格,它将添加 %NL% 和换行符。但它会忽略以 \ (乳胶命令)开头或在任何地方包含注释的行(以便注释不能成为正文的一部分)。污迹过滤器删除 %NL% 和换行符。
差异和合并是在“干净”文件上完成的,因此对段落的更改会逐句合并。这是期望的行为。
好处是乳胶文件应该在干净或污迹状态下编译,因此合作者有希望不需要做任何事情。最后,您可以将 git config 命令放入作为存储库一部分的 shell 脚本中,这样协作者只需在存储库的根目录中运行它即可进行配置。
最后一点是一个 hack,因为当这个脚本第一次运行时,分支已经被签出(以干净的形式)并且它不会自动被弄脏。
您可以将此脚本和 .gitattributes 文件添加到存储库中,然后新用户只需克隆,然后在存储库的根目录中运行脚本即可。
我认为如果在 git bash 中完成,这个脚本甚至可以在 Windows git 上运行。
缺点:
Here's a solution in the same vein as sehe's, with a few changes which hopefully will address your comments:
As in saha's solution make a (or append to)
.gittatributes
.Now to implement the clean and smudge filters:
I've created a test file with the following contents, notice the one-line paragraph.
After we commit it to the local repo, we can see the raw contents.
So the rules of the clean filter are whenever it finds a string of text that ends with
.
or?
or!
or''
(that's the latex way to do double quotes) then a space, it will add %NL% and a newline character. But it ignores lines that start with \ (latex commands) or contain a comment anywhere (so that comments cannot become part of the main text).The smudge filter removes %NL% and the newline.
Diffing and merging is done on the 'clean' files so changes to paragraphs are merged sentence by sentence. This is the desired behavior.
The nice thing is that the latex file should compile in either the clean or smudged state, so there is some hope for collaborators to not need to do anything. Finally, you could put the
git config
commands in a shell script that is part of the repo so a collaborator would just have to run it in the root of the repo to get configured.That last little bit is a hack because when this script is first run, the branch is already checked out (in the clean form) and it doesn't get smudged automatically.
You can add this script and the .gitattributes file to the repo, then new users just need to clone, then run the script in the root of the repo.
I think this script even runs on windows git if done in git bash.
Drawbacks:
您可以尝试一下:
您可以进行某种“规范化”(规范化,如果您愿意的话),而不是更换合并引擎(硬)。我不会说 LateX,但让我说明如下:
假设您有像
test.raw
这样的输入,您希望它逐字比较/合并。添加以下 .gitattributes 文件
然后
过滤器的简约实现是
/home/username/bin/wordbyword.clean
/home/username/bin/wordbyword.smudge
提交文件后,检查原始文件使用 `git show 提交的 blob 的内容
将 test.raw 的内容更改为
git diff --patch-with-stat 的输出可能会是您想要的:
您可以看到这是如何工作的神奇地进行合并,导致逐字比较和合并。 QED
(我希望您喜欢我对 .gitattributes 的创造性使用。如果不喜欢,我很喜欢做这个小练习)
You could try this:
instead of swapping out a merge engine (hard) you can do some kind of 'normalization' (canonicalization, if you will). I don't speak LateX, but let me illustrate as follows:
Say you have input like
test.raw
You want it to diff/merge word-by-word. Add the following
.gitattributes
fileThen
A minimalist implementation of the filters would be
/home/username/bin/wordbyword.clean
/home/username/bin/wordbyword.smudge
After committing the file, inspect the raw contents of the committed blob with `git show
After changing the contents of test.raw to
The output of
git diff --patch-with-stat
will probably what you wanted:You can see how this would work magically for merges resulting in word-by-word diffing and merging. Q.E.D.
(I hope you like my creative use of .gitattributes. If not, I enjoyed making this little exercise)
我相信
git merge
算法是非常简单(尽管您可以通过“耐心”合并策略使其更加努力)。其工作项目将保持不变。
但总体思路是将任何细粒度的检测§解决机制委托给第三方工具您可以使用
git config mergetool
进行设置。如果长行中的某些单词不同,则外部工具(
KDiff3
、DiffMerge
...)将能够获取该更改并将其呈现给您。I believe the
git merge
algorithm is quite simple (even though you can make it work harder with the "patience" merge strategy).Its work item will remain the line.
But the general idea is to delegate any fine-grained detection§resolution mechanism to a third-party tool you can setup with
git config mergetool
.If some words within a long line differs, that external tool (
KDiff3
,DiffMerge
, ...) will be able to pick up that change and present it to you.