Git 在一行内合并

发布于 2024-10-30 18:51:33 字数 439 浏览 11 评论 0原文

序言

我正在使用 git 作为我的实验室正在用 LaTeX 编写的论文的版本控制系统。有几个人在合作。

我遇到了 git 对于如何合并的固执。假设两个人对一行进行了单个单词的更改,然后尝试合并它们。虽然 git diff --word-diff 似乎能够逐字显示分支之间的差异,但 git merge 似乎无法逐字执行合并,而是需要手动合并。

对于 LaTeX 文档,这尤其令人烦恼,因为编写 LaTeX 时的常见习惯是每行写一个完整的段落,然后让文本编辑器在显示时处理自动换行。我们现在正在解决这个问题,为每个句子添加一个换行符,以便 git 至少可以合并一个段落中不同句子的更改。但它仍然会对句子中的多个变化感到困惑,这当然会使文本不再很好地换行。

问题

有没有一种方法可以“逐字”而不是“逐行”合并两个文件?

Preamble

I'm using git as a version control system for a paper that my lab is writing, in LaTeX. There are several people collaborating.

I'm running into git being stubborn about how it merges. Let's say two people have made single-word changes to a line, and then attempt to merge them. Though git diff --word-diff seems capable of SHOWING the difference between the branches word-by-word, git merge seems unable to perform the merge word-by-word, and instead requires a manual merge.

With a LaTeX document this is particularly annoying, as the common habit when writing LaTeX is to write a full paragraph per line and just let your text editor handle word wrapping when displaying for you. We are working around for now by adding a newline for each sentence, so that git can at least merge changes on different sentences within a paragraph. But it will still get confused about multiple changes within a sentence, and this makes the text no longer wrap nicely of course.

The Question

Is there a way to git merge two files "word by word" rather than "line by line"?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

战皆罪 2024-11-06 18:51:33

这是一个与 sehe 的解决方案相同的解决方案,做了一些更改,希望能够解决您的意见:

  • 该解决方案考虑按句子进行合并,而不是按单词进行合并,就像您以前手动进行的那样,只是现在,用户将看到每个段落一行,但 git 会看到段落被分成句子。这似乎更符合逻辑,因为从段落中添加/删除句子可能与段落中的其他更改兼容,但当同一个句子被两次提交编辑时,可能更需要手动合并。这还具有“干净”快照仍然具有一定程度的人类可读性(并且可以乳胶编译!)的好处。
  • 过滤器是单行命令,这应该可以更轻松地将其移植给协作者。

正如在萨哈的解决方案中一样,创建(或附加到).gittatributes

    *.tex filter=sentencebreak

现在要实现干净和污迹过滤器:

    git config filter.sentencebreak.clean "perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /
amp;%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\""
    git config filter.sentencebreak.smudge "perl -pe \"s/%NL%\n//gm\""

我创建了一个包含以下内容的测试文件,请注意一行段落。

    \chapter{Tumbling Tumbleweeds. Intro}
    A way out west there was a fella, fella I want to tell you about, fella by the name of Jeff Lebowski.  At least, that was the handle his lovin' parents gave him, but he never had much use for it himself. This Lebowski, he called himself the Dude. Now, Dude, that's a name no one would self-apply where I come from.  But then, there was a lot about the Dude that didn't make a whole lot of sense to me.  And a lot about where he lived, like- wise.  But then again, maybe that's why I found the place s'durned innarestin'.

    This line has two sentences. But it also ends with a comment. % here

将其提交到本地存储库后,我们可以看到原始内容。

    $ git show HEAD:test.tex

    \chapter{Tumbling Tumbleweeds. Intro}
    A way out west there was a fella, fella I want to tell you about, fella by the name of Jeff Lebowski. %NL%
     At least, that was the handle his lovin' parents gave him, but he never had much use for it himself. %NL%
    This Lebowski, he called himself the Dude. %NL%
    Now, Dude, that's a name no one would self-apply where I come from. %NL%
     But then, there was a lot about the Dude that didn't make a whole lot of sense to me. %NL%
     And a lot about where he lived, like- wise. %NL%
     But then again, maybe that's why I found the place s'durned innarestin'.

    This line has two sentences. But it also ends with a comment. % here

因此,clean 过滤器的规则是每当它找到以 .?!'' 结尾的文本字符串时 (这是乳胶中双引号的方式)然后是一个空格,它将添加 %NL% 和换行符。但它会忽略以 \ (乳胶命令)开头或在任何地方包含注释的行(以便注释不能成为正文的一部分)。

污迹过滤器删除 %NL% 和换行符。

差异和合并是在“干净”文件上完成的,因此对段落的更改会逐句合并。这是期望的行为。

好处是乳胶文件应该在干净或污迹状态下编译,因此合作者有希望不需要做任何事情。最后,您可以将 git config 命令放入作为存储库一部分的 shell 脚本中,这样协作者只需在存储库的根目录中运行它即可进行配置。

    #!/bin/bash

    git config filter.sentencebreak.clean "perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /
amp;%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\""
    git config filter.sentencebreak.smudge "perl -pe \"s/%NL%\n//gm\""

    fileArray=($(find . -iname "*.tex"))

    for (( i=0; i<${#fileArray[@]}; i++ ));
    do
        perl -pe "s/%NL%\n//gm" < ${fileArray[$i]} > temp
        mv temp ${fileArray[$i]}
    done

最后一点是一个 hack,因为当这个脚本第一次运行时,分支已经被签出(以干净的形式)并且它不会自动被弄脏。

您可以将此脚本和 .gitattributes 文件添加到存储库中,然后新用户只需克隆,然后在存储库的根目录中运行脚本即可。

我认为如果在 git bash 中完成,这个脚本甚至可以在 Windows git 上运行。

缺点:

  • 这不能巧妙地处理带有注释的行,它只是忽略它们。
  • %NL% 有点丑
  • 过滤器可能会弄乱一些方程(对此不确定)。

Here's a solution in the same vein as sehe's, with a few changes which hopefully will address your comments:

  • This solution considers merging by sentence rather than by word, as you had been doing by hand, only now, the user will see a single line per paragraph, but git will see paragraphs broken into sentences. This seems to be more logical because adding/removing a sentence from a paragraph may be compatible with other changes in the paragraph, but it is probably more desirable to have a manual merge when the same sentence is edited by two commits. This also has the benefit of the "clean" snapshots to still be somewhat human readable (and latex compilable!).
  • The filters are one-line commands, which should make it easier to port this to collaborators.

As in saha's solution make a (or append to) .gittatributes.

    *.tex filter=sentencebreak

Now to implement the clean and smudge filters:

    git config filter.sentencebreak.clean "perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /
amp;%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\""
    git config filter.sentencebreak.smudge "perl -pe \"s/%NL%\n//gm\""

I've created a test file with the following contents, notice the one-line paragraph.

    \chapter{Tumbling Tumbleweeds. Intro}
    A way out west there was a fella, fella I want to tell you about, fella by the name of Jeff Lebowski.  At least, that was the handle his lovin' parents gave him, but he never had much use for it himself. This Lebowski, he called himself the Dude. Now, Dude, that's a name no one would self-apply where I come from.  But then, there was a lot about the Dude that didn't make a whole lot of sense to me.  And a lot about where he lived, like- wise.  But then again, maybe that's why I found the place s'durned innarestin'.

    This line has two sentences. But it also ends with a comment. % here

After we commit it to the local repo, we can see the raw contents.

    $ git show HEAD:test.tex

    \chapter{Tumbling Tumbleweeds. Intro}
    A way out west there was a fella, fella I want to tell you about, fella by the name of Jeff Lebowski. %NL%
     At least, that was the handle his lovin' parents gave him, but he never had much use for it himself. %NL%
    This Lebowski, he called himself the Dude. %NL%
    Now, Dude, that's a name no one would self-apply where I come from. %NL%
     But then, there was a lot about the Dude that didn't make a whole lot of sense to me. %NL%
     And a lot about where he lived, like- wise. %NL%
     But then again, maybe that's why I found the place s'durned innarestin'.

    This line has two sentences. But it also ends with a comment. % here

So the rules of the clean filter are whenever it finds a string of text that ends with . or ? or ! or '' (that's the latex way to do double quotes) then a space, it will add %NL% and a newline character. But it ignores lines that start with \ (latex commands) or contain a comment anywhere (so that comments cannot become part of the main text).

The smudge filter removes %NL% and the newline.

Diffing and merging is done on the 'clean' files so changes to paragraphs are merged sentence by sentence. This is the desired behavior.

The nice thing is that the latex file should compile in either the clean or smudged state, so there is some hope for collaborators to not need to do anything. Finally, you could put the git config commands in a shell script that is part of the repo so a collaborator would just have to run it in the root of the repo to get configured.

    #!/bin/bash

    git config filter.sentencebreak.clean "perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /
amp;%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\""
    git config filter.sentencebreak.smudge "perl -pe \"s/%NL%\n//gm\""

    fileArray=($(find . -iname "*.tex"))

    for (( i=0; i<${#fileArray[@]}; i++ ));
    do
        perl -pe "s/%NL%\n//gm" < ${fileArray[$i]} > temp
        mv temp ${fileArray[$i]}
    done

That last little bit is a hack because when this script is first run, the branch is already checked out (in the clean form) and it doesn't get smudged automatically.

You can add this script and the .gitattributes file to the repo, then new users just need to clone, then run the script in the root of the repo.

I think this script even runs on windows git if done in git bash.

Drawbacks:

  • This doesn't handle lines with comments smartly, it just ignores them.
  • %NL% is kind of ugly
  • The filters may screw up some equations (not sure about this).
¢蛋碎的人ぎ生 2024-11-06 18:51:33

可以尝试一下:

您可以进行某种“规范化”(规范化,如果您愿意的话),而不是更换合并引擎()。我不会说 LateX,但让我说明如下:

假设您有像 test.raw 这样的输入,

curve ball well received {misfit} whatever
proprietary format extinction {benefit}.

您希望它逐字比较/合并。添加以下 .gitattributes 文件

*.raw     filter=wordbyword

然后

git config --global filter.wordbyword.clean /home/username/bin/wordbyword.clean
git config --global filter.wordbyword.smudge /home/username/bin/wordbyword.smudge

过滤器的简约实现是

/home/username/bin/wordbyword.clean

#!/usr/bin/perl
use strict;
use warnings;

while (<>)
{
    print "$_\n" foreach (m/(.*?\s+)/go);
    print '#@#DELIM#@#' . "\n";
}

/home/username/bin/wordbyword.smudge

#!/usr/bin/perl
use strict;
use warnings;

while (<>)
{
    chomp; '#@#DELIM#@#' eq $_ and print "\n" or print;
}

提交文件后,检查原始文件使用 `git show 提交的 blob 的内容

HEAD:test.raw`:

curve 
ball 
well 
received 
{misfit} 
whatever

#@#DELIM#@#
proprietary 
format 
extinction 
{benefit}.

#@#DELIM#@#

将 test.raw 的内容更改为

curve ball welled repreived {misfit} whatever
proprietary extinction format {benefit}.

git diff --patch-with-stat 的输出可能会是您想要的:

 test.raw |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/test.raw b/test.raw
index b0b0b88..ed8c393 100644
--- a/test.raw
+++ b/test.raw
@@ -1,14 +1,14 @@
 curve 
 ball 
-well 
-received 
+welled 
+repreived 
 {misfit} 
 whatever

 #@#DELIM#@#
 proprietary 
-format 
 extinction 
+format 
 {benefit}.

 #@#DELIM#@#

您可以看到这是如何工作的神奇地进行合并,导致逐字比较和合并。 QED

我希望您喜欢我对 .gitattributes 的创造性使用。如果不喜欢,我很喜欢做这个小练习

You could try this:

instead of swapping out a merge engine (hard) you can do some kind of 'normalization' (canonicalization, if you will). I don't speak LateX, but let me illustrate as follows:

Say you have input like test.raw

curve ball well received {misfit} whatever
proprietary format extinction {benefit}.

You want it to diff/merge word-by-word. Add the following .gitattributes file

*.raw     filter=wordbyword

Then

git config --global filter.wordbyword.clean /home/username/bin/wordbyword.clean
git config --global filter.wordbyword.smudge /home/username/bin/wordbyword.smudge

A minimalist implementation of the filters would be

/home/username/bin/wordbyword.clean

#!/usr/bin/perl
use strict;
use warnings;

while (<>)
{
    print "$_\n" foreach (m/(.*?\s+)/go);
    print '#@#DELIM#@#' . "\n";
}

/home/username/bin/wordbyword.smudge

#!/usr/bin/perl
use strict;
use warnings;

while (<>)
{
    chomp; '#@#DELIM#@#' eq $_ and print "\n" or print;
}

After committing the file, inspect the raw contents of the committed blob with `git show

HEAD:test.raw`:

curve 
ball 
well 
received 
{misfit} 
whatever

#@#DELIM#@#
proprietary 
format 
extinction 
{benefit}.

#@#DELIM#@#

After changing the contents of test.raw to

curve ball welled repreived {misfit} whatever
proprietary extinction format {benefit}.

The output of git diff --patch-with-stat will probably what you wanted:

 test.raw |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/test.raw b/test.raw
index b0b0b88..ed8c393 100644
--- a/test.raw
+++ b/test.raw
@@ -1,14 +1,14 @@
 curve 
 ball 
-well 
-received 
+welled 
+repreived 
 {misfit} 
 whatever

 #@#DELIM#@#
 proprietary 
-format 
 extinction 
+format 
 {benefit}.

 #@#DELIM#@#

You can see how this would work magically for merges resulting in word-by-word diffing and merging. Q.E.D.

(I hope you like my creative use of .gitattributes. If not, I enjoyed making this little exercise)

南渊 2024-11-06 18:51:33

我相信 git merge 算法是非常简单(尽管您可以通过“耐心”合并策略使其更加努力)。
其工作项目将保持不变。

但总体思路是将任何细粒度的检测§解决机制委托给第三方工具您可以使用git config mergetool进行设置。
如果长行中的某些单词不同,则外部工具(KDiff3DiffMerge...)将能够获取该更改并将其呈现给您。

I believe the git merge algorithm is quite simple (even though you can make it work harder with the "patience" merge strategy).
Its work item will remain the line.

But the general idea is to delegate any fine-grained detection§resolution mechanism to a third-party tool you can setup with git config mergetool.
If some words within a long line differs, that external tool (KDiff3, DiffMerge, ...) will be able to pick up that change and present it to you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文