提取两个补丁文件的公共子集

发布于 2025-01-02 23:21:13 字数 460 浏览 1 评论 0原文

我有三个源树，A、B 和 C。A 是原始树。 B 和 C 是 A 的修改，由 2 个不同的开发人员创建。

我已经对 A 与 B 和 A 与 C 进行了比较。

但这两个比较文件中有许多常见的更改。意思是A不是B和C的最新分歧点。相反，A被修改为D（我没有），然后B和C在D上被修改。

我的问题是：什么可以我用两个 diff 文件（除了手工劳动）来提取它们的最大公共子集？这样我将该子集作为补丁应用到 A 以获得 D

编辑 1：插图：

A ---> D ---> B
        \---> C

编辑 2：我查看了 patchutils 工具，但没有找到一个可以满足我需要的工具。我也看过这个问题，但是那里提到的方法没有给出正确的输出。

原文

I have three source trees, A, B, and C. A is the original tree. B and C are modifications of a A, created by 2 different developers.

I have taken diff's of A with B and A with C.

But there are many changes in the two diff files that are common. Meaning A is not the latest point of divergence for B and C. Instead A was modified to a point where it became D (that I don't have), and B and C were then modified over D.

My question is: what can I do with the two diff files (besides manual labor) to extract their maximum common subset? so that I apply that subset as a patch to A to get D

EDIT 1: Illustration:

A ---> D ---> B
        \---> C

EDIT 2: I have looked at patchutils tools but didn't find one that does what I need. I have also looked at this question but the method mentioned there doesn't give correct output.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萌无敌 2025-01-09 23:21:13

我不知道你将如何手动执行此操作，但你可以看一下 3 路合并概念：http://en.wikipedia.org/wiki/Merge_(revision_control)#Three-way_merge

一些出色的版本控制系统使用 3 路合并其合并算法（例如 Mercurial）。您还可以在此处查找独立的三向合并工具：https://stackoverflow.com/questions/460198/best-free-3-way-merge-tool-for-windows">https:// /stackoverflow.com/questions/460198/best-free-3-way-merge-tool-for-windows

回复收藏 0 原文

逆流 2025-01-09 23:21:13

好吧，伙计们，我想出了以下解决方案（不使用 git、mercurial 等）。（免责声明：可能有拼写错误，可能需要更改才能在您这边工作）

底层方法/算法如下：

将两个差异文件拆分为较小的组件
比较将一个 diff 文件的组件与另一个 diff 文件的组件进行比较，并选择两者中相同的组件
连接所选组件以创建一个正确的新 diff 文件

格式 diff 文件具有文件级差异，每个文件级差异都有一个或多个块。如果我所说的组件是指文件级组件，那么可以使用 patchutils 工具“splitdiff”和“combinediff”来完成提取，如下所示：

$ # Step 1
$ mkdir AB_components; cp AB.diff AB_components; cd AB_components
$ splitdiff -ad AB.diff
$ cd ..
$ mkdir AC_components; cp AC.diff AC_components; cd AC_components
$ splitdiff -ad AC.diff
$ cd ..
$
$ # Step 2
$ mkdir AD_components;
$ for f in `diff -rs AB_components AC_components | grep 'are identical
但是，如果我所说的组件是指单个块，那么 splitdiff 是不够的。我在这里找到了一个工具，可以将文件分割成单独的块（我必须在该脚本中稍微更改一下让它在我的机器上工作......特别是我必须注释掉“require 'file.rb'”行）。
对于第 2 步，我必须运行一个双 for 循环来查找“相同”的帅哥：
$ for f in `ls AB_components.mod/*`; do for g in `ls AC_components.mod/*`; do diff -s $f $g | grep 'are identical
对于组合，我必须遵循两步过程：

第 3 步第 1 部分：我首先组合属于同一文件的帅哥来创建一个每个文件的 diff
步骤 3 第 2 部分：我使用combinediff 来加入这些 diff 文件以创建一个最终的 diff 文件

对于第 3 部分第 1 部分，我创建了以下 shell 脚本（我们称之为combinehunks.sh）：
#!/bin/bash
filename=$1
echo 'diff header line:'
firstpatchfile=`ls -1v $filename.*.patch | head -1`
head -2 $firstpatchfile
files=`ls -1v $filename.*.patch`
for f in $files; do tail -n +3 $f; done

我将其用作如下所示：
$ mkdir AD_filelevel_components; cd AD_filelevel_components
$ for f in `ls ../AD_components/* | rev | cut -d'.' -f3- | rev | sort | uniq`; do ../combinehunks.sh $f > `basename $f`.patch; done

步骤 3 第 2 部分与文件级情况中的步骤 3 相同，只是使用 AD_filelevel_components 目录而不是 AD_components。
警告/注释：


在继续这项工作之前，我必须从 --- 和 ++++ 标题行中删除时间戳（时间戳通常不同，并且不必要地保留diff 组件是相同的）


在执行该过程之前，我还从 diff 文件中删除了 Only in ... 行。


对于大块级别的工作，我必须在比较之前更改 @@ 行。基本上我删除了该行的第二部分，即将 @@ -nnn,nn +mmm,mm @@ 更改为 @@ -nnn,nn @@。请注意上面 AB_components.mod 与 AB_components 的使用。这仅用于比较。进入最终 diff 的 hunk 必须具有正确的 @@ 行，否则合并的iff 将报告错误


通过“差异文件”和“补丁文件”我的意思是相同的。在整个工作中，我专门使用了统一的 diff 格式，即 diff -u


 AB_components.mod 是这样创建的：
$ cp -r AB_components{,.mod}
$ cd AB_components.mod
$ for f in `ls`; do sed -i -e 's/@@ \(.*\) \(.*\) @@$/@@ \1 @@/g' $f; done

编辑 1：我必须采取以下额外步骤来修复有缺陷的 ruby 代码的问题（提到在我下面的评论中）：
$ cd ..; cp -r AB_components{,.mod2}; cd AB_components.mod2
$ for f in `ls`; do echo $f:`tail -1 $f`; done | grep ':diff ' | cut -d':' -f1 > ../bad_files
$ for f in `cat ../bad_files`; do head -n -1 ../AB_components/$f > $f; done

 | cut -d' ' -f2 | cut -d'/' -f2`; do cp AB_components/$f AD_components; done
$
$ # Step 3
$ cd AD_components; touch AD.diff
$ for f in `ls ._*`; do combinediff AD.diff $f > tmpfile; mv tmpfile AD.diff; done

但是，如果我所说的组件是指单个块，那么 splitdiff 是不够的。我在这里找到了一个工具，可以将文件分割成单独的块（我必须在该脚本中稍微更改一下让它在我的机器上工作......特别是我必须注释掉“require 'file.rb'”行）。
对于第 2 步，我必须运行一个双 for 循环来查找“相同”的帅哥：

对于组合，我必须遵循两步过程：

第 3 步第 1 部分：我首先组合属于同一文件的帅哥来创建一个每个文件的 diff
步骤 3 第 2 部分：我使用combinediff 来加入这些 diff 文件以创建一个最终的 diff 文件

对于第 3 部分第 1 部分，我创建了以下 shell 脚本（我们称之为combinehunks.sh）：

我将其用作如下所示：

步骤 3 第 2 部分与文件级情况中的步骤 3 相同，只是使用 AD_filelevel_components 目录而不是 AD_components。
警告/注释：


在继续这项工作之前，我必须从 --- 和 ++++ 标题行中删除时间戳（时间戳通常不同，并且不必要地保留diff 组件是相同的）


在执行该过程之前，我还从 diff 文件中删除了 Only in ... 行。


对于大块级别的工作，我必须在比较之前更改 @@ 行。基本上我删除了该行的第二部分，即将 @@ -nnn,nn +mmm,mm @@ 更改为 @@ -nnn,nn @@。请注意上面 AB_components.mod 与 AB_components 的使用。这仅用于比较。进入最终 diff 的 hunk 必须具有正确的 @@ 行，否则合并的iff 将报告错误


通过“差异文件”和“补丁文件”我的意思是相同的。在整个工作中，我专门使用了统一的 diff 格式，即 diff -u


 AB_components.mod 是这样创建的：

编辑 1：我必须采取以下额外步骤来修复有缺陷的 ruby 代码的问题（提到在我下面的评论中）：

; done; done > identical_hunks
$ for f in `cat identical_hunks | cut -d' ' -f2`; do cp AB_components/`basename $f` AD_components; done

对于组合，我必须遵循两步过程：

第 3 步第 1 部分：我首先组合属于同一文件的帅哥来创建一个每个文件的 diff
步骤 3 第 2 部分：我使用combinediff 来加入这些 diff 文件以创建一个最终的 diff 文件

对于第 3 部分第 1 部分，我创建了以下 shell 脚本（我们称之为combinehunks.sh）：

我将其用作如下所示：

步骤 3 第 2 部分与文件级情况中的步骤 3 相同，只是使用 AD_filelevel_components 目录而不是 AD_components。

警告/注释：

在继续这项工作之前，我必须从 --- 和 ++++ 标题行中删除时间戳（时间戳通常不同，并且不必要地保留diff 组件是相同的）
在执行该过程之前，我还从 diff 文件中删除了 Only in ... 行。
对于大块级别的工作，我必须在比较之前更改 @@ 行。基本上我删除了该行的第二部分，即将 @@ -nnn,nn +mmm,mm @@ 更改为 @@ -nnn,nn @@。请注意上面 AB_components.mod 与 AB_components 的使用。这仅用于比较。进入最终 diff 的 hunk 必须具有正确的 @@ 行，否则合并的iff 将报告错误
通过“差异文件”和“补丁文件”我的意思是相同的。在整个工作中，我专门使用了统一的 diff 格式，即 diff -u

AB_components.mod 是这样创建的：

编辑 1：我必须采取以下额外步骤来修复有缺陷的 ruby 代码的问题（提到在我下面的评论中）：

| cut -d' ' -f2 | cut -d'/' -f2`; do cp AB_components/$f AD_components; done $ $ # Step 3 $ cd AD_components; touch AD.diff $ for f in `ls ._*`; do combinediff AD.diff $f > tmpfile; mv tmpfile AD.diff; done

但是，如果我所说的组件是指单个块，那么 splitdiff 是不够的。我在这里找到了一个工具，可以将文件分割成单独的块（我必须在该脚本中稍微更改一下让它在我的机器上工作......特别是我必须注释掉“require 'file.rb'”行）。

对于第 2 步，我必须运行一个双 for 循环来查找“相同”的帅哥：

对于组合，我必须遵循两步过程：

第 3 步第 1 部分：我首先组合属于同一文件的帅哥来创建一个每个文件的 diff
步骤 3 第 2 部分：我使用combinediff 来加入这些 diff 文件以创建一个最终的 diff 文件

对于第 3 部分第 1 部分，我创建了以下 shell 脚本（我们称之为combinehunks.sh）：

我将其用作如下所示：

步骤 3 第 2 部分与文件级情况中的步骤 3 相同，只是使用 AD_filelevel_components 目录而不是 AD_components。

警告/注释：

在继续这项工作之前，我必须从 --- 和 ++++ 标题行中删除时间戳（时间戳通常不同，并且不必要地保留diff 组件是相同的）
在执行该过程之前，我还从 diff 文件中删除了 Only in ... 行。
对于大块级别的工作，我必须在比较之前更改 @@ 行。基本上我删除了该行的第二部分，即将 @@ -nnn,nn +mmm,mm @@ 更改为 @@ -nnn,nn @@。请注意上面 AB_components.mod 与 AB_components 的使用。这仅用于比较。进入最终 diff 的 hunk 必须具有正确的 @@ 行，否则合并的iff 将报告错误
通过“差异文件”和“补丁文件”我的意思是相同的。在整个工作中，我专门使用了统一的 diff 格式，即 diff -u

AB_components.mod 是这样创建的：

编辑 1：我必须采取以下额外步骤来修复有缺陷的 ruby 代码的问题（提到在我下面的评论中）：

Alright guys, I came up with the following solution (without using git, mercurial etc). (DISCLAIMER: may have typos, might require changes to work on your side)

The underlying method/algorithm is as follows:

Split both diff files into smaller components
Compare the components of one diff file with those of the other and select those that are identical in both
Join the selected components to create a new diff file with correct formatting

Each of my diff files has file-level diffs and each file-level diff has one or more hunks. If by components I mean file-level components then the extraction can be done with patchutils tools "splitdiff" and "combinediff" as follows:

$ # Step 1
$ mkdir AB_components; cp AB.diff AB_components; cd AB_components
$ splitdiff -ad AB.diff
$ cd ..
$ mkdir AC_components; cp AC.diff AC_components; cd AC_components
$ splitdiff -ad AC.diff
$ cd ..
$
$ # Step 2
$ mkdir AD_components;
$ for f in `diff -rs AB_components AC_components | grep 'are identical
However if by components I mean individual hunks then splitdiff is not enough. I found a tool here that splits a file into individual hunks (I had to make slight change in that script to make it work on my machine ... specifically I had to comment out the "require 'file.rb'" line). 
For step 2 I had to run a double for-loop for finding 'identical' hunks:
$ for f in `ls AB_components.mod/*`; do for g in `ls AC_components.mod/*`; do diff -s $f $g | grep 'are identical
For the combining I had to follow a two step process:

Step 3 part 1: I first combined hunks belonging to same file(s) to create a diff for each file
Step 3 part 2: I used combinediff to join those diff files to create one final diff file

For step 3 part 1, I created the following shell script (let's call it combinehunks.sh):
#!/bin/bash
filename=$1
echo 'diff header line:'
firstpatchfile=`ls -1v $filename.*.patch | head -1`
head -2 $firstpatchfile
files=`ls -1v $filename.*.patch`
for f in $files; do tail -n +3 $f; done

and I used it as follows:
$ mkdir AD_filelevel_components; cd AD_filelevel_components
$ for f in `ls ../AD_components/* | rev | cut -d'.' -f3- | rev | sort | uniq`; do ../combinehunks.sh $f > `basename $f`.patch; done

Step 3 part 2 is same as step 3 in the file-level case, except using the AD_filelevel_components directory instead of AD_components.
Caveats/Notes:


I had to remove timestamps from --- and +++ header lines before proceeding with this work (timestamps are often different and would needlessly keep the diff components from being identical)


I also removed Only in ... lines from the diff file before the procedure.


For hunk-level work, I had to change @@ lines before comparison. Basically I removed the 2nd portion of the lines, i.e., changing @@ -nnn,nn +mmm,mm @@ to @@ -nnn,nn @@. Note the use of AB_components.mod versus AB_components above.This is only for the comparison. Hunks that go into the final diff must have the correct @@ lines otherwise combinediff will report errors


By 'diff file' and 'patch file' I mean the same thing. Throughout this work I used unified diff format exclusively i.e., diff -u


AB_components.mod was created like this:
$ cp -r AB_components{,.mod}
$ cd AB_components.mod
$ for f in `ls`; do sed -i -e 's/@@ \(.*\) \(.*\) @@$/@@ \1 @@/g' $f; done

EDIT 1: I had to take the following additional step to fix the issue with buggy ruby code (mentioned in my comment below):
$ cd ..; cp -r AB_components{,.mod2}; cd AB_components.mod2
$ for f in `ls`; do echo $f:`tail -1 $f`; done | grep ':diff ' | cut -d':' -f1 > ../bad_files
$ for f in `cat ../bad_files`; do head -n -1 ../AB_components/$f > $f; done

 | cut -d' ' -f2 | cut -d'/' -f2`; do cp AB_components/$f AD_components; done
$
$ # Step 3
$ cd AD_components; touch AD.diff
$ for f in `ls ._*`; do combinediff AD.diff $f > tmpfile; mv tmpfile AD.diff; done

However if by components I mean individual hunks then splitdiff is not enough. I found a tool here that splits a file into individual hunks (I had to make slight change in that script to make it work on my machine ... specifically I had to comment out the "require 'file.rb'" line). 
For step 2 I had to run a double for-loop for finding 'identical' hunks:

For the combining I had to follow a two step process:

Step 3 part 1: I first combined hunks belonging to same file(s) to create a diff for each file
Step 3 part 2: I used combinediff to join those diff files to create one final diff file

For step 3 part 1, I created the following shell script (let's call it combinehunks.sh):

and I used it as follows:

Step 3 part 2 is same as step 3 in the file-level case, except using the AD_filelevel_components directory instead of AD_components.
Caveats/Notes:


I had to remove timestamps from --- and +++ header lines before proceeding with this work (timestamps are often different and would needlessly keep the diff components from being identical)


I also removed Only in ... lines from the diff file before the procedure.


For hunk-level work, I had to change @@ lines before comparison. Basically I removed the 2nd portion of the lines, i.e., changing @@ -nnn,nn +mmm,mm @@ to @@ -nnn,nn @@. Note the use of AB_components.mod versus AB_components above.This is only for the comparison. Hunks that go into the final diff must have the correct @@ lines otherwise combinediff will report errors


By 'diff file' and 'patch file' I mean the same thing. Throughout this work I used unified diff format exclusively i.e., diff -u


AB_components.mod was created like this:

EDIT 1: I had to take the following additional step to fix the issue with buggy ruby code (mentioned in my comment below):

; done; done > identical_hunks
$ for f in `cat identical_hunks | cut -d' ' -f2`; do cp AB_components/`basename $f` AD_components; done

For the combining I had to follow a two step process:

Step 3 part 1: I first combined hunks belonging to same file(s) to create a diff for each file
Step 3 part 2: I used combinediff to join those diff files to create one final diff file

For step 3 part 1, I created the following shell script (let's call it combinehunks.sh):

and I used it as follows:

Step 3 part 2 is same as step 3 in the file-level case, except using the AD_filelevel_components directory instead of AD_components.

Caveats/Notes:

I had to remove timestamps from --- and +++ header lines before proceeding with this work (timestamps are often different and would needlessly keep the diff components from being identical)
I also removed Only in ... lines from the diff file before the procedure.
For hunk-level work, I had to change @@ lines before comparison. Basically I removed the 2nd portion of the lines, i.e., changing @@ -nnn,nn +mmm,mm @@ to @@ -nnn,nn @@. Note the use of AB_components.mod versus AB_components above.This is only for the comparison. Hunks that go into the final diff must have the correct @@ lines otherwise combinediff will report errors
By 'diff file' and 'patch file' I mean the same thing. Throughout this work I used unified diff format exclusively i.e., diff -u

AB_components.mod was created like this:

EDIT 1: I had to take the following additional step to fix the issue with buggy ruby code (mentioned in my comment below):

However if by components I mean individual hunks then splitdiff is not enough. I found a tool here that splits a file into individual hunks (I had to make slight change in that script to make it work on my machine ... specifically I had to comment out the "require 'file.rb'" line).

For step 2 I had to run a double for-loop for finding 'identical' hunks: