如何执行忽略所有注释的差异?

发布于 2024-12-05 16:02:06 字数 273 浏览 1 评论 0 原文

我有一个从原始项目分叉的大型代码库,我正在尝试找出与原始项目的所有差异。许多文件编辑包括注释掉的调试代码和其他杂项注释。 Ubuntu下名为Meld的GUI diff/merge工具可以忽略注释,但只能忽略单行注释。

有没有其他方便的方法来仅查找非注释差异,无论是使用 GUI 工具还是 Linux 命令行工具?如果有什么不同的话,代码是 PHP 和 Javascript 的混合体,所以我主要感兴趣的是忽略 ///* */ #

I have a large codebase that was forked from the original project and I'm trying to track down all the differences from the original. A lot of the file edits consist of commented out debugging code and other miscellaneous comments. The GUI diff/merge tool called Meld under Ubuntu can ignore comments, but only single line comments.

Is there any other convenient way of finding only the non-comment diffs, either using a GUI tool or linux command line tools? In case it makes a difference, the code is a mixture of PHP and Javascript, so I'm primarily interested in ignoring //, /* */ and #.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

记忆里有你的影子 2024-12-12 16:02:06

要使用视觉差异,您可以尝试 MeldDiffMerge

DiffMerge

其规则集和选项提供定制行为。

GNU diffutils

从命令行角度,您可以对 diff 使用 --ignore-matching-lines=RE 选项,例如:

diff -d -I '^#' -I '^ #' file1 file2

Please请注意,正则表达式必须匹配两个文件中的相应行,并且它匹配块中每个更改的行才能工作,否则它仍然会显示差异。

使用单引号来保护模式免受 shell 扩展并转义正则表达式保留字符(例如括号)。

我们可以阅读 diffutils手册:

但是,如果块中的每个更改行(每个插入和每个删除)都与正则表达式匹配,则 -I 仅忽略包含正则表达式的行的插入或删除。

换句话说,对于每个不可忽略的更改,diff 打印其附近的完整更改集,包括可忽略的更改。您可以使用多个 -I 选项为要忽略的行指定多个正则表达式。 diff 尝试将每一行与每个正则表达式进行匹配,从给定的最后一个开始。

armel 这里也很好地解释了这种行为。


另请参阅:

或者,检查其他比较应用程序,了解示例:

To use visual diff, you can try Meld or DiffMerge.

DiffMerge

Its rulesets and options provide for customized behavior.

GNU diffutils

From the command-line perspective, you can use --ignore-matching-lines=RE option for diff, for example:

diff -d -I '^#' -I '^ #' file1 file2

Please note that the regex has to match the corresponding line in both files and it matches every changed line in the hunk in order to work, otherwise it'll still show the difference.

Use single quotes to protect pattern from shell expanding and to escape the regex-reserved characters (e.g. brackets).

We can read in diffutils manual:

However, -I only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk (every insertion and every deletion) matches the regular expression.

In other words, for each non-ignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones. You can specify more than one regular expression for lines to ignore by using more than one -I option. diff tries to match each line against each regular expression, starting with the last one given.

This behavior is also well explained by armel here.


See also:

Alternatively, check other diff apps, for example:

书信已泛黄 2024-12-12 16:02:06

您可以先通过 stripcmt 过滤这两个文件,这将删除 C 和 C++ 注释。要删除 # 注释,sed 's/#.*//' 将删除这些注释。

当然,首先删除注释时您会失去一些上下文,但另一方面,注释中的差异不会产生任何问题。我想我会像下面这样做(针对单个文件进行描述,根据需要自动化):

  1. 如果原始代码库的最新版本是 A 并且
    复制的最新代码库是 B,让我们用以下命令调用版本
    删除了 A'B' 的注释(例如,在处理时将它们保存到临时文件中)。
  2. 找到一些常见的原始版本并将注释从其中删除到 O' 中(或者为此重新使用 B')。
  3. O'A'B' 执行 3 路合并并保存到 C'KDiff3 是一个出色的工具。
  4. 现在您已经有了要合并的代码更改,但是 C' 没有注释,因此返回“正常”模式,与 A' 进行新的 3 路合并作为基础以及 AC'。这会将 A'C' 之间的更改(即您想要的代码更改)提取到正常代码库中,并带有基于版本 A 的注释

强烈建议您在开始之前在纸上绘制版本树,以便清楚地了解您想要处理的版本。但不要限制树显示的内容,您可以 合并任何版本和任何方向(如果您只是弄清楚要使用什么版本)。

You can filter both files through stripcmt first which will remove C and C++ comments. For removing # comments, sed 's/#.*//' will remove those.

Of course you will loose some context when removing comments first, but on the other hand differences in comments will not make any problems. I think I would have done it like the following (described for a single file, automate as required):

  1. If the latest version of the original code base is A and the
    latest of the copied code base is B, let's call the versions with
    comments removed for A' and B' (e.g. save those to temporarily files while processing).
  2. Find some common origin version and strip comments from that into O' (alternatively just re-use B' for this).
  3. Perform a 3-way merge of O', A' and B' and save to C'. KDiff3 is an excellent tool for this.
  4. Now you have the code changes you want merged, however C' is without comments, so get back into "normal" mode, do a new 3-way merge with A' as base and A and C'. This will pick up the changes between A' and C' (which is the code changes what you want) into the normal code base with comments based on version A.

Drawing version trees on paper is before you start is highly recommended to get a clear picture of which versions you want to work on. But don't be limited of what the tree is showing, you can merge any version and in any direction if you just figure out what versions to use.

东风软 2024-12-12 16:02:06
diff <file1> <file2> | grep -v '^[<>]\ #'

远非完美,但它会让人了解差异

diff <file1> <file2> | grep -v '^[<>]\ #'

Far from perfect but it will give an idea of the differences

一紙繁鸢 2024-12-12 16:02:06

请参阅我们的 Smart Differencer 系列工具,该工具使用语言结构而不是比较计算机语言源文件布局作为指导。这特别意味着它在比较代码时忽略注释和空格。

有一个 PHP 的 SmartDifferencer

See our Smart Differencer line of tools, which compare computer language source files using the language structure rather than the layout as a guide. This in particular means it ignores comments and whitespace in comparing code.

There is a SmartDifferencer for PHP.

歌枕肩 2024-12-12 16:02:06

gnu diff 支持忽略与正则表达式匹配的行:

diff --ignore-matching-lines='^#' file1 file2

对于文件夹:

diff -[bB]qr --ignore-matching-lines='^#' folder1/ folder2/

这将忽略行开头以 # 开头的所有行。

gnu diff supports ignoring lines wich match a regular expression:

diff --ignore-matching-lines='^#' file1 file2

and for folders:

diff -[bB]qr --ignore-matching-lines='^#' folder1/ folder2/

This would ignore all lines which start with a # at the line beginning.

平定天下 2024-12-12 16:02:06

我尝试过: diff file1 file2diff -d -I ^#.\* file1 file2
两种情况的结果是相同的 - 包括评论;

但是,diff -u file1 file2 | grep -v '^ \|^.#\|^.$' 给出
我需要什么:只有真正的差异,没有注释,没有空行。 ;)

I tried: diff file1 file2 and diff -d -I ^#.\* file1 file2
and the result was the same in both cases - included comments;

however, diff -u file1 file2 | grep -v '^ \|^.#\|^.$' gives
what I need: real diffs only, no comments, no empty lines. ;)

恋你朝朝暮暮 2024-12-12 16:02:06

尝试:

diff -I REGEXP -I REGEXP2 file1 file 2

参见:维基百科的正则表达式

下面是会导致差异的正则表达式示例忽略预处理器指令和两种标准注释块类型。

在示例中:

\#*\n
/***/
//*\n

Try:

diff -I REGEXP -I REGEXP2 file1 file 2

See: Regular expression at Wikipedia

Below are examples of regular expressions that would cause a diff to ignore a preprocessor directive and both standard comment block types.

In example:

\#*\n
/***/
//*\n
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文