我有一个从原始项目分叉的大型代码库,我正在尝试找出与原始项目的所有差异。许多文件编辑包括注释掉的调试代码和其他杂项注释。 Ubuntu下名为Meld的GUI diff/merge工具可以忽略注释,但只能忽略单行注释。
有没有其他方便的方法来仅查找非注释差异,无论是使用 GUI 工具还是 Linux 命令行工具?如果有什么不同的话,代码是 PHP 和 Javascript 的混合体,所以我主要感兴趣的是忽略 //
、/* */
和 #
。
I have a large codebase that was forked from the original project and I'm trying to track down all the differences from the original. A lot of the file edits consist of commented out debugging code and other miscellaneous comments. The GUI diff/merge tool called Meld under Ubuntu can ignore comments, but only single line comments.
Is there any other convenient way of finding only the non-comment diffs, either using a GUI tool or linux command line tools? In case it makes a difference, the code is a mixture of PHP and Javascript, so I'm primarily interested in ignoring //
, /* */
and #
.
发布评论
评论(7)
要使用视觉差异,您可以尝试 Meld 或 DiffMerge。
DiffMerge
其规则集和选项提供定制行为。
GNU
diffutils
从命令行角度,您可以对
diff
使用--ignore-matching-lines=RE
选项,例如:Please请注意,正则表达式必须匹配两个文件中的相应行,并且它匹配块中每个更改的行才能工作,否则它仍然会显示差异。
使用单引号来保护模式免受 shell 扩展并转义正则表达式保留字符(例如括号)。
我们可以阅读
diffutils
手册:
armel 这里也很好地解释了这种行为。
另请参阅:
或者,检查其他比较应用程序,了解示例:
To use visual diff, you can try Meld or DiffMerge.
DiffMerge
Its rulesets and options provide for customized behavior.
GNU
diffutils
From the command-line perspective, you can use
--ignore-matching-lines=RE
option fordiff
, for example:Please note that the regex has to match the corresponding line in both files and it matches every changed line in the hunk in order to work, otherwise it'll still show the difference.
Use single quotes to protect pattern from shell expanding and to escape the regex-reserved characters (e.g. brackets).
We can read in
diffutils
manual:This behavior is also well explained by armel here.
See also:
Alternatively, check other diff apps, for example:
您可以先通过 stripcmt 过滤这两个文件,这将删除 C 和 C++ 注释。要删除
#
注释,sed 's/#.*//'
将删除这些注释。当然,首先删除注释时您会失去一些上下文,但另一方面,注释中的差异不会产生任何问题。我想我会像下面这样做(针对单个文件进行描述,根据需要自动化):
A
并且复制的最新代码库是
B
,让我们用以下命令调用版本删除了
A'
和B'
的注释(例如,在处理时将它们保存到临时文件中)。O'
中(或者为此重新使用B'
)。O'
、A'
和B'
执行 3 路合并并保存到C'
。 KDiff3 是一个出色的工具。C'
没有注释,因此返回“正常”模式,与A'
进行新的 3 路合并作为基础以及A
和C'
。这会将A'
和C'
之间的更改(即您想要的代码更改)提取到正常代码库中,并带有基于版本A 的注释
。强烈建议您在开始之前在纸上绘制版本树,以便清楚地了解您想要处理的版本。但不要限制树显示的内容,您可以 合并任何版本和任何方向(如果您只是弄清楚要使用什么版本)。
You can filter both files through stripcmt first which will remove C and C++ comments. For removing
#
comments,sed 's/#.*//'
will remove those.Of course you will loose some context when removing comments first, but on the other hand differences in comments will not make any problems. I think I would have done it like the following (described for a single file, automate as required):
A
and thelatest of the copied code base is
B
, let's call the versions withcomments removed for
A'
andB'
(e.g. save those to temporarily files while processing).O'
(alternatively just re-useB'
for this).O'
,A'
andB'
and save toC'
. KDiff3 is an excellent tool for this.C'
is without comments, so get back into "normal" mode, do a new 3-way merge withA'
as base andA
andC'
. This will pick up the changes betweenA'
andC'
(which is the code changes what you want) into the normal code base with comments based on versionA
.Drawing version trees on paper is before you start is highly recommended to get a clear picture of which versions you want to work on. But don't be limited of what the tree is showing, you can merge any version and in any direction if you just figure out what versions to use.
远非完美,但它会让人了解差异
Far from perfect but it will give an idea of the differences
请参阅我们的 Smart Differencer 系列工具,该工具使用语言结构而不是比较计算机语言源文件布局作为指导。这特别意味着它在比较代码时忽略注释和空格。
有一个 PHP 的 SmartDifferencer。
See our Smart Differencer line of tools, which compare computer language source files using the language structure rather than the layout as a guide. This in particular means it ignores comments and whitespace in comparing code.
There is a SmartDifferencer for PHP.
gnu diff 支持忽略与正则表达式匹配的行:
对于文件夹:
这将忽略行开头以 # 开头的所有行。
gnu diff supports ignoring lines wich match a regular expression:
and for folders:
This would ignore all lines which start with a # at the line beginning.
我尝试过:
diff file1 file2
和diff -d -I ^#.\* file1 file2
两种情况的结果是相同的 - 包括评论;
但是,diff -u file1 file2 | grep -v '^ \|^.#\|^.$' 给出
我需要什么:只有真正的差异,没有注释,没有空行。 ;)
I tried:
diff file1 file2
anddiff -d -I ^#.\* file1 file2
and the result was the same in both cases - included comments;
however,
diff -u file1 file2 | grep -v '^ \|^.#\|^.$'
giveswhat I need: real diffs only, no comments, no empty lines. ;)
尝试:
参见:维基百科的正则表达式
下面是会导致差异的正则表达式示例忽略预处理器指令和两种标准注释块类型。
在示例中:
Try:
See: Regular expression at Wikipedia
Below are examples of regular expressions that would cause a diff to ignore a preprocessor directive and both standard comment block types.
In example: