开发过程中如何检测代码重复?

发布于 2024-07-07 19:39:20 字数 1560 浏览 11 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

木緿 2024-07-14 19:39:20

Simian 检测 C++ 项目中的重复代码。

更新:还适用于 Java、C#、C、COBOL、Ruby、JSP、ASP、HTML、XML、Visual Basic、Groovy 源代码甚至纯文本文件

Simian detects duplicate code in C++ projects.

Update: Also works with Java, C#, C, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source code and even plain text files

征﹌骨岁月お 2024-07-14 19:39:20

我使用了 PMD 的复制粘贴检测器 并将其集成使用以下包装器脚本进入 CruiseControl(确保类路径中有 pmd jar)。

我们的支票每晚运行。 如果您希望将输出限制为仅列出当前更改集中的文件,您可能需要一些自定义编程(想法:检查所有文件并仅列出涉及更改文件之一的重复项。您必须检查所有文件,因为更改可能会使用来自未更改文件的一些代码)。 应该可以通过使用 XML 输出并解析结果来实现。 完成后不要忘记发布该脚本;)

对于初学者来说,“文本”输出应该没问题,但您会希望以用户友好的方式显示结果,为此我使用 perl 脚本生成 HTML 文件来自 CPD 的“xml”输出。 可以通过将它们发布到 Cruise 报告 jsp 所在的 tomcat 来访问这些内容。 开发人员可以从那里查看它们并看到他们肮脏的黑客行为的结果:)

它运行得非常快,在 150 KLoc 代码上不到 2 秒(空行和注释不计入该数字)。

duplicatecheck.xml

<project name="duplicatecheck" default="cpd">

<property name="files.dir" value="dir containing your sources"/>
<property name="output.dir" value="dir containing results for publishing"/>

<target name="cpd">
    <taskdef name="cpd" classname="net.sourceforge.pmd.cpd.CPDTask"/>
    <cpd minimumTokenCount="100" 
         language="cpp" 
         outputFile="${output.dir}/duplicates.txt"
         ignoreLiterals="false"
         ignoreIdentifiers="false"
         format="text">
        <fileset dir="${files.dir}/">
            <include name="**/*.h"/>
            <include name="**/*.cpp"/>
                <!-- exclude third-party stuff -->
            <exclude name="boost/"/>
            <exclude name="cppunit/"/>
        </fileset>
    </cpd>
</target>

I've used PMD's Copy-and-Paste-Detector and integrated it into CruiseControl by using the following wrapper script (be sure to have the pmd jar in the classpath).

Our check runs nightly. If you wish to limit output to list only files from the current change set you might need some custom programming (idea: check all and list only duplicates where one of the changed files is involved. You have to check all files because a change could use some code from a non-changed file). Should be doable by using XML output and parsing the result. Don't forget to post that script when it's done ;)

For starters the "Text" output should be ok, but you will want to display the results in a user-friendly way, for which i use a perl script to generate HTML files from the "xml" output of CPD. Those are accessible by posting them to the tomcat where cruise's reporting jsp resides. The developers can view them from there and see the results of their dirty hacking :)

It runs quite fast, less than 2 seconds on 150 KLoc code (empty lines and comments not counted in that number).

duplicatecheck.xml:

<project name="duplicatecheck" default="cpd">

<property name="files.dir" value="dir containing your sources"/>
<property name="output.dir" value="dir containing results for publishing"/>

<target name="cpd">
    <taskdef name="cpd" classname="net.sourceforge.pmd.cpd.CPDTask"/>
    <cpd minimumTokenCount="100" 
         language="cpp" 
         outputFile="${output.dir}/duplicates.txt"
         ignoreLiterals="false"
         ignoreIdentifiers="false"
         format="text">
        <fileset dir="${files.dir}/">
            <include name="**/*.h"/>
            <include name="**/*.cpp"/>
                <!-- exclude third-party stuff -->
            <exclude name="boost/"/>
            <exclude name="cppunit/"/>
        </fileset>
    </cpd>
</target>

聚集的泪 2024-07-14 19:39:20

duplo 似乎是 Duploc 中使用的算法的 C 实现。 它的编译和安装很简单,虽然选项有限,但似乎或多或少是开箱即用的。

duplo appears to be a C implementation of the algorithm used in Duploc. It is simple to compile and install, and while the options are limited it seems to more or less work out-of-the-box.

抱猫软卧 2024-07-14 19:39:20

这些 Debian 软件包似乎按照以下方式做了一些事情:

PS 应该有是所有与查找[附近]重复相关的工具的 debtags 标签。 (但是它会被称为什么?)

These Debian packages seem to do something along these lines:

P.S. There ought to be a debtags tag for all tools related for finding [near] duplication. (But what would it be called?)

雨落星ぅ辰 2024-07-14 19:39:20

查看 PMD 项目

我从来没有用过它,但一直想用。

Look at the PMD project.

I've never used it, but have always wanted to.

好吧,您可以在源上运行克隆检测器
每晚的代码库。

许多克隆检测器通过比较源代码行来工作,
并且只能找到完全相同的重复代码。

上面的 CCFinder 通过比较语言来工作
令牌,因此它对空格不敏感
变化。 它可以检测变体克隆
如果只有一个令牌,则为原始代码
更改(例如,将变量 X 更改为 Y
克隆)。

理想情况下你想要的是以上,但是能力
在哪里找到克隆
允许变化相对任意,
例如,用表达式、语句替换变量
操作

我们的 CloneDR 克隆检测器针对 Java、C#、C++、COBOL、VB.net、VB6、Fortran 和各种语言执行此
其他语言的。 可以在以下位置看到:
http://www.semdesigns.com/Products/Clone/index.html

除了能够处理多种语言外,CloneDR 引擎还能够处理各种输入编码样式,包括 ASCII、ISO-8859-1、UTF8、UTF16、EBCDIC、多种 Microsoft 编码和(日语)Shift -JIS。

该网站有多个克隆检测运行示例报告,其中一份针对 C++。

2014 年 2 月编辑:现在处理所有 C++14。

Well, you can run a clone detector on your source
code base every night.

Many clone detectors work by comparing source lines,
and can only find exact duplicate code.

CCFinder, above, works by comparing language
tokens, so it isn't sensitive to white space
changes. It can detect clones which are variants
of the original code if there only single token
changes (e.g, change a variable X to Y in
the clone).

Ideally what you want is the above, but the ability
to find clones where
the variations are allowed to be relatively arbitrary,
e.g., replace a variable by an expression, a statement
by a block, etc.

Our CloneDR clone detector does this for Java, C#, C++, COBOL, VB.net, VB6, Fortran and a variety
of other languages. It can be seen at:
http://www.semdesigns.com/Products/Clone/index.html

As well as being able to handle multiple languages, CloneDR engine is capable of handling a variety of input encoding styles, including ASCII, ISO-8859-1, UTF8, UTF16, EBCDIC, a number of Microsoft encodings, and (Japanese) Shift-JIS.

The site has several clone detection run example reports, including one for C++.

EDIT Feb 2014: Now handles all of C++14.

合久必婚 2024-07-14 19:39:20

CCFinderX 是一款免费(供内部使用)克隆代码检测器,支持多种编程语言( Java、C、C++、COBOL、VB、C#)。

CCFinderX is a free (for in-house use) cloned code detector that supports multiple programming languages (Java, C, C++, COBOL, VB, C#).

黑凤梨 2024-07-14 19:39:20

相同(http://sourceforge.net/projects/same/)非常简单,但它适用于文本行而不是标记,如果您使用的语言不受更高级的克隆查找器之一的支持,这会很有用。

Same (http://sourceforge.net/projects/same/) is extremely plain, but it works on text lines instead of tokens, which is useful if you're using a language that isn't supported by one of the fancier clone finders.

我一向站在原地 2024-07-14 19:39:20

还有 Simian 支持 Java、C#、C++、C、Objective-C 、JavaScript...

它由 Hudson 支持(如 CPD)。

除非您是开源项目,否则您必须为 Simian 付费。

There is also Simian which supports Java, C#, C++, C, Objective-C, JavaScript...

It's supported by Hudson (like CPD).

Unless you're an open source project, you must pay for Simian.

ゝ杯具 2024-07-14 19:39:20

ConQAT 是一个支持 C++ 代码分析的出色工具。 可以忽略空格查找重复项。 具有非常方便的 GUI 和控制台界面。
由于它的灵活性,设置起来并不容易。 我发现这篇博客文章对于设置非常有用启动c++项目

ConQAT is a great tool which suports C++ code analysis. Can find duplicates ignoring whitespace. Has extreamly handy gui and console interfaces.
Because of it's flexibility it is not an easy to to setup. I've found this blog post very useful for setting up c++ project.

驱逐舰岛风号 2024-07-14 19:39:20

您可以使用我们的 SourceMeter 工具来检测代码重复。 它是一个命令行工具(与编译器非常相似),因此您可以轻松地将其集成到持续集成工具中,例如 CruiseControl 您提到的,或 Jenkins

You can use our SourceMeter tool for detecting code duplication. It is a command line tool (very similar to compilers), so you can it easily integrate into continuous integration tools, like CruiseControl your mentioned, or Jenkins.

困倦 2024-07-14 19:39:20

查找“相同”的代码片段相对容易,现有的工具已经可以做到这一点(请参阅其他答案)。

有时这是一件好事,有时却不是; 如果“级别”太细,可能会导致开发时间陷入困境; 即尝试重构如此多的代码,您就会失去目标(并且可能会破坏您的里程碑和时间表)。

更困难的是找到执行相同操作但具有不同(但相似)输入和/或算法的多个函数/方法,而没有适当的文档。

如果您必须使用两种或不同的方法来完成同一件事,并且程序员尝试修复一个实例但忘记(或不知道它们存在)修复其他实例,那么您将增加软件的风险。

Finding "identical" code snippets is relatively easy, there are existing tool that already do this (see other answers).

Sometimes it's a good thing, sometimes it's not; it can bog down development time if done at a too fine "level"; i.e. trying to refactor so much code, you loose your goal (and probably bust your milestones and schedules).

What is harder is to find multiple function/method that do the same thing but with different (but similar) inputs and/or algorithm without proper documentation.

If you have to two or different methods to do the same thing and the programmer try to fix one instance but forget (or does not know they exist) to fix the other ones, you will increase the risk to your software.

对岸观火 2024-07-14 19:39:20

TeamCity 拥有适用于 .NET 和 java 的强大代码复制引擎,可以轻松地作为您的构建系统。

TeamCity has a powerful code duplication engine for .NET and java, that can effortlessly run as part of your build system.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文