VB 的代码重复重构工具
我需要一个非常具体的 VB(或多语言)工具。我想在我自己开始制作一个(可能是用Python)之前,我应该先问一下是否已经存在。
我需要什么:
- 该工具必须递归或不递归地爬行路径,搜索扩展名列表,例如 .bas、.frm、.xxx
- 然后,它必须解析该文件,搜索函数、例程等。
- 最后,它必须输出它发现的内容。
我基于“减少代码冗余”的想法,在这样的情况下,糟糕的程序员创建了很多做同样事情的函数,有时具有相同的名称,有时则不同。有4种情况:
- 情况1:名称相同,内容相同。
- 情况2:名称相同,内容不同。
- 情况3:名称不同,内容相同。
- 情况 4:差异名称、差异内容。
所以,输出应该是这样的,
===========================================================================
RESULT
===========================================================================
Errors:
---------------------------------------------------------------------------
==Name, ==Content --> 3: (Func(), Foo(), Bar()) In files (f,f2,f3)
!=Name, ==Content --> 2: (Func() + Func1(), Bar() + Bar1()) In Files (f4)
---------------------------------------------------------------------------
Warnings:
==Name, !=Content --> 1 (Foobar()) In Files (f19)
---------------------------------------------------------------------------
这是为了让您了解我需要什么。
所以,问题是:有没有任何工具可以完成类似的事情???
PS:是的,我们首先应该编写好的代码,但是,你知道,事情总会发生。
I need a very specific tool for VB (or multi-language). I thought I would ask if one already exists, before I start making one myself (probably, in python).
What I need:
- The tool must crawl, recursivelly or not, a path, searching for a list of extension, such as .bas, .frm, .xxx
- Then, It has to parse that files, searching for functions, routines, etc.
- And finally, it must output what it found.
I based this on the idea of, "reducing code redundance", in an scenario where, bad programmers make a lot of functions that do the same thing, sometimes with the same name, sometimes not. There are 4 cases:
- Case 1: Same name, Same content.
- Case 2: Same name, Diff content.
- Case 3: Diff name, Same content.
- Case 4: Diff name, Diff Content.
So, the output, should be something like this
===========================================================================
RESULT
===========================================================================
Errors:
---------------------------------------------------------------------------
==Name, ==Content --> 3: (Func(), Foo(), Bar()) In files (f,f2,f3)
!=Name, ==Content --> 2: (Func() + Func1(), Bar() + Bar1()) In Files (f4)
---------------------------------------------------------------------------
Warnings:
==Name, !=Content --> 1 (Foobar()) In Files (f19)
---------------------------------------------------------------------------
This is to give you an idea of what I need.
So, the question is: is there any tool that acomplish something similar to this???
P.S: Yes, we should write good code, in first instance, but, you know, stuff happens.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你想要的是一个“克隆探测器”。这些工具可以在大量指定文件中查找复制粘贴代码。克隆不仅仅是功能,更是功能。它们可以是代码块、数据声明等。
有各种各样的检测器,在尝试构建自己的检测器之前,您应该知道它们是如何工作的。
有些只是简单地匹配行以获得精确的等效性。虽然这些演示了基本思想,但它们的检测效果不佳,因为它们没有考虑到克隆代码通常存在变化的事实;人们真正做的是在复制时进行克隆和编辑。
有些匹配语言标记的序列,例如标识符、关键字、文字、标点符号。这些至少对空白变化相对宽容。他们可以找到用单个标记代替单个标记的克隆。然而,由于它们不理解语言结构(块、语句、函数体),因此它们经常匹配跨越此类结构边界的序列(例如,“} {”通常被这些工具视为克隆) ,它们产生相当高的(非)克隆假阳性指示。其中一些尝试将匹配限制为关键程序结构,例如完整的功能,正如您所建议的那样。
更复杂的检测器与程序结构相匹配。
我们的 CloneDR (我是原作者)是一个检测器,
使用编译器质量的解析来抽象语法树,从而提取代码的精确结构。它对许多语言(包括 VB6 和 VBScript)执行此操作,将克隆定位为任意函数、块、语句或声明,并通过参数显示克隆如何变化。尽管格式发生变化、注释位置或内容发生变化,甚至使用复杂结构(多个语句或表达式)替代简单结构(例如,单个语句或文字),CloneDR 仍可以找到克隆。虽然它往往具有较高的检测率(通常会发现 10-20% 可移除冗余!),但其误报率往往比基于令牌的检测器低得多。您可以查看以下示例报告
上面的链接提供了各种不同的语言。
请参阅代码的比较和评估克隆检测技术和工具:定性方法明确讨论了不同的方法和优点,并对包括 CloneDR 在内的大量检测器进行了比较。
2010 年 10 月编辑:...当我第一次写这个回复时,我假设 OP 对 VB.net 感兴趣,但 CloneDR 没有这样做。此后,我们向 CloneDR 添加了 VB.net、VB6 和 VBScript 功能。 (解析现代形式的 VB.net 比人们想象的像 Visual Basic 这样的“简单”(!)语言要复杂得多)。
What you want is a "clone detector". These tools find copy-and-pasted code across a large set of designated files. Clones are not just of functions; they can be code blocks, data declarations, etc.
There are a variety of detectors out there, and you should know how they work before you attempt to build one of your own.
Some simply match lines for exact equivalence. While these demonstrate the basic idea, their detection is not good because they don't take into account the fact that cloned code often has variations; what people really do is clone-and-edit when making copies.
Some match sequences of langauge tokens, e.g., identifiers, keywords, literals, punctuation. These at least are relatively tolerant of whitespace changes. And they can find clones in which single tokens have been substituted for single tokens. However, because they don't understand language structure (blocks, statements, function bodies) they often match sequences that cross such structure boundaries (e.g., "} {" is often considered a clone by these tools), they produce rather high false-positive indications of (non)clones. Some of these attempt to limit the matches to key program structures, such as complete functions, as you have kind of suggested.
More sophisticated detectors match program structures.
Our CloneDR (I'm the original author) is a detector that
uses compiler-quality parsing to abstract syntax trees, which extracts the precise structure of the code. It does this for many languages (including VB6 and VBScript), locating clones as arbitrary functions, blocks, statements or declarations, with parameters shows how the clones vary. CloneDR can find clones in spite of formatting changes, changes in comment locations or content, and even variations where complex constructs (multiple statements or expressions) have been used as alternatives to simple ones (e.g., a single statment or a literal). While it tends to have a high detection rate(it usually finds 10-20% removable redundancy!), its false-positive rate tends to be considerably lower than the token based detectors. You can see sample reports for
a variety of different langauges at the link above.
See Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach which explicitly discusses different approaches and benefits, and compares a large number of detectors including CloneDR.
EDIT October 2010: ... When I first wrote this response, I assumed the OP was interested in VB.net, which CloneDR didn't do. We've since added VB.net, VB6 and VBScript capability to CloneDR. (Parsing VB.net in its modern form is a lot messier than one might imagine for "simple"(!) langauge like Visual Basic).