查找重复的源代码
我正在分析一些遗留代码。大约有 80.000 行旧 plsql 代码。乍一看,源代码中有相当多的重复内容需要删除。必须有一些工具/命令行配置来检测源代码的重复行,而不是执行 diff 手册并查看每个文件。
我的目标是对源代码重写的最小规模以及该程序中捕获了多少实际知识做出有根据的猜测。我编写了一些基本的静态代码分析器来查找每个文件中的控制语句 IF ELSE FOR 等和函数的数量。 但重复的代码仍然需要从我的统计中删除。
I'm analyzing some legacy code. It is about 80.000 lines of old plsql code. On a fist look there is quite some duplication in the source which needs to be removed. Instead off doing diff's manual and looking at each file there must be some tool/commandline confu out there to detect duplicate lines of source code.
My goal is to make an educated guess about the minimal size of a rewrite of source and about how much actual knowledge is captured in this program. I wrote some a basic static code analyzer to find the amount of control statements IF ELSE FOR etc and Functions in each file.
But duplicated code still needs to be removed from my statistics.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您看过Simian - 相似度分析器吗? (刚刚检查过,它不再免费,但可以使用 15 天进行评估。)
我已经在实践中使用过并且效果很好。
Have you looked at Simian - Similarity Analyser? (Just checked and it's no longer free, but it is available for a period of 15 days for evaluation purposes.)
I have used it in practice and it does work well.
Sonar 具有重复检测功能,并声称支持 PL/SQL,尽管我从未使用过它。
Sonar has duplication detection and claims to support PL/SQL, though I've never used it for that.
您需要乞求/借用/窃取/编写 plsql 解析器并比较生成的抽象语法树。考虑到您拥有的代码库的大小,这可能是值得的。完成后,解析器还有其他用途。
You would need to beg/borrow/steal/write a plsql parser and compare the resulting abstract syntax trees. With the size of the code base you have, that might be worthwhile. There would be other uses for the parser once you're done.
怎么样:
http://sourceforge.net/projects/sddforeclipse/
它是开源的,并且是据说是商业软件使用的。顺便说一下,它是 Eclipse 的一个插件。
How about this:
http://sourceforge.net/projects/sddforeclipse/
It is opensource, and is said to be used by commercial software. It is a plugin to Eclipse, by the way.