跟踪实用程序类
最近,我对项目代码库中出现的问题感到越来越沮丧。
我正在开发一个大型 java 项目,该项目有超过 100 万行代码。接口和类结构设计得非常好,编写代码的工程师也非常熟练。问题是,为了使代码更简洁,人们在需要重用某些功能时编写实用程序类,结果随着时间的推移以及项目的发展,越来越多的实用程序方法出现。然而,当下一位工程师遇到对相同功能的需求时,他无法知道有人已经在代码中的某处实现了实用程序类(或方法),并在不同的类中实现了该功能的另一个副本。结果是大量重复代码和太多具有重叠功能的实用程序类。
我们作为一个团队是否可以实施任何工具或任何设计原则,以防止实用程序类的重复和低可见性?
示例:工程师 A 有 3 个地方需要将 XML 转换为 String,因此他编写了一个名为 XMLUtil 的实用程序类,并放置了一个静态 toString(Document)
其中的方法。工程师 B 在多个地方将文档序列化为各种格式(包括字符串),因此他编写了一个名为 SerializationUtil 的实用程序类,并有一个名为 serialize(Document)
的静态方法,该方法返回一个细绳。
请注意,这不仅仅是代码重复,因为上面示例的 2 个实现很可能不同(假设一个使用 Transformer API,另一个使用 Xerces2-J),因此这可以被视为“最佳实践” “问题也是......
更新:我想我更好地描述我们当前的开发环境。 我们使用 Hudson 进行 CI,使用 Clover 进行代码覆盖率,使用 Checkstyle 进行静态代码分析。 我们使用敏捷开发,包括日常谈话和(也许不够)代码审查。 我们在 .util 中定义所有实用程序类,由于它的大小,现在根 (.util) 类下有 13 个子包和大约 60 个类。我们还使用第 3 方库,例如大多数 apache commons jar 和组成 Guava 的一些 jar。
我确信,如果我们让某人负责重构整个包,我们可以将实用程序的数量减少一半,我想知道是否有任何工具可以降低该操作的成本,以及是否有任何方法可以可以尽可能延缓问题再次发生。
I've recently been more and more frustrated with a problem I see emerging in my projects code-base.
I'm working on a large scale java project that has >1M lines of code. The interfaces and class structure are designed very well and the engineers writing the code are very proficient. The problem is that in an attempt to make the code cleaner people write Utility classes whenever they need to reuse some functionality, as a result over time and as the project grows more and more utility methods crop up. However, when the next engineer comes across the need for the same functionality he has no way of knowing that someone had already implemented a utility class (or method) somewhere in the code and implements another copy of the functionality in a different class. The result is a lot of code duplication and too many utility classes with overlapping functionality.
Are there any tools or any design principles which we as a team can implement in order to prevent the duplication and low visibility of the utility classes?
Example: engineer A has 3 places he needs to transform XML to String so he writes a utility class called XMLUtil and places a static toString(Document)
method in it. Engineer B has several places where he serializes Documents into various formats including String, so he writes a utility class called SerializationUtil and has a static method called serialize(Document)
which returns a String.
Note that this is more than just code-duplication as it is quite possible that the 2 implementations of the above example are different (say one uses transformer API and the other uses Xerces2-J) so this can be seen as a "best-practices" problem as well...
Update: I guess I better describe the current environment we develop in.
We use Hudson for CI, Clover for code coverage and Checkstyle for static code analysis.
We use agile development including daily talks and (perhaps insufficient) code reviews.
We define all our utility classes in a .util which due to it's size now has 13 sub-packages and about 60 classes under the root (.util) class. We also use 3rd party libraries such as most of the apache commons jars and some of the jars that make up Guava.
I'm positive that we can reduce the amount of utilities by half if we put someone on the task of refactoring that entire package, I was wondering if there are any tools which can make that operation less costly, and if there are any methodologies which can delay as much as possible the problem from recurring.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
解决这个问题的一个好方法是开始添加更多的面向对象。使用你的例子:
解决方案是停止使用原始类型或 JVM 提供的类型(String、 Integer、java.util.Date、java.w3c.Document)并将它们包装在您自己的项目特定类中。然后您的 XmlDocument 类可以提供方便的 toString 方法和其他实用方法。您自己的 ProjectFooDate 可以包含解析和格式化方法,否则这些方法最终会出现在各种 DateUtils 类等中。
这样,每当您尝试对对象执行某些操作时,IDE 都会提示您使用实用程序方法。
A good solution to this problem is to start adding more object-orientation. To use your example:
The solution is to stop using primitive types or types provided by the JVM (String, Integer, java.util.Date, java.w3c.Document) and wrap them in your own project-specific classes. Then your XmlDocument class can provide a convenient toString method and other utility methods. Your own ProjectFooDate can contain the parsing and formatting methods that would otherwise end up in various DateUtils classes, etc.
This way, the IDE will prompt you with your utility methods whenever you try to do something with an object.
你的问题是一个很常见的问题。这也是一个真正的问题,因为没有好的解决方案。
我们这里也处于同样的情况,我想说更糟糕的是,有 1300 万行代码、人员流动和超过 800 名开发人员在处理代码。我们经常讨论您所描述的同一问题。
第一个想法(您的开发人员已经使用过)是重构一些实用程序类中的通用代码。我们对该解决方案的问题是,即使有结对编程、指导和讨论,我们的人数也太多了,无法发挥作用。事实上,我们在子团队中成长,人们在子团队中共享知识,但知识不会在子团队之间传递。也许我们错了,但我认为即使结对编程和谈话在这种情况下也无济于事。
我们还有一个架构团队。该团队负责处理设计和架构问题,并制作我们可能需要的通用实用程序。事实上,这个团队制作了我们可以称之为企业框架的东西。是的,它是一个框架,有时效果很好。该团队还负责推动最佳实践,并提高人们对应该做什么或不应该做什么、什么是可用的或什么是不可用的认识。
良好的核心Java API设计是Java成功的原因之一。好的第三方开源库也很重要。即使是精心设计的小型 API 也可以提供真正有用的抽象,并且可以帮助大大减少代码大小。但你知道,制作框架和公共 API 与在 2 小时内编写一个实用程序类根本不是一回事。它的成本非常高。一个实用程序类的初始编码需要 2 小时,调试和单元测试可能需要 2 天。当您开始在大型项目/团队中共享通用代码时,您就真正创建了一个 API。那么你必须确保完美的文档,真正可读和可维护的代码。当您发布此代码的新版本时,必须保持向后兼容。您必须在公司范围内(或至少在团队范围内)推广它。小型实用程序类需要 2 天,而成熟的 API 则需要 10 天、20 天甚至 50 天。
而且你的 API 设计可能不是那么出色。好吧,并不是说你们的工程师不聪明——他们确实很聪明。但是您愿意让他们在一个小型实用程序类上工作 50 天,该类只是帮助以一致的方式为 UI 解析数字吗?当你开始使用具有完全不同需求的移动用户界面时,你愿意让他们重新设计整个事情吗?您是否还注意到世界上最聪明的工程师是如何制作出永远不会流行或会慢慢消失的 API 的?你看,我们做的第一个Web项目只使用了内部框架或者根本没有框架。然后我们添加了 PHP/JSP/ASP。然后在Java中我们添加了Struts。现在 JSF 已成为标准。我们正在考虑使用 Spring Web Flow、Vaadin 或 Lift...
我想说的是,没有好的解决方案,开销随着代码大小和团队规模呈指数级增长。共享大型代码库会限制您的敏捷性和响应能力。任何更改都必须谨慎进行,必须考虑所有潜在的集成问题,并且每个人都必须接受新特性和功能的培训。
但软件公司的主要生产力点并不是在解析 XML 时获得 10 行甚至 50 行代码。无论如何,执行此操作的通用代码将增长到一千行代码,并重新创建一个由实用程序类分层的复杂 API。当这个人创建一个用于解析 XML 的实用程序类时,这是一个很好的抽象。他给十几行甚至一百行专门的代码起了名字。这段代码很有用,因为它是专门的。通用 API 允许处理流、URL、字符串等。它有一个工厂,因此您可以选择解析器实现。该实用程序类很好,因为它仅适用于此解析器和字符串。因为你需要一行代码来调用它。但当然,这个实用代码的用途是有限的。它非常适合此移动应用程序或加载 XML 配置。这就是开发人员首先为其添加实用程序类的原因。
总而言之,我考虑的不是尝试整合整个代码库的代码,而是随着团队的成长来划分代码职责:
你真正要管理的是复杂性。最后,如果你创建一个单一的代码库,一个非常通用和高级的代码库,你会增加新人入门的时间,你会增加开发人员根本不会使用你的通用代码的风险,并且你会因为任何变化而放慢每个人的速度有更大的机会破坏现有功能。
Your problem is a very common one. And a real problem too, because there is no good solution.
We are in the same situation here, well I'd say worse, with 13 millions line of code, turnover and more than 800 developers working on the code. We often discuss about the very same problem that you describe.
The first idea - that your developers have already used - is to refactor common code in some utility classes. Our problem with that solution, even with pair programming, mentoring and discussion, is that we are simply too many for this to be effective. In fact we grow in subteams, with people sharing knowledge in their subteam, but the knowledge doesn't transit between subteams. Maybe we are wrong but I think that even pair programming and talks can't help in this case.
We also have an architecture team. This team is responsible to deal with design and architecture concerns and to make common utilities that we might need. This team in fact produces something we could call a corporate framework. Yes, it is a framework, and sometimes it works well. This team is also responsible to push best practices and to raise awareness of what should be done or not, what is available or what is not.
Good core Java API design is one of the reason for Java success. Good third party open sources libraries count a lot too. Even a small well crafted API allows to offer a really useful abstraction and can help reduce code size a lot. But you know, making framework and public API is not the same thing at all as just coding an utility class in 2 hours. It has a really high cost. An utility class costs 2 hours for the initial coding, maybe 2 days with debugging and unit tests. When you start sharing common code on big projects/teams, you really make an API. You must ensure perfect documentation then, really readable and maintainable code. When you release new version of this code, you must stay backward compatible. You have to promote it company wide (or at least team wide). From 2 days for your small utility class you grow to 10 days, 20 days or even 50 days for a full-fledged API.
And your API design may not be so great. Well, it is not that your engineers are not bright - indeed they are. But are you willing to let them work 50 days on a small utility class that just help parsing number in a consistent way for the UI? Are you willing to let them redesign the whole thing when you start using a mobile UI with totally different needs? Also have you noticed how the brightest engineers in the word make APIs that will never be popular or will fade slowly? You see, the first web project we made used only internal frameworks or no framework at all. We then added PHP/JSP/ASP. Then in Java we added Struts. Now JSF is the standard. And we are thinking about using Spring Web Flow, Vaadin or Lift...
All I want to say is that there is no good solution, the overhead grows exponentially with code size and team size. Sharing a big codebase restricts your agility and responsiveness. Any change must be done carefully, you must think of all potential integration problems and everybody must be trained of the new specificities and features.
But the main productivity point in a software company is not to gain 10 or even 50 lines of code when parsing XML. A generic code to do this will grow to a thousand lines of code anyway and recreates a complex API that will be layered by utility classes. When the guy make an utility class for parsing XML, it is good abstraction. He give a name to one dozen or even one hundred lines of specialized code. This code is useful because it is specialized. The common API allows to work on streams, URL, strings, whatever. It has a factory so you can choose you parser implementation. The utility class is good because it work only with this parser and with strings. And because you need one line of code to call it. But of course, this utility code is of limited use. It works well for this mobile application, or for loading XML configuration. And that's why the developer added the utility class for it in the first place.
In conclusion, what I would consider instead of trying to consolidate the code for the whole codebase is to split code responsibility as the teams grow:
What you are really managing is complexity. In the end if you make one monolithic codebase, a very generic and advanced one, you increase the time for newcomers to ramp up, you increase the risk that developers will not use your common code at all, and you slow down everybody because any change has far greater chances to break existing functionality.
您可以使用多种敏捷/XP 实践来解决此问题,例如:
然后创建、记录和编写代码。测试一个或几个可以参考的实用程序库项目。我建议使用 Maven 来管理依赖项/版本。
There are several agile/ XP practices you can use to address this, e.g.:
Then create, document & test one or several utility library projects which can be referenced. I recommend to use Maven to manage dependecies/ versions.
您可能会考虑建议将所有实用程序类放置在组织良好的包结构中,例如
com.yourcompany.util.
。如果人们愿意很好地命名子包和类,那么至少如果他们需要查找实用程序,他们知道在哪里查找。但我认为这里没有任何灵丹妙药的答案。沟通很重要。也许,如果开发人员在编写新实用程序时向其他开发人员发送一封简单的电子邮件,就足以引起人们的注意。或者一个共享的维基页面,人们可以在其中列出/记录它们。You might consider suggesting that all utility classes be placed in a well organized package structure like
com.yourcompany.util.
. If people are willing to name sub packages and classes well, then at least if they need to find a utility, they know where to look. I don't think there is any silver bullet answer here though. Communication is important. Maybe if a developer sends a simple email to the rest of the development staff when they write a new utility, that will be enough to get it on people's radar. Or a shared wiki page where people can list/document them.toString
和serialize
方法。toString
andserialize
methods to it.当将 IDE“代码完成”功能与支持类型扩展的语言(例如 C# 和 F#)相结合时,这个问题就会得到解决。因此,想象一下 Java 具有这样的功能,程序员可以在 IDE 中轻松探索类上的所有扩展方法,例如:
当然,Java 没有类型扩展。但是您可以使用 grep 在您的项目中搜索“以 SomeClass 作为第一个参数的所有静态公共方法”,以获得对已为给定类编写的实用程序方法的类似了解。
This problem is helped when combining IDE "code-completion" features with languages which support type extensions (e.g. C# and F#). So that, imagining Java had a such a feature, a programmer could explore all the extension methods on a class easily within the IDE like:
Of course, Java doesn't have type extensions. But you could use grep to search your project for "all static public methods which take SomeClass as the first argument" to gain similar insight into what utility methods have already been written for a given class.
构建一个识别“相同功能”的工具非常困难。 (从理论上讲,这实际上是不可能的,并且在实践中可以做到这一点时,您可能需要定理证明器)。
但经常发生的情况是,人们克隆接近他们想要的内容,然后对其进行定制。您可以使用克隆检测器找到这种代码。
我们的 CloneDR 是一种基于使用参数化语法树来检测精确和几乎未命中克隆代码的工具。它与代码的已解析版本相匹配,因此不会因布局、更改的注释、修改的变量名称或在许多情况下插入或删除的语句而混淆。有多种语言的版本(C++、COBOL、C#、Java、JavaScript、PHP...),您可以在提供的网站上查看克隆检测运行的示例
关联。它通常会发现 10-20% 的重复代码,如果您将该代码抽象为基于宗教的库方法,您的代码库实际上可能会缩小(使用 CloneDR 的一个组织就发生过这种情况)。
Its pretty hard to build a tool that recognizes "same functionality". (In theory this is in fact impossible, and where you can do it in practice you likely need a theorem prover).
But what often happens is people clone clode that is close to what they want, and then customize it. That kind of code you can find, using a clone detector.
Our CloneDR is a tool for detecting exact and near-miss cloned code based on using parameterized syntax trees. It matches parsed versions of the code, so it isn't confused by layout, changed comments, revised variable names, or in many cases, inserted or deleted statements. There are versions for many languages (C++, COBOL, C#, Java, JavaScript, PHP, ...) and you can see examples of clone detection runs at the provided
link. It typically finds 10-20% duplicated code, and if you abstract that code into library methods on a religious base, your code base can actually shrink (that has occurred with one organization using CloneDR).
您正在寻找一种解决方案来帮助您管理这个不可避免的问题,那么我可以推荐一个工具:
对于大多数人来说,它甚至是免费产品。
更好的部分:它内置了所有代码中的代码重复检测。
更多需要阅读的内容:
You are looking for a solution that can you help you manage this inevitable problem, then I can suggest a tool:
It's even a free product for most people.
The even better part: it has built in code duplicate detection across all your code.
More stuff to read up: