跟踪实用程序类

发布于 2024-11-01 06:22:53 字数 1041 浏览 14 评论 0原文

最近，我对项目代码库中出现的问题感到越来越沮丧。

我正在开发一个大型 java 项目，该项目有超过 100 万行代码。接口和类结构设计得非常好，编写代码的工程师也非常熟练。问题是，为了使代码更简洁，人们在需要重用某些功能时编写实用程序类，结果随着时间的推移以及项目的发展，越来越多的实用程序方法出现。然而，当下一位工程师遇到对相同功能的需求时，他无法知道有人已经在代码中的某处实现了实用程序类（或方法），并在不同的类中实现了该功能的另一个副本。结果是大量重复代码和太多具有重叠功能的实用程序类。

我们作为一个团队是否可以实施任何工具或任何设计原则，以防止实用程序类的重复和低可见性？

示例：工程师 A 有 3 个地方需要将 XML 转换为 String，因此他编写了一个名为 XMLUtil 的实用程序类，并放置了一个静态 toString(Document) 其中的方法。工程师 B 在多个地方将文档序列化为各种格式（包括字符串），因此他编写了一个名为 SerializationUtil 的实用程序类，并有一个名为 serialize(Document) 的静态方法，该方法返回一个细绳。

请注意，这不仅仅是代码重复，因为上面示例的 2 个实现很可能不同（假设一个使用 Transformer API，另一个使用 Xerces2-J），因此这可以被视为“最佳实践” “问题也是......

更新：我想我更好地描述我们当前的开发环境。我们使用 Hudson 进行 CI，使用 Clover 进行代码覆盖率，使用 Checkstyle 进行静态代码分析。我们使用敏捷开发，包括日常谈话和（也许不够）代码审查。我们在 .util 中定义所有实用程序类，由于它的大小，现在根 (.util) 类下有 13 个子包和大约 60 个类。我们还使用第 3 方库，例如大多数 apache commons jar 和组成 Guava 的一些 jar。

我确信，如果我们让某人负责重构整个包，我们可以将实用程序的数量减少一半，我想知道是否有任何工具可以降低该操作的成本，以及是否有任何方法可以可以尽可能延缓问题再次发生。

原文

I've recently been more and more frustrated with a problem I see emerging in my projects code-base.

I'm working on a large scale java project that has >1M lines of code. The interfaces and class structure are designed very well and the engineers writing the code are very proficient. The problem is that in an attempt to make the code cleaner people write Utility classes whenever they need to reuse some functionality, as a result over time and as the project grows more and more utility methods crop up. However, when the next engineer comes across the need for the same functionality he has no way of knowing that someone had already implemented a utility class (or method) somewhere in the code and implements another copy of the functionality in a different class. The result is a lot of code duplication and too many utility classes with overlapping functionality.

Are there any tools or any design principles which we as a team can implement in order to prevent the duplication and low visibility of the utility classes?

Example: engineer A has 3 places he needs to transform XML to String so he writes a utility class called XMLUtil and places a static toString(Document) method in it. Engineer B has several places where he serializes Documents into various formats including String, so he writes a utility class called SerializationUtil and has a static method called serialize(Document) which returns a String.

Note that this is more than just code-duplication as it is quite possible that the 2 implementations of the above example are different (say one uses transformer API and the other uses Xerces2-J) so this can be seen as a "best-practices" problem as well...

Update: I guess I better describe the current environment we develop in.
We use Hudson for CI, Clover for code coverage and Checkstyle for static code analysis.
We use agile development including daily talks and (perhaps insufficient) code reviews.
We define all our utility classes in a .util which due to it's size now has 13 sub-packages and about 60 classes under the root (.util) class. We also use 3rd party libraries such as most of the apache commons jars and some of the jars that make up Guava.

I'm positive that we can reduce the amount of utilities by half if we put someone on the task of refactoring that entire package, I was wondering if there are any tools which can make that operation less costly, and if there are any methodologies which can delay as much as possible the problem from recurring.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

昨迟人 2024-11-08 06:22:54

解决这个问题的一个好方法是开始添加更多的面向对象。使用你的例子：

示例：工程师 A 有 3 个地方需要将 XML 转换为 String，因此他编写了一个名为 XMLUtil 的实用程序类，并在其中放置了一个静态 toString(Document) 方法

解决方案是停止使用原始类型或 JVM 提供的类型（String、 Integer、java.util.Date、java.w3c.Document）并将它们包装在您自己的项目特定类中。然后您的 XmlDocument 类可以提供方便的 toString 方法和其他实用方法。您自己的 ProjectFooDate 可以包含解析和格式化方法，否则这些方法最终会出现在各种 DateUtils 类等中。

这样，每当您尝试对对象执行某些操作时，IDE 都会提示您使用实用程序方法。

回复收藏 0 原文

凉墨 2024-11-08 06:22:54

你的问题是一个很常见的问题。这也是一个真正的问题，因为没有好的解决方案。

我们这里也处于同样的情况，我想说更糟糕的是，有 1300 万行代码、人员流动和超过 800 名开发人员在处理代码。我们经常讨论您所描述的同一问题。

第一个想法（您的开发人员已经使用过）是重构一些实用程序类中的通用代码。我们对该解决方案的问题是，即使有结对编程、指导和讨论，我们的人数也太多了，无法发挥作用。事实上，我们在子团队中成长，人们在子团队中共享知识，但知识不会在子团队之间传递。也许我们错了，但我认为即使结对编程和谈话在这种情况下也无济于事。

我们还有一个架构团队。该团队负责处理设计和架构问题，并制作我们可能需要的通用实用程序。事实上，这个团队制作了我们可以称之为企业框架的东西。是的，它是一个框架，有时效果很好。该团队还负责推动最佳实践，并提高人们对应该做什么或不应该做什么、什么是可用的或什么是不可用的认识。

良好的核心Java API设计是Java成功的原因之一。好的第三方开源库也很重要。即使是精心设计的小型 API 也可以提供真正有用的抽象，并且可以帮助大大减少代码大小。但你知道，制作框架和公共 API 与在 2 小时内编写一个实用程序类根本不是一回事。它的成本非常高。一个实用程序类的初始编码需要 2 小时，调试和单元测试可能需要 2 天。当您开始在大型项目/团队中共享通用代码时，您就真正创建了一个 API。那么你必须确保完美的文档，真正可读和可维护的代码。当您发布此代码的新版本时，必须保持向后兼容。您必须在公司范围内（或至少在团队范围内）推广它。小型实用程序类需要 2 天，而成熟的 API 则需要 10 天、20 天甚至 50 天。

而且你的 API 设计可能不是那么出色。好吧，并不是说你们的工程师不聪明——他们确实很聪明。但是您愿意让他们在一个小型实用程序类上工作 50 天，该类只是帮助以一致的方式为 UI 解析数字吗？当你开始使用具有完全不同需求的移动用户界面时，你愿意让他们重新设计整个事情吗？您是否还注意到世界上最聪明的工程师是如何制作出永远不会流行或会慢慢消失的 API 的？你看，我们做的第一个Web项目只使用了内部框架或者根本没有框架。然后我们添加了 PHP/JSP/ASP。然后在Java中我们添加了Struts。现在 JSF 已成为标准。我们正在考虑使用 Spring Web Flow、Vaadin 或 Lift...

我想说的是，没有好的解决方案，开销随着代码大小和团队规模呈指数级增长。共享大型代码库会限制您的敏捷性和响应能力。任何更改都必须谨慎进行，必须考虑所有潜在的集成问题，并且每个人都必须接受新特性和功能的培训。

但软件公司的主要生产力点并不是在解析 XML 时获得 10 行甚至 50 行代码。无论如何，执行此操作的通用代码将增长到一千行代码，并重新创建一个由实用程序类分层的复杂 API。当这个人创建一个用于解析 XML 的实用程序类时，这是一个很好的抽象。他给十几行甚至一百行专门的代码起了名字。这段代码很有用，因为它是专门的。通用 API 允许处理流、URL、字符串等。它有一个工厂，因此您可以选择解析器实现。该实用程序类很好，因为它仅适用于此解析器和字符串。因为你需要一行代码来调用它。但当然，这个实用代码的用途是有限的。它非常适合此移动应用程序或加载 XML 配置。这就是开发人员首先为其添加实用程序类的原因。

总而言之，我考虑的不是尝试整合整个代码库的代码，而是随着团队的成长来划分代码职责：

将处理一个大项目的大团队转变为处理多个子项目的小团队；
确保接口良好，以尽量减少集成问题，但让团队拥有自己的代码；
在这些团队和相应的代码库中，确保您拥有最佳实践。没有重复的代码，良好的抽象。使用社区中现有的经过验证的 API。使用结对编程、强大的 API 文档、wiki...但是您确实应该让不同的团队做出选择，构建自己的代码，即使这意味着跨团队重复代码或不同的设计决策。您知道，如果设计决策不同，这可能是因为需求不同。

你真正要管理的是复杂性。最后，如果你创建一个单一的代码库，一个非常通用和高级的代码库，你会增加新人入门的时间，你会增加开发人员根本不会使用你的通用代码的风险，并且你会因为任何变化而放慢每个人的速度有更大的机会破坏现有功能。

Your problem is a very common one. And a real problem too, because there is no good solution.

We are in the same situation here, well I'd say worse, with 13 millions line of code, turnover and more than 800 developers working on the code. We often discuss about the very same problem that you describe.

The first idea - that your developers have already used - is to refactor common code in some utility classes. Our problem with that solution, even with pair programming, mentoring and discussion, is that we are simply too many for this to be effective. In fact we grow in subteams, with people sharing knowledge in their subteam, but the knowledge doesn't transit between subteams. Maybe we are wrong but I think that even pair programming and talks can't help in this case.

We also have an architecture team. This team is responsible to deal with design and architecture concerns and to make common utilities that we might need. This team in fact produces something we could call a corporate framework. Yes, it is a framework, and sometimes it works well. This team is also responsible to push best practices and to raise awareness of what should be done or not, what is available or what is not.

Good core Java API design is one of the reason for Java success. Good third party open sources libraries count a lot too. Even a small well crafted API allows to offer a really useful abstraction and can help reduce code size a lot. But you know, making framework and public API is not the same thing at all as just coding an utility class in 2 hours. It has a really high cost. An utility class costs 2 hours for the initial coding, maybe 2 days with debugging and unit tests. When you start sharing common code on big projects/teams, you really make an API. You must ensure perfect documentation then, really readable and maintainable code. When you release new version of this code, you must stay backward compatible. You have to promote it company wide (or at least team wide). From 2 days for your small utility class you grow to 10 days, 20 days or even 50 days for a full-fledged API.

And your API design may not be so great. Well, it is not that your engineers are not bright - indeed they are. But are you willing to let them work 50 days on a small utility class that just help parsing number in a consistent way for the UI? Are you willing to let them redesign the whole thing when you start using a mobile UI with totally different needs? Also have you noticed how the brightest engineers in the word make APIs that will never be popular or will fade slowly? You see, the first web project we made used only internal frameworks or no framework at all. We then added PHP/JSP/ASP. Then in Java we added Struts. Now JSF is the standard. And we are thinking about using Spring Web Flow, Vaadin or Lift...

All I want to say is that there is no good solution, the overhead grows exponentially with code size and team size. Sharing a big codebase restricts your agility and responsiveness. Any change must be done carefully, you must think of all potential integration problems and everybody must be trained of the new specificities and features.

But the main productivity point in a software company is not to gain 10 or even 50 lines of code when parsing XML. A generic code to do this will grow to a thousand lines of code anyway and recreates a complex API that will be layered by utility classes. When the guy make an utility class for parsing XML, it is good abstraction. He give a name to one dozen or even one hundred lines of specialized code. This code is useful because it is specialized. The common API allows to work on streams, URL, strings, whatever. It has a factory so you can choose you parser implementation. The utility class is good because it work only with this parser and with strings. And because you need one line of code to call it. But of course, this utility code is of limited use. It works well for this mobile application, or for loading XML configuration. And that's why the developer added the utility class for it in the first place.

In conclusion, what I would consider instead of trying to consolidate the code for the whole codebase is to split code responsibility as the teams grow:

transform your big team that work on one big project into small teams that work on several subprojects;
ensure that interfacing is good to minimize integration problems, but let team have their own code;
inside theses teams and corresponding codebases, ensure you have the best practices. No duplicate code, good abstractions. Use existing proven APIs from the community. Use pair programming, strong API documentation, wikis... But you should really let different teams make their choices, build their own code, even if this means duplicate code across teams or different design decisions. You know, if the design decisions are different this may be because the needs are different.

What you are really managing is complexity. In the end if you make one monolithic codebase, a very generic and advanced one, you increase the time for newcomers to ramp up, you increase the risk that developers will not use your common code at all, and you slow down everybody because any change has far greater chances to break existing functionality.

回复收藏 0 原文

我做我的改变 2024-11-08 06:22:54

您可以使用多种敏捷/XP 实践来解决此问题，例如：

相互交谈（例如在每日站立会议期间）
结对编程/代码审查

然后创建、记录和编写代码。测试一个或几个可以参考的实用程序库项目。我建议使用 Maven 来管理依赖项/版本。

回复收藏 0 原文

战皆罪 2024-11-08 06:22:54

您可能会考虑建议将所有实用程序类放置在组织良好的包结构中，例如 com.yourcompany.util.。如果人们愿意很好地命名子包和类，那么至少如果他们需要查找实用程序，他们知道在哪里查找。但我认为这里没有任何灵丹妙药的答案。沟通很重要。也许，如果开发人员在编写新实用程序时向其他开发人员发送一封简单的电子邮件，就足以引起人们的注意。或者一个共享的维基页面，人们可以在其中列出/记录它们。

回复收藏 0 原文

云柯 2024-11-08 06:22:54

团队沟通（大喊“嘿，有人有一个 Document toString 吗？”）
将实用程序类保持在绝对最低限度，并将它们限制在单个命名空间中。
始终思考：如何使用对象来做到这一点。在您的示例中，我将扩展 Document 类并向其中添加那些 toString 和 serialize 方法。

回复收藏 0 原文

绝對不後悔。 2024-11-08 06:22:54

当将 IDE“代码完成”功能与支持类型扩展的语言（例如 C# 和 F#）相结合时，这个问题就会得到解决。因此，想象一下 Java 具有这样的功能，程序员可以在 IDE 中轻松探索类上的所有扩展方法，例如：

Document doc = ...
doc.to //list pops up with toXmlString, toJsonString, all the "to" series extension methods

当然，Java 没有类型扩展。但是您可以使用 grep 在您的项目中搜索“以 SomeClass 作为第一个参数的所有静态公共方法”，以获得对已为给定类编写的实用程序方法的类似了解。

This problem is helped when combining IDE "code-completion" features with languages which support type extensions (e.g. C# and F#). So that, imagining Java had a such a feature, a programmer could explore all the extension methods on a class easily within the IDE like:

Document doc = ...
doc.to //list pops up with toXmlString, toJsonString, all the "to" series extension methods

Of course, Java doesn't have type extensions. But you could use grep to search your project for "all static public methods which take SomeClass as the first argument" to gain similar insight into what utility methods have already been written for a given class.

回复收藏 0 原文

最后的乘客 2024-11-08 06:22:54

构建一个识别“相同功能”的工具非常困难。（从理论上讲，这实际上是不可能的，并且在实践中可以做到这一点时，您可能需要定理证明器）。

但经常发生的情况是，人们克隆接近他们想要的内容，然后对其进行定制。您可以使用克隆检测器找到这种代码。

我们的 CloneDR 是一种基于使用参数化语法树来检测精确和几乎未命中克隆代码的工具。它与代码的已解析版本相匹配，因此不会因布局、更改的注释、修改的变量名称或在许多情况下插入或删除的语句而混淆。有多种语言的版本（C++、COBOL、C#、Java、JavaScript、PHP...），您可以在提供的网站上查看克隆检测运行的示例
关联。它通常会发现 10-20% 的重复代码，如果您将该代码抽象为基于宗教的库方法，您的代码库实际上可能会缩小（使用 CloneDR 的一个组织就发生过这种情况）。

回复收藏 0 原文