代码重复有什么正当理由吗?
我目前正在审查一个非常古老的 C++ 项目,并发现其中有大量代码重复。
例如,有一个类有 5 个 MFC 消息处理程序,每个处理程序包含 10 行相同的代码。或者到处都有一个 5 行代码片段,用于非常具体的字符串转换。在这些情况下,减少代码重复根本不是问题。
但我有一种奇怪的感觉,我可能误解了某些东西,而且这种重复本来是有原因的。
重复代码的正当理由是什么?
I'm currently reviewing a very old C++ project and see lots of code duplication there.
For example, there is a class with 5 MFC message handlers each holding 10 identical lines of code. Or there is a 5-line snippet for a very specific string transformation every here and there. Reducing code duplication is not a problem in these cases at all.
But I have a strange feeling that I might be misunderstanding something and that there was originally a reason for this duplication.
What could be a valid reason for duplicating code?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(20)
关于这一点的一个很好的读物是大规模C++软件设计 约翰·拉科斯。
他对代码重复有很多好的观点,代码重复可能会帮助或阻碍一个项目。
最重要的一点是在决定删除重复或重复代码时要问:
如果此方法将来发生变化,我是否要更改重复方法中的行为,还是需要保持原样?
,方法包含(业务)逻辑,有时您需要更改每个调用者的逻辑,有时则不需要。视具体情况而定。
最后,这都是关于维护,而不是关于漂亮的源。
A good read about this is large scale c++ software design by John Lakos.
He has many good points about code duplication, where it might help or hinder a project.
The most important point is asking when deciding to remove duplication or duplicate code:
If this method changes in the future, do I want to change the behaviour in the duplicated method, or needs it to stay the way it is?
After all, methods contain (business) logic, and sometimes you'll want to change the logic for every caller, sometimes not. Depends on the circumstances.
In the end, it's all about maintenance, not about pretty source.
懒惰,这是我能想到的唯一原因。
更严肃地说。我能想到的唯一有效的理由是产品周期最后阶段的变化。这些往往会受到更多的审查,最小的改变往往会获得最高的成功率。在这种有限的情况下,与重构较小的更改相比,更容易完成代码重复更改。
仍然在我嘴里留下不好的味道。
Laziness, that's the only reason I can think of.
On a more serious note. The only valid reason I can think of is changes at the very end of the product cycle. These tend to undergo a lot more scrutiny and the smallest change tends to have the highest rate of success. In that limited circumstance it is easier to get a code duplication change through as opposed to refactoring out a smaller change.
Still leaves a bad taste in my mouth.
当我第一次开始编程时,我编写了一个应用程序,其中有很多类似的功能,我将它们封装在一个整洁的 20-30 行小函数中......我为自己编写了如此优雅的代码感到非常自豪。
不久之后,客户在非常具体的情况下更改了流程,然后再一次,然后再一次,再一次,再一次,再一次......(很多很多次)我优雅的代码变成了非常困难的,黑客的,有缺陷的, &高维护混乱。
一年后,当我被要求做一些非常类似的事情时,我故意决定忽略 DRY。我将基本流程放在一起,并生成了所有重复的代码。重复的代码已记录下来,我保存了用于生成代码的模板。当客户要求特定的条件更改(例如,如果 x == y^z + b 那么 1+2 == 3.42)时,这是小菜一碟。它的维护和维护非常简单。改变。
回想起来,我可能可以用函数指针和谓词解决许多这样的问题,但利用我当时所掌握的知识,我仍然相信在这个具体情况下,这是最好的决定。
When I first started programming, I wrote an app where I had a bunch of similar functionality which I wrapped up in a neat little 20-30 line function ... I was very proud of myself for writing such an elegant piece of code.
Shortly after, the client changed the process in very specific cases, then again, then again, then again , and again, and again .... (many many more times) My elegant code turned into a very difficult, hackish, buggy, & high maintenance mess.
A year later, when I was asked to do something very similar, I deliberately decided to ignore DRY. I put together the basic process, and generated all duplicate code. The duplicate code was documented and I saved the template used to generate the code. When the client asked for specific conditional change (like, if x == y^z + b then 1+2 == 3.42) it was a piece of cake. It was unbelievably easy to maintain & change.
In retrospect, I probably could have solved many of these problems with function pointers and predicates, but using the knowledge I had at the time, I still believe in this specific case, this was the best decision.
除了缺乏经验之外,还有可能出现重复代码的原因:
没有时间正确重构
我们大多数人都在现实世界中工作,现实的约束迫使我们快速解决实际问题,而不是思考问题代码的好坏。所以我们复制并粘贴并继续。对我来说,如果我后来看到代码又重复了几次,那就表明我必须花更多时间在上面并将所有实例聚合为一个。
由于语言限制,代码的泛化是不可能的/不“漂亮”
可以说,在函数的深处,您有几个语句,这些语句在相同重复代码的实例与实例之间存在很大差异。例如:我有一个为视频绘制二维缩略图数组的函数,并且它嵌入了每个缩略图位置的计算。为了计算命中测试(从点击位置计算缩略图索引),我使用相同的代码但没有绘画。
你根本不确定是否会有泛化
首先重复代码,然后观察它会如何演变。由于我们正在编写软件,因此我们可以允许“尽可能晚”对软件进行修改,因为一切都是“软”且可变的。
如果我还记得其他的话我会添加更多。
后来添加...
循环展开
在编译器像爱因斯坦和霍金结合一样智能之前,您必须展开循环或内联代码才能更快。循环展开将使您的代码被重复,并且可能会快几个百分点,无论如何编译器都不会为您做这件事。
Besides being inexperienced, there is why duplicated code occurrences might show up:
No time to properly refactor
Most of us are working in a real world where real constraints force us to move quickly to real problems instead of thinking about niceness of the code. So we copy&paste and move on. With me, if I later see that code is duplicated several more times, it is the sign that I have to spend some more time on it and converge all instances to one.
Generalization of the code not possible/not 'pretty' due to language constraints
Lets say that deep inside a function you have several statements that greatly differ from instance to instance of same duplicated code. For example: I have a function that draws 2d array of thumbnails for the video, and it's embedded with calculation of each thumbnail position. In order to calculate hit-test (calculate thumbnail index from click position) I am using same code but without painting.
You are not sure that there will be generalization at all
Duplicate code at first, and later observe how it will evolve. Since we are writing software, we can allow 'as late as possible' modifications to the software, since everything is 'soft' and changeable.
I'll add more if I remember something else.
Added later...
Loop unrolling
In time before compilers were smart as Einstein and Hawking combined, you had to unroll the loops or inline code to be faster. Loop unrolling will make your code to be duplicated, and probably faster by few percents, it compiler didn't do it for you anyway.
您可能希望这样做是为了确保未来某个部分的更改不会无意中更改另一部分。例如,
现在您可以使用如下函数来防止“代码重复”:
但是,存在其他程序员想要更改 Do_A_Policy() 的风险
并将通过更改first_policy()来实现这一点,并将导致更改Do_B_Policy()的副作用,这是程序员可能没有意识到的副作用。
所以这种“代码重复”可以作为一种安全机制,防止未来程序发生这种变化。
You might want to do so to make sure that future changes in one part will not unintentionally change the other part. for example consider
Now you can prevent "code duplication" with function like this:
However there is a risk that some other programmer will want to change Do_A_Policy()
and will do so by changing first_policy() and will cause the side effect of changing Do_B_Policy(), a side effect which the programmer may not be aware of.
so this kind of "code duplication" can serve as a safety mechanism against this kind of future changes in the program.
有时,方法和类在领域方面没有任何共同点,但在实现方面看起来很相似。在这些情况下,最好进行代码复制,因为将来的更改会更频繁,这样不会将这些实现分支为不同的东西。
Sometimes methods and classes which domain-wise have nothing in common, but implementation-wise looks a lot alike. In these cases it's often better to do code duplication as future changes more often that not will branch these implementations into something that aren't the same.
我能想到的合理原因是:如果代码变得更加复杂以避免重复。基本上,这就是当你用几种方法做几乎相同的事情时的地方 - 但只是不完全相同。当然 - 然后您可以重构并添加特殊参数,包括指向必须修改的不同成员的指针。但重构后的新方法可能会变得过于复杂。
示例(伪代码):
您可以以某种方式重构方法调用(setXXX) - 但根据语言的不同,这可能会很困难(尤其是继承)。这是代码重复,因为每个属性的大部分主体都是相同的,但很难重构公共部分。
简而言之 - 如果重构的方法更加复杂,我会选择代码重复,尽管它是“邪恶的”(并且将保持邪恶)。
The valid reason I can think of: If the code gets alot more complex to avoid the duplication. Basically that's the place when you do almost the same in several methods - but just not quite the same. Of course - you can then refactor and add special parameters including pointers to different members that have to be modified. But the new, refactored method may get too complicated.
Example (pseudocode):
You could refactor out the method call (setXXX) somehow - but depending on the language it could be difficult (especially with inheritance). It is code duplication since most of the body is the same for each property, but it can be hard to refactor out the common parts.
In short - if the refactored method is factors more complicated, I'd go with code duplication although it is "evil" (and will stay evil).
我能看到的唯一“有效”的事情是当这些代码行不同时,然后通过后续编辑收敛到相同的事情。我以前也遇到过这种情况,但不是太频繁。
当然,此时正是将这段公共代码分解为新功能的时候。
也就是说,我想不出任何合理的方法来证明重复代码的合理性。看看为什么不好。
这很糟糕,因为一处的改变需要多处的改变。这会增加时间,并且有可能出现错误。通过分解它,您可以将代码维护在一个单一的工作位置。毕竟,当你编写一个程序时,你不会写两次,为什么函数会有所不同呢?
The only "valid" thing I can see this arising from is when those lines of code were different, then converged to the same thing through subsequent edits. I've had this happen to me before, but none too frequently.
This is, of course, when it's time to factor out this common segment of code into new functionality.
That said, I can't think of any reasonable way to justify duplicate code. Look at why it's bad.
It's bad because a change in one place requires a change in multiple places. This is increased time, with a chance of bugs. By factoring it out, you maintain the code in a single, working location. After all, when you write a program you don't write it twice, why would a function be any different?
对于这种代码重复(很多行重复很多次),我会说:
不过,从我通常看到的情况来看,这可能是第一个解决方案:-(
我见过的最佳解决方案:让您的开发人员在受雇时从维护一些旧应用程序开始 - 那'我会告诉他们这种事情不好......他们会理解为什么,这是最重要的部分,
将代码拆分为多个函数,以正确的方式重用代码,等等。这通常伴随着经验——或者你没有雇佣合适的人;-)
For that kind of code duplication (lots of lines duplicated lots of times), I'd say :
Probably the first solution, though, from what I've generally seen :-(
Best solution I've seen against that : have your developpers start by maintaining some old application, when they are hired -- that'll teach them that this kind of thing is not good... And they will understand why, which is the most important part.
Splitting code into several functions, re-using code the right way, and all that often come with experience -- or you have not hired the right people ;-)
很久以前,当我进行图形编程时,在某些特殊情况下,您会以这种方式使用重复代码,以避免代码中生成低级 JMP 语句(它可以通过避免跳转到标签/函数来提高性能) 。这是一种优化和伪“内联”的方法。
不过,在这种情况下,我不认为这就是他们这样做的原因,呵呵。
A long time ago when I used to do graphics programming you would, in some special cases, use duplicate code this way to avoid the low level JMP statements generated in the code (it would improve performance by avoiding the jump to the label/function). It was a way to optimize and do a pseudo "inlining".
However, in this case, I don't think that's why they were doing it, heh.
如果不同的任务偶然相似,那么在两个地方重复相同的动作并不一定是重复。如果一个地方的行为发生了变化,其他地方的行为是否也应该发生变化?那么这是您应该避免或重构的重复。
而且,有时——即使逻辑是重复的——减少重复的成本也太高了。这种情况尤其可能发生在不仅仅是代码重复的情况下:例如,如果您有一条数据记录,其中某些字段在不同位置重复(数据库表定义、C++ 类、基于文本的输入),则减少这种情况的常用方法重复与代码生成有关。这会增加解决方案的复杂性。这种复杂性几乎总是会带来回报,但有时却不会——这是你需要做出的权衡。
If different tasks are similar by accident, repeating the same actions in two places is not necessarily duplication. If the actions in one place change, is it probable they should change in other places as well? Then this is duplication you should avoid or refactor away.
Also, sometimes - even when logic is duplicated - the cost of reducing duplication is too high. This can happen especially when it's not just code duplication: for example, if you have a record of data with certain fields that repeats itself in different places (DB table definition, C++ class, text-based input), the usual way to reduce this duplication is with code generation. This adds complexity to your solution. Almost always, this complexity pays off, but sometimes it doesn't - it's your tradeoff to make.
我不知道代码重复的很多充分理由,但是与其首先进行重构,不如只重构您实际更改的代码部分,而不是更改您不更改的大型代码库却又完全明白。
I don't know of many good reasons for code duplication, but rather than jumping in feet first to refactoring, it's probably better to only refactor those bits of the code that you actually change, rather than altering a large codebase that you don't yet fully understand.
听起来原作者要么缺乏经验,要么时间紧迫。大多数有经验的程序员都会将可重用的东西聚集在一起,因为以后的维护工作就会减少——这是一种懒惰的表现。
您唯一应该检查的是是否有任何副作用,如果复制的代码访问某些全局数据,则可能需要进行一些重构。
编辑:在编译器很糟糕、优化器甚至更糟糕的时候,由于编译器中的某些错误,人们可能不得不采取这样的技巧来解决错误。也许是这样的?几岁算老?
Sounds like the original author either was inexperienced and/or was hard pressed on time. Most experienced programmers bunch together things that are reused because later there will be less maintenance - a form of laziness.
The only thing you should check is if there are any side effects, if the copied code accesses some global data a bit refactoring may be needed.
edit: back in the day when compilers were crappy and optimizers even crappier it could happen that due to some bug in the compiler one may had to do such a trick in order to get around a bug. Maybe its something like that? How old is old?
在大型项目(代码库大至 GB 的项目)中,很可能会丢失现有的 API。这通常是由于文档不足,或者程序员无法找到原始代码;因此重复的代码。
归结为懒惰,或不良的复习习惯。
编辑:
另一种可能性是这些方法中可能存在其他代码,这些代码在此过程中被删除。
您查看过文件的修订历史记录吗?
On large projects ( those with a code-base as large as a GB ) it's quite possible to lose existing API. This is typically due to insufficient documentation, or an inability of the programmer to locate the original code; hence duplicate code.
Boils down to laziness, or poor review practice.
EDIT:
One additional possibility is that there may have been additional code in those methods which was removed along the way.
Have you looked at the revision history on the file?
所有的答案看起来都是对的,但我认为还有另一种可能性。
也许存在性能方面的考虑,因为你所说的事情让我想起了“内联代码”。内联函数总是比调用函数更快。
也许您看到的代码已经先进行了预处理?
All the answers looks right, but I think there is another possibility.
Maybe there are performance considerations as the things you say reminds me "inlining code". It's always faster to inline functions that to call them.
Maybe the code you look at has been preprocessed first?
当源代码生成器生成重复代码时,我没有遇到任何问题。
I have no problems with duplicated code when it is produced by a source code generator.
我们发现迫使我们重复代码的是像素操作代码。我们处理非常大的图像,函数调用开销消耗了大约 30% 的每像素时间。
复制像素操作代码使我们的图像遍历速度提高了 20%,但代价是代码复杂性增加。
这显然是一种非常罕见的情况,最终它使我们的源代码显着膨胀(300 行函数现在是 1200 行)。
Something that we found that forced us to duplicate code was our pixel manipulation code. We work with VERY large images and the function call overhead was eating up on the order of 30% of our per-pixel time.
Duplicating the pixel manipulation code gave us 20% faster image traversal at the cost of code complexity.
This is obviously a very rare case, and in the end it bloated our source significantly (a 300 line function is now 1200 lines).
代码重复没有充分的理由。
请参阅Refactor Merciously 设计模式。
最初的程序员要么是着急赶时间,要么是懒惰。请随意重构和改进代码。
There is no good reason for code duplication.
See the Refactor Mercilessly design pattern.
The original programmer was either in a hurry to meet a deadline or lazy. Feel free to refactor and improve the code.
以我的愚见,没有地方可以重复代码。例如,看看这篇维基百科文章
,或者,让我们参考 Larry Wall 的引文:
很明显,代码重复与“懒惰”无关。
哈哈;)
in my humble opinion there's no place for code duplication. have a look, for example, at this wikipedia article
or, let's refer to Larry Wall's citation:
it is pretty clear that code duplication has nothing to do with "laziness".
haha;)
既然有“策略模式”,就没有重复代码的正当理由。一行代码都不能重复,其他一切都会失败。
Since there is the "Strategy Pattern", there is no valid reason for duplicate code. Not a single line of code must be duplicated, everything else is epic fail.