如何重构快速发展的代码?
我有一些研究代码是真正的老鼠窝,到处都是代码重复,显然需要重构。 然而,随着我提出主题的新变体并将它们放入代码库中,代码库正在不断发展。 我推迟重构这么久的原因是,我觉得一旦我花几天时间想出好的抽象,看看什么设计模式适合哪里,等等,我就会想尝试一些新的不可预见的想法使我的抽象完全不充分。 换句话说,由于代码发展的速度,我真的不知道抽象线属于哪里,尽管不乏(近似)重复,并且代码的普遍混乱使得向其添加内容成为真正的事情。疼痛。 应对这种情况的一般最佳实践有哪些?
I have some research code that's a real rat's nest, with code duplication everywhere, and clearly needs to be refactored. However, the code base is evolving as I come up with new variations on the theme and fit them into the codebase. The reason I've put off refactoring so long is because I feel like the minute I spend a few days coming up with good abstractions, seeing what design patterns fit where, etc., I'll want to try out some new unforeseen idea that makes my abstractions completely inadequate. In other words, because of the rate at which the code is evolving, I really have no idea where abstraction lines belong, even though there is no shortage of (approximate) duplication and the general messiness of the code makes adding stuff to it a real pain. What are some general best practices for coping with this kind of situation?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
不要花那么长时间重构!
当您要更改一段代码时,请考虑重构它以使更改更容易。
进行更改后,再次重构以清除该更改造成的损坏。
在这两种情况下,都应缩小重构规模并快速进行,然后继续前进。
您不必始终保持代码原始,但请记住,如果您有精心设计的代码(当然,如果您有良好的单元测试),那么快速会更容易)。
Don't spend so long refactoring!
When you're about make a change in a piece of code, consider refactoring it to make the change easier.
After making the change, refactor again to clean up the damage done by that change.
In both cases, make the refactorings small and do them quickly, and move on.
You don't have to keep your code pristine at all times, but remember that it's easier to go fast if you have well-factored code to work in (and if you have good unit tests, of course).
测试驱动开发:
红色、绿色、重构。 冲洗,重复。
由于这是每个周期中的步骤之一,因此您会注意到发生了大量通常较小的重构。 事情就应该这样。
Test Driven Development:
Red, Green, Refactor. Rinse, repeat.
Since it's one of the steps in every single cycle, you'll notice that's a LOT of usually minor refactoring taking place. That's the way it should be.
你的情况我很熟悉。 在进行调查性编码时,您经常不知道“正确”的抽象是什么,正如您所说,它可能会随着每个新想法而改变。
其他发帖者建议:
然而,对于调查研究代码,还有另一种策略:原型。 这似乎就是您当前正在做的事情:尽快编码以证明一个概念。 这并没有什么问题,但是原型应该总是被丢弃。 对其进行调整,直到您拥有所有必要的输入和知识,然后丢弃丢弃代码并从 TDD 和持续重构以及所有其他“正确做事”开始策略。
不要保留任何代码。 不要复制粘贴任何东西。 不要回头参考它。 用你的新知识重新开始。
Your situation is pretty familiar to me. While doing investigative coding often you have no idea what the "right" abstraction will be, and as you say it can change with every new idea.
Other posters have suggested:
However, for investigative research code there is another strategy: the prototype. This seems to be what you are currently doing: coding as quickly as possible to prove a concept. There's nothing wrong with that, but a prototype should always be throw-away. Tweak it until you have all the necessary input and knowledge, then throw away the code and start over with TDD and continuous refactoring, and all your other "doing the things right" strategies.
Don't keep any of the code. Don't copy-paste anything. Don't refer back to it. Just start over with your new knowledge.
一次一点点地清理代码。 当您触摸某个类时,请始终尝试使该类保持在您触摸之前的状态 (“童子军规则”)。 重构最好以很小的步骤进行,但要经常进行。
诸如重命名某个变量、拆分方法等之类的事情只需要几秒钟或几分钟。 大型重构(例如拆分或加入类)可能需要一两个小时(并且您要分小步进行,以便所有测试至少每五分钟通过一次 - 否则您已经输入 重构地狱,您应该恢复到最后一个已知的工作状态)。 如果你需要几天或几周的时间来重构某些东西,那么它就不再是“重构”——它更像是重写。
关于这个主题的文章:
http://blog.objectmentor.com/文章/2007/07/20/whats-your-unit-of-measure
Clean up the code a little bit at a time. Always when you touch a class, try to leave the class cleaner that it was before you touched it ("the boy scout rule"). Refactoring is best done in very small steps, but very often.
Things like renaming some variable, splitting a method etc. take only some seconds or minutes. Large refactorings such as splitting or joining classes, may take an hour or two (and you make it in small steps, so that all tests pass at least every five minutes - otherwise you have entered Refactoring Hell and you should revert to the last known working state). If it takes days or weeks for you to refactor something, then it's not anymore "refactoring" - it's more like rewriting.
An article about this topic:
http://blog.objectmentor.com/articles/2007/07/20/whats-your-unit-of-measure
至少把它放在像 Git 这样的分布式 SCM 中,这样当你破坏某些重构时,你可以分割地逆转时间以找到更改之前的提交,并且能够处理更改并在分支中提交它们,而不会干扰其他工作。
Gits Branch merge 非常适合这样的事情,你会很容易知道是否有 2 个人并行进行了不兼容的更改,而不必担心其余的代码。
由于上述原因,我还会在存储库中创建一个单独的分支只是用于重构代码,并定期更新它。 这样,其他人不仅不会干扰您的进度,而且可以密切关注并查看其中的更改,这些更改最终将影响主分支,以便他们可以先发制人地围绕这些更改进行编码。
Put it in Distributed SCM like Git at least, that way when you break something refactoring you can reverse time divisibly to find the commit prior to the change, as well as being able to work on changes and commit them in branches without interfering with others work.
Gits Branch merge is great for things like this and you'll know easily if 2 people made incompatible changes in parallel without having to worry about the rest of the code.
For the above reasons, I would also create a seperate branch in the repository just for re factoring code with, and keep it up-dated regularly. This way, not only will others not interfere with your progress, but they can keep an eye on it and see changes in it that will eventually hit the main branch so they can pre-emptively code around those changes.
如果您已经知道哪里有重复,那么您不需要几天的时间来重构它。
If you already know where there is duplication, you don't need several days to refactor it away.
有时重写是唯一的选择。 情况似乎是这样。
Sometimes a rewrite is the only choice. This seems to be the case.
CloneDR 可以在大型源系统中查找重复的代码,包括精确的副本和未遂的副本,参数化通过语言语法。 它支持Java、C#、COBOL、C++、PHP等多种语言。
当它显示一组找到的克隆的参数化抽象时,它本质上是建议您使用该抽象实现来重构代码(作为方法、函数、类......)。
因此,运行 CloneDR 会获取要添加到代码中的潜在抽象列表,并通过调用抽象重构代码来替换克隆实例,从而(在某种程度上)清理它。
更值得注意的是,当它显示调用抽象所需的每个克隆站点使用的参数绑定时,它通常会显示一个错误的克隆实例,当绑定参数在概念上不一致时很容易识别。 如果参数绑定到名为 YYYY-MM-DD 的变量,并且其中一个是 YY-MM-DD,则“其 4 位数字年份”参数类型看起来被违反,在这种情况下,存在损坏的 Y2K 修复。 因此,检查克隆绑定经常会发现错误。
The CloneDR finds duplicate code, both exact copies and near-misses, across large source systems, parameterized by langauge syntax. It supports Java, C#, COBOL, C++, PHP and many other languages.
When it shows a parameterized abstraction of a set of found clones, it is essentially proposing that you refactor the code with that abstraction implemented (as a method, a function, a class, ...).
So running the CloneDR gets a list of potential abstractions to be added to your code, and replacing the clone instances by calls on the abstraction refactors your code thus cleaning it up (somewhat).
Even more remarkably, when it shows the parameter bindings used at each clone site needed to invoke the abstraction, it often shows a bungled clone instance, easily recognized when the bound paramters are conceptually inconsistent. If a parameer is bound to variables named YYYY-MM-DD, and one of them is YY-MM-DD, the "its a 4 digit-year" parameter type looks violated and in this this case there's a broken Y2K remediation. So examining the clone bindings often finds bugs.
这是科学计算中非常常见的问题。 减少代码大小和复杂性的一些最有效的想法需要利用假设,而科学要求您不断改变这些假设。
你所能做的就是尝试重构你的代码,并尽量不要让自己陷入任何困境。 还要与懂得不搞乱的价值的优秀人士一起工作。
This is a very common problem in scientific computing. Some of the most effective ideas for reducing the size and complexity of code require leveraging assumptions, and science demands that you constantly change those assumptions.
All you can do is try to refactor your code as you go, and try not to write yourself into any corners. Also work with good people who understand the value of not making a mess.