所以我们的项目中有这个巨大的(11000 行巨大吗?) mainmodule.cpp 源文件,每次我必须触摸它时我都会感到畏缩。
由于这个文件是如此的中心和巨大,它不断积累越来越多的代码,我想不出一个好的方法来让它真正开始缩小。
该文件在我们产品的多个(> 10)维护版本中使用并主动更改,因此很难重构它。如果我“简单地”将其分成 3 个文件,那么从维护版本合并回更改将成为一场噩梦。而且,如果您拆分具有如此悠久而丰富的历史记录的文件,跟踪和检查 SCC
历史记录中的旧更改突然变得更加困难。
该文件基本上包含了我们程序的“主类”(主要内部工作调度和协调),因此每次添加功能时,也会影响到这个文件以及每次增长。 :-(
在这种情况下你会怎么做?关于如何将新功能移动到单独的源文件而不弄乱 SCC
工作流程的任何想法?
(工具注意事项:我们使用 C++ 和 Visual Studio
; 我们使用 AccuRev
作为 SCC
但我认为 SCC
的类型在这里并不重要; Araxis Merge
进行文件的实际比较和合并)
So we have this huge (is 11000 lines huge?) mainmodule.cpp source file in our project and every time I have to touch it I cringe.
As this file is so central and large, it keeps accumulating more and more code and I can't think of a good way to make it actually start to shrink.
The file is used and actively changed in several (> 10) maintenance versions of our product and so it is really hard to refactor it. If I were to "simply" split it up, say for a start, into 3 files, then merging back changes from maintenance versions will become a nightmare. And also if you split up a file with such a long and rich history, tracking and checking old changes in the SCC
history suddenly becomes a lot harder.
The file basically contains the "main class" (main internal work dispatching and coordination) of our program, so every time a feature is added, it also affects this file and every time it grows. :-(
What would you do in this situation? Any ideas on how to move new features to a separate source file without messing up the SCC
workflow?
(Note on the tools: We use C++ with Visual Studio
; We use AccuRev
as SCC
but I think the type of SCC
doesn't really matter here; We use Araxis Merge
to do actual comparison and merging of files)
发布评论
评论(30)
哇,听起来很棒。我认为向你的老板解释你需要很多时间来重构这个野兽值得一试。如果他不同意,退出也是一个选择。
不管怎样,我的建议基本上是扔掉所有的实现并将其重新组合成新的模块,让我们称之为“全局服务”。 “主模块”只会转发到这些服务,您编写的任何新代码都将使用它们而不是“主模块”。这在合理的时间内应该是可行的(因为它主要是复制和粘贴),您不会破坏现有代码,并且可以一次执行一个维护版本。如果您还有时间,您可以重构所有旧的依赖模块以也使用全局服务。
Wow, sounds great. I think explaining to your boss, that you need a lot of time to refactor the beast is worth a try. If he doesn't agree, quitting is an option.
Anyway, what I suggest is basically throwing out all the implementation and regrouping it into new modules, let's call those "global services". The "main module" would only forward to those services and ANY new code you write will use them instead of the "main module". This should be feasible in a reasonable amount of time (because it's mostly copy and paste), you don't break existing code and you can do it one maintenance version at a time. And if you still have any time left, you can spend it refactoring all old depending modules to also use the global services.
我很同情 - 在我之前的工作中,我遇到了类似的情况,文件比您必须处理的文件大几倍。解决方案是:
您在第 3 步中构建的类可能会不断迭代,以吸收更多适合其新清晰功能的代码。
我还可以添加:
0:购买 Michael Feathers'关于处理遗留代码的书
不幸的是,这种类型的工作太常见了,但我的经验是,能够使工作但可怕的代码在保持其正常工作的同时逐渐变得不再那么可怕,具有巨大的价值。
My sympathies - in my previous job I encountered a similar situation with a file that was several times larger than the one you have to deal with. Solution was:
The classes you build in step 3. iterations will likely grow to absorb more code that is appropriate to their newly-clear function.
I could also add:
0: buy Michael Feathers' book on working with legacy code
Unfortunately this type of work is all too common, but my experience is that there is great value in being able to make working but horrid code incrementally less horrid while keeping it working.
考虑以更合理的方式重写整个应用程序的方法。也许重写一小部分作为原型,看看你的想法是否可行。
如果您已经确定了可行的解决方案,请相应地重构应用程序。
如果所有产生更合理架构的尝试都失败了,那么至少您知道解决方案可能是重新定义程序的功能。
Consider ways to rewrite the entire application in a more sensible way. Maybe rewrite a small section of it as a prototype to see if your idea is feasible.
If you've identified a workable solution, refactor the application accordingly.
If all attempts to produce a more rational architecture fail, then at least you know the solution is probably in redefining the program's functionality.
我的 0.05 欧分:
重新设计整个混乱,将其分成子系统,考虑到技术和业务需求(=许多并行维护轨道,每个轨道可能有不同的代码库,显然需要高可修改性等)。
拆分子系统时,分析变化最大的地方,并将其与不变的部分分开。这应该会告诉你问题所在。将最变化的部分分离到它们自己的模块(例如dll)中,这样模块API可以保持完整,并且您不需要一直破坏BC。这样,如果需要,您可以为不同的维护分支部署不同版本的模块,同时保持核心不变。
重新设计可能需要成为一个单独的项目,尝试针对移动目标进行重新设计是行不通的。
至于源代码历史,我的看法是:为了新代码就忘记它吧。但将历史记录保存在某个地方,以便您可以在需要时进行检查。我敢打赌,开始之后你就不再需要它了。
您很可能需要获得管理层对该项目的支持。您也许可以争论更快的开发时间、更少的错误、更容易的维护和更少的整体混乱。类似于“主动实现我们关键软件资产的面向未来和维护可行性”:)
这至少是我开始解决这个问题的方式。
My 0.05 eurocents:
Re-design the whole mess, split it into subsystems taking into account the technical and business requirements (=many parallel maintenance tracks with potentially different codebase for each, there is obviously a need for high modifiability, etc.).
When splitting into subsystems, analyze the places which have most changed and separate those from the unchanging parts. This should show you the trouble-spots. Separate the most changing parts to their own modules (e.g. dll) in such a way that the module API can be kept intact and you don't need to break BC all the time. This way you can deploy different versions of the module for different maintenance branches, if needed, while having the core unchanged.
The redesign will likely need to be a separate project, trying to do it to a moving target will not work.
As for the source code history, my opinion: forget it for the new code. But keep the history somewhere so you can check it, if needed. I bet you won't need it that much after the beginning.
You most likely need to get management buy-in for this project. You can argue perhaps with faster development time, less bugs, easier maintaining and less overall chaos. Something along the lines of "Proactively enable the future-proofness and maintenance viability of our critical software assets" :)
This is how I'd start to tackle the problem at least.
首先添加评论。参考调用函数的位置以及是否可以移动事物。这可以让事情发生进展。您确实需要评估代码库的脆弱程度。然后将常见的功能整合在一起。一次做一些小的改变。
Start by adding comments to it. With reference to where functions are called and if you can move things around. This can get things moving. You really need to assess how brittle the code base it. Then move common bits of functionality together. Small changes at a time.
您可能会发现有趣/有用的另一本书是重构。
Another book you may find interesting/helpful is Refactoring.
我发现有用的事情(我现在正在做,尽管没有达到您所面临的规模),是将方法提取为类(方法对象重构)。不同版本中不同的方法将成为不同的类,可以将它们注入到公共基础中以提供您所需的不同行为。
Something I find useful to do (and I'm doing it now although not at the scale you face), is to extract methods as classes (method object refactoring). The methods that differ across your different versions will become different classes which can be injected into a common base to provide the different behaviour you need.
我发现这句话是您帖子中最有趣的部分:
>该文件在我们产品的多个(> 10)维护版本中使用并主动更改,因此重构它真的很难
首先,我建议您使用源代码控制系统来开发这 10 + 维护版本支持分支的版本。
其次,我将创建十个分支(每个维护版本一个)。
我已经能感觉到你在畏缩了!但是,您的源代码管理要么由于缺乏功能而无法适合您的情况,要么没有正确使用。
现在到您工作的分支 - 按照您认为合适的方式重构它,因为您知道您不会扰乱产品的其他九个分支。
我有点担心你的 main() 函数中有这么多内容。
在我编写的任何项目中,我只会使用 main() 执行核心对象的初始化 - 例如模拟或应用程序对象 - 这些类是真正的工作应该进行的地方。
我还将在 main 中初始化一个应用程序日志记录对象,以便在整个程序中全局使用。
最后,在 main 中,我还在预处理器块中添加了泄漏检测代码,以确保它仅在 DEBUG 构建中启用。这就是我要添加到 main() 中的所有内容。 Main() 应该很短!
你说
>该文件基本上包含我们程序的“主类”(主要内部工作调度和协调)
听起来这两个任务可以分为两个单独的对象 - 协调器和工作调度器。
当您将它们分开时,您可能会弄乱您的“SCC 工作流程”,但听起来严格遵守 SCC 工作流程会导致软件维护问题。现在就把它扔掉,不要回头,因为一旦你解决了它,你就会开始安然入睡。
如果您无法做出决定,请与您的经理竭尽全力 - 您的应用程序需要重构 - 听起来很糟糕!不要接受“不”的答案!
I found this sentence to be the most interesting part of your post:
> The file is used and actively changed in several (> 10) maintenance versions of our product and so it is really hard to refactor it
First, I would recommend that you use a source control system for developing these 10 + maintenance versions that supports branching.
Second, I would create ten branches (one for each of your maintenance versions).
I can feel you cringing already! But either your source control isn't working for your situation because of a lack of features, or it's not being used correctly.
Now to the branch you work on - refactor it as you see fit, safe in the knowledge that you'll not upset the other nine branches of your product.
I would be a bit concerned that you have so much in your main() function.
In any projects I write, I would use main() only perform initialization of core objects - like a simulation or application object - these classes is where the real work should go on.
I would also initialize an application logging object in main for use globally throughout the program.
Finally, in main I also add leak detection code in preprocessor blocks that ensure it's only enabled in DEBUG builds. This is all I would add to main(). Main() should be short!
You say that
> The file basically contains the "main class" (main internal work dispatching and coordination) of our program
It sounds like these two tasks could be split into two separate objects - a co-ordinator and a work dispatcher.
When you split these up, you may mess up your "SCC workflow", but it sounds like adhering stringently to your SCC workflow is causing software maintenance problems. Ditch it, now and don't look back, because as soon as you fix it, you'll begin to sleep easy.
If you're not able to make the decision, fight tooth and nail with your manager for it - your application needs to be refactored - and badly by the sounds of it! Don't take no for an answer!
正如您所描述的,主要问题是区分预分割与后分割、合并错误修复等。围绕它的工具。用 Perl、Ruby 等对脚本进行硬编码并不需要那么长时间,就能消除预分割与后分割串联之间的差异带来的大部分噪音。在处理噪音方面做最简单的事情:
您甚至可以做到这一点,以便每当有签入时,串联都会运行,并且您已经得到了一些东西准备与单文件版本进行比较。
As you've described it, the main issue is diffing pre-split vs post-split, merging in bug fixes etc.. Tool around it. It won't take that long to hardcode a script in Perl, Ruby, etc. to rip out most of the noise from diffing pre-split against a concatenation of post-split. Do whatever's easiest in terms of handling noise:
You could even make it so whenever there's a checkin, the concatenation runs and you've got something prepared to diff against the single-file versions.
“该文件基本上包含了我们程序的‘主类’(主要内部工作调度和协调),因此每次添加功能时,也会影响到这个文件以及每次增长。”
如果那个大开关(我认为存在)成为主要的维护问题,您可以重构它以使用字典和命令模式,并将所有开关逻辑从现有代码删除到加载器,加载器填充该映射,即:
"The file basically contains the "main class" (main internal work dispatching and coordination) of our program, so every time a feature is added, it also affects this file and every time it grows."
If that big SWITCH (which I think there is) becomes the main maintenance problem, you could refactor it to use dictionary and the Command pattern and remove all switch logic from the existing code to the loader, which populates that map, i.e.:
我认为在分割文件时跟踪源历史记录的最简单方法是这样的:
I think the easiest way to track the history of source when splitting a file would be something like this:
我认为在这种情况下我会做的就是咬紧牙关:
只需通过您的第一个签入注释即可解决,例如“如果需要,请从 mainmodule.cpp 中拆分”。回到最近的事情,大多数人都会记得这个变化,如果是两年后,评论会告诉他们去哪里看。当然,回到两年多前看看谁更改了代码以及为什么更改,会有多大价值?
I think what I would do in this situation is bit the bullet and:
Tracking old changes to the file is simply solved by your first check-in comment saying something like "split from mainmodule.cpp". If you need to go back to something recent, most people will remember the change, if it's 2 year from now, the comment will tell them where to look. Of course, how valuable will it be to go back more than 2 years to look at who changed the code and why?
合并不会像将来您获得 30000 个 LOC 文件时那样成为一场大噩梦。所以:
如果您不能在重构过程中停止编码,您可以将这个大文件按原样保留一段时间,至少不添加更多代码:因为它包含一个您可以继承的“主类”并将继承的具有重载函数的类保留在几个新的小型且设计良好的文件中。
Merging will not be such a big nightmare as it will be when you'll get 30000 LOC file in the future. So:
If you can't just stop coding during refactoring process, you could leave this big file as is for a while at least without adding more code to it: since it contains one "main class" you could inherit from it and keep inherited class(es) with overloaded functions in several new small and well designed files.
在文件中找到一些相对稳定的代码(变化不快,分支之间变化不大)并且可以作为一个独立的单元。在所有分支中,将其移至其自己的文件中,并移至其自己的类中。因为它是稳定的,所以当您将更改从一个分支合并到另一个分支时,这不会导致(许多)“尴尬”的合并必须应用于与最初创建的文件不同的文件。重复。
在文件中找到一些代码,基本上只适用于少数分支,并且可以独立存在。变化是否快并不重要,因为分支数量很少。将其移至其自己的类和文件中。重复。
因此,我们已经摆脱了到处相同的代码以及特定于某些分支的代码。
这给你留下了管理不善的代码的核心——它在任何地方都需要,但在每个分支中都是不同的(和/或它不断变化,以便某些分支在其他分支后面运行),但它位于你正在使用的单个文件中。尝试在分支之间合并失败。别再这样做了。永久地对文件进行分支,也许可以通过在每个分支中重命名它来实现。它不再是“主要”,而是“配置 X 的主要”。好的,所以您失去了通过合并将相同更改应用于多个分支的能力,但无论如何,这都是合并效果不佳的代码核心。如果您无论如何都必须手动管理合并来处理冲突,那么在每个分支上独立手动应用它们也没有什么损失。
我认为你说 SCC 的种类并不重要是错误的,因为例如 git 的合并能力可能比你正在使用的合并工具更好。所以核心问题“合并难”对于不同的SCC来说,出现的时间是不同的。但是,您不太可能更改 SCC,因此该问题可能无关紧要。
Find some code in the file which is relatively stable (not changing fast, and doesn't vary much between branches) and could stand as an independent unit. Move this into its own file, and for that matter into its own class, in all branches. Because it's stable, this won't cause (many) "awkward" merges that have to be applied to a different file from the one they were originally made on, when you merge the change from one branch to another. Repeat.
Find some code in the file which basically only applies to a small number of branches, and could stand alone. Doesn't matter whether it's changing fast or not, because of the small number of branches. Move this into its own classes and files. Repeat.
So, we've got rid of the code that's the same everywhere, and the code that's specific to certain branches.
This leaves you with a nucleus of badly-managed code - it's needed everywhere, but it's different in every branch (and/or it changes constantly so that some branches are running behind others), and yet it's in a single file that you're unsuccessfully trying to merge between branches. Stop doing that. Branch the file permanently, perhaps by renaming it in each branch. It's not "main" any more, it's "main for configuration X". OK, so you lose the ability to apply the same change to multiple branches by merging, but this is in any case the core of code where merging doesn't work very well. If you're having to manually manage the merges anyway to deal with conflicts, then it's no loss to manually apply them independently on each branch.
I think you're wrong to say that the kind of SCC doesn't matter, because for example git's merging abilities are probably better than the merge tool you're using. So the core problem, "merging is difficult" occurs at different times for different SCCs. However, you're unlikely to be able to change SCCs, so the issue is probably irrelevant.
在我看来,您在这里面临着许多代码异味。首先,主类似乎违反了开放/封闭原则。听起来它正在处理太多的责任。因此,我认为代码比需要的更脆弱。
虽然我可以理解您对重构后的可追溯性的担忧,但我预计此类很难维护和增强,并且您所做的任何更改都可能会导致副作用。我认为这些成本超过了重构类的成本。
无论如何,由于代码的味道只会随着时间的推移而变得更糟,因此至少在某些时候这些成本将超过重构的成本。根据你的描述,我认为你已经过了临界点。
重构应该分小步完成。如果可能的话,在重构任何内容之前添加自动化测试来验证当前行为。然后挑选出独立功能的小区域并将其提取为类型以委派责任。
无论如何,这听起来像是一个重大项目,祝你好运:)
It sounds to me like you're facing a number of code smells here. First of all the main class appears to violate the open/closed principle. It also sounds like it is handling too many responsibilities. Due to this I would assume the code to be more brittle than it needs to be.
While I can understand your concerns regarding traceability following a refactoring, I would expect that this class is rather hard to maintain and enhance and that any changes you do make are likely to cause side effects. I would assume that the cost of these outweighs the cost of refactoring the class.
In any case, since the code smells will only get worse with time, at least at some point the cost of these will outweigh the cost of refactoring. From your description I would assume that you're past the tipping point.
Refactoring this should be done in small steps. If possible add automated tests to verify current behavior before refactoring anything. Then pick out small areas of isolated functionality and extract these as types in order to delegate the responsibility.
In any case, it sounds like a major project, so good luck :)
我想到的解决此类问题的唯一解决方案如下。所描述的方法的实际增益是进化的渐进性。这里没有革命,否则你很快就会遇到麻烦。
在原始主类上方插入一个新的 cpp 类。目前,它基本上会将所有调用重定向到当前主类,但旨在使这个新类的 API 尽可能清晰和简洁。
完成此操作后,您就可以在新类中添加新功能。
至于现有的功能,当它们变得足够稳定时,您必须逐步将它们转移到新的类中。您将失去这段代码的 SCC 帮助,但对此无能为力。只要选择正确的时机即可。
我知道这并不完美,但我希望它能有所帮助,并且该过程必须适应您的需求!
其他信息
请注意,Git 是一种 SCC,可以跟踪代码片段从一个文件到另一个文件。我听说过有关它的好消息,因此当您逐步推进工作时它会有所帮助。
Git 是围绕 blob 的概念构建的,如果我理解正确的话,它代表代码文件的片段。在不同的文件中移动这些片段,即使您修改了它们,Git 也会找到它们。除了下面评论中提到的 Linus Torvalds 的视频之外,我找不到关于这一点很清楚。
The only solution I have ever imagined to such problems follows. The actual gain by the described method is progressiveness of the evolutions. No revolutions here, otherwise you'll be in trouble very fast.
Insert a new cpp class above the original main class. For now, it would basically redirect all calls to the current main class, but aim at making the API of this new class as clear and succinct as possible.
Once this has been done, you get the possibility to add new functionalities in new classes.
As for existing functionalities, you have to progressively move them in new classes as they become stable enough. You will lose SCC help for this piece of code, but there is not much that can be done about that. Just pick the right timing.
I know this is not perfect, though I hope it can help, and the process must be adapted to your needs!
Additional information
Note that Git is an SCC that can follow pieces of code from one file to another. I have heard good things about it, so it could help while you are progressively moving your work.
Git is constructed around the notion of blobs which, if I understand correctly, represent pieces of code files. Move these pieces around in different files and Git will find them, even if you modify them. Apart from the video from Linus Torvalds mentioned in comments below, I have not been able to find something clear about this.
孔子曰:“欲出洞,先止于挖洞”。
Confucius say: "first step to getting out of hole is to stop digging hole."
让我猜猜:十个具有不同功能集的客户和一个提倡“定制”的销售经理?我以前曾开发过类似的产品。我们基本上遇到了同样的问题。
您认识到拥有庞大的文件很麻烦,但更麻烦的是您必须保持“最新”的十个版本。这就是多次维护。 SCC 可以使这一切变得更容易,但它不能使它正确。
在尝试将文件分解为多个部分之前,您需要使十个分支重新彼此同步,以便您可以立即查看和调整所有代码。您可以一次执行一个分支,针对同一主代码文件测试两个分支。要强制执行自定义行为,您可以使用 #ifdef 等,但最好尽可能对定义的常量使用普通的 if/else。这样,您的编译器将验证所有类型,并且很可能无论如何都会消除“死”目标代码。 (不过,您可能想关闭有关死代码的警告。)
一旦所有分支仅隐式共享该文件的一个版本,那么开始传统的重构方法就会变得相当容易。
#ifdef 主要适用于受影响的代码仅在其他每分支自定义的上下文中才有意义的部分。有人可能会说,这也为相同的分支合并计划提供了机会,但不要太过疯狂。请一次完成一个庞大的项目。
从短期来看,该文件将会出现增长。这没关系。你所做的就是将需要整合的事物整合在一起。之后,您将开始看到无论版本如何都明显相同的区域;这些可以单独保留或随意重构。其他区域将根据版本的不同而明显不同。在这种情况下,您有多种选择。一种方法是将差异委托给每个版本的策略对象。另一种方法是从公共抽象类派生客户端版本。但只要你在不同的分支上有十个开发“技巧”,这些转变都是不可能的。
Let me guess: Ten clients with divergent feature sets and a sales manager that promotes "customization"? I've worked on products like that before. We had essentially the same problem.
You recognize that having an enormous file is trouble, but even more trouble is ten versions that you have to keep "current". That's multiple maintenance. SCC can make that easier, but it can't make it right.
Before you try to break the file into parts, you need to bring the ten branches back in sync with each other so that you can see and shape all the code at once. You can do this one branch at a time, testing both branches against the same main code file. To enforce the custom behavior, you can use #ifdef and friends, but it's better as much as possible to use ordinary if/else against defined constants. This way, your compiler will verify all types and most probably eliminate "dead" object code anyway. (You may want to turn off the warning about dead code, though.)
Once there's only one version of that file shared implicitly by all branches, then it's rather easier to begin traditional refactoring methods.
The #ifdefs are primarily better for sections where the affected code only makes sense in the context of other per-branch customizations. One may argue that these also present an opportunity for the same branch-merging scheme, but don't go hog-wild. One colossal project at a time, please.
In the short run, the file will appear to grow. This is OK. What you're doing is bringing things together that need to be together. Afterwards, you'll begin to see areas that are clearly the same regardless of version; these can be left alone or refactored at will. Other areas will clearly differ depending on the version. You have a number of options in this case. One method is to delegate the differences to per-version strategy objects. Another is to derive client versions from a common abstract class. But none of these transformations are possible as long as you have ten "tips" of development in different branches.
我不知道这是否解决了您的问题,但我猜您想要做的是将文件的内容迁移到彼此独立的较小文件(总结)。
我还了解到,该软件有大约 10 个不同版本,您需要支持所有这些版本,而不会把事情搞砸。
首先,这绝对不可能是一件容易的事,并且可以在几分钟的头脑风暴中自行解决。文件中链接的函数对于您的应用程序都至关重要,简单地删除它们并将它们迁移到其他文件并不能解决您的问题。
我认为您只有以下选择:
不要迁移并保留现有的内容。可能辞掉你的工作,开始开发具有良好设计的严肃软件。如果您正在从事一个长期项目,并且有足够的资金来承受一两次崩溃,极限编程并不总是最好的解决方案。
制定一个布局,说明您希望文件分割后的外观。创建必要的文件并将它们集成到您的应用程序中。重命名函数或重载它们以获取附加参数(也许只是一个简单的布尔值?)。
一旦您必须处理代码,请将需要处理的函数迁移到新文件,并将旧函数的函数调用映射到新函数。
您应该仍然以这种方式拥有您的主文件,并且仍然能够看到对其所做的更改,一旦涉及到特定功能,您就可以确切地知道它何时被外包等等。
尝试用一些美味的蛋糕来说服您的同事,工作流程被高估了,您需要重写应用程序的某些部分才能开展严肃的业务。
I don't know if this solves your problem, but what I guess you want to do is migrate the content of the file to smaller files independent of each other (summed up).
What I also get is that you have about 10 different versions of the software floating around and you need to support them all without messing things up.
First of all there is just no way that this is easy and will solve itself in a few minutes of brainstorming. The functions linked in your file are all vital to your application, and simply cutting them of and migrating them to other files won't save your problem.
I think you only have these options:
Don't migrate and stay with what you have. Possibly quit your job and start working on serious software with good design in addition. Extreme programming is not always the best solution if you are working on a long time project with enough funds to survive a crash or two.
Work out a layout of how you would love your file to look once it's split up. Create the necessary files and integrate them in your application. Rename the functions or overload them to take an additional parameter (maybe just a simple boolean?).
Once you have to work on your code, migrate the functions you need to work on to the new file and map the function calls of the old functions to the new functions.
You should still have your main-file this way, and still be able to see the changes that were made to it, once it comes to a specific function you know exactly when it was outsourced and so on.
Try to convince your co-workers with some good cake that workflow is overrated and that you need to rewrite some parts of the application in order to do serious business.
正是这个问题在“有效处理遗留代码”一书中的一章中得到了解决( http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052)。
Exactly this problem is handled in one of the chapters of the book "Working Effectively with Legacy Code" (http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052).
我认为您最好创建一组映射到 mainmodule.cpp 的 API 点的命令类。
一旦它们就位,您将需要重构现有的代码库以通过命令类访问这些 API 点,完成后,您可以自由地将每个命令的实现重构为新的类结构。
当然,对于 11 个 KLOC 的单个类,其中的代码可能是高度耦合且脆弱的,但创建单独的命令类将比任何其他代理/外观策略更有帮助。
我并不羡慕这项任务,但随着时间的推移,如果不解决这个问题只会变得更糟。
更新
我建议命令模式比外观更可取。
在(相对)整体的外观上维护/组织许多不同的命令类是更好的选择。将单个 Facade 映射到 11 个 KLOC 文件本身可能需要分成几个不同的组。
为什么要费力去弄清楚这些立面组呢?通过命令模式,您将能够有机地对这些小类进行分组和组织,因此您拥有更大的灵活性。
当然,这两种选择都比单个 11 KLOC 和不断增长的文件要好。
I think you would be best off creating a set of command classes that map to the API points of the mainmodule.cpp.
Once they are in place, you will need to refactor the existing code base to access these API points via the command classes, once that's done, you are free to refactor each command's implementation into a new class structure.
Of course, with a single class of 11 KLOC the code in there is probably highly coupled and brittle, but creating individual command classes will help much more than any other proxy/facade strategy.
I don't envy the task, but as time goes on this problem will only get worse if it's not tackled.
Update
I'd suggest that the Command pattern is preferable to a Facade.
Maintaining/organizing a lot of different Command classes over a (relatively) monolithic Facade is preferable. Mapping a single Facade onto a 11 KLOC file will probably need to be broken up into a few different groups itself.
Why bother trying to figure out these facade groups? With the Command pattern you will be able to group and organise these small classes organically, so you have a lot more flexibility.
Of course, both options are better than the single 11 KLOC and growing, file.
一个重要的建议:不要混合重构和错误修复。您想要的程序版本与以前的版本相同,只是源代码不同。
一种方法可能是开始将最小的函数/部分拆分到它自己的文件中,然后包含一个标头(从而将 main.cpp 变成 #includes 列表,这听起来本身就有代码味道*我不是虽然是 C++ Guru),但至少现在它被分成了文件)。
然后,您可以尝试将所有维护版本切换到“新”main.cpp 或任何您的结构。再次强调:没有其他更改或错误修复,因为跟踪这些内容非常令人困惑。
另一件事:尽管您可能希望一次性重构整个事情,但您可能会贪多嚼不烂。也许只选择一两个“部分”,将它们放入所有版本中,然后为您的客户增加更多价值(毕竟,重构不会增加直接价值,因此这是一项必须合理的成本),然后选择另一个一个或两个部分。
显然,这需要团队中的一些纪律来实际使用拆分文件,而不仅仅是一直向 main.cpp 添加新内容,但同样,尝试进行大规模重构可能不是最好的做法。
One important advice: Do not mix refactoring and bugfixes. What you want is a Version of your program that is identical to the previous version, except that the source code is differently.
One way could be to start splitting up the least big function/part into it's own file and then either include with a header (thus turning main.cpp into a list of #includes, which sounds a code smell in itself *I'm not a C++ Guru though), but at least it's now split into files).
You could then try to switch all maintenance releases over to the "new" main.cpp or whatever your structure is. Again: No other changes or Bugfixes because tracking those is confusing as hell.
Another thing: As much as you may desire making one big pass at refactoring the whole thing in one go, you might bite off more than you can chew. Maybe just pick one or two "parts", get them into all the releases, then add some more value for your customer (after all, Refactoring does not add direct value so it is a cost that has to be justified) and then pick another one or two parts.
Obviously that requires some discipline in the team to actually use the split files and not just add new stuff to the main.cpp all the time, but again, trying to do one massive refactor may not be the best course of action.
罗夫,这让我想起了我以前的工作。看起来,在我加入之前,所有内容都在一个巨大的文件(也是 C++)中。然后他们将其(使用包含在完全随机的点上)分成大约三个(仍然是巨大的文件)。正如您所料,该软件的质量非常糟糕。该项目总计约 40k LOC。 (几乎没有注释,但有很多重复的代码)
最后我完全重写了该项目。我首先从头开始重做该项目中最糟糕的部分。当然,我想到了这个新部分和其余部分之间可能的(小)接口。然后我确实将这部分插入到旧项目中。我没有重构旧代码来创建必要的接口,只是替换了它。然后我从那里开始采取一些小步骤,重写旧代码。
不得不说,这花了大约半年的时间,期间除了错误修复之外,没有对旧代码库进行任何开发。
编辑:
大小保持在大约 40k LOC,但新应用程序包含更多功能,并且其初始版本中的错误可能比 8 年老软件更少。重写的原因之一是我们需要新功能,而在旧代码中引入它们几乎是不可能的。
该软件适用于嵌入式系统、标签打印机。
我要补充的另一点是,理论上该项目是 C++ 的。但它根本不是OO,它可能是C。新版本是面向对象的。
Rofl, this reminds me of my old job. It seems that, before I joined, everything was inside one huge file (also C++). Then they've split it up (at completely random points using includes) into about three (still huge files). The quality of this software was, as you might expect, horrible. The project totaled at about 40k LOC. (containing almost no comments but LOTS of duplicate code)
In the end I did a complete rewrite of the project. I started by redoing the worst part of the project from scratch. Of course I had in mind a possible (small) interface between this new part and the rest. Then I did insert this part into the old project. I didn't refactor the old code to create the interface necessary, but just replaced it. Then I took made small steps from there, rewriting the old code.
I have to say that this took about half a year and there was no development of the old code base beside bugfixes during that time.
edit:
The size stayed at about 40k LOC but the new application contained many more features and presumably less bugs in its initial version than the 8 year old software. One reason of the rewrite was also that we needed the new features and introducing them inside the old code was nearly impossible.
The software was for an embedded system, a label printer.
Another point that I should add is that in theory the project was C++. But it wasn't OO at all, it could have been C. The new version was object oriented.
好吧,所以在大多数情况下重写生产代码的 API 作为一个开始并不是一个好主意。有两件事需要发生。
第一,您实际上需要让您的团队决定对此文件的当前生产版本进行代码冻结。
第二,您需要采用此生产版本并创建一个分支,使用预处理指令来管理构建以拆分大文件。使用 JUST 预处理器指令(#ifdefs、#includes、#endifs)分割编译比重新编码 API 更容易。这对于您的 SLA 和持续支持来说绝对更容易。
在这里,您可以简单地删除与类中特定子系统相关的函数,并将它们放入 mainloop_foostuff.cpp 文件中,并将其包含在 mainloop.cpp 中的正确位置。
或
一种更耗时但更可靠的方法是设计一个内部依赖结构,在如何包含事物方面具有双重间接性。这将允许您将事情分开并仍然处理相互依赖关系。请注意,这种方法需要位置编码,因此应结合适当的注释。
此方法将包括根据您正在编译的变体使用的组件。
基本结构是,您的 mainclass.cpp 将在如下语句块之后包含一个名为 MainClassComponents.cpp 的新文件:
MainClassComponents.cpp 文件的主要结构将用于计算子组件内的依赖关系,如下所示:
现在,为每个组件创建一个 component_xx.cpp 文件。
当然,我使用的是数字,但您应该根据您的代码使用更符合逻辑的东西。
使用预处理器可以让您将事情分开,而不必担心 API 更改,这在生产中是一场噩梦。
一旦确定了生产,您就可以实际进行重新设计。
OK so for the most part rewriting API of production code is a bad idea as a start. Two things need to happen.
One, you need to actually have your team decide to do a code freeze on current production version of this file.
Two, you need to take this production version and create a branch that manages the builds using preprocessing directives to split up the big file. Splitting the compilation using JUST preprocessor directives (#ifdefs, #includes, #endifs) is easier than recoding the API. It's definitely easier for your SLAs and ongoing support.
Here you could simply cut out functions that relate to a particular subsystem within the class and put them in a file say mainloop_foostuff.cpp and include it in mainloop.cpp at the right location.
OR
A more time consuming but robust way would be to devise an internal dependencies structure with double-indirection in how things get included. This will allow you to split things up and still take care of co-dependencies. Note that this approach requires positional coding and therefore should be coupled with appropriate comments.
This approach would include components that get used based on which variant you are compiling.
The basic structure is that your mainclass.cpp will include a new file called MainClassComponents.cpp after a block of statements like the following:
The primary structure of the MainClassComponents.cpp file would be there to work out dependencies within the sub components like this:
And now for each component you create a component_xx.cpp file.
Of course i am using numbers but you should use something more logical based on your code.
Using preprocessor allows you to split things up without having to worry about API changes which is a nightmare in production.
Once you have production settled you can then actually work on redesign.
好吧,我理解你的痛苦:)我也参与过一些这样的项目,但它并不漂亮。这个问题没有简单的答案。
一种可能对您有用的方法是开始在所有函数中添加安全防护措施,即检查方法中的参数、前置/后置条件,然后最终添加单元测试以捕获源的当前功能。一旦你有了这个,你就可以更好地重构代码,因为如果你忘记了什么,你会弹出断言和错误来提醒你。
但有时重构带来的痛苦可能多于好处。那么最好离开原始项目并处于伪维护状态,从头开始,然后逐步添加来自野兽的功能。
Well I understand your pain :) I've been in a few such projects as well and it's not pretty. There is no easy answer for this.
One approach that may work for you is to start adding safe guards in all functions, that is, checking arguments, pre/post-conditions in methods, then eventually adding unit tests all in order to capture the current functionality of the sources. Once you have this you are better equipped to re-factor the code because you will have asserts and errors popping up alerting you if you have forgotten something.
Sometimes though there are times when refactoring just may bring more pain than benefit. Then it may be better to just leave the original project and in a pseudo maintenance state and start from scratch and then incrementally adding the functionality from the beast.
您不应该关心减少文件大小,而应该关心减少类大小。归根结底几乎是一样的,但让你从不同的角度看待问题(如 @Brian Rasmussen 建议,你的类似乎有很多责任)。
You should not be concerned with reducing the file-size, but rather with reducing the class-size. It comes down to almost the same, but makes you look at the problem from a different angle (as @Brian Rasmussen suggests, your class seems to have to many responsibilities).
你所拥有的是一个经典的例子,一个已知的设计反模式,称为blob。花一些时间阅读我在这里指出的文章,也许您会发现一些有用的东西。此外,如果这个项目像看起来那么大,你应该考虑一些设计,以防止成长为你无法控制的代码。
What you have is a classic example a known design antipattern called the blob. Take some time to read the article I point here, and maybe you may find something useful. Besides, if this project is as big as it looks, you should consider some design to prevent growing into code that you can't control.
这不是大问题的答案,而是特定部分的理论解决方案:
找出要将大文件拆分为子文件的位置。在每个点以某种特殊格式添加注释。
编写一个相当简单的脚本,在这些点将文件分解为子文件。 (也许特殊注释嵌入了文件名,脚本可以将其用作如何拆分它的说明。)它应该保留注释作为拆分的一部分。
运行脚本。删除原始文件。
当您需要从分支合并时,首先通过将各个部分连接在一起来重新创建大文件,进行合并,然后重新拆分它。
另外,如果您想保留 SCC 文件历史记录,我认为最好的方法是告诉您的源代码控制系统各个片段文件是原始文件的副本。然后,它将保留该文件中保存的部分的历史记录,当然它也会记录大部分内容被“删除”。
This isn't an answer to the big problem, but a theoretical solution to a specific piece of it:
Figure out where you want to split the big file into subfiles. Put comments in some special format at each of those points.
Write a fairly trivial script that will break the file apart into subfiles at those points. (Perhaps the special comments have embedded filenames that the script can use as instructions for how to split it.) It should preserve the comments as part of the splitting.
Run the script. Delete the original file.
When you need to merge from a branch, first recreate the big file by concatenating the pieces back together, do the merge, and then re-split it.
Also, if you want to preserve the SCC file history, I expect the best way to do that is to tell your source control system that the individual piece files are copies of the original. Then it will preserve the history of the sections that were kept in that file, although of course it will also record that large parts were "deleted".
在没有太大危险的情况下拆分它的一种方法是对所有线路变化进行历史性的审视。是否有某些功能比其他功能更稳定?如果你愿意的话,可以看看变化的热点。
如果某行在几年内没有更改过,您可能可以将其移动到另一个文件而不必担心。我会看一下带有最新修订版本注释的源代码,该版本涉及给定的行,看看是否有任何可以提取的功能。
One way to split it without too much danger would be to take a historic look at all the line changes. Are there certain functions that are more stable than others? Hot spots of change if you will.
If a line hasn't been changed in a few years you can probably move it to another file without too much worry. I'd take a look at the source annotated with the last revision that touched a given line and see if there are any functions you could pull out.