Divide and conquer. Split the file into smaller, logical libraries and function groupings. You will learn more about the library this way, and make it easier to understand and test incrementally.
Remove duplication. Look for repeated functions and concepts, and replace them with standard library functions, or centralized functions within the library.
Add consistency. Smooth out parameters and naming.
Add unit tests. This is the most important part of refactoring a library. Use jUnit (or similar), and add tests that you can use to verify that the functions are both correct, and that they have not changed.
Add docs. Document your understanding of the consistent, improved library as you write your tests.
If the code is badly written, it is likely that it has a lot of cloning. Finding and getting rid of the clones would then likely make it a lot more maintainable as well as reducing its size.
You can find a variety of clone detectors, these specifically for PHP:
ranked in least-to-most capability order (IMHO with my strong personal self-interest in CloneDR) in terms of qualitatively different ability to detect interesting clones.
If the code is badly written, a lot of it might be dead. It would be worthwhile to find out which part executes in practice, and which does not. A test coverage tool can give you good insight into the answer for this question, even in the absence of tests (you simply exercise your program by hand). What the test coverage tool says executes, obviously isn't dead. What doesn't execute... might be worth further investigation to see if you can remove it. A test coverage tool is also useful to tell you how much of the code is exercised by your unit tests, as suggested by another answer. Finally, a test coverage tool can help you find where some of the functionality is: exercise the functionality from the outside, and whatever code the test coverage tool says is executed is probably relevant.
If it's an open source library, ask the developers. First it's very likely someone already has (attempted) a restructured version. And very occassionally the big bloated version of something was actually auto-generated from a more modular version.
I actually do that sometimes for one of my applications which is strictly pluginized, and allows a simple cat */*.php > monolithic.php, which eases distribution and handling. So ask if that might be the case there.
If you really want to restructure it, then use the time-proven incremental extension structure. Split up the class library into mutliple files, by segregating the original class. Split every ~ 2000 lines, and name the first part library0.php:
class library0 {
var $var1,$var2,$var3,$var4;
function method1();
function method2();
function method3();
function method4();
function method5();
The next part simple goes from there and holds the next few methods:
class library1 extends library0 {
function method6();
function method7();
function method8();
...
Do so until you have separated them all. Call the last file by its real name library.php, and class library extends library52 { should do it. That's so ridiculously simplistic, a regex script should be able to do it.
Now obviously, there are no memory savings here. And splitting it up like that buys you nothing in terms of structuring. With 20000 lines it's however difficult to get a quick overview and senseful grouping right the first time. So start with an arbitrary restructuring in lieu of an obvious plan. But going from there you could very well sort and put the least useful code into the last file, and use the lighter base classes whenever they suffice. You'll need a dependency chart however to see if this is workable, else errors might blow up at runtime.
(I haven't tried this approach with a huge project like that. But arbitrarily splitting something into three parts, and then reshuffling it for sensibility did work out. That one time.)
I assume you are planning to break the library up into thematically relevant classes. Definitely consider using autoloading. It's the best thing since sliced bread, and makes inter-dependencies easy to handle.
Document the code using phpDoc compatible comments from the start.
If you know the library use is limited to a particular class, module, or project it can be easier to approach the problem from the calling side. You can then do the following to clean the code and refactor it. The point of approaching from the calling side is because there are very few calls into the library. The fewer the calls the (potentially) less code that is actually used in the lib.
Write the Calling Side Tests
Write a test that mimics the calls that are done against the library.
Bury the Dead Code
If there is a lot of dead code this will be a huge win. Trace the the actual calls into the library and remove everything else. Run the test and verify.
Refactor Whats Left
Since you have the tests it should be much easier to refactor (or even replace) the code in the library. You can then apply the standard refactoring rules ie. (de-duplication, simplification, consolidation, etc).
Apart from what was already stated I suggest to have a look at Martin Fowler's Catalog of Refactorings based on his book. The page also contains a large number of additional sources useful in understanding how refactoring should be approached. A more detailed catalog listing can be found at sourcemaking. Note that not all of these techniques and patterns can be applied to PHP code.
There is also a lot useful tools to assist you in the refactorings (and in general) at http://phpqatools.org. Use these to analze your code to find things like dead or duplicated code, high cyclomatic complexity, often executed code and so on. Not only will this give you a better overview of your code, but it will also tell you which portions of your code are critical (and better left untouched in the beginning) and which could be candidates for refactorings.
Whatever you do, do write Unit-Tests. You have to make sure you are not breaking code when refactoring. If the library is not unit-tested yet, add a test before you change any code. If you find you cannot write a test for a portion of code you want to change, check if doing a smaller refactoring in some other place might let you do so more easily. If not, do not attempt the refactoring until you can.
Write tests for the library such that all the lines of the code is covered(i.e 100% Coverage).
Use TDD. Start from the higher level module and re-factor(Top to Bottom approach).
Run the tests mentioned in step 1. and verify with the results of step 2.
I understand that 100% coverage(as mentioned in step 1) does not necessarily mean that all the features have been covered at least we are making sure that whatever the o/p of the current system will be same as the o/p of new system.
First of all, consider using a different IDE - Eclipse is notoriously terrible in terms of performance. Komodo is way faster. So is PhpStorm.
In terms of making the refactoring easier, I'd first try to identify the high-level picture - what functions are there? Are there classes? Can you put those classes into separate files just to start with?
now body of all the functions and their names are in array create separate files for each of the functions in folder 'functions'
foreach($code_array as $function)
{
$funcTemp = explode('(',$function); // getting function name
$function_name = trim($funcTemp[0]);
$function_text = '<?php function '.$function;
file_put_contents('functions/'.$function_name.'.php',$function_text)
}
now all the functions of your library are in the separate files in a common folder. files are named with the function names. now you can easily look up you functions in folder view and apply your strategies to manage them.
You can also implemet __call() function to use same formates
function __call($name,$params)
{
include_once('functions/'.$name.'.php');
$name($params); // this may be wrong ...
}
Usually, a general rule of thumb is to remove repeated code. Also make sure to have useful documentation. If you're using Java, Javadoc is very useful, but a suitable equivalent is available for other languages.
发布评论
评论(12)
一些通用原则适用:
分而治之。将文件拆分为更小的逻辑库和功能分组。您将通过这种方式了解有关该库的更多信息,并使其更容易逐步理解和测试。
删除重复项。查找重复的函数和概念,并将其替换为标准库函数或库内的集中函数。
添加一致性。平滑参数和命名。
添加单元测试。这是重构库最重要的部分。使用 jUnit(或类似的),并添加可用于验证函数是否正确且未更改的测试。
添加文档。在编写测试时记录您对一致、改进的库的理解。
A few generic principles apply:
Divide and conquer. Split the file into smaller, logical libraries and function groupings. You will learn more about the library this way, and make it easier to understand and test incrementally.
Remove duplication. Look for repeated functions and concepts, and replace them with standard library functions, or centralized functions within the library.
Add consistency. Smooth out parameters and naming.
Add unit tests. This is the most important part of refactoring a library. Use jUnit (or similar), and add tests that you can use to verify that the functions are both correct, and that they have not changed.
Add docs. Document your understanding of the consistent, improved library as you write your tests.
如果代码写得不好,很可能有很多克隆。找到并消除克隆可能会使其更易于维护并减小其大小。
您可以找到各种克隆检测器,这些专门用于 PHP:
就检测感兴趣克隆的不同性质的能力而言,按从最低到最高的能力顺序排列(恕我直言,我对 CloneDR 有强烈的个人兴趣)。
如果代码写得不好,很多代码可能就死了。找出哪些部分在实践中执行,哪些部分不执行是值得的。即使在没有测试的情况下(您只需手动练习您的程序),测试覆盖率工具也可以让您深入了解该问题的答案。测试覆盖率工具所说的执行显然还没有死。不执行的内容...可能值得进一步调查,看看是否可以将其删除。正如另一个答案所建议的,测试覆盖率工具也有助于告诉您单元测试执行了多少代码。最后,测试覆盖率工具可以帮助您找到某些功能的位置:从外部执行功能,以及测试覆盖率工具所说的执行的任何代码都可能是相关的。
我们的PHP 测试覆盖率工具可以收集测试覆盖率数据。
If the code is badly written, it is likely that it has a lot of cloning. Finding and getting rid of the clones would then likely make it a lot more maintainable as well as reducing its size.
You can find a variety of clone detectors, these specifically for PHP:
ranked in least-to-most capability order (IMHO with my strong personal self-interest in CloneDR) in terms of qualitatively different ability to detect interesting clones.
If the code is badly written, a lot of it might be dead. It would be worthwhile to find out which part executes in practice, and which does not. A test coverage tool can give you good insight into the answer for this question, even in the absence of tests (you simply exercise your program by hand). What the test coverage tool says executes, obviously isn't dead. What doesn't execute... might be worth further investigation to see if you can remove it. A test coverage tool is also useful to tell you how much of the code is exercised by your unit tests, as suggested by another answer. Finally, a test coverage tool can help you find where some of the functionality is: exercise the functionality from the outside, and whatever code the test coverage tool says is executed is probably relevant.
Our PHP Test Coverage Tool can collect test coverage data.
如果它是开源库,请询问开发人员。首先,很可能有人已经(尝试)了重组版本。有时候,某些东西的大而臃肿的版本实际上是从更模块化的版本自动生成的。
实际上,有时我会为我的一个严格插件化的应用程序这样做,并允许一个简单的
cat */*.php > 。 monolithic.php
,简化了分发和处理。所以问问那里是否可能是这种情况。如果您确实想要重组它,那么请使用经过时间验证的增量扩展结构。通过隔离原始类,将类
库
拆分为多个文件。每约 2000 行拆分一次,并将第一部分命名为library0.php
:下一部分简单地从那里开始并包含接下来的几个方法:
这样做,直到将它们全部分开。通过其真实名称
library.php
调用最后一个文件,并且class library extends library52 {
应该执行此操作。这太简单了,正则表达式脚本应该能够做到这一点。显然,这里没有节省内存。像这样拆分它在结构方面不会给你带来任何好处。然而,对于 20000 行,第一次就很难获得快速概览和有意义的分组。因此,从任意的重组开始,而不是一个明显的计划。但是从那里开始,您可以很好地排序并将最不有用的代码放入最后一个文件中,并在足够的时候使用较轻的基类。但是,您需要一个依赖关系图来查看这是否可行,否则错误可能会在运行时爆发。
(我还没有在这样的大型项目中尝试过这种方法。但是随意将某些东西分成三部分,然后根据敏感性重新调整它确实有效。那一次。)
If it's an open source library, ask the developers. First it's very likely someone already has (attempted) a restructured version. And very occassionally the big bloated version of something was actually auto-generated from a more modular version.
I actually do that sometimes for one of my applications which is strictly pluginized, and allows a simple
cat */*.php > monolithic.php
, which eases distribution and handling. So ask if that might be the case there.If you really want to restructure it, then use the time-proven incremental extension structure. Split up the class
library
into mutliple files, by segregating the original class. Split every ~ 2000 lines, and name the first partlibrary0.php
:The next part simple goes from there and holds the next few methods:
Do so until you have separated them all. Call the last file by its real name
library.php
, andclass library extends library52 {
should do it. That's so ridiculously simplistic, a regex script should be able to do it.Now obviously, there are no memory savings here. And splitting it up like that buys you nothing in terms of structuring. With 20000 lines it's however difficult to get a quick overview and senseful grouping right the first time. So start with an arbitrary restructuring in lieu of an obvious plan. But going from there you could very well sort and put the least useful code into the last file, and use the lighter base classes whenever they suffice. You'll need a dependency chart however to see if this is workable, else errors might blow up at runtime.
(I haven't tried this approach with a huge project like that. But arbitrarily splitting something into three parts, and then reshuffling it for sensibility did work out. That one time.)
我假设您计划将库分成主题相关的类。绝对考虑使用自动加载。这是自切片面包以来最好的事情,并且使相互依赖关系易于处理。
从一开始就使用 phpDoc 兼容注释来记录代码。
I assume you are planning to break the library up into thematically relevant classes. Definitely consider using autoloading. It's the best thing since sliced bread, and makes inter-dependencies easy to handle.
Document the code using phpDoc compatible comments from the start.
调用方方法
如果您知道库的使用仅限于特定的类、模块或项目,那么从调用方解决问题会更容易。然后,您可以执行以下操作来清理代码并重构它。从调用方接近的点是因为对库的调用很少。调用越少,库中实际使用的(可能)代码就越少。
编写调用方测试
编写一个模拟对库进行的调用的测试。
埋葬死代码
如果有很多死代码,这将是一个巨大的胜利。跟踪对库的实际调用并删除其他所有内容。运行测试并验证。
重构剩下的内容
既然您已经进行了测试,那么重构(甚至替换)库中的代码应该会容易得多。然后,您可以应用标准重构规则,即。 (去重、简化、整合等)。
Calling Side Approach
If you know the library use is limited to a particular class, module, or project it can be easier to approach the problem from the calling side. You can then do the following to clean the code and refactor it. The point of approaching from the calling side is because there are very few calls into the library. The fewer the calls the (potentially) less code that is actually used in the lib.
Write the Calling Side Tests
Write a test that mimics the calls that are done against the library.
Bury the Dead Code
If there is a lot of dead code this will be a huge win. Trace the the actual calls into the library and remove everything else. Run the test and verify.
Refactor Whats Left
Since you have the tests it should be much easier to refactor (or even replace) the code in the library. You can then apply the standard refactoring rules ie. (de-duplication, simplification, consolidation, etc).
除了已经说过的内容之外,我建议看看 Martin Fowler 的基于重构的目录关于他的书。该页面还包含大量其他资源,有助于理解如何进行重构。更详细的目录列表可以在 sourcemaking 中找到。请注意,并非所有这些技术和模式都可以应用于 PHP 代码。
http://phpqatools.org 上还有很多有用的工具可以帮助您进行重构(一般来说)。使用这些来分析您的代码,以查找死代码或重复代码、高圈复杂度、经常执行的代码等。这不仅可以让您更好地了解代码,还可以告诉您代码的哪些部分是关键的(最好一开始就保持不变)以及哪些部分可以进行重构。
无论你做什么,都要编写单元测试。您必须确保重构时不会破坏代码。如果库尚未经过单元测试,请在更改任何代码之前添加测试。如果您发现无法为要更改的部分代码编写测试,请检查在其他地方进行较小的重构是否可以让您更轻松地做到这一点。如果没有,请不要尝试重构,直到可以为止。
Apart from what was already stated I suggest to have a look at Martin Fowler's Catalog of Refactorings based on his book. The page also contains a large number of additional sources useful in understanding how refactoring should be approached. A more detailed catalog listing can be found at sourcemaking. Note that not all of these techniques and patterns can be applied to PHP code.
There is also a lot useful tools to assist you in the refactorings (and in general) at http://phpqatools.org. Use these to analze your code to find things like dead or duplicated code, high cyclomatic complexity, often executed code and so on. Not only will this give you a better overview of your code, but it will also tell you which portions of your code are critical (and better left untouched in the beginning) and which could be candidates for refactorings.
Whatever you do, do write Unit-Tests. You have to make sure you are not breaking code when refactoring. If the library is not unit-tested yet, add a test before you change any code. If you find you cannot write a test for a portion of code you want to change, check if doing a smaller refactoring in some other place might let you do so more easily. If not, do not attempt the refactoring until you can.
代码的所有行都是
覆盖(即100%覆盖)。
TDD。从更高的开始
级别模块和重构(顶部到
底部方法)。
我知道 100% 覆盖率(如步骤 1 中提到的)并不一定意味着所有功能都已被覆盖,至少我们确保无论当前系统的o/p将与新系统的o/p相同。
that all the lines of the code is
covered(i.e 100% Coverage).
TDD. Start from the higher
level module and re-factor(Top to
Bottom approach).
I understand that 100% coverage(as mentioned in step 1) does not necessarily mean that all the features have been covered at least we are making sure that whatever the o/p of the current system will be same as the o/p of new system.
一本用大量示例和细节回答您问题的好书是:《有效处理遗留代码》,作者:Michael Feathers。
A good book that answers your question with a lot of examples and details is: Working Effectively with Legacy Code, by Michael Feathers.
首先,考虑使用不同的 IDE - Eclipse 在性能方面是出了名的糟糕。科莫多速度更快。 PhpStorm 也是如此。
为了使重构更容易,我首先尝试确定高层图片 - 有哪些功能?有课吗?您可以将这些类放入单独的文件中吗?
First of all, consider using a different IDE - Eclipse is notoriously terrible in terms of performance. Komodo is way faster. So is PhpStorm.
In terms of making the refactoring easier, I'd first try to identify the high-level picture - what functions are there? Are there classes? Can you put those classes into separate files just to start with?
http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/ dp/0132350882
重构取决于您的目标和解决方案类型。本书将帮助您理解正确代码的基本概念。
http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
Refactoring depends from you goals and type of solution. This book will help you to understand basic concepts of right code.
如果您遇到手动将函数放置在不同文件中的麻烦,那么下面的策略可能会有所帮助。
将您的库文件放入 php 变量中,
消除标签,
将所有函数分开,
现在所有函数的主体及其名称都在数组中
为文件夹“functions”中的每个函数创建单独的文件
现在,库的所有函数都位于公共文件夹中的单独文件中。文件以函数名称命名。现在,您可以轻松地在文件夹视图中查找功能并应用您的策略来管理它们。
您还可以实现 __call() 函数来使用相同的格式
希望它有帮助:)
If you problem include the headache of manually placing the functions in different files than may be below strategy can help.
get your library file ina php variable
eliminate tags
separate all the functions
now body of all the functions and their names are in array
create separate files for each of the functions in folder 'functions'
now all the functions of your library are in the separate files in a common folder. files are named with the function names. now you can easily look up you functions in folder view and apply your strategies to manage them.
You can also implemet __call() function to use same formates
Hope it helps :)
通常,一般的经验法则是删除重复的代码。还要确保有有用的文档。如果您使用 Java,Javadoc 非常有用,但对于其他语言也有合适的等效项。
Usually, a general rule of thumb is to remove repeated code. Also make sure to have useful documentation. If you're using Java, Javadoc is very useful, but a suitable equivalent is available for other languages.