寻求有关构建代码以降低圈复杂度的澄清
最近,我们公司开始每周测量代码中函数的圈复杂度(CC),并报告哪些函数得到了改善或恶化。 所以我们开始更加关注函数的 CC。
我读过,CC 可以非正式地计算为 1 + 函数中决策点的数量(例如 if 语句、for 循环、select 等),或者也可以是通过函数的路径数...
我知道最简单的减少CC的方法是反复使用Extract Method重构...
有些事情我不确定,例如以下代码片段的CC是什么?
1)
for (int i = 0; i < 3; i++)
Console.WriteLine("Hello");
它们
Console.WriteLine("Hello");
Console.WriteLine("Hello");
Console.WriteLine("Hello");
都做同样的事情,但是第一个版本是否因为 for 语句而具有更高的 CC?
2)
if (condition1)
if (condition2)
if (condition 3)
Console.WriteLine("wibble");
假设
if (condition1 && condition2 && condition3)
Console.WriteLine("wibble");
该语言进行短路求值,例如 C#,那么这两个代码片段具有相同的效果...但是第一个片段的 CC 是否更高,因为它有 3 个决策点/if 语句?
3)
if (condition1)
{
Console.WriteLine("one");
if (condition2)
Console.WriteLine("one and two");
}
这
if (condition3)
Console.WriteLine("fizz");
if (condition4)
Console.WriteLine("buzz");
两个代码片段做不同的事情,但是它们有相同的CC吗? 或者第一个片段中的嵌套 if 语句是否具有更高的 CC? 即嵌套的 if 语句在心理上更难以理解,但这是否反映在 CC 中?
Recently our company has started measuring the cyclomatic complexity (CC) of the functions in our code on a weekly basis, and reporting which functions have improved or worsened. So we have started paying a lot more attention to the CC of functions.
I've read that CC could be informally calculated as 1 + the number of decision points in a function (e.g. if statement, for loop, select etc), or also the number of paths through a function...
I understand that the easiest way of reducing CC is to use the Extract Method refactoring repeatedly...
There are somethings I am unsure about, e.g. what is the CC of the following code fragments?
1)
for (int i = 0; i < 3; i++)
Console.WriteLine("Hello");
And
Console.WriteLine("Hello");
Console.WriteLine("Hello");
Console.WriteLine("Hello");
They both do the same thing, but does the first version have a higher CC because of the for statement?
2)
if (condition1)
if (condition2)
if (condition 3)
Console.WriteLine("wibble");
And
if (condition1 && condition2 && condition3)
Console.WriteLine("wibble");
Assuming the language does short-circuit evaluation, such as C#, then these two code fragments have the same effect... but is the CC of the first fragment higher because it has 3 decision points/if statements?
3)
if (condition1)
{
Console.WriteLine("one");
if (condition2)
Console.WriteLine("one and two");
}
And
if (condition3)
Console.WriteLine("fizz");
if (condition4)
Console.WriteLine("buzz");
These two code fragments do different things, but do they have the same CC? Or does the nested if statement in the first fragment have a higher CC? i.e. nested if statements are mentally more complex to understand, but is that reflected in the CC?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
如果你的公司正在以特定的方式测量CC,那么你需要熟悉该方法(希望他们正在使用工具来执行此操作)。 对于不同的情况(case 语句、布尔运算符等)有不同的计算 CC 的方法,但无论使用哪种约定,您都应该从度量中获得相同类型的信息。
更大的问题是其他人提到的,你们的公司似乎更关注 CC,而不是其背后的代码。 一般来说,当然,低于 5 很好,低于 10 很好,低于 20 还可以,21 到 50 应该是一个警告信号,高于 50 应该是一个很大的警告信号,但这些只是指导,而不是绝对规则。 您可能应该检查 CC 高于 50 的过程中的代码,以确保它不仅仅是一大堆代码,但也许有一个特定的原因导致该过程以这种方式编写,并且它是不可行的(对于任何多个原因)来重构它。
如果您使用工具重构代码以减少 CC,请确保您了解这些工具的作用,并且它们不仅仅是将一个问题转移到另一个地方。 最终,您希望您的代码几乎没有缺陷、能够正常工作并且相对易于维护。 如果该代码的 CC 也较低,那就太好了。 如果您的代码符合这些标准并且 CC 高于 10,那么也许是时候与您能找到的任何管理层坐下来捍卫您的代码(或许还可以让他们检查他们的政策)。
... if your company is measuring CC in a specific way, then you need to become familiar with that method (hopefully they are using tools to do this). There are different ways to calculate CC for different situations (case statements, Boolean operators, etc.), but you should get the same kind of information from the metric no matter what convention you use.
The bigger problem is what others have mentioned, that your company seems to be focusing more on CC than on the code behind it. In general, sure, below 5 is great, below 10 is good, below 20 is okay, 21 to 50 should be a warning sign, and above 50 should be a big warning sign, but those are guides, not absolute rules. You should probably examine the code in a procedure that has a CC above 50 to ensure it isn't just a huge heap of code, but maybe there is a specific reason why the procedure is written that way, and it's not feasible (for any number of reasons) to refactor it.
If you use tools to refactor your code to reduce CC, make sure you understand what the tools are doing, and that they're not simply shifting one problem to another place. Ultimately, you want your code to have few defects, to work properly, and to be relatively easy to maintain. If that code also has a low CC, good for it. If your code meets these criteria and has a CC above 10, maybe it's time to sit down with whatever management you can and defend your code (and perhaps get them to examine their policy).
浏览维基百科条目和 Thomas J. McCabe 的原始论文后,似乎这些项目您上面提到的是该指标的已知问题。
然而,大多数指标确实都有优点和缺点。 我想在一个足够大的程序中,CC 值可能指向代码中可能复杂的部分。 但更高的 CC 并不一定意味着复杂。
After browsing thru the wikipedia entry and on Thomas J. McCabe's original paper, it seems that the items you mentioned above are known problems with the metric.
However, most metrics do have pros and cons. I suppose in a large enough program the CC value could point to possibly complex parts of your code. But that higher CC does not necessarily mean complex.
与所有软件指标一样,CC 并不完美。 在足够大的代码库上使用时,它可以让您了解哪里可能是有问题的区域。
这里需要记住两件事:
我喜欢每周分析的想法。 在质量控制中,趋势分析是在问题产生期间识别问题的非常有效的工具。 这比必须等到它们变得太大而变得明显要好得多(请参阅SPC了解一些细节)。
Like all software metrics, CC is not perfect. Used on a big enough code base, it can give you an idea of where might be a problematic zone.
There are two things to keep in mind here:
I like the idea of a weekly analysis. In quality control, trend analysis is a very effective tool for indentifying problems during their creation. This is so much better than having to wait until they get so big that they become obvious (see SPC for some details).
CC 并不是衡量质量的灵丹妙药。 显然,重复语句并不比循环“更好”,即使循环具有更大的 CC。 循环具有较大 CC 的原因是有时它可能会被执行,有时可能不会,这会导致两种不同的“情况”,都应该进行测试。 在你的例子中,循环将总是执行三次,因为你使用了一个常量,但 CC 不够聪明,无法检测到这一点。
与示例 2 中的链式 if 相同 - 此结构允许您拥有一个只有条件 1 和条件 2 为 true 时才会执行的语句。 这是一种特殊情况,在使用 && 的情况下是不可能的。 因此,即使您不在代码中使用 if 链,它在特殊情况下也具有更大的潜力。
CC is not a panacea for measuring quality. Clearly a repeated statement is not "better" than a loop, even if a loop has a bigger CC. The reason the loop has a bigger CC is that sometimes it might get executed and sometimes it might not, which leads to two different "cases" which should both be tested. In your case the loop will always be executed three times because you use a constant, but CC is not clever enough to detect this.
Same with the chained ifs in example 2 - this structure allows you to have a statment which would be executed if only condition1 and condition2 is true. This is a special case which is not possible in the case using &&. So the if-chain has a bigger potential for special cases even if you dont utilize this in your code.
这就是盲目应用任何指标的危险。 CC 指标当然有很多优点,但与任何其他改进代码的技术一样,它不能脱离上下文进行评估。 向您的管理层指出 Casper Jone 对代码行测量的讨论(希望我能为您找到一个链接)。 他指出,如果代码行数是衡量生产力的一个很好的衡量标准,那么汇编语言开发人员就是地球上生产力最高的开发人员。 当然,他们并不比其他开发人员更有效率; 他们只是需要更多的代码来完成高级语言用更少的源代码所做的事情。 正如我所说,我提到这一点,是为了让你的经理们知道,在没有对指标所告诉你的内容进行智能审查的情况下盲目应用指标是多么愚蠢。
我建议,如果不是,您的管理层应该明智地使用 CC 措施来发现代码中应进一步审查的潜在热点。 盲目地追求降低 CC 的目标而不参考代码可维护性或其他良好编码措施是愚蠢的。
This is the danger of applying any metric blindly. The CC metric certainly has a lot of merit but as with any other technique for improving code it can't be evaluated divorced from context. Point your management at Casper Jone's discussion of the Lines of Code measurement (wish I could find a link for you). He points out that if Lines of Code is a good measure of productivity then assembler language developers are the most productive developers on earth. Of course they're no more productive than other developers; it just takes them a lot more code to accomplish what higher level languages do with less source code. I mention this, as I say, so you can show your managers how dumb it is to blindly apply metrics without intelligent review of what the metric is telling you.
I would suggest that if they're not, that your management would be wise to use the CC measure as a way of spotting potential hot spots in the code that should be reviewed further. Blindly aiming for the goal of lower CC without any reference to code maintainability or other measures of good coding is just foolish.
圈复杂度类似于温度。 它们都是测量值,在大多数情况下如果没有上下文就毫无意义。 如果我说外面的温度是 72 度,那意义不大; 但如果我加上我在北极的事实,那么数字 72 就变得很重要了。 如果有人告诉我某个方法的圈复杂度为 10,那么在没有上下文的情况下我无法确定该方法是好是坏。
当我对现有应用程序进行代码审查时,我发现圈复杂度是一个有用的“起点”指标。 我检查的第一件事是带有 CC > 的方法 10. 这些“>10”的方法并不一定是坏的。 它们只是为我提供了审查代码的起点。
考虑 CC 编号时的一般规则:
可维护性
Cyclomatic complexity is analogous to temperature. They are both measurements, and in most cases meaningless without context. If I said the temperature outside was 72 degrees that doesn’t mean much; but if I added the fact that I was at North Pole, the number 72 becomes significant. If someone told me a method has a cyclomatic complexity of 10, I can’t determine if that is good or bad without its context.
When I code review an existing application, I find cyclomatic complexity a useful “starting point” metric. The first thing I check for are methods with a CC > 10. These “>10” methods are not necessarily bad. They just provide me a starting point for reviewing the code.
General rules when considering a CC number:
maintainability
[题外话]如果您更看重指标的可读性而不是良好的分数(J.Spolsky 是否说过,“测量了什么,就完成了”? - 我认为这意味着指标被滥用的情况比没有),通常最好使用名称良好的布尔值来替换复杂的条件语句。
然后
变成
[Off topic] If you favor readability over good score in the metrics (Was it J.Spolsky that said, "what's measured, get's done" ? - meaning that metrics are abused more often than not I suppose), it is often better to use a well-named boolean to replace your complex conditional statement.
then
become
我不是这个主题的专家,但我想我会给出我的两分钱。 也许这就是这一切的价值。
环复杂度似乎只是一种特定的自动快捷方式,用于查找潜在(但不是绝对)有问题的代码片段。 但真正要解决的问题不就是测试吗? 代码需要多少个测试用例? 如果CC较高,但测试用例数量相同并且代码更干净,则不必担心CC。
1.) 那里没有决策点。 那里的程序只有一条路径,两个版本中的任何一个都只有一个可能的结果。 第一个更简洁、更好,圈复杂度该死。
2.)两种情况各有 1 个测试用例
。) 在这两种情况下,您要么写“wibble”,要么不写。
两个测试用例
3.) 第一个可能不会产生任何结果,“一”,或“一”和“一和二”。 3条路径。 第二个可能不会导致任何结果,无论是两者中的任何一个,还是两者。 4条路径。
第一个测试用例 3 个
第二个4个测试用例
I'm no expert at this subject, but I thought I would give my two cents. And maybe that's all this is worth.
Cyclomatic Complexity seems to be just a particular automated shortcut to finding potentially (but not definitely) problematic code snippets. But isn't the real problem to be solved one of testing? How many test cases does the code require? If CC is higher, but number of test cases is the same and code is cleaner, don't worry about CC.
1.) There is no decision point there. There is one and only one path through the program there, only one possible result with either of the two versions. The first is more concise and better, Cyclomatic Complexity be damned.
1 test case for both
2.) In both cases, you either write "wibble" or you don't.
2 test cases for both
3.) First one could result in nothing, "one", or "one" and "one and two". 3 paths. 2nd one could result in nothing, either of the two, or both of them. 4 paths.
3 test cases for the first
4 test cases for the second