通过案例研究支持代码指标
我主要对代码度量的案例研究感兴趣,将代码可读性与缺陷减少联系起来,这证明了认真对待圈复杂度或一些类似度量的合理性。维基百科有这样的例子:
多项研究调查 圈复杂度的相关性 包含在一个缺陷的数量 模块。大多数此类研究发现 之间强正相关 圈复杂度和缺陷: 具有最高的模块 复杂性往往还包含 大多数缺陷。例如,2008 年 通过指标监控软件进行研究 供应商 Enerjy 分析了以下类别 开源 Java 应用程序和 根据以下条件将它们分为两组 发现故障的频率有多高 他们。他们发现很强的相关性 圈复杂度和之间 他们的缺点,与类 组合复杂度为 11 容易发生故障的概率 只有 0.28,班级上升到 0.98 复杂度为 74。
这很好,但我希望知道是否有更多的研究(或者可能对其他指标进行类似的研究,例如 SLOC)。
我还发现了一篇 文章IBM 提倡监控 CC 值,但缺乏显示 ROI 数字的案例研究支持。然后是关于“箭头代码”的编码恐怖文章 其中提供了案例研究的摘要,但不提供案例研究本身,也不提供证明结论的实际数字:
研究表明, 程序的圈复杂度和 它的错误频率。低圈 复杂性有助于程序的 可理解性并表明它是 易于以较低风险进行修改 比更复杂的程序。一个 模块的圈复杂度也是 其可测试性的有力指标。
当然,圈复杂度 (CC) 将有助于发现箭头代码,但我仍然需要显示 ROI 值的案例研究。例如,“组织 X 在方法/功能上纳入了最大 CC 为 10,并在接下来的开发迭代中将缺陷减少了 20%。”
如果没有此类数据,就很难让管理层关注。谁能指点我一些艰苦的学习?即使只有一个也会有帮助...
I'm principally interested in case studies on code metrics, relating code readability to defect reduction, that justify taking seriously cyclomatic complexity or some similar metric. Wikipedia has this example:
A number of studies have investigated
cyclomatic complexity's correlation to
the number of defects contained in a
module. Most such studies find a
strong positive correlation between
cyclomatic complexity and defects:
modules that have the highest
complexity tend to also contain the
most defects. For example, a 2008
study by metric-monitoring software
supplier Enerjy analyzed classes of
open-source Java applications and
divided them into two sets based on
how commonly faults were found in
them. They found strong correlation
between cyclomatic complexity and
their faultiness, with classes with a
combined complexity of 11 having a
probability of being fault-prone of
just 0.28, rising to 0.98 for classes
with a complexity of 74.
This is good, but I'm hoping to know if there are more studies (or perhaps similar studies for other metrics, such as SLOC).
I also found an article at IBM that promotes monitoring CC values, but it lacks case-study support showing ROI figures. Then there is Coding Horror article on "arrow code" which sites a summary of a case study, but does not offer the case study(ies) themselves nor the actual numbers which justified the conclusion:
Studies show a correlation between a
program's cyclomatic complexity and
its error frequency. A low cyclomatic
complexity contributes to a program's
understandability and indicates it is
amenable to modification at lower risk
than a more complex program. A
module's cyclomatic complexity is also
a strong indicator of its testability.
Certainly cyclomatic complexity (CC) will help spot arrow-code, but I still need case studies that show ROI values. For example, "organization X incorporated a max CC of 10 on methods/functions, and reduced defects 20% in the following development iteration."
Without that kind of data, it is difficult to get management to care. Can anyone point me to a few hard studies? Even just one would help...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
为什么投资回报率这么难?
原因如下。
个体程序员的生产力至少相差一个数量级,有时甚至两个数量级。
http://forums.construx.com/blogs/stevemcc/archive/2008/03/27/productivity-variations-among-software-developers-and-teams-the- origin-of-quot-10x-quot.aspx
个体差异胜过您可能正在寻找的任何其他影响。你不能进行“头对头”、“同类”比较。当您比较使用不同技术(即不同复杂性阈值)的两个相似团队时,您会发现个人绩效差异只是主导数据,几乎所有内容都是噪音。
如果管理层不关心质量,就会出现大问题。投资回报率数字不会影响管理层改变环境。
您必须在自己的组织中对自己的代码进行自己的实验。
收集圈复杂度、缺陷率、问题单、崩溃等任何你能收集的信息。尝试将复杂性与其他不良指标关联起来。善于争辩的经理总能通过指出团队成员之间的个体差异而获胜。
在真实组织中使用真实数据。这是你能做的最好的事情了。这不是“某些研究”或“某些白皮书”,而是您的实际组织。
Why is ROI so hard?
Here's why.
Individual programmer productivity varies by at least one and sometimes two orders of magnitude.
http://forums.construx.com/blogs/stevemcc/archive/2008/03/27/productivity-variations-among-software-developers-and-teams-the-origin-of-quot-10x-quot.aspx
Individual variability trumps any other effect you might be looking for. You can't do a "head-to-head", "apples-to-apples" comparison. When you compare two similar teams using different techniques (i.e., different complexity thresholds) you find that individual performance differences simply dominate the data and almost everything is noise.
If management doesn't care about quality, you have big problems. ROI numbers aren't going to influence management to change the environment.
You have to run your own experiments on your own code in your own organization.
Gather Cyclomatic complexity, defect rates, problem tickets, crashes, anything you can. Try to correlate complexity with other bad metrics. An argumentative manager can always win by pointing out the individual differences among members of teams.
Use real data in your real organization. That's the best you can do. And it's not "some study" or "some whitepaper" It's your actual organization.
以下是一些面向对象的指标:
Here are some for object-oriented metrics:
在这种情况下,您从原始文章中获得的信息比从维基百科中获得的信息要多一些。他的技术论文介绍了数据收集等如何显示 95%结论的置信度。
你是对的,这并没有直接提供投资回报率信息。至少对于这项研究来说,这将是相当困难的——例如,他们使用开源项目作为训练数据,而开源项目的实际成本通常很难估计,更不用说衡量了。与此同时,他们确实使用了我认为至少是真实投资回报率数据的合理代理:他们在源代码控制系统中搜索每个“培训”项目,寻找似乎与修复相关的签入然后,他们使用朴素贝叶斯算法来查找他们使用的指标与代码中已识别的问题之间的相关性。虽然毫无疑问至少有一些改进,但在我看来,这些结果至少应该意味着一些东西。
还值得注意的是,进行这项研究的同一个人在大量的开源项目。如果您想要更多可靠的数据,您可以根据其中一些项目的源代码控制日志检查它们的索引,您可能可以使用它们的数据以直接 ROI 类型结果的方式得出更多信息。但需要注意的是:他们的索引基于相当多的源代码指标,而不仅仅是圈复杂度,因此我不确定它到底能在多大程度上专门说明 CC(而不是其他指标)他们看着。
In this case, you get a bit more from the original article than Wikipedia's. His technical paper about how the data gathering and such shows a 95% confidence level in the conclusions.
You're right that this doesn't give ROI information directly. At least for this study, that would be fairly difficult -- for example, they used open-source projects for their training data, and actual costs for open-source projects are usually difficult to even estimate, much less measure. At the same time, they did use what I'd consider at least a reasonable proxy for true ROI data: they searched through the source control system for each of their "training" projects looking for check-ins that appeared to be related to fixing bugs, defects, etc. They then used a naive Bayes algorithm to find correlation between the metrics they used, and the problems that had been identified in the code. While undoubtedly open to at least some improvement, it appears to me that these results should mean at least something.
It's also worth noting that the same people who did the study keep a running index on a large number of open-source projects. If you wanted more in the way of solid data, you can check their index back against the source control logs for some of those projects, you could probably use their data to come up with more in the way of direct ROI type results. One note, however: their index is based on quite a few source code metrics, not just cyclomatic complexity, so I'm not sure exactly how much it would tell exclusively about CC as opposed to the other metrics they look at.