LOC 计数是否应该包括测试和评论?
虽然 LOC(代码行数)是衡量代码复杂性的一种有问题的方法,但它是最流行的一种,并且如果非常仔细地使用,可以提供至少对代码库的相对复杂性的粗略估计(即,如果一个程序是 10KLOC另一个是 100KLOC,由能力大致相同的团队用相同的语言编写,第二个程序几乎肯定要复杂得多)。
在计算代码行数时,您是否更喜欢计算 中的注释? 测试怎么样?
我见过各种不同的方法。 cloc 和 sloccount 等工具允许包含或排除注释。 其他人认为注释是代码及其复杂性的一部分。
单元测试也存在同样的困境,有时可能会达到被测试代码本身的大小,甚至超过它。
我见过各种各样的方法,从仅计算“可操作”非注释非空白行,到“已测试、注释的代码的 XXX 行”,这更像是在所有代码文件上运行“wc -l”项目”。
您的个人偏好是什么,为什么?
While LOC (# lines of code) is a problematic measurement of a code's complexity, it is the most popular one, and when used very carefully, can provide a rough estimate of at least relative complexities of code bases (i.e. if one program is 10KLOC and another is 100KLOC, written in the same language, by teams of roughly the same competence, the second program is almost certainly much more complex).
When counting lines of code, do you prefer to count comments in ? What about tests?
I've seen various approaches to this. Tools like cloc and sloccount allow to either include or exclude comments. Other people consider comments part of the code and its complexity.
The same dilemma exists for unit tests, that can sometimes reach the size of the tested code itself, and even exceed it.
I've seen approaches all over the spectrum, from counting only "operational" non-comment non-blank lines, to "XXX lines of tested, commented code", which is more like running "wc -l on all code files in the project".
What is your personal preference, and why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
一位智者曾经告诉我,在管理程序员时,“你衡量什么,你就得到什么”。
如果您在 LOC 输出中对它们进行令人惊讶的评分,您往往会得到很多代码行。
如果你根据他们解决的错误数量来评价他们,你会惊奇地发现很多错误都被修复了。
如果您根据添加的功能对它们进行评分,您会获得很多功能。
如果你根据圈复杂度对它们进行评分,你会得到极其简单的函数。
由于当今代码库的主要问题之一是它们增长的速度有多快以及一旦增长就很难改变,所以我倾向于完全避免使用 LOC 作为衡量标准,因为它会导致错误的基本行为。
也就是说,如果您必须使用它,请不要添加注释和测试,并且需要一致的编码风格。
但如果您确实想要测量“代码大小”,只需 tar.gz 代码库即可。 与计算行数相比,它往往可以更好地粗略估计“内容”,而行数容易受到不同编程风格的影响。
A wise man once told me 'you get what you measure' when it comes to managing programmers.
If you rate them in their LOC output amazingly you tend to get a lot of lines of code.
If you rate them on the number of bugs they close out, amazingly you get a lot of bugs fixed.
If you rate them on features added, you get a lot of features.
If you rate them on cyclomatic complexity you get ridiculously simple functions.
Since one of the major problems with code bases these days is how quickly they grow and how hard they are to change once they've grown, I tend to shy away from using LOC as a metric at all, because it drives the wrong fundamental behavior.
That said, if you have to use it, count sans comments and tests and require a consistent coding style.
But if you really want a measure of 'code size' just tar.gz the code base. It tends to serve as a better rough estimate of 'content' than counting lines which is susceptible to different programming styles.
测试和评论也必须保留。 如果你打算使用 LOC 作为衡量标准(我只是假设我无法说服你放弃它),你应该给出所有三行(真实代码行、注释、测试)。
最重要的(希望也是显而易见的)事情是你要保持一致。 不要报告一个项目仅包含实际代码行,而另一个项目则包含所有三行代码。 查找或创建一个工具,可以为您自动执行此过程并生成报告。
这样您就可以确定它会
Tests and comments have to be maintained too. If you're going to use LOC as a metric (and I'm just going to assume that I can't talk you out of it), you should give all three (lines of real code, comments, tests).
The most important (and hopefully obvious) thing is that you be consistent. Don't report one project with just the lines of real code and another with all three combined. Find or create a tool that will automate this process for you and generate a report.
This way you can be sure it will
我个人认为 LOC 指标本身不如其他一些代码指标那么有用。
NDepend 将为您提供 LOC 指标,但也会为您提供许多其他指标,例如循环复杂度。 这里没有列出所有这些,而是列表的链接。
还有一个免费的 CodeMetric 插件,适用于 反射器
I personally don't feel that the LOC metric on its own is as useful as some of the other code metrics.
NDepend will give you the LOC metric but will also give you many others, such cyclometric complexity. Rather than list them all, here's the link to the list.
There is also a free CodeMetric add-in for Reflector
我不会直接回答你的问题,原因很简单:我讨厌代码行度量。 无论您想要衡量什么,都很难做得比 LOC 更差; 几乎任何您想到的其他指标都会更好。
特别是,您似乎想要测量代码的复杂性。 总体而言,循环复杂度(也称为 McCabe 复杂度)是更好的衡量标准。
具有高循环复杂度的例程是您想要集中注意力的例程。 这些例程难以测试、充满错误且难以维护。
有许多工具可以测量这种复杂性。 在 Google 上快速搜索您最喜欢的语言,您会发现数十种可以完成此类复杂操作的工具。
I'm not going to directly answer your question for a simple reason: I hate the lines of code metric. No matter what you're trying to measure it's very hard to do worse than LOC; Pretty much any other metric you care to think of is going to be better.
In particular, you seem to want measure the complexity of your code. Overall cyclometric complexity (also called McCabe's complexity) is much better metric for this.
Routines with a high cyclometric complexity are the routines you want to focus your attention on. It's these routines that are difficult to test, rotten to the core with bugs and hard to maintain.
There are many tools that measure this sort of complexity. A quick Google search on your favourite language will find dozens of tools that do this sort of complexity.
代码行的确切含义是:不计算任何注释或空行。 为了使其与其他源代码具有可比性(无论其中的指标是否有帮助),您至少需要类似的编码风格:
第二个版本的功能完全相同,但少了一个 LOC。 当你有很多嵌套循环时,这可以总结很多。 这就是发明像功能点这样的指标的原因。
Lines of Code means exactly that: No comments or empty lines are counted. And in order for it to be comparable to other source code (no matter if the metric in itsle fis helpful or not), you need at least similar coding styles:
The second version does exactly the same, but has one LOC less. When you have a lot of nested loops, this can sum up quite a bit. Which is why metrics like function points were invented.
取决于您使用 LOC 的用途。
作为复杂性衡量标准 - 没有那么多。 也许 100KLOC 主要是从一个简单的表生成的代码,而 10KLOC 就是 5KLOC 正则表达式。
然而,我看到每一行代码都与运行成本相关。 只要程序存在,您就需要为每一行付费:维护时需要读取它,它可能包含需要修复的错误,它会增加编译时间、从源代码控制获取和备份时间,然后再进行更改或者删除它,您可能需要查明是否有人依赖它等。平均成本可能是每条线和每天几纳便士,但它是加起来的东西。
KLOC 可以作为项目需要多少基础设施的第一手指标。 在这种情况下,我将包括注释和测试 - 尽管注释行的运行成本远低于第二个项目中的正则表达式之一。
[编辑] [对代码大小有类似看法的人]1
Depends on what you are using the LOC for.
As a complexity measure - not so much. Maybe the 100KLOC are mostly code generated from a simple table, and the 10KLOC kas 5KLOC regexps.
However, I see every line of code associated with a running cost. You pay for every line as long as the program lives: it needs to be read when maintained, it might contain an error that needs to be fixed, it increases compile time, get-from-source-control and backup times, before you change or remove it you may need to find out if anyone relies on it etc. The average cost may be nanopennies per line and day, but it's stuff that adds up.
KLOC can be a first shot indicator of how much infrastructure a project needs. In that case, I would include comments and tests - even though the running cost of a comment line is much lower than one of the regexp's in the second project.
[edit] [someone with a similar opinion about code size]1
我们只使用代码行度量来做一件事 - 函数应该包含足够少的代码行,以便在不滚动屏幕的情况下阅读。 大于此值的函数通常难以阅读,即使它们的循环复杂度非常低。 对于他的使用,我们确实计算了空格和注释。
很高兴看到您在重构过程中删除了多少行代码 - 在这里您只想计算实际的代码行、无助于可读性的空格和无助于阅读的注释。没有用(不能自动化)。
最后是免责声明 - 明智地使用指标。 指标的一个很好的用途是帮助回答“代码的哪一部分将从重构中受益最多”或“最新签入的代码审查有多紧急?”的问题。 - 圈复杂度为 50 的 1000 行函数是一个闪烁的霓虹灯,上面写着“现在重构我”。 衡量标准的错误使用是“程序员 X 的生产力如何”或“我的软件有多复杂”。
We only use a lines of code metric for one thing - a function should contain few enough lines of code to be read without scrolling the screen. Functions bigger than that are usually hard to read, even if they have a very low cyclometric complexity. For his use we do count whitespace and comments.
It can also be nice to see how many lines of code you've removed during a refactor - here you only want to count actual lines of code, whitespace that doesn't aid readability and comments that aren't useful (which can't be automated).
Finally a disclaimer - use metrics intelligently. A good use of metrics is to help answer the question 'which part of the code would benefit most from refactoring' or 'how urgent is a code review for the latest checkin?' - a 1000 line function with a cyclomatic complexity of 50 is a flashing neon sign saying 'refactor me now'. A bad use of metrics is 'how productive is programmer X' or 'How complicated is my software'.
文章摘录:如何计算代码行数 (LOC)? 相对于计算逻辑 .NET 程序的代码行数。
如何计算代码行数 (LOC)?
你算方法签名声明吗? 你计算只包含括号的行数吗? 当单个方法调用由于参数较多而写在几行时,您是否会计算几行? 您计算“命名空间”和“使用命名空间”声明吗? 你算接口和抽象方法声明吗? 声明字段时是否计算字段赋值? 你算空行吗?
根据每个开发人员的编码风格以及选择的语言(C#、VB.NET…),通过测量 LOC 可能会出现显着差异。
显然,通过解析源文件来测量 LOC 看起来是一个复杂的主题。 多亏了精明的人,有一种简单的方法可以准确测量所谓的逻辑 LOC。 与物理 LOC(通过解析源文件推断出的 LOC)相比,逻辑 LOC 有 2 个显着优势:
在 .NET 世界中,可以根据 PDB 文件计算逻辑 LOC,调试器使用这些文件将 IL 代码与源代码链接起来。 NDepend 工具以这种方式计算方法的逻辑 LOC:它等于在 PDB 文件中为方法找到的序列点的数量。 序列点用于标记 IL 代码中与原始源中的特定位置相对应的点。 有关序列点的更多信息请参见此处。 请注意,不考虑与 C# 大括号“{”和“}”相对应的序列点。
显然,类型的 LOC 是其方法 LOC 的总和,命名空间的 LOC 是其类型 LOC 的总和,程序集的 LOC 是其命名空间 LOC 的总和,应用程序的 LOC 是其程序集 LOC 的总和。 以下是一些观察结果:
Excerpt from the article: How do you count your number of Lines Of Code (LOC) ? relative to the tool NDepend that counts the logical numbers of lines of code for .NET programs.
How do you count your number of Lines Of Code (LOC) ?
Do you count method signature declaration? Do you count lines with only bracket? Do you count several lines when a single method call is written on several lines because of a high number of parameters? Do you count ‘namespaces’ and ‘using namespace’ declaration? Do you count interface and abstract methods declaration? Do you count fields assignment when they are declared? Do you count blank line?
Depending on the coding style of each of developer and depending on the language choose (C#, VB.NET…) there can be significant difference by measuring the LOC.
Apparently measuring the LOC from parsing source files looks like a complex subject. Thanks to an astute there exists a simple way to measure exactly what is called the logical LOC. The logical LOC has 2 significant advantages over the physical LOC (the LOC that is inferred from parsing source files):
In the .NET world, the logical LOC can be computed from the PDB files, the files that are used by the debugger to link the IL code with the source code. The tool NDepend computes the logical LOC for a method this way: it is equals to the number of sequence point found for a method in the PDB file. A sequence point is used to mark a spot in the IL code that corresponds to a specific location in the original source. More info about sequence points here. Notice that sequence points which correspond to C# braces‘{‘ and ‘}’ are not taken account.
Obviously, the LOC for a type is the sum of its methods’ LOC, the LOC for a namespace is the sum of its types’ LOC, the LOC for an assembly is the sum of its namespaces’ LOC and the LOC for an application is the sum of its assemblies LOC. Here are some observations: