当前位置：文江博客话题详情

unit-testing code-coverage code-metrics

单元测试的合理代码覆盖率是多少（以及为什么）？

发布于 2024-07-04 11:31:08 字数 1448 浏览 11 评论 0原文

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（30）

说不完的你爱 2024-07-11 11:31:09

查看 Crap4j。这是一种比直接代码覆盖稍微复杂的方法。它将代码覆盖率测量与复杂性测量相结合，然后向您显示当前未测试的复杂代码。

回复收藏 0 原文

神经大条 2024-07-11 11:31:09

一般来说，从我读过的几篇工程卓越最佳实践论文来看，单元测试中新代码的 80% 是产生最佳回报的点。如果高于该 CC%，则相对于付出的努力而言，会产生更少的缺陷。这是许多大公司都采用的最佳实践。

不幸的是，这些结果大部分是公司内部的，所以我没有可以向您指出的公开文献。

回复收藏 0 原文

情域 2024-07-11 11:31:09

我认为不可能有这样的黑白规则。
应审查代码，特别注意关键细节。
但是，如果没有经过测试，它就有一个错误！

回复收藏 0 原文

千纸鹤带着心事 2024-07-11 11:31:09

这很大程度上取决于您的应用程序。例如，某些应用程序主要由无法进行单元测试的 GUI 代码组成。

回复收藏 0 原文

多像笑话 2024-07-11 11:31:09

在我看来，答案是“这取决于你有多少时间”。我努力达到100%，但如果我在有限的时间内没有达到目标，我也不会大惊小怪。

当我编写单元测试时，我戴的帽子与开发生产代码时戴的帽子不同。我会思考经过测试的代码声称要做什么，以及什么情况可能会破坏它。

我通常遵循以下标准或规则：

单元测试应该是关于我的代码的预期行为的文档形式，即。给定特定输入的预期输出以及客户端可能想要捕获的可能抛出的异常（我的代码的用户应该知道什么？）
单元测试应该帮助我发现我可能还不知道的假设条件已经想到了。（如何使我的代码稳定且健壮？）

如果这两条规则不能产生 100% 的覆盖率，那就这样吧。但是，一旦有时间，我就会分析未覆盖的块和行，并确定是否仍然存在没有单元测试的测试用例，或者是否需要重构代码以消除不必要的代码。

回复收藏 0 原文

冷︶言冷语的世界 2024-07-11 11:31:09

从另一个角度看覆盖率：编写良好、控制流程清晰的代码是最容易覆盖、最容易阅读的，而且通常是错误最少的代码。恕我直言，通过在编写代码时考虑到清晰性和可覆盖性，并通过与代码并行编写单元测试，您可以获得最佳结果。

回复收藏 0 原文

策马西风 2024-07-11 11:31:09

代码覆盖率很好，但前提是您从中获得的好处超过了实现它的成本/工作量。

一段时间以来，我们一直致力于 80% 的标准，但我们刚刚决定放弃这一标准，转而更加专注于我们的测试。专注于复杂的业务逻辑等，

做出这个决定是因为我们花在追求代码覆盖率和维护现有单元测试上的时间越来越多。我们觉得我们已经到了这样的地步：我们从代码覆盖率中获得的好处被认为小于我们为实现它而必须付出的努力。

回复收藏 0 原文

三人与歌 2024-07-11 11:31:09

从 Testivus 发布来看，我认为答案上下文应该是第二个程序员。

从实际角度来看，我们需要努力实现的参数/目标。

我认为这可以在敏捷过程中通过分析我们拥有的架构、功能（用户故事）的代码来“测试”，然后得出一个数字。根据我在电信领域的经验，我认为 60% 是一个值得检查的值。

回复收藏 0 原文

◇流星雨 2024-07-11 11:31:09

直到几天前，我们的目标是 >80%，但是在我们使用了大量生成的代码之后，我们不关心 %age，而是让审阅者对所需的覆盖率进行调用。

回复收藏 0 原文

扎心 2024-07-11 11:31:09

我认为正确代码覆盖率的最佳症状是单元测试帮助修复的具体问题的数量与您创建的单元测试代码的大小合理对应。

回复收藏 0 原文

吲‖鸣 2024-07-11 11:31:09

这必须取决于您所处的应用程序开发生命周期的哪个阶段。

如果您已经从事开发一段时间并且已经有很多已实现的代码并且现在刚刚意识到您需要考虑代码覆盖率，那么您必须检查当前的覆盖范围（如果存在），然后使用该基线来设置每个冲刺的里程碑（或一段冲刺期间的平均增长），这意味着在继续提供最终用户价值的同时承担代码债务（至少在根据我的经验，如果最终用户看不到新功能，则他们不会关心您是否增加了测试覆盖率）。

根据您的领域，达到 95% 并不是没有道理的，但我不得不说，平均而言，您将看到 85% 到 90% 的平均情况。

回复收藏 0 原文

北陌 2024-07-11 11:31:09

根据代码的重要性，75%-85% 之间的任何地方都是一个很好的经验法则。
运输代码绝对应该比内部公用设施等进行更彻底的测试。

回复收藏 0 原文

笑，眼淚并存 2024-07-11 11:31:09

85% 是签入标准的良好起点。

我可能会选择各种更高的运输标准 - 取决于正在测试的子系统/组件的重要性。

回复收藏 0 原文

过气美图社 2024-07-11 11:31:09

我更喜欢使用 BDD，它结合使用自动化验收测试、可能的其他集成测试和单元测试。对我来说，问题是整个自动化测试套件的目标覆盖率应该是多少。

除此之外，答案取决于您的方法、语言以及测试和覆盖工具。在 Ruby 或 Python 中进行 TDD 时，保持 100% 的覆盖率并不难，而且非常值得这样做。 管理 100% 的覆盖率比管理 90% 左右的覆盖率要容易得多。也就是说，在出现覆盖率差距时更容易对其进行填补（并且在做好 TDD 时，覆盖率差距很少见，通常值得您花时间）而不是管理一系列您还没有抽出时间处理的覆盖率差距，并且由于您始终存在未覆盖的代码背景而错过覆盖率回归。

答案还取决于您的项目的历史。我只发现上述内容对于从一开始就以这种方式管理的项目来说是实用的。我极大地提高了大型遗留项目的覆盖范围，并且这样做是值得的，但我从未发现返回并填补每个覆盖范围空白是可行的，因为旧的未经测试的代码还没有被充分理解，无法正确执行此操作，并且迅速地。

回复收藏 0 原文

爱本泡沫多脆弱 2024-07-11 11:31:09

代码覆盖率只是另一个指标。就其本身而言，它可能非常具有误导性（请参阅 www.thoughtworks .com/insights/blog/are-test-coverage-metrics-overerated）。因此，您的目标不应是实现 100% 的代码覆盖率，而应确保测试应用程序的所有相关场景。

回复收藏 0 原文

韶华倾负 2024-07-11 11:31:09

我认为最重要的是了解随着时间的推移覆盖率趋势是什么，并了解趋势变化的原因。你认为趋势的变化是好还是坏取决于你对原因的分析。

回复收藏 0 原文

早茶月光 2024-07-11 11:31:09

如果您已经进行单元测试相当长的时间，我认为没有理由不接近 95%+。然而，至少，我总是使用 80% 的工作，即使是刚开始测试时也是如此。

这个数字应该只包括项目中编写的代码（不包括框架、插件等），甚至可能排除完全由调用外部代码编写的代码组成的某些类。这种调用应该被模拟/存根。

回复收藏 0 原文

秋叶绚丽 2024-07-11 11:31:09

简短回答：60-80%

详细回答：
我认为这完全取决于您项目的性质。我通常通过对每个实际部分进行单元测试来开始一个项目。在项目的第一个“版本”中，根据您正在执行的编程类型，您应该拥有相当好的基础百分比。此时，您可以开始“强制执行”最低代码覆盖率。

回复收藏 0 原文

牵强ㄟ 2024-07-11 11:31:09

当我认为我的代码没有经过足够的单元测试，并且我不确定接下来要测试什么时，我会使用覆盖率来帮助我决定接下来要测试什么。

如果我增加单元测试的覆盖范围 - 我知道这个单元测试有价值。

这适用于未覆盖、50% 覆盖或 97% 覆盖的代码。

回复收藏 0 原文

沉鱼一梦 2024-07-11 11:31:09

我使用 cobertura，无论百分比如何，我都建议保持 cobertura-check 任务中的值是最新的。至少，不断将totallinerate 和totalbranchrate 提高到略低于当前覆盖范围，但绝不降低这些值。还将 Ant 构建失败属性与此任务联系起来。如果构建由于缺乏覆盖而失败，则您知道某人添加了代码但尚未对其进行测试。例子：

<cobertura-check linerate="0"
                 branchrate="0"
                 totallinerate="70"
                 totalbranchrate="90"
                 failureproperty="build.failed" />

I use cobertura, and whatever the percentage, I would recommend keeping the values in the cobertura-check task up-to-date. At the minimum, keep raising totallinerate and totalbranchrate to just below your current coverage, but never lower those values. Also tie in the Ant build failure property to this task. If the build fails because of lack of coverage, you know someone's added code but hasn't tested it. Example:

<cobertura-check linerate="0"
                 branchrate="0"
                 totallinerate="70"
                 totalbranchrate="90"
                 failureproperty="build.failed" />

回复收藏 0 原文

仅此而已 2024-07-11 11:31:09

我对这个难题的回答是，对可以测试的代码实现 100% 的行覆盖率，对无法测试的代码实现 0% 的行覆盖率。

我目前在 Python 中的做法是将我的 .py 模块分为两个文件夹：app1/ 和 app2/，并且在运行单元测试时计算这两个文件夹的覆盖范围并进行目视检查（有一天我必须自动执行此操作） app1 的覆盖率是 100%，app2 的覆盖率是 0%。

当/如果我发现这些数字与标准不同时，我会调查并更改代码的设计，以使覆盖范围符合标准。

这确实意味着我可以建议实现库代码的 100% 行覆盖率。

我偶尔也会查看 app2/ 以查看是否可以测试那里的任何代码，如果可以的话，我会将其移至 app1/

现在我不太担心总体覆盖范围，因为这可能会根据项目的大小而有很大差异，但一般我见过70%到90%以上。

使用 python，我应该能够设计一个冒烟测试，它可以在测量覆盖范围的同时自动运行我的应用程序，并希望在将冒烟测试与单元测试数据相结合时获得 100% 的聚合。

回复收藏 0 原文

莫多说 2024-07-11 11:31:08

Jon Limjap 提出了一个很好的观点——没有一个数字可以作为每个项目的标准。有些项目不需要这样的标准。在我看来，公认的答案的不足之处在于描述人们如何为给定的项目做出决定。

我会尝试这样做。我不是测试工程方面的专家，很高兴看到更明智的答案。

何时设置代码覆盖率要求

首先，为什么要首先强加这样的标准？一般来说，当您想在流程中引入经验信心时。 “经验信心”是什么意思？嗯，真正的目标正确性。对于大多数软件，我们不可能在所有输入中都知道这一点，因此我们只能说代码是经过充分测试的。这是更容易理解的，但仍然是一个主观标准：无论你是否达到它，都将始终存在争议。这些辩论是有用的并且应该进行，但它们也暴露了不确定性。

代码覆盖率是一种客观的衡量标准：一旦您看到覆盖率报告，就可以清楚地知道满足的标准是否有用。能证明其正确性吗？完全不是，但它与代码的测试程度有明显的关系，这反过来又是我们增强对其正确性信心的最佳方法。代码覆盖率是我们所关心的不可衡量的质量的可衡量的近似值。

在一些特定情况下，拥有经验标准可以增加价值：

为了满足利益相关者。对于许多项目，有各种对软件质量感兴趣的参与者，他们可能不参与日常工作软件开发人员（经理、技术主管等）说“我们将编写我们真正需要的所有测试”并不能令人信服：他们要么需要完全信任，要么需要通过持续的密切监督进行验证（假设他们甚至有这样做的技术理解。）提供可衡量的标准并解释它们如何合理地接近实际目标会更好。
规范团队行为。撇开利益相关者不谈，如果您所在的团队有多人编写代码和测试，那么“经过充分测试”的资格就存在模糊性。对于什么级别的测试才足够好，您所有的同事是否都有相同的想法？可能不会。您如何协调这一点？找到一个大家都同意的指标，并接受它作为合理的近似值。这在大型团队中尤其（但并非排他）有用，例如，领导者可能无法直接监督初级开发人员。信任网络也很重要，但如果没有客观的衡量标准，即使每个人都真诚行事，群体行为也很容易变得不一致。
保持诚实。即使您是项目的唯一开发人员和唯一利益相关者，您也可能会考虑到该软件的某些品质。您可以使用代码覆盖率作为合理的近似值，然后让机器为您测量，而不是对软件的测试情况进行持续的主观评估（这需要工作）。

使用哪些指标

代码覆盖率不是单一指标；有几种不同的测量覆盖率的方法。您可以设定哪一个标准取决于您使用该标准来满足什么要求。

我将使用两个常见指标作为示例，说明您何时可以使用它们来设置标准：

语句覆盖率：测试期间执行的语句百分比是多少？对于了解代码的物理覆盖很有用：我编写的代码有多少是经过实际测试的？
- 这种覆盖范围支持较弱的正确性论证，但也更容易实现。如果您只是使用代码覆盖率来确保事物得到测试（而不是作为超出此范围的测试质量指标），那么语句覆盖率可能就足够了。
分支覆盖率：当存在分支逻辑（例如if）时，两个分支都被评估了吗？这可以更好地了解代码的逻辑覆盖：我测试过我的代码可能采用的可能路径有多少条？
- 这种覆盖范围可以更好地表明程序已经通过一组全面的输入进行了测试。如果您使用代码覆盖率作为正确性置信度的最佳经验近似值，则应根据分支覆盖率或类似内容设置标准。

还有许多其他指标（例如，行覆盖率与语句覆盖率类似，但对于多行语句会产生不同的数值结果；条件覆盖率和路径覆盖率与分支覆盖率类似，但反映了对可能的排列的更详细视图）程序执行时你可能会遇到。）

需要多少百分比

最后，回到最初的问题：如果你设定代码覆盖率标准，这个数字应该是多少？

希望此时我们已经清楚地知道我们首先讨论的是近似值，因此我们选择的任何数字本质上都是近似值。

人们可能会选择一些数字：

100%。您可能会选择此选项，因为您希望确保所有内容都经过测试。这不会让您深入了解测试质量，但确实会告诉您某种质量的某些测试已经触及每个语句（或分支等）。同样，这又回到了置信度：如果您的覆盖率低于 100% ，您知道您的代码的某些子集未经测试。
- 有些人可能会认为这很愚蠢，您应该只测试代码中真正重要的部分。我认为您也应该只维护代码中真正重要的部分。还可以通过删除未经测试的代码来提高代码覆盖率。
99%（或 95%，其他九十多岁的数字。）适用于您想要表达类似于 100% 的置信度，但要给自己留一些信心的情况不必担心偶尔难以测试的代码角落。
80%。我见过这个号码使用过几次，但不完全知道它的来源。我认为这可能是对 80-20 规则的奇怪滥用；一般来说，这里的目的是表明您的代码的大部分都经过了测试。（是的，51% 也是“大多数”，但 80% 更能反映大多数人的意思。）这适用于“经过充分测试”的中间立场情况高优先级（您不想在低价值测试上浪费精力），但优先级足够高，您仍然希望制定一些标准。

我在实践中还没有见过低于 80% 的数字，并且很难想象有人会设置它们。这些标准的作用是增强人们对正确性的信心，低于 80% 的数字并不能特别鼓舞人心。（是的，这是主观的，但同样，这个想法是在设定标准时做出主观选择，然后使用客观的衡量标准。）

其他注释

上面假设正确性是目标。代码覆盖率只是信息；它可能与其他目标相关。例如，如果您关心可维护性，您可能会关心松散耦合，这可以通过可测试性来证明，而可测试性又可以通过代码覆盖率来衡量（以某些方式）。因此，您的代码覆盖率标准也为近似“可维护性”的质量提供了经验基础。

Jon Limjap makes a good point - there is not a single number that is going to make sense as a standard for every project. There are projects that just don't need such a standard. Where the accepted answer falls short, in my opinion, is in describing how one might make that decision for a given project.

I will take a shot at doing so. I am not an expert in test engineering and would be happy to see a more informed answer.

When to set code coverage requirements

First, why would you want to impose such a standard in the first place? In general, when you want to introduce empirical confidence in your process. What do I mean by "empirical confidence"? Well, the real goal correctness. For most software, we can't possibly know this across all inputs, so we settle for saying that code is well-tested. This is more knowable, but is still a subjective standard: It will always be open to debate whether or not you have met it. Those debates are useful and should occur, but they also expose uncertainty.

Code coverage is an objective measurement: Once you see your coverage report, there is no ambiguity about whether standards have been met are useful. Does it prove correctness? Not at all, but it has a clear relationship to how well-tested the code is, which in turn is our best way to increase confidence in its correctness. Code coverage is a measurable approximation of immeasurable qualities we care about.

Some specific cases where having an empirical standard could add value:

To satisfy stakeholders. For many projects, there are various actors who have an interest in software quality who may not be involved in the day-to-day development of the software (managers, technical leads, etc.) Saying "we're going to write all the tests we really need" is not convincing: They either need to trust entirely, or verify with ongoing close oversight (assuming they even have the technical understanding to do so.) Providing measurable standards and explaining how they reasonably approximate actual goals is better.
To normalize team behavior. Stakeholders aside, if you are working on a team where multiple people are writing code and tests, there is room for ambiguity for what qualifies as "well-tested." Do all of your colleagues have the same idea of what level of testing is good enough? Probably not. How do you reconcile this? Find a metric you can all agree on and accept it as a reasonable approximation. This is especially (but not exclusively) useful in large teams, where leads may not have direct oversight over junior developers, for instance. Networks of trust matter as well, but without objective measurements, it is easy for group behavior to become inconsistent, even if everyone is acting in good faith.
To keep yourself honest. Even if you're the only developer and only stakeholder for your project, you might have certain qualities in mind for the software. Instead of making ongoing subjective assessments about how well-tested the software is (which takes work), you can use code coverage as a reasonable approximation, and let machines measure it for you.

Which metrics to use

Code coverage is not a single metric; there are several different ways of measuring coverage. Which one you might set a standard upon depends on what you're using that standard to satisfy.

I'll use two common metrics as examples of when you might use them to set standards:

Statement coverage: What percentage of statements have been executed during testing? Useful to get a sense of the physical coverage of your code: How much of the code that I have written have I actually tested?
- This kind of coverage supports a weaker correctness argument, but is also easier to achieve. If you're just using code coverage to ensure that things get tested (and not as an indicator of test quality beyond that) then statement coverage is probably sufficient.
Branch coverage: When there is branching logic (e.g. an if), have both branches been evaluated? This gives a better sense of the logical coverage of your code: How many of the possible paths my code may take have I tested?
- This kind of coverage is a much better indicator that a program has been tested across a comprehensive set of inputs. If you're using code coverage as your best empirical approximation for confidence in correctness, you should set standards based on branch coverage or similar.

There are many other metrics (line coverage is similar to statement coverage, but yields different numeric results for multi-line statements, for instance; conditional coverage and path coverage is similar to branch coverage, but reflect a more detailed view of the possible permutations of program execution you might encounter.)

What percentage to require

Finally, back to the original question: If you set code coverage standards, what should that number be?

Hopefully it's clear at this point that we're talking about an approximation to begin with, so any number we pick is going to be inherently approximate.

Some numbers that one might choose:

100%. You might choose this because you want to be sure everything is tested. This doesn't give you any insight into test quality, but does tell you that some test of some quality has touched every statement (or branch, etc.) Again, this comes back to degree of confidence: If your coverage is below 100%, you know some subset of your code is untested.
- Some might argue that this is silly, and you should only test the parts of your code that are really important. I would argue that you should also only maintain the parts of your code that are really important. Code coverage can be improved by removing untested code, too.
99% (or 95%, other numbers in the high nineties.) Appropriate in cases where you want to convey a level of confidence similar to 100%, but leave yourself some margin to not worry about the occasional hard-to-test corner of code.
80%. I've seen this number in use a few times, and don't entirely know where it originates. I think it might be a weird misappropriation of the 80-20 rule; generally, the intent here is to show that most of your code is tested. (Yes, 51% would also be "most", but 80% is more reflective of what most people mean by most.) This is appropriate for middle-ground cases where "well-tested" is not a high priority (you don't want to waste effort on low-value tests), but is enough of a priority that you'd still like to have some standard in place.

I haven't seen numbers below 80% in practice, and have a hard time imagining a case where one would set them. The role of these standards is to increase confidence in correctness, and numbers below 80% aren't particularly confidence-inspiring. (Yes, this is subjective, but again, the idea is to make the subjective choice once when you set the standard, and then use an objective measurement going forward.)

Other notes

The above assumes that correctness is the goal. Code coverage is just information; it may be relevant to other goals. For instance, if you're concerned about maintainability, you probably care about loose coupling, which can be demonstrated by testability, which in turn can be measured (in certain fashions) by code coverage. So your code coverage standard provides an empirical basis for approximating the quality of "maintainability" as well.

回复收藏 0 原文

愛上了 2024-07-11 11:31:08

我最喜欢的代码覆盖率是 100%，带星号。出现星号是因为我更喜欢使用允许我将某些行标记为“不计数”行的工具。如果我已经覆盖了 100% 的“有效”行，那么我就完成了。

基本流程是：

我编写测试来练习我能想到的所有功能和边缘情况（通常根据文档进行工作）。
我运行代码覆盖工具
我检查任何未覆盖的行或路径以及任何我认为不重要或无法访问的行或路径（由于防御性编程）我标记为不计算
我编写新的测试来覆盖缺失的行并改进文档（如果这些边缘）没有提及案例。

这样，如果我和我的合作者将来添加新代码或更改测试，就会有一条明线告诉我们是否错过了一些重要的内容 - 覆盖率降至 100% 以下。然而，它还提供了处理不同测试优先级的灵活性。

回复收藏 0 原文

仅此而已 2024-07-11 11:31:08

代码覆盖率很高，但功能覆盖率甚至更好。我不相信要涵盖我写的每一行。但我确实相信对我想要提供的所有功能编写 100% 的测试覆盖率（即使是我自己带来的、在会议期间没有讨论的额外很酷的功能）。

我不在乎我是否会拥有测试中未涵盖的代码，但我会关心我是否会重构我的代码并最终产生不同的行为。因此，100%的功能覆盖率是我唯一的目标。

回复收藏 0 原文

苦行僧 2024-07-11 11:31:08

阿尔贝托·萨沃亚 (Alberto Savoia) 的这篇散文恰恰回答了这个问题（以一种非常有趣的方式！）：

http://www.artima.com/forums/flat.jsp?forum=106&thread=204677

Testivus 谈测试覆盖率
一天一大早，一位程序员问
大师：
“我准备编写一些单元测试。我应该瞄准什么样的代码覆盖率
为了？”
大师回复：
“不用担心覆盖率，只需编写一些好的测试即可。”
程序员微笑着，鞠了一躬，然后
向左。
...
那天晚些时候，第二个程序员
问了同样的问题。
大师指着一盆
沸水并说道：
“那个锅里应该放多少粒米？”
程序员一脸困惑，
回复：
“我怎么能告诉你呢？这取决于你需要多少人
喂食，他们有多饿，还有什么
你提供的食物，多少米
你有空，等等。”
“正是，”大师说。
第二个程序员微笑着，鞠了一躬，
然后离开了。
...
在这一天结束时，第三个
程序员也来问同样的问题
关于代码覆盖率的问题。
“百分之八十，不少于！” 大师用严厉的声音回答道：
用拳头猛击桌子。
第三位程序员微笑着，鞠了一躬，
然后离开了。
...
在最后的回复之后，一位年轻的
徒弟走近伟人
大师：
“伟大的主人，今天我无意中听到您回答了同样的问题
三种不同的代码覆盖率
答案。为什么？”
伟大的大师从他的身上站起来
椅子：
“来和我一起喝点新鲜茶，我们来谈谈吧。”
当他们把杯子装满后
抽着热绿茶，很棒
大师开始解答：
“第一位程序员是新人，刚刚开始测试。
现在他有很多代码，但没有
测试。他还有很长的路要走；
此时重点关注代码覆盖率
会很沮丧而且毫无用处。
他最好还是习惯一下
编写并运行一些测试。他可以
担心以后的报道。”
“另一方面，第二个程序员非常有经验
在编程和测试方面。当我
回答问她有多少粒
我应该把米饭放进锅里，我
帮助她意识到
测试是否必要取决于一些数字
因素，她知道那些
比我更好的因素——是她
毕竟代码。没有一个是单一的，
简单，回答，她足够聪明
处理真相并与
那个。”
“我明白了，”年轻学徒说道，
“但如果没有一个简单的
回答，那么你为什么回答这个
第三个程序员'百分之八十和
不少'？”
大师笑得很厉害
大声说他的肚子，证明他
喝的不仅仅是绿茶，
上下翻腾。
“第三位程序员只想要简单的答案——即使有
没有简单的答案......然后不
无论如何都要跟随他们。”
年轻学徒和头发花白的人
伟大的主人喝完了他们的酒
静静地喝茶。

This prose by Alberto Savoia answers precisely that question (in a nicely entertaining manner at that!):

http://www.artima.com/forums/flat.jsp?forum=106&thread=204677

Testivus On Test Coverage
Early one morning, a programmer asked
the great master:
“I am ready to write some unit tests. What code coverage should I aim
for?”
The great master replied:
“Don’t worry about coverage, just write some good tests.”
The programmer smiled, bowed, and
left.
...
Later that day, a second programmer
asked the same question.
The great master pointed at a pot of
boiling water and said:
“How many grains of rice should I put in that pot?”
The programmer, looking puzzled,
replied:
“How can I possibly tell you? It depends on how many people you need to
feed, how hungry they are, what other
food you are serving, how much rice
you have available, and so on.”
“Exactly,” said the great master.
The second programmer smiled, bowed,
and left.
...
Toward the end of the day, a third
programmer came and asked the same
question about code coverage.
“Eighty percent and no less!” Replied the master in a stern voice,
pounding his fist on the table.
The third programmer smiled, bowed,
and left.
...
After this last reply, a young
apprentice approached the great
master:
“Great master, today I overheard you answer the same question about
code coverage with three different
answers. Why?”
The great master stood up from his
chair:
“Come get some fresh tea with me and let’s talk about it.”
After they filled their cups with
smoking hot green tea, the great
master began to answer:
“The first programmer is new and just getting started with testing.
Right now he has a lot of code and no
tests. He has a long way to go;
focusing on code coverage at this time
would be depressing and quite useless.
He’s better off just getting used to
writing and running some tests. He can
worry about coverage later.”
“The second programmer, on the other hand, is quite experience both
at programming and testing. When I
replied by asking her how many grains
of rice I should put in a pot, I
helped her realize that the amount of
testing necessary depends on a number
of factors, and she knows those
factors better than I do – it’s her
code after all. There is no single,
simple, answer, and she’s smart enough
to handle the truth and work with
that.”
“I see,” said the young apprentice,
“but if there is no single simple
answer, then why did you answer the
third programmer ‘Eighty percent and
no less’?”
The great master laughed so hard and
loud that his belly, evidence that he
drank more than just green tea,
flopped up and down.
“The third programmer wants only simple answers – even when there are
no simple answers … and then does not
follow them anyway.”
The young apprentice and the grizzled
great master finished drinking their
tea in contemplative silence.

回复收藏 0 原文

丢了幸福的猪 2024-07-11 11:31:08

许多商店不重视测试，所以如果你的值高于零，至少会有一些价值升值 - 所以可以说非零也不错，因为许多仍然为零。

在 .Net 世界中，人们经常引用 80% 的说法是合理的。但他们是在解决方案层面这么说的。我更喜欢在项目级别进行衡量：如果您有 Selenium 等或手动测试，对于 UI 项目来说 30% 可能就可以了，对于数据层项目来说 20% 可能就可以了，但是对于业务来说 95%+ 可能是可以实现的规则层，如果不是完全必要的话。因此，总体覆盖率可能是 60%，但关键业务逻辑可能要高得多。

我还听说过这样一句话：立志达到100%，你就会达到80%；立志达到100%，你就会达到80%；但立志达到 80%，你就会达到 40%。

底线：应用 80:20 规则，让应用程序的错误计数来指导您。

回复收藏 0 原文

预谋 2024-07-11 11:31:08

对于一个设计良好的系统，单元测试从一开始就推动了开发，我想说 85% 是一个相当低的数字。设计为可测试的小班应该不难覆盖比这更好的内容。

很容易用这样的东西来驳回这个问题：

覆盖的线不等于经过测试的逻辑，并且不应该过多地解读百分比。

确实如此，但是关于代码覆盖率有一些重要的要点需要注意。根据我的经验，如果使用得当，这个指标实际上非常有用。话虽如此，我还没有见过所有的系统，而且我确信有大量的系统很难看到代码覆盖率分析增加任何真正的价值。代码看起来可能如此不同，可用测试框架的范围也可能有所不同。

另外，我的推理主要涉及相当短的测试反馈循环。对于我正在开发的产品，最短的反馈循环非常灵活，涵盖从类测试到进程间信号传输的所有内容。测试可交付子产品通常需要 5 分钟，对于如此短的反馈循环，确实可以使用测试结果（特别是我们在这里查看的代码覆盖率指标）来拒绝或接受存储库中的提交。

使用代码覆盖率指标时，您不应该只拥有必须满足的固定（任意）百分比。在我看来，这样做并不能给您带来代码覆盖率分析的真正好处。相反，定义以下指标：

低水位线 (LWM)，被测系统中未覆盖行的最低数量
高水位线 (HWM)，被测系统中所见的最高代码覆盖率

新代码只能如果我们不高于 LWM 并且我们不低于 HWM，则添加。换句话说，代码覆盖率是不允许降低的，新的代码应该被覆盖。请注意我如何说“应该”和“不是必须”（如下所述）。

但这是否意味着您将无法清除那些经过充分测试、不再使用的旧垃圾？是的，这就是为什么你必须对这些事情采取务实的态度。在某些情况下，必须打破规则，但对于典型的日常集成，我的经验是这些指标非常有用。他们给出了以下两个含义。

可测试的代码得到提升。
添加新代码时，您确实必须努力使代码可测试，因为您必须尝试用测试用例覆盖所有代码。可测试的代码通常是一件好事。
遗留代码的测试覆盖率随着时间的推移而不断增加。
当添加新代码并且无法用测试用例覆盖它时，可以尝试覆盖一些遗留代码来绕过 LWM 规则。这种有时必要的作弊行为至少会产生积极的副作用，即遗留代码的覆盖范围将随着时间的推移而增加，使得这些规则看似严格的执行在实践中相当务实。

同样，如果反馈循环太长，在集成过程中设置类似的东西可能是完全不切实际的。

我还想提一下代码覆盖率指标的两个更普遍的好处。

代码覆盖率分析是动态代码分析的一部分（与静态代码分析相对，即 Lint）。动态代码分析过程中发现的问题（通过 purify 系列等工具，http://www-03.ibm.com/software/products/en/rational-purify-family）是未初始化内存读取（UMR）、内存泄漏等问题。这些问题可能会仅当代码被执行的测试用例覆盖时才能找到。测试用例中最难覆盖的代码通常是系统中的异常情况，但是如果您希望系统优雅地失败（即错误跟踪而不是崩溃），您可能需要花一些精力来覆盖异常情况在动态代码分析中也是如此。只要运气不好，UMR 就可能导致段错误或更严重的情况。
人们为保持 100% 新代码而感到自豪，并且人们以与其他实现问题类似的热情讨论测试问题。如何以更可测试的方式编写这个函数？您将如何尝试覆盖这种异常情况等。

以及负面的情况。

在一个有许多开发人员参与的大型项目中，并不是每个人都肯定是测试天才。 有些人倾向于使用代码覆盖率指标作为代码经过测试的证据，但这与事实相去甚远，正如该问题的许多其他答案中提到的那样。如果使用得当，它是一个可以给你带来一些好处的指标，但如果使用不当，它实际上可能会导致糟糕的测试。除了上面提到的非常有价值的副作用之外，被覆盖的行仅表明被测系统可以到达该行获取某些输入数据，并且它可以执行而不会挂起或崩溃。

For a well designed system, where unit tests have driven the development from the start i would say 85% is a quite low number. Small classes designed to be testable should not be hard to cover better than that.

It's easy to dismiss this question with something like:

Covered lines do not equal tested logic and one should not read too much into the percentage.

True, but there are some important points to be made about code coverage. In my experience this metric is actually quite useful, when used correctly. Having said that, I have not seen all systems and i'm sure there are tons of them where it's hard to see code coverage analysis adding any real value. Code can look so different and the scope of the available test framework can vary.

Also, my reasoning mainly concerns quite short test feedback loops. For the product that I'm developing the shortest feedback loop is quite flexible, covering everything from class tests to inter process signalling. Testing a deliverable sub-product typically takes 5 minutes and for such a short feedback loop it is indeed possible to use the test results (and specifically the code coverage metric that we are looking at here) to reject or accept commits in the repository.

When using the code coverage metric you should not just have a fixed (arbitrary) percentage which must be fulfilled. Doing this does not give you the real benefits of code coverage analysis in my opinion. Instead, define the following metrics:

Low Water Mark (LWM), the lowest number of uncovered lines ever seen in the system under test
High Water Mark (HWM), the highest code coverage percentage ever seen for the system under test

New code can only be added if we don't go above the LWM and we don't go below the HWM. In other words, code coverage is not allowed to decrease, and new code should be covered. Notice how i say should and not must (explained below).

But doesn't this mean that it will be impossible to clean away old well-tested rubbish that you have no use for anymore? Yes, and that's why you have to be pragmatic about these things. There are situations when the rules have to be broken, but for your typical day-to-day integration my experience it that these metrics are quite useful. They give the following two implications.

Testable code is promoted.
When adding new code you really have to make an effort to make the code testable, because you will have to try and cover all of it with your test cases. Testable code is usually a good thing.
Test coverage for legacy code is increasing over time.
When adding new code and not being able to cover it with a test case, one can try to cover some legacy code instead to get around the LWM rule. This sometimes necessary cheating at least gives the positive side effect that the coverage of legacy code will increase over time, making the seemingly strict enforcement of these rules quite pragmatic in practice.

And again, if the feedback loop is too long it might be completely unpractical to setup something like this in the integration process.

I would also like to mention two more general benefits of the code coverage metric.

Code coverage analysis is part of the dynamic code analysis (as opposed to the static one, i.e. Lint). Problems found during the dynamic code analysis (by tools such as the purify family, http://www-03.ibm.com/software/products/en/rational-purify-family) are things like uninitialized memory reads (UMR), memory leaks, etc. These problems can only be found if the code is covered by an executed test case. The code that is the hardest to cover in a test case is usually the abnormal cases in the system, but if you want the system to fail gracefully (i.e. error trace instead of crash) you might want to put some effort into covering the abnormal cases in the dynamic code analysis as well. With just a little bit of bad luck, a UMR can lead to a segfault or worse.
People take pride in keeping 100% for new code, and people discuss testing problems with a similar passion as other implementation problems. How can this function be written in a more testable manner? How would you go about trying to cover this abnormal case, etc.

And a negative, for completeness.

In a large project with many involved developers, everyone is not going to be a test-genius for sure. Some people tend to use the code coverage metric as proof that the code is tested and this is very far from the truth, as mentioned in many of the other answers to this question. It is ONE metric that can give you some nice benefits if used properly, but if it is misused it can in fact lead to bad testing. Aside from the very valuable side effects mentioned above a covered line only shows that the system under test can reach that line for some input data and that it can execute without hanging or crashing.

回复收藏 0 原文

家住魔仙堡 2024-07-11 11:31:08

我想分享另一个关于测试覆盖率的轶事。

我们有一个巨大的项目，其中，通过 twitter，我注意到，有 700 个单元测试，我们只有 20% 的代码覆盖范围。

Scott Hanselman 回复了智慧之言：

这 20% 是正确的吗？是20%吗
代表您的用户的代码
打击最多？您可以再添加 50 个
经测试仅增加2%。

再次，它回到我的 Testivus关于代码覆盖率答案。锅里应该放多少米？这取决于。

回复收藏 0 原文

趁微风不噪 2024-07-11 11:31:08

如果 100% 覆盖率是您的目标（而不是 100% 测试所有功能），那么代码覆盖率是一个误导性的指标。

击中所有线一次即可获得 100%。但是，您仍然可能会错过测试命中这些行的特定序列（逻辑路径）。
您无法获得 100%，但仍然测试了所有 80%/freq 使用的代码路径。进行测试来测试您放入的每个“抛出 ExceptionTypeX”或类似的防御性编程防护是“最好有”而不是“必须有”，

因此请相信您自己或您的开发人员会彻底并覆盖其代码中的每条路径。务实一点，不要追求神奇的 100% 覆盖率。如果您对代码进行 TDD，那么您应该获得 90% 以上的覆盖率作为奖励。使用代码覆盖来突出显示您错过的代码块（如果您是 TDD，则不应发生这种情况。因为您编写代码只是为了测试通过。如果没有其合作伙伴测试，任何代码都不可能存在。）

回复收藏 0 原文

萌梦深 2024-07-11 11:31:08

如果这是一个完美的世界，单元测试将覆盖 100% 的代码。然而，由于这不是一个完美的世界，所以这取决于你有时间做什么。因此，我建议减少对特定百分比的关注，而更多地关注关键领域。如果您的代码写得很好（或者至少是其合理的复制品），那么应该有几个关键点可以将 API 暴露给其他代码。

将您的测试工作集中在这些 API 上。确保 API 1) 有详细记录，2) 编写的测试用例与文档相匹配。如果预期结果与文档不匹配，则说明您的代码、文档或测试用例中存在错误。所有这些都值得审查。

祝你好运！

回复收藏 0 原文

~没有更多了~

关于作者

时光是把杀猪刀

暂无简介

0 文章

0 评论

23 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

苦中寻乐

文章 0 评论 0

lueluelue

文章 0 评论 0

嗼ふ静

文章 0 评论 0

王权女流氓

文章 0 评论 0

与花如笺

文章 0 评论 0

残酷

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文