多少测试就足够了?
我最近花了大约 70% 的时间编写集成测试的功能。 有一次,我在想“该死,所有这些艰苦的测试工作,我知道我这里没有错误,为什么我要这么努力? 让我们浏览一下测试并完成它吧……”
五分钟后,测试失败了。 详细检查表明,这是我们正在使用的第三方库中的一个重要的未知错误。
那么……对于要检验什么、要相信什么,你的界限在哪里呢? 您是否测试了所有内容,或者测试了您预计会出现大多数错误的代码?
I recently spent about 70% of the time coding a feature writing integration tests. At one point, I was thinking “Damn, all this hard work testing it, I know I don’t have bugs here, why do I work so hard on this? Let’s just skim on the tests and finish it already…”
Five minutes later a test fails. Detailed inspection shows it’s an important, unknown bug in a 3rd party library we’re using.
So … where do you draw your line on what to test on what to take on faith? Do you test everything, or the code where you expect most of the bugs?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
好问题!
首先 - 听起来你广泛的集成测试得到了回报:)
从我个人的经验来看:
我喜欢执行严格的单元测试
并有一个彻底的(彻底的
可能)集成测试计划
设计的。
测试覆盖率很差,那么我
更喜欢设计一套集成
测试特定/已知的测试
功能。 那我介绍一下
测试(单元/集成)如我
代码库取得进一步进展。
多少才够呢? 棘手的问题——我认为还不够!
Good question!
Firstly - it sounds like your extensive integration testing paid off :)
From my personal experience:
I like to enforce strict unit testing
and have a thorough (as thorough as
possible) integration test plan
designed.
that has poor test coverage, then I
prefer to design a set integration
tests that test specific/known
functionality. I then introduce
tests (unit/integration) as I
progress further with the code base.
How much is enough? Tough question - I dont think that there ever can be enough!
“凡事太多就够了。”
我不遵循严格的 TDD 实践。 我尝试编写足够的单元测试来覆盖所有代码路径并练习我认为重要的任何边缘情况。 基本上我会尝试预测可能会出现什么问题。 我还尝试将我编写的测试代码量与我认为被测代码的脆弱性或重要性相匹配。
我在一个方面很严格:如果发现错误,我首先编写一个测试来执行该错误并失败,然后更改代码并验证测试是否通过。
"Too much of everything is just enough."
I don't follow strict TDD practices. I try to write enough unit tests to cover all code paths and exercise any edge cases I think are important. Basically I try to anticipate what might go wrong. I also try to match the amount of test code I write to how brittle or important I think the code under test is.
I am strict in one area: if a bug is found, I first write a test that exercises the bug and fails, make the code changes, and verify that the test passes.
Gerald Weinberg 的经典著作《计算机编程心理学》 有很多关于测试的好故事。 我特别喜欢的一个是第 4 章“编程作为一种社交活动”“Bill”要求一位同事检查他的代码,他们仅在 13 个语句中发现了 17 个错误。 代码审查提供了额外的眼睛来帮助发现错误,您使用的眼睛越多,发现如此微妙的错误的机会就越大。 就像莱纳斯所说,“只要有足够多的眼球,所有的错误都是浅薄的”,你的测试基本上是机器人的眼睛,它们会在白天或晚上的任何时间根据你的需要多次检查你的代码,并让你知道一切是否仍然正常。
多少测试就足够取决于您是从头开始开发还是维护现有系统。
从头开始时,您不希望花费所有时间编写测试并最终无法交付,因为您能够编码的 10% 的功能都经过了详尽的测试。 需要确定一些优先级。 一个例子是私有方法。 由于私有方法必须由以某种形式(公共/包/受保护)可见的代码使用,因此可以认为私有方法被覆盖在更可见的方法的测试中。 如果私有代码中有一些重要或模糊的行为或边缘情况,则需要在此处包含一些白盒测试。
测试应该帮助您确保 1) 了解需求,2) 通过编码实现可测试性,遵守良好的设计实践,3) 了解以前现有的代码何时停止工作。 如果您无法描述某些功能的测试,我敢打赌您对该功能的理解还不够透彻,无法干净地编写代码。 使用单元测试代码迫使您做一些事情,例如将数据库连接或实例工厂等重要的事情作为参数传递,而不是屈服于让类本身做太多事情并变成“上帝”对象的诱惑。 让你的代码成为你的金丝雀意味着你可以自由地编写更多代码。 当先前通过的测试失败时,这意味着以下两种情况之一:要么代码不再执行预期的操作,要么功能的要求已更改,并且只需更新测试即可满足新的要求。
在使用现有代码时,您应该能够证明所有已知场景都已涵盖,这样当下一个更改请求或错误修复出现时,您就可以自由地深入研究您认为合适的任何模块,而不必担心,”如果我破坏了某些东西怎么办”,这会导致花费更多的时间来测试甚至是小的修复,然后才实际更改代码。
因此,我们无法为您提供严格且快速的测试数量,但您应该争取一定程度的覆盖范围,以增强您对不断进行更改或添加功能的能力的信心,否则您可能已经达到了收益递减的地步。
Gerald Weinberg's classic book "The Psychology of Computer Programming" has lots of good stories about testing. One I especially like is in Chapter 4 "Programming as a Social Activity" "Bill" asks a co-worker to review his code and they find seventeen bugs in only thirteen statements. Code reviews provide additional eyes to help find bugs, the more eyes you use the better chance you have of finding ever-so-subtle bugs. Like Linus said, "Given enough eyeballs, all bugs are shallow" your tests are basically robotic eyes who will look over your code as many times as you want at any hour of day or night and let you know if everything is still kosher.
How many tests are enough does depend on whether you are developing from scratch or maintaining an existing system.
When starting from scratch, you don't want to spend all your time writing test and end up failing to deliver because the 10% of the features you were able to code are exhaustively tested. There will be some amount of prioritization to do. One example is private methods. Since private methods must be used by the code which is visible in some form (public/package/protected) private methods can be considered to be covered under the tests for the more-visible methods. This is where you need to include some white-box tests if there are some important or obscure behaviors or edge cases in the private code.
Tests should help you make sure you 1) understand the requirements, 2) adhere to good design practices by coding for testability, and 3) know when previously existing code stops working. If you can't describe a test for some feature, I would be willing to bet that you don't understand the feature well enough to code it cleanly. Using unit test code forces you to do things like pass in as arguments those important things like database connections or instance factories instead of giving in to the temptation of letting the class do way too much by itself and turning into a 'God' object. Letting your code be your canary means that you are free to write more code. When a previously passing test fails it means one of two things, either the code no longer does what was expected or that the requirements for the feature have changed and the test simply needs to be updated to fit the new requirements.
When working with existing code, you should be able to show that all the known scenarios are covered so that when the next change request or bug fix comes along, you will be free to dig into whatever module you see fit without the nagging worry, "what if I break something" which leads to spending more time testing even small fixes then it took to actually change the code.
So, we can't give you a hard and fast number of tests but you should shoot for a level of coverage which increases your confidence in your ability to keep making changes or adding features, otherwise you've probably reached the point of diminished returns.
如果您或您的团队一直在跟踪指标,您可以看到随着软件生命周期的进展,每次测试发现了多少错误。 如果您已经定义了一个可接受的阈值,其中测试所花费的时间不能证明所发现的错误数量是合理的,那么这就是您应该停止的点。
您可能永远无法 100% 找到 bug。
If you or your team has been tracking metrics, you could see how many bugs are found for every test as the software life-cycle progresses. If you've defined an acceptable threshold where the time spent testing does not justify the number of bugs found, then THAT is the point at which you should stop.
You will probably never find 100% of your bugs.
我在单元测试上花费了大量时间,但在集成测试上花费的时间很少。 单元测试允许我以结构方式构建功能。 现在您有了一些很好的文档和回归测试,可以在每个构建中运行
集成测试是另一回事。 它们很难维护,并且根据定义集成了许多不同的功能,并且通常与难以使用的基础设施集成。
I spend a lot of time on unit tests, but very little on integration tests. Unit tests allow me to build out a feature in a structure way. And now you have some nice documentation and regression tests that can be run every build
Integration tests are a different matter. They are difficult to maintain and by definition integrate a lot of different pieces of functionality, often with infrastructure that is difficult to work with.
与生活中的一切一样,它受到时间和资源及其重要性的限制。 理想情况下,您将测试您合理认为可能破坏的所有内容。 当然,您的估计可能是错误的,但过度测试以确保您的假设正确取决于错误的严重程度与继续进行下一个功能/版本/项目的需要。
注意:我的回答主要涉及集成测试。 TDD 非常不同。 之前已经在 SO 中介绍过了,当您没有更多的功能可添加时,您就停止测试。 TDD 是关于设计,而不是错误发现。
As with everything in life it is limited by time and resources and relative to its importance. Ideally you would test everything that you reasonably think could break. Of course you can be wrong in your estimate, but overtesting to ensure that your assumptions are right depends on how significant a bug would be vs. the need to move on to the next feature/release/project.
Note: My answer primarily address integration testing. TDD is very different. It was covered on SO before, and there you stop testing when you have no more functionality to add. TDD is about design, not bug discovery.
我更喜欢尽可能进行单元测试。 在我看来,最大的副作用之一(除了提高代码质量和帮助避免一些错误之外)是,高单元测试期望需要改变他们编写代码的方式为了更好。 至少,这对我来说是这样的。
我的类更具凝聚力、更易于阅读且更加灵活,因为它们被设计为功能性且可测试。
也就是说,我使用 junit 和 cobertura(对于 Java)默认单元测试覆盖率要求为 90%(行和分支)。 当我觉得由于特定类的性质(或 cobertura 中的错误)而无法满足这些要求时,我会例外。
单元测试从覆盖范围开始,当您使用它们实际测试边界条件时,单元测试才会真正为您服务。 对于如何实现该目标的建议,其他答案都是正确的。
I prefer to unit test as much as possible. One of the greatest side-effects (other than increasing the quality of your code and helping keep some bugs away) is that, in my opinion, high unit test expectations require one to change the way they write code for the better. At least, that's how it worked out for me.
My classes are more cohesive, easier to read, and much more flexible because they're designed to be functional and testable.
That said, I default unit test coverage requirements of 90% (line and branch) using junit and cobertura (for Java). When I feel that these requirements cannot be met due to the nature of a specific class (or bugs in cobertura) then I make exceptions.
Unit tests start with coverage, and really work for you when you've used them to test boundary conditions realistically. For advice on how to implement that goal, the other answers all have it right.
这篇文章提供了一些关于不同数量用户的用户测试有效性的非常有趣的见解。 它表明,只需三个用户测试应用程序即可发现大约三分之二的错误,只需五个用户即可发现多达 85% 的错误。
单元测试很难给出离散值。 要记住的一项建议是,单元测试可以帮助您组织关于如何开发正在测试的代码的想法。 一旦编写了一段代码的需求并有办法可靠地检查它,您就可以更快、更可靠地编写它。
This article gives some very interesting insights on the effectiveness of user testing with different numbers of users. It suggests that you can find about two thirds of your errors with only three users testing the application, and as much as 85% of your errors with just five users.
Unit testing is harder to put a discrete value on. One suggestion to keep in mind is that unit testing can help to organize your thoughts on how to develop the code you're testing. Once you've written the requirements for a piece of code and have a way to check it reliably, you can write it more quickly and reliably.
我测试一切。 我讨厌它,但它是我工作的重要组成部分。
I test Everything. I hate it, but it's an important part of my work.
在成为一名开发人员之前,我在 QA 工作了 1.5 年。
你永远无法测试所有内容(我在训练时被告知,单个文本框的所有排列将花费比已知宇宙更长的时间)。
作为开发人员,您没有责任了解或说明重要测试内容和不测试内容的优先级。 最终产品的测试和质量是一项责任,但只有客户才能有意义地陈述功能的优先级,除非他们明确将此责任交给您。 如果没有 QA 团队并且您不知道,请要求项目经理找出并确定优先级。
测试是一种降低风险的活动,客户/用户会知道什么是重要的,什么是不重要的。 使用极限编程中的测试优先驱动开发会很有帮助,这样您就有一个良好的测试基础,并且可以在更改后进行回归测试。
值得注意的是,由于自然选择,代码可能会对测试“免疫”。 Code Complete 表示,在修复缺陷并为其编写测试用例并查找类似缺陷时,为与其相似的缺陷编写测试用例也是一个好主意。
I worked in QA for 1.5 years before becoming a developer.
You can never test everything (I was told when trained all the permutations of a single text box would take longer than the known universe).
As a developer it's not your responsibility to know or state the priorities of what is important to test and what not to test. Testing and quality of the final product is a responsibility, but only the client can meaningfully state the priorities of features, unless they have explicitly given this responsibility to you. If there isn't a QA team and you don't know, ask the project manager to find out and prioritise.
Testing is a risk reduction exercise and the client/user will know what is important and what isn't. Using a test first driven development from Extreme Programming will be helpful, so you have a good test base and can regression test after a change.
It's important to note that due to natural selection code can become "immune" to tests. Code Complete says when fixing a defect to write a test case for it and look for similar defects, it's also a good idea to write a test case for defects similar to it.
在我看来,在测试方面保持务实很重要。 优先考虑最有可能失败的事情和/或最重要但不会失败的事情(即考虑概率和后果)。
思考,而不是盲目遵循代码覆盖率等一项指标。
当您对测试套件和代码感到满意时停止。 当(如果?)事情开始失败时,返回并添加更多测试。
In my opinion, it's important to be pragmatic when it comes to testing. Prioritize your testing efforts on the things that are most likely to fail, and/or the things that it is most important that do not fail (i.e. take probability and consequence into consideration).
Think, instead of blindly following one metric such as code coverage.
Stop when you are comfortable with the test suite and your code. Go back and add more tests when (if?) things start failing.
当您不再害怕对代码进行中到重大更改时,您很可能已经进行了足够的测试。
When you're no longer afraid to make medium to major changes in your code, then chances are you've got enough tests.