Check out Crap4j. It's a slightly more sophisticated approach than straight code coverage. It combines code coverage measurements with complexity measurements, and then shows you what complex code isn't currently tested.
Generally speaking, from the several engineering excellence best practices papers that I have read, 80% for new code in unit tests is the point that yields the best return. Going above that CC% yields a lower amount of defects for the amount of effort exerted. This is a best practice that is used by many major corporations.
Unfortunately, most of these results are internal to companies, so there are no public literatures that I can point you to.
I don't think there can be such a B/W rule. Code should be reviewed, with particular attention to the critical details. However, if it hasn't been tested, it has a bug!
In my opinion, the answer is "It depends on how much time you have". I try to achieve 100% but I don't make a fuss if I don't get it with the time I have.
When I write unit tests, I wear a different hat compared to the hat I wear when developing production code. I think about what the tested code claims to do and what are the situations that can possible break it.
I usually follow the following criteria or rules:
That the Unit Test should be a form of documentation on what's the expected behavior of my codes, ie. the expected output given a certain input and the exceptions it may throw that clients may want to catch (What the users of my code should know?)
That the Unit Test should help me discover the what if conditions that I may not yet have thought of. (How to make my code stable and robust?)
If these two rules doesn't produce 100% coverage then so be it. But once, I have the time, I analyze the uncovered blocks and lines and determine if there are still test cases without unit tests or if the code needs to be refactored to eliminate the unecessary codes.
Viewing coverage from another perspective: Well-written code with a clear flow of control is the easiest to cover, the easiest to read, and usually the least buggy code. By writing code with clearness and coverability in mind, and by writing the unit tests in parallel with the code, you get the best results IMHO.
Code coverage is great but only as long as the benefits that you get from it outweigh the cost/effort of achieving it.
We have been working to a standard of 80% for some time, however we have just made the decison to abandon this and instead be more focused on our testing. Concentrating on the complex business logic etc,
This decision was taken due to the increasing amount of time we spent chasing code coverage and maintaining existing unit tests. We felt we had got to the point where the benefit we were getting from our code coverage was deemed to be less than the effort that we had to put in to achieve it.
From the Testivus posting I think the answer context should be the second programmer.
Having said this from a practical point of view we need parameter / goals to strive for.
I consider that this can be "tested" in an Agile process by analyzing the code we have the architecture, functionality (user stories), and then come up with a number. Based on my experience in the Telecom area I would say that 60% is a good value to check.
We were targeting >80% till few days back, But after we used a lot of Generated code, We do not care for %age, but rather make reviewer take a call on the coverage required.
I think the best symptom of correct code coverage is that amount of concrete problems unit tests help to fix is reasonably corresponds to size of unit tests code you created.
This has to be dependent on what phase of your application development lifecycle you are in.
If you've been at development for a while and have a lot of implemented code already and are just now realizing that you need to think about code coverage then you have to check your current coverage (if it exists) and then use that baseline to set milestones each sprint (or an average rise over a period of sprints), which means taking on code debt while continuing to deliver end user value (at least in my experience the end user doesn't care one bit if you've increased test coverage if they don't see new features).
Depending on your domain it's not unreasonable to shoot for 95%, but I'd have to say on average your going to be looking at an average case of 85% to 90%.
Depending on the criticality of the code, anywhere from 75%-85% is a good rule of thumb. Shipping code should definitely be tested more thoroughly than in house utilities, etc.
I prefer to do BDD, which uses a combination of automated acceptance tests, possibly other integration tests, and unit tests. The question for me is what the target coverage of the automated test suite as a whole should be.
That aside, the answer depends on your methodology, language and testing and coverage tools. When doing TDD in Ruby or Python it's not hard to maintain 100% coverage, and it's well worth doing so. It's much easier to manage 100% coverage than 90-something percent coverage. That is, it's much easier to fill coverage gaps as they appear (and when doing TDD well coverage gaps are rare and usually worth your time) than it is to manage a list of coverage gaps that you haven't gotten around to and miss coverage regressions due to your constant background of uncovered code.
The answer also depends on the history of your project. I've only found the above to be practical in projects managed that way from the start. I've greatly improved the coverage of large legacy projects, and it's been worth doing so, but I've never found it practical to go back and fill every coverage gap, because old untested code is not well understood enough to do so correctly and quickly.
Code coverage is just another metric. In and of itself, it can be very misleading (see www.thoughtworks.com/insights/blog/are-test-coverage-metrics-overrated). Your goal should therefore not be to achieve 100% code coverage but rather to ensure that you test all relevant scenarios of your application.
I think that what may matter most is knowing what the coverage trend is over time and understanding the reasons for changes in the trend. Whether you view the changes in the trend as good or bad will depend upon your analysis of the reason.
If you've been doing unit testing for a decent amount of time, I see no reason for it not to be approaching 95%+. However, at a minimum, I've always worked with 80%, even when new to testing.
This number should only include code written in the project (excludes frameworks, plugins, etc.) and maybe even exclude certain classes composed entirely of code written of calls to outside code. This sort of call should be mocked/stubbed.
Long answer: I think it totally depends on the nature of your project. I typically start a project by unit testing every practical piece. By the first "release" of the project you should have a pretty good base percentage based on the type of programming you are doing. At that point you can start "enforcing" a minimum code coverage.
I use cobertura, and whatever the percentage, I would recommend keeping the values in the cobertura-check task up-to-date. At the minimum, keep raising totallinerate and totalbranchrate to just below your current coverage, but never lower those values. Also tie in the Ant build failure property to this task. If the build fails because of lack of coverage, you know someone's added code but hasn't tested it. Example:
My answer to this conundrum is to have 100% line coverage of the code you can test and 0% line coverage of the code you can't test.
My current practice in Python is to divide my .py modules into two folders: app1/ and app2/ and when running unit tests calculate the coverage of those two folders and visually check (I must automate this someday) that app1 has 100% coverage and app2 has 0% coverage.
When/if I find that these numbers differ from standard I investigage and alter the design of the code so that coverage conforms to the standard.
This does mean that I can recommend achieving 100% line coverage of library code.
I also occasionally review app2/ to see if I could possible test any code there, and If I can I move it into app1/
Now I'm not too worried about the aggregate coverage because that can vary wildly depending on the size of the project, but generally I've seen 70% to over 90%.
With python, I should be able to devise a smoke test which could automatically run my app while measuring coverage and hopefully gain an aggreagate of 100% when combining the smoke test with unittest figures.
Jon Limjap makes a good point - there is not a single number that is going to make sense as a standard for every project. There are projects that just don't need such a standard. Where the accepted answer falls short, in my opinion, is in describing how one might make that decision for a given project.
I will take a shot at doing so. I am not an expert in test engineering and would be happy to see a more informed answer.
When to set code coverage requirements
First, why would you want to impose such a standard in the first place? In general, when you want to introduce empirical confidence in your process. What do I mean by "empirical confidence"? Well, the real goal correctness. For most software, we can't possibly know this across all inputs, so we settle for saying that code is well-tested. This is more knowable, but is still a subjective standard: It will always be open to debate whether or not you have met it. Those debates are useful and should occur, but they also expose uncertainty.
Code coverage is an objective measurement: Once you see your coverage report, there is no ambiguity about whether standards have been met are useful. Does it prove correctness? Not at all, but it has a clear relationship to how well-tested the code is, which in turn is our best way to increase confidence in its correctness. Code coverage is a measurable approximation of immeasurable qualities we care about.
Some specific cases where having an empirical standard could add value:
To satisfy stakeholders. For many projects, there are various actors who have an interest in software quality who may not be involved in the day-to-day development of the software (managers, technical leads, etc.) Saying "we're going to write all the tests we really need" is not convincing: They either need to trust entirely, or verify with ongoing close oversight (assuming they even have the technical understanding to do so.) Providing measurable standards and explaining how they reasonably approximate actual goals is better.
To normalize team behavior. Stakeholders aside, if you are working on a team where multiple people are writing code and tests, there is room for ambiguity for what qualifies as "well-tested." Do all of your colleagues have the same idea of what level of testing is good enough? Probably not. How do you reconcile this? Find a metric you can all agree on and accept it as a reasonable approximation. This is especially (but not exclusively) useful in large teams, where leads may not have direct oversight over junior developers, for instance. Networks of trust matter as well, but without objective measurements, it is easy for group behavior to become inconsistent, even if everyone is acting in good faith.
To keep yourself honest. Even if you're the only developer and only stakeholder for your project, you might have certain qualities in mind for the software. Instead of making ongoing subjective assessments about how well-tested the software is (which takes work), you can use code coverage as a reasonable approximation, and let machines measure it for you.
Which metrics to use
Code coverage is not a single metric; there are several different ways of measuring coverage. Which one you might set a standard upon depends on what you're using that standard to satisfy.
I'll use two common metrics as examples of when you might use them to set standards:
Statement coverage: What percentage of statements have been executed during testing? Useful to get a sense of the physical coverage of your code: How much of the code that I have written have I actually tested?
This kind of coverage supports a weaker correctness argument, but is also easier to achieve. If you're just using code coverage to ensure that things get tested (and not as an indicator of test quality beyond that) then statement coverage is probably sufficient.
Branch coverage: When there is branching logic (e.g. an if), have both branches been evaluated? This gives a better sense of the logical coverage of your code: How many of the possible paths my code may take have I tested?
This kind of coverage is a much better indicator that a program has been tested across a comprehensive set of inputs. If you're using code coverage as your best empirical approximation for confidence in correctness, you should set standards based on branch coverage or similar.
There are many other metrics (line coverage is similar to statement coverage, but yields different numeric results for multi-line statements, for instance; conditional coverage and path coverage is similar to branch coverage, but reflect a more detailed view of the possible permutations of program execution you might encounter.)
What percentage to require
Finally, back to the original question: If you set code coverage standards, what should that number be?
Hopefully it's clear at this point that we're talking about an approximation to begin with, so any number we pick is going to be inherently approximate.
Some numbers that one might choose:
100%. You might choose this because you want to be sure everything is tested. This doesn't give you any insight into test quality, but does tell you that some test of some quality has touched every statement (or branch, etc.) Again, this comes back to degree of confidence: If your coverage is below 100%, you know some subset of your code is untested.
Some might argue that this is silly, and you should only test the parts of your code that are really important. I would argue that you should also only maintain the parts of your code that are really important. Code coverage can be improved by removing untested code, too.
99% (or 95%, other numbers in the high nineties.) Appropriate in cases where you want to convey a level of confidence similar to 100%, but leave yourself some margin to not worry about the occasional hard-to-test corner of code.
80%. I've seen this number in use a few times, and don't entirely know where it originates. I think it might be a weird misappropriation of the 80-20 rule; generally, the intent here is to show that most of your code is tested. (Yes, 51% would also be "most", but 80% is more reflective of what most people mean by most.) This is appropriate for middle-ground cases where "well-tested" is not a high priority (you don't want to waste effort on low-value tests), but is enough of a priority that you'd still like to have some standard in place.
I haven't seen numbers below 80% in practice, and have a hard time imagining a case where one would set them. The role of these standards is to increase confidence in correctness, and numbers below 80% aren't particularly confidence-inspiring. (Yes, this is subjective, but again, the idea is to make the subjective choice once when you set the standard, and then use an objective measurement going forward.)
Other notes
The above assumes that correctness is the goal. Code coverage is just information; it may be relevant to other goals. For instance, if you're concerned about maintainability, you probably care about loose coupling, which can be demonstrated by testability, which in turn can be measured (in certain fashions) by code coverage. So your code coverage standard provides an empirical basis for approximating the quality of "maintainability" as well.
My favorite code coverage is 100% with an asterisk. The asterisk comes because I prefer to use tools that allow me to mark certain lines as lines that "don't count". If I have covered 100% of the lines which "count", I am done.
The underlying process is:
I write my tests to exercise all the functionality and edge cases I can think of (usually working from the documentation).
I run the code coverage tools
I examine any lines or paths not covered and any that I consider not important or unreachable (due to defensive programming) I mark as not counting
I write new tests to cover the missing lines and improve the documentation if those edge cases are not mentioned.
This way if I and my collaborators add new code or change the tests in the future, there is a bright line to tell us if we missed something important - the coverage dropped below 100%. However, it also provides the flexibility to deal with different testing priorities.
Code coverage is great, but functionality coverage is even better. I don't believe in covering every single line I write. But I do believe in writing 100% test coverage of all the functionality I want to provide (even for the extra cool features I came with myself and which were not discussed during the meetings).
I don't care if I would have code which is not covered in tests, but I would care if I would refactor my code and end up having a different behaviour. Therefore, 100% functionality coverage is my only target.
Early one morning, a programmer asked the great master:
“I am ready to write some unit tests. What code coverage should I aim for?”
The great master replied:
“Don’t worry about coverage, just write some good tests.”
The programmer smiled, bowed, and left.
...
Later that day, a second programmer asked the same question.
The great master pointed at a pot of boiling water and said:
“How many grains of rice should I put in that pot?”
The programmer, looking puzzled, replied:
“How can I possibly tell you? It depends on how many people you need to feed, how hungry they are, what other food you are serving, how much rice you have available, and so on.”
“Exactly,” said the great master.
The second programmer smiled, bowed, and left.
...
Toward the end of the day, a third programmer came and asked the same question about code coverage.
“Eighty percent and no less!” Replied the master in a stern voice, pounding his fist on the table.
The third programmer smiled, bowed, and left.
...
After this last reply, a young apprentice approached the great master:
“Great master, today I overheard you answer the same question about code coverage with three different answers. Why?”
The great master stood up from his chair:
“Come get some fresh tea with me and let’s talk about it.”
After they filled their cups with smoking hot green tea, the great master began to answer:
“The first programmer is new and just getting started with testing. Right now he has a lot of code and no tests. He has a long way to go; focusing on code coverage at this time would be depressing and quite useless. He’s better off just getting used to writing and running some tests. He can worry about coverage later.”
“The second programmer, on the other hand, is quite experience both at programming and testing. When I replied by asking her how many grains of rice I should put in a pot, I helped her realize that the amount of testing necessary depends on a number of factors, and she knows those factors better than I do – it’s her code after all. There is no single, simple, answer, and she’s smart enough to handle the truth and work with that.”
“I see,” said the young apprentice, “but if there is no single simple answer, then why did you answer the third programmer ‘Eighty percent and no less’?”
The great master laughed so hard and loud that his belly, evidence that he drank more than just green tea, flopped up and down.
“The third programmer wants only simple answers – even when there are no simple answers … and then does not follow them anyway.”
The young apprentice and the grizzled great master finished drinking their tea in contemplative silence.
Many shops don't value tests, so if you are above zero at least there is some appreciation of worth - so arguably non-zero isn't bad as many are still zero.
In the .Net world people often quote 80% as reasonble. But they say this at solution level. I prefer to measure at project level: 30% might be fine for UI project if you've got Selenium, etc or manual tests, 20% for the data layer project might be fine, but 95%+ might be quite achievable for the business rules layer, if not wholly necessary. So the overall coverage may be, say, 60%, but the critical business logic may be much higher.
I've also heard this: aspire to 100% and you'll hit 80%; but aspire to 80% and you'll hit 40%.
Bottom line: Apply the 80:20 rule, and let your app's bug count guide you.
For a well designed system, where unit tests have driven the development from the start i would say 85% is a quite low number. Small classes designed to be testable should not be hard to cover better than that.
It's easy to dismiss this question with something like:
Covered lines do not equal tested logic and one should not read too much into the percentage.
True, but there are some important points to be made about code coverage. In my experience this metric is actually quite useful, when used correctly. Having said that, I have not seen all systems and i'm sure there are tons of them where it's hard to see code coverage analysis adding any real value. Code can look so different and the scope of the available test framework can vary.
Also, my reasoning mainly concerns quite short test feedback loops. For the product that I'm developing the shortest feedback loop is quite flexible, covering everything from class tests to inter process signalling. Testing a deliverable sub-product typically takes 5 minutes and for such a short feedback loop it is indeed possible to use the test results (and specifically the code coverage metric that we are looking at here) to reject or accept commits in the repository.
When using the code coverage metric you should not just have a fixed (arbitrary) percentage which must be fulfilled. Doing this does not give you the real benefits of code coverage analysis in my opinion. Instead, define the following metrics:
Low Water Mark (LWM), the lowest number of uncovered lines ever seen in the system under test
High Water Mark (HWM), the highest code coverage percentage ever seen for the system under test
New code can only be added if we don't go above the LWM and we don't go below the HWM. In other words, code coverage is not allowed to decrease, and new code should be covered. Notice how i say should and not must (explained below).
But doesn't this mean that it will be impossible to clean away old well-tested rubbish that you have no use for anymore? Yes, and that's why you have to be pragmatic about these things. There are situations when the rules have to be broken, but for your typical day-to-day integration my experience it that these metrics are quite useful. They give the following two implications.
Testable code is promoted. When adding new code you really have to make an effort to make the code testable, because you will have to try and cover all of it with your test cases. Testable code is usually a good thing.
Test coverage for legacy code is increasing over time. When adding new code and not being able to cover it with a test case, one can try to cover some legacy code instead to get around the LWM rule. This sometimes necessary cheating at least gives the positive side effect that the coverage of legacy code will increase over time, making the seemingly strict enforcement of these rules quite pragmatic in practice.
And again, if the feedback loop is too long it might be completely unpractical to setup something like this in the integration process.
I would also like to mention two more general benefits of the code coverage metric.
Code coverage analysis is part of the dynamic code analysis (as opposed to the static one, i.e. Lint). Problems found during the dynamic code analysis (by tools such as the purify family, http://www-03.ibm.com/software/products/en/rational-purify-family) are things like uninitialized memory reads (UMR), memory leaks, etc. These problems can only be found if the code is covered by an executed test case. The code that is the hardest to cover in a test case is usually the abnormal cases in the system, but if you want the system to fail gracefully (i.e. error trace instead of crash) you might want to put some effort into covering the abnormal cases in the dynamic code analysis as well. With just a little bit of bad luck, a UMR can lead to a segfault or worse.
People take pride in keeping 100% for new code, and people discuss testing problems with a similar passion as other implementation problems. How can this function be written in a more testable manner? How would you go about trying to cover this abnormal case, etc.
And a negative, for completeness.
In a large project with many involved developers, everyone is not going to be a test-genius for sure. Some people tend to use the code coverage metric as proof that the code is tested and this is very far from the truth, as mentioned in many of the other answers to this question. It is ONE metric that can give you some nice benefits if used properly, but if it is misused it can in fact lead to bad testing. Aside from the very valuable side effects mentioned above a covered line only shows that the system under test can reach that line for some input data and that it can execute without hanging or crashing.
Code Coverage is a misleading metric if 100% coverage is your goal (instead of 100% testing of all features).
You could get a 100% by hitting all the lines once. However you could still miss out testing a particular sequence (logical path) in which those lines are hit.
You could not get a 100% but still have tested all your 80%/freq used code-paths. Having tests that test every 'throw ExceptionTypeX' or similar defensive programming guard you've put in is a 'nice to have' not a 'must have'
So trust yourself or your developers to be thorough and cover every path through their code. Be pragmatic and don't chase the magical 100% coverage. If you TDD your code you should get a 90%+ coverage as a bonus. Use code-coverage to highlight chunks of code you have missed (shouldn't happen if you TDD though.. since you write code only to make a test pass. No code can exist without its partner test. )
如果这是一个完美的世界,单元测试将覆盖 100% 的代码。 然而,由于这不是一个完美的世界,所以这取决于你有时间做什么。 因此,我建议减少对特定百分比的关注,而更多地关注关键领域。 如果您的代码写得很好(或者至少是其合理的复制品),那么应该有几个关键点可以将 API 暴露给其他代码。
将您的测试工作集中在这些 API 上。 确保 API 1) 有详细记录,2) 编写的测试用例与文档相匹配。 如果预期结果与文档不匹配,则说明您的代码、文档或测试用例中存在错误。 所有这些都值得审查。
祝你好运!
If this were a perfect world, 100% of code would be covered by unit tests. However, since this is NOT a perfect world, it's a matter of what you have time for. As a result, I recommend focusing less on a specific percentage, and focusing more on the critical areas. If your code is well-written (or at least a reasonable facsimile thereof) there should be several key points where APIs are exposed to other code.
Focus your testing efforts on these APIs. Make sure that the APIs are 1) well documented and 2) have test cases written that match the documentation. If the expected results don't match up with the docs, then you have a bug in either your code, documentation, or test cases. All of which are good to vet out.
发布评论
评论(30)
查看 Crap4j。 这是一种比直接代码覆盖稍微复杂的方法。 它将代码覆盖率测量与复杂性测量相结合,然后向您显示当前未测试的复杂代码。
Check out Crap4j. It's a slightly more sophisticated approach than straight code coverage. It combines code coverage measurements with complexity measurements, and then shows you what complex code isn't currently tested.
一般来说,从我读过的几篇工程卓越最佳实践论文来看,单元测试中新代码的 80% 是产生最佳回报的点。 如果高于该 CC%,则相对于付出的努力而言,会产生更少的缺陷。 这是许多大公司都采用的最佳实践。
不幸的是,这些结果大部分是公司内部的,所以我没有可以向您指出的公开文献。
Generally speaking, from the several engineering excellence best practices papers that I have read, 80% for new code in unit tests is the point that yields the best return. Going above that CC% yields a lower amount of defects for the amount of effort exerted. This is a best practice that is used by many major corporations.
Unfortunately, most of these results are internal to companies, so there are no public literatures that I can point you to.
我认为不可能有这样的黑白规则。
应审查代码,特别注意关键细节。
但是,如果没有经过测试,它就有一个错误!
I don't think there can be such a B/W rule.
Code should be reviewed, with particular attention to the critical details.
However, if it hasn't been tested, it has a bug!
这很大程度上取决于您的应用程序。 例如,某些应用程序主要由无法进行单元测试的 GUI 代码组成。
It depends greatly on your application. For example, some applications consist mostly of GUI code that cannot be unit tested.
在我看来,答案是“这取决于你有多少时间”。 我努力达到100%,但如果我在有限的时间内没有达到目标,我也不会大惊小怪。
当我编写单元测试时,我戴的帽子与开发生产代码时戴的帽子不同。 我会思考经过测试的代码声称要做什么,以及什么情况可能会破坏它。
我通常遵循以下标准或规则:
单元测试应该是关于我的代码的预期行为的文档形式,即。 给定特定输入的预期输出以及客户端可能想要捕获的可能抛出的异常(我的代码的用户应该知道什么?)
单元测试应该帮助我发现我可能还不知道的假设条件已经想到了。 (如何使我的代码稳定且健壮?)
如果这两条规则不能产生 100% 的覆盖率,那就这样吧。 但是,一旦有时间,我就会分析未覆盖的块和行,并确定是否仍然存在没有单元测试的测试用例,或者是否需要重构代码以消除不必要的代码。
In my opinion, the answer is "It depends on how much time you have". I try to achieve 100% but I don't make a fuss if I don't get it with the time I have.
When I write unit tests, I wear a different hat compared to the hat I wear when developing production code. I think about what the tested code claims to do and what are the situations that can possible break it.
I usually follow the following criteria or rules:
That the Unit Test should be a form of documentation on what's the expected behavior of my codes, ie. the expected output given a certain input and the exceptions it may throw that clients may want to catch (What the users of my code should know?)
That the Unit Test should help me discover the what if conditions that I may not yet have thought of. (How to make my code stable and robust?)
If these two rules doesn't produce 100% coverage then so be it. But once, I have the time, I analyze the uncovered blocks and lines and determine if there are still test cases without unit tests or if the code needs to be refactored to eliminate the unecessary codes.
从另一个角度看覆盖率:编写良好、控制流程清晰的代码是最容易覆盖、最容易阅读的,而且通常是错误最少的代码。 恕我直言,通过在编写代码时考虑到清晰性和可覆盖性,并通过与代码并行编写单元测试,您可以获得最佳结果。
Viewing coverage from another perspective: Well-written code with a clear flow of control is the easiest to cover, the easiest to read, and usually the least buggy code. By writing code with clearness and coverability in mind, and by writing the unit tests in parallel with the code, you get the best results IMHO.
代码覆盖率很好,但前提是您从中获得的好处超过了实现它的成本/工作量。
一段时间以来,我们一直致力于 80% 的标准,但我们刚刚决定放弃这一标准,转而更加专注于我们的测试。 专注于复杂的业务逻辑等,
做出这个决定是因为我们花在追求代码覆盖率和维护现有单元测试上的时间越来越多。 我们觉得我们已经到了这样的地步:我们从代码覆盖率中获得的好处被认为小于我们为实现它而必须付出的努力。
Code coverage is great but only as long as the benefits that you get from it outweigh the cost/effort of achieving it.
We have been working to a standard of 80% for some time, however we have just made the decison to abandon this and instead be more focused on our testing. Concentrating on the complex business logic etc,
This decision was taken due to the increasing amount of time we spent chasing code coverage and maintaining existing unit tests. We felt we had got to the point where the benefit we were getting from our code coverage was deemed to be less than the effort that we had to put in to achieve it.
从 Testivus 发布 来看,我认为答案上下文应该是第二个程序员。
从实际角度来看,我们需要努力实现的参数/目标。
我认为这可以在敏捷过程中通过分析我们拥有的架构、功能(用户故事)的代码来“测试”,然后得出一个数字。 根据我在电信领域的经验,我认为 60% 是一个值得检查的值。
From the Testivus posting I think the answer context should be the second programmer.
Having said this from a practical point of view we need parameter / goals to strive for.
I consider that this can be "tested" in an Agile process by analyzing the code we have the architecture, functionality (user stories), and then come up with a number. Based on my experience in the Telecom area I would say that 60% is a good value to check.
直到几天前,我们的目标是 >80%,但是在我们使用了大量生成的代码之后,我们不关心 %age,而是让审阅者对所需的覆盖率进行调用。
We were targeting >80% till few days back, But after we used a lot of Generated code, We do not care for %age, but rather make reviewer take a call on the coverage required.
我认为正确代码覆盖率的最佳症状是单元测试帮助修复的具体问题的数量与您创建的单元测试代码的大小合理对应。
I think the best symptom of correct code coverage is that amount of concrete problems unit tests help to fix is reasonably corresponds to size of unit tests code you created.
这必须取决于您所处的应用程序开发生命周期的哪个阶段。
如果您已经从事开发一段时间并且已经有很多已实现的代码并且现在刚刚意识到您需要考虑代码覆盖率,那么您必须检查当前的覆盖范围(如果存在),然后使用该基线来设置每个冲刺的里程碑(或一段冲刺期间的平均增长),这意味着在继续提供最终用户价值的同时承担代码债务(至少在根据我的经验,如果最终用户看不到新功能,则他们不会关心您是否增加了测试覆盖率)。
根据您的领域,达到 95% 并不是没有道理的,但我不得不说,平均而言,您将看到 85% 到 90% 的平均情况。
This has to be dependent on what phase of your application development lifecycle you are in.
If you've been at development for a while and have a lot of implemented code already and are just now realizing that you need to think about code coverage then you have to check your current coverage (if it exists) and then use that baseline to set milestones each sprint (or an average rise over a period of sprints), which means taking on code debt while continuing to deliver end user value (at least in my experience the end user doesn't care one bit if you've increased test coverage if they don't see new features).
Depending on your domain it's not unreasonable to shoot for 95%, but I'd have to say on average your going to be looking at an average case of 85% to 90%.
根据代码的重要性,75%-85% 之间的任何地方都是一个很好的经验法则。
运输代码绝对应该比内部公用设施等进行更彻底的测试。
Depending on the criticality of the code, anywhere from 75%-85% is a good rule of thumb.
Shipping code should definitely be tested more thoroughly than in house utilities, etc.
85% 是签入标准的良好起点。
我可能会选择各种更高的运输标准 - 取决于正在测试的子系统/组件的重要性。
85% would be a good starting place for checkin criteria.
I'd probably chose a variety of higher bars for shipping criteria - depending on the criticality of the subsystems/components being tested.
我更喜欢使用 BDD,它结合使用自动化验收测试、可能的其他集成测试和单元测试。 对我来说,问题是整个自动化测试套件的目标覆盖率应该是多少。
除此之外,答案取决于您的方法、语言以及测试和覆盖工具。 在 Ruby 或 Python 中进行 TDD 时,保持 100% 的覆盖率并不难,而且非常值得这样做。 管理 100% 的覆盖率比管理 90% 左右的覆盖率要容易得多。也就是说,在出现覆盖率差距时更容易对其进行填补(并且在做好 TDD 时,覆盖率差距很少见,通常值得您花时间)而不是管理一系列您还没有抽出时间处理的覆盖率差距,并且由于您始终存在未覆盖的代码背景而错过覆盖率回归。
答案还取决于您的项目的历史。 我只发现上述内容对于从一开始就以这种方式管理的项目来说是实用的。 我极大地提高了大型遗留项目的覆盖范围,并且这样做是值得的,但我从未发现返回并填补每个覆盖范围空白是可行的,因为旧的未经测试的代码还没有被充分理解,无法正确执行此操作,并且迅速地。
I prefer to do BDD, which uses a combination of automated acceptance tests, possibly other integration tests, and unit tests. The question for me is what the target coverage of the automated test suite as a whole should be.
That aside, the answer depends on your methodology, language and testing and coverage tools. When doing TDD in Ruby or Python it's not hard to maintain 100% coverage, and it's well worth doing so. It's much easier to manage 100% coverage than 90-something percent coverage. That is, it's much easier to fill coverage gaps as they appear (and when doing TDD well coverage gaps are rare and usually worth your time) than it is to manage a list of coverage gaps that you haven't gotten around to and miss coverage regressions due to your constant background of uncovered code.
The answer also depends on the history of your project. I've only found the above to be practical in projects managed that way from the start. I've greatly improved the coverage of large legacy projects, and it's been worth doing so, but I've never found it practical to go back and fill every coverage gap, because old untested code is not well understood enough to do so correctly and quickly.
代码覆盖率只是另一个指标。 就其本身而言,它可能非常具有误导性(请参阅 www.thoughtworks .com/insights/blog/are-test-coverage-metrics-overerated)。 因此,您的目标不应是实现 100% 的代码覆盖率,而应确保测试应用程序的所有相关场景。
Code coverage is just another metric. In and of itself, it can be very misleading (see www.thoughtworks.com/insights/blog/are-test-coverage-metrics-overrated). Your goal should therefore not be to achieve 100% code coverage but rather to ensure that you test all relevant scenarios of your application.
我认为最重要的是了解随着时间的推移覆盖率趋势是什么,并了解趋势变化的原因。 你认为趋势的变化是好还是坏取决于你对原因的分析。
I think that what may matter most is knowing what the coverage trend is over time and understanding the reasons for changes in the trend. Whether you view the changes in the trend as good or bad will depend upon your analysis of the reason.
如果您已经进行单元测试相当长的时间,我认为没有理由不接近 95%+。 然而,至少,我总是使用 80% 的工作,即使是刚开始测试时也是如此。
这个数字应该只包括项目中编写的代码(不包括框架、插件等),甚至可能排除完全由调用外部代码编写的代码组成的某些类。 这种调用应该被模拟/存根。
If you've been doing unit testing for a decent amount of time, I see no reason for it not to be approaching 95%+. However, at a minimum, I've always worked with 80%, even when new to testing.
This number should only include code written in the project (excludes frameworks, plugins, etc.) and maybe even exclude certain classes composed entirely of code written of calls to outside code. This sort of call should be mocked/stubbed.
简短回答:60-80%
详细回答:
我认为这完全取决于您项目的性质。 我通常通过对每个实际部分进行单元测试来开始一个项目。 在项目的第一个“版本”中,根据您正在执行的编程类型,您应该拥有相当好的基础百分比。 此时,您可以开始“强制执行”最低代码覆盖率。
Short answer: 60-80%
Long answer:
I think it totally depends on the nature of your project. I typically start a project by unit testing every practical piece. By the first "release" of the project you should have a pretty good base percentage based on the type of programming you are doing. At that point you can start "enforcing" a minimum code coverage.
当我认为我的代码没有经过足够的单元测试,并且我不确定接下来要测试什么时,我会使用覆盖率来帮助我决定接下来要测试什么。
如果我增加单元测试的覆盖范围 - 我知道这个单元测试有价值。
这适用于未覆盖、50% 覆盖或 97% 覆盖的代码。
When I think my code isn't unit tested enough, and I'm not sure what to test next, I use coverage to help me decide what to test next.
If I increase coverage in a unit test - I know this unit test worth something.
This goes for code that is not covered, 50% covered or 97% covered.
我使用 cobertura,无论百分比如何,我都建议保持 cobertura-check 任务中的值是最新的。 至少,不断将totallinerate 和totalbranchrate 提高到略低于当前覆盖范围,但绝不降低这些值。 还将 Ant 构建失败属性与此任务联系起来。 如果构建由于缺乏覆盖而失败,则您知道某人添加了代码但尚未对其进行测试。 例子:
I use cobertura, and whatever the percentage, I would recommend keeping the values in the cobertura-check task up-to-date. At the minimum, keep raising totallinerate and totalbranchrate to just below your current coverage, but never lower those values. Also tie in the Ant build failure property to this task. If the build fails because of lack of coverage, you know someone's added code but hasn't tested it. Example:
我对这个难题的回答是,对可以测试的代码实现 100% 的行覆盖率,对无法测试的代码实现 0% 的行覆盖率。
我目前在 Python 中的做法是将我的 .py 模块分为两个文件夹:app1/ 和 app2/,并且在运行单元测试时计算这两个文件夹的覆盖范围并进行目视检查(有一天我必须自动执行此操作) app1 的覆盖率是 100%,app2 的覆盖率是 0%。
当/如果我发现这些数字与标准不同时,我会调查并更改代码的设计,以使覆盖范围符合标准。
这确实意味着我可以建议实现库代码的 100% 行覆盖率。
我偶尔也会查看 app2/ 以查看是否可以测试那里的任何代码,如果可以的话,我会将其移至 app1/
现在我不太担心总体覆盖范围,因为这可能会根据项目的大小而有很大差异,但一般我见过70%到90%以上。
使用 python,我应该能够设计一个冒烟测试,它可以在测量覆盖范围的同时自动运行我的应用程序,并希望在将冒烟测试与单元测试数据相结合时获得 100% 的聚合。
My answer to this conundrum is to have 100% line coverage of the code you can test and 0% line coverage of the code you can't test.
My current practice in Python is to divide my .py modules into two folders: app1/ and app2/ and when running unit tests calculate the coverage of those two folders and visually check (I must automate this someday) that app1 has 100% coverage and app2 has 0% coverage.
When/if I find that these numbers differ from standard I investigage and alter the design of the code so that coverage conforms to the standard.
This does mean that I can recommend achieving 100% line coverage of library code.
I also occasionally review app2/ to see if I could possible test any code there, and If I can I move it into app1/
Now I'm not too worried about the aggregate coverage because that can vary wildly depending on the size of the project, but generally I've seen 70% to over 90%.
With python, I should be able to devise a smoke test which could automatically run my app while measuring coverage and hopefully gain an aggreagate of 100% when combining the smoke test with unittest figures.
Jon Limjap 提出了一个很好的观点——没有一个数字可以作为每个项目的标准。 有些项目不需要这样的标准。 在我看来,公认的答案的不足之处在于描述人们如何为给定的项目做出决定。
我会尝试这样做。 我不是测试工程方面的专家,很高兴看到更明智的答案。
何时设置代码覆盖率要求
首先,为什么要首先强加这样的标准? 一般来说,当您想在流程中引入经验信心时。 “经验信心”是什么意思? 嗯,真正的目标正确性。 对于大多数软件,我们不可能在所有输入中都知道这一点,因此我们只能说代码是经过充分测试的。 这是更容易理解的,但仍然是一个主观标准:无论你是否达到它,都将始终存在争议。 这些辩论是有用的并且应该进行,但它们也暴露了不确定性。
代码覆盖率是一种客观的衡量标准:一旦您看到覆盖率报告,就可以清楚地知道满足的标准是否有用。 能证明其正确性吗? 完全不是,但它与代码的测试程度有明显的关系,这反过来又是我们增强对其正确性信心的最佳方法。 代码覆盖率是我们所关心的不可衡量的质量的可衡量的近似值。
在一些特定情况下,拥有经验标准可以增加价值:
使用哪些指标
代码覆盖率不是单一指标; 有几种不同的测量覆盖率的方法。 您可以设定哪一个标准取决于您使用该标准来满足什么要求。
我将使用两个常见指标作为示例,说明您何时可以使用它们来设置标准:
if
)时,两个分支都被评估了吗? 这可以更好地了解代码的逻辑覆盖:我测试过我的代码可能采用的可能路径有多少条?还有许多其他指标(例如,行覆盖率与语句覆盖率类似,但对于多行语句会产生不同的数值结果;条件覆盖率和路径覆盖率与分支覆盖率类似,但反映了对可能的排列的更详细视图)程序执行时你可能会遇到。)
需要多少百分比
最后,回到最初的问题:如果你设定代码覆盖率标准,这个数字应该是多少?
希望此时我们已经清楚地知道我们首先讨论的是近似值,因此我们选择的任何数字本质上都是近似值。
人们可能会选择一些数字:
我在实践中还没有见过低于 80% 的数字,并且很难想象有人会设置它们。 这些标准的作用是增强人们对正确性的信心,低于 80% 的数字并不能特别鼓舞人心。 (是的,这是主观的,但同样,这个想法是在设定标准时做出主观选择,然后使用客观的衡量标准。)
其他注释
上面假设正确性是目标。 代码覆盖率只是信息; 它可能与其他目标相关。 例如,如果您关心可维护性,您可能会关心松散耦合,这可以通过可测试性来证明,而可测试性又可以通过代码覆盖率来衡量(以某些方式)。 因此,您的代码覆盖率标准也为近似“可维护性”的质量提供了经验基础。
Jon Limjap makes a good point - there is not a single number that is going to make sense as a standard for every project. There are projects that just don't need such a standard. Where the accepted answer falls short, in my opinion, is in describing how one might make that decision for a given project.
I will take a shot at doing so. I am not an expert in test engineering and would be happy to see a more informed answer.
When to set code coverage requirements
First, why would you want to impose such a standard in the first place? In general, when you want to introduce empirical confidence in your process. What do I mean by "empirical confidence"? Well, the real goal correctness. For most software, we can't possibly know this across all inputs, so we settle for saying that code is well-tested. This is more knowable, but is still a subjective standard: It will always be open to debate whether or not you have met it. Those debates are useful and should occur, but they also expose uncertainty.
Code coverage is an objective measurement: Once you see your coverage report, there is no ambiguity about whether standards have been met are useful. Does it prove correctness? Not at all, but it has a clear relationship to how well-tested the code is, which in turn is our best way to increase confidence in its correctness. Code coverage is a measurable approximation of immeasurable qualities we care about.
Some specific cases where having an empirical standard could add value:
Which metrics to use
Code coverage is not a single metric; there are several different ways of measuring coverage. Which one you might set a standard upon depends on what you're using that standard to satisfy.
I'll use two common metrics as examples of when you might use them to set standards:
if
), have both branches been evaluated? This gives a better sense of the logical coverage of your code: How many of the possible paths my code may take have I tested?There are many other metrics (line coverage is similar to statement coverage, but yields different numeric results for multi-line statements, for instance; conditional coverage and path coverage is similar to branch coverage, but reflect a more detailed view of the possible permutations of program execution you might encounter.)
What percentage to require
Finally, back to the original question: If you set code coverage standards, what should that number be?
Hopefully it's clear at this point that we're talking about an approximation to begin with, so any number we pick is going to be inherently approximate.
Some numbers that one might choose:
I haven't seen numbers below 80% in practice, and have a hard time imagining a case where one would set them. The role of these standards is to increase confidence in correctness, and numbers below 80% aren't particularly confidence-inspiring. (Yes, this is subjective, but again, the idea is to make the subjective choice once when you set the standard, and then use an objective measurement going forward.)
Other notes
The above assumes that correctness is the goal. Code coverage is just information; it may be relevant to other goals. For instance, if you're concerned about maintainability, you probably care about loose coupling, which can be demonstrated by testability, which in turn can be measured (in certain fashions) by code coverage. So your code coverage standard provides an empirical basis for approximating the quality of "maintainability" as well.
我最喜欢的代码覆盖率是 100%,带星号。 出现星号是因为我更喜欢使用允许我将某些行标记为“不计数”行的工具。 如果我已经覆盖了 100% 的“有效”行,那么我就完成了。
基本流程是:
这样,如果我和我的合作者将来添加新代码或更改测试,就会有一条明线告诉我们是否错过了一些重要的内容 - 覆盖率降至 100% 以下。 然而,它还提供了处理不同测试优先级的灵活性。
My favorite code coverage is 100% with an asterisk. The asterisk comes because I prefer to use tools that allow me to mark certain lines as lines that "don't count". If I have covered 100% of the lines which "count", I am done.
The underlying process is:
This way if I and my collaborators add new code or change the tests in the future, there is a bright line to tell us if we missed something important - the coverage dropped below 100%. However, it also provides the flexibility to deal with different testing priorities.
代码覆盖率很高,但功能覆盖率甚至更好。 我不相信要涵盖我写的每一行。 但我确实相信对我想要提供的所有功能编写 100% 的测试覆盖率(即使是我自己带来的、在会议期间没有讨论的额外很酷的功能)。
我不在乎我是否会拥有测试中未涵盖的代码,但我会关心我是否会重构我的代码并最终产生不同的行为。 因此,100%的功能覆盖率是我唯一的目标。
Code coverage is great, but functionality coverage is even better. I don't believe in covering every single line I write. But I do believe in writing 100% test coverage of all the functionality I want to provide (even for the extra cool features I came with myself and which were not discussed during the meetings).
I don't care if I would have code which is not covered in tests, but I would care if I would refactor my code and end up having a different behaviour. Therefore, 100% functionality coverage is my only target.
阿尔贝托·萨沃亚 (Alberto Savoia) 的这篇散文恰恰回答了这个问题(以一种非常有趣的方式!):
http://www.artima.com/forums/flat.jsp?forum=106&thread=204677
This prose by Alberto Savoia answers precisely that question (in a nicely entertaining manner at that!):
http://www.artima.com/forums/flat.jsp?forum=106&thread=204677
许多商店不重视测试,所以如果你的值高于零,至少会有一些价值升值 - 所以可以说非零也不错,因为许多仍然为零。
在 .Net 世界中,人们经常引用 80% 的说法是合理的。 但他们是在解决方案层面这么说的。 我更喜欢在项目级别进行衡量:如果您有 Selenium 等或手动测试,对于 UI 项目来说 30% 可能就可以了,对于数据层项目来说 20% 可能就可以了,但是对于业务来说 95%+ 可能是可以实现的规则层,如果不是完全必要的话。 因此,总体覆盖率可能是 60%,但关键业务逻辑可能要高得多。
我还听说过这样一句话:立志达到100%,你就会达到80%;立志达到100%,你就会达到80%; 但立志达到 80%,你就会达到 40%。
底线:应用 80:20 规则,让应用程序的错误计数来指导您。
Many shops don't value tests, so if you are above zero at least there is some appreciation of worth - so arguably non-zero isn't bad as many are still zero.
In the .Net world people often quote 80% as reasonble. But they say this at solution level. I prefer to measure at project level: 30% might be fine for UI project if you've got Selenium, etc or manual tests, 20% for the data layer project might be fine, but 95%+ might be quite achievable for the business rules layer, if not wholly necessary. So the overall coverage may be, say, 60%, but the critical business logic may be much higher.
I've also heard this: aspire to 100% and you'll hit 80%; but aspire to 80% and you'll hit 40%.
Bottom line: Apply the 80:20 rule, and let your app's bug count guide you.
对于一个设计良好的系统,单元测试从一开始就推动了开发,我想说 85% 是一个相当低的数字。 设计为可测试的小班应该不难覆盖比这更好的内容。
很容易用这样的东西来驳回这个问题:
确实如此,但是关于代码覆盖率有一些重要的要点需要注意。 根据我的经验,如果使用得当,这个指标实际上非常有用。 话虽如此,我还没有见过所有的系统,而且我确信有大量的系统很难看到代码覆盖率分析增加任何真正的价值。 代码看起来可能如此不同,可用测试框架的范围也可能有所不同。
另外,我的推理主要涉及相当短的测试反馈循环。 对于我正在开发的产品,最短的反馈循环非常灵活,涵盖从类测试到进程间信号传输的所有内容。 测试可交付子产品通常需要 5 分钟,对于如此短的反馈循环,确实可以使用测试结果(特别是我们在这里查看的代码覆盖率指标)来拒绝或接受存储库中的提交。
使用代码覆盖率指标时,您不应该只拥有必须满足的固定(任意)百分比。在我看来,这样做并不能给您带来代码覆盖率分析的真正好处。 相反,定义以下指标:
新代码只能如果我们不高于 LWM 并且我们不低于 HWM,则添加。 换句话说,代码覆盖率是不允许降低的,新的代码应该被覆盖。 请注意我如何说“应该”和“不是必须”(如下所述)。
但这是否意味着您将无法清除那些经过充分测试、不再使用的旧垃圾? 是的,这就是为什么你必须对这些事情采取务实的态度。 在某些情况下,必须打破规则,但对于典型的日常集成,我的经验是这些指标非常有用。 他们给出了以下两个含义。
可测试的代码得到提升。
添加新代码时,您确实必须努力使代码可测试,因为您必须尝试用测试用例覆盖所有代码。 可测试的代码通常是一件好事。
遗留代码的测试覆盖率随着时间的推移而不断增加。
当添加新代码并且无法用测试用例覆盖它时,可以尝试覆盖一些遗留代码来绕过 LWM 规则。 这种有时必要的作弊行为至少会产生积极的副作用,即遗留代码的覆盖范围将随着时间的推移而增加,使得这些规则看似严格的执行在实践中相当务实。
同样,如果反馈循环太长,在集成过程中设置类似的东西可能是完全不切实际的。
我还想提一下代码覆盖率指标的两个更普遍的好处。
代码覆盖率分析是动态代码分析的一部分(与静态代码分析相对,即 Lint)。 动态代码分析过程中发现的问题(通过 purify 系列等工具,http://www-03.ibm.com/software/products/en/rational-purify-family)是未初始化内存读取(UMR)、内存泄漏等问题。这些问题可能会仅当代码被执行的测试用例覆盖时才能找到。 测试用例中最难覆盖的代码通常是系统中的异常情况,但是如果您希望系统优雅地失败(即错误跟踪而不是崩溃),您可能需要花一些精力来覆盖异常情况在动态代码分析中也是如此。 只要运气不好,UMR 就可能导致段错误或更严重的情况。
人们为保持 100% 新代码而感到自豪,并且人们以与其他实现问题类似的热情讨论测试问题。 如何以更可测试的方式编写这个函数? 您将如何尝试覆盖这种异常情况等。
以及负面的情况。
For a well designed system, where unit tests have driven the development from the start i would say 85% is a quite low number. Small classes designed to be testable should not be hard to cover better than that.
It's easy to dismiss this question with something like:
True, but there are some important points to be made about code coverage. In my experience this metric is actually quite useful, when used correctly. Having said that, I have not seen all systems and i'm sure there are tons of them where it's hard to see code coverage analysis adding any real value. Code can look so different and the scope of the available test framework can vary.
Also, my reasoning mainly concerns quite short test feedback loops. For the product that I'm developing the shortest feedback loop is quite flexible, covering everything from class tests to inter process signalling. Testing a deliverable sub-product typically takes 5 minutes and for such a short feedback loop it is indeed possible to use the test results (and specifically the code coverage metric that we are looking at here) to reject or accept commits in the repository.
When using the code coverage metric you should not just have a fixed (arbitrary) percentage which must be fulfilled. Doing this does not give you the real benefits of code coverage analysis in my opinion. Instead, define the following metrics:
New code can only be added if we don't go above the LWM and we don't go below the HWM. In other words, code coverage is not allowed to decrease, and new code should be covered. Notice how i say should and not must (explained below).
But doesn't this mean that it will be impossible to clean away old well-tested rubbish that you have no use for anymore? Yes, and that's why you have to be pragmatic about these things. There are situations when the rules have to be broken, but for your typical day-to-day integration my experience it that these metrics are quite useful. They give the following two implications.
Testable code is promoted.
When adding new code you really have to make an effort to make the code testable, because you will have to try and cover all of it with your test cases. Testable code is usually a good thing.
Test coverage for legacy code is increasing over time.
When adding new code and not being able to cover it with a test case, one can try to cover some legacy code instead to get around the LWM rule. This sometimes necessary cheating at least gives the positive side effect that the coverage of legacy code will increase over time, making the seemingly strict enforcement of these rules quite pragmatic in practice.
And again, if the feedback loop is too long it might be completely unpractical to setup something like this in the integration process.
I would also like to mention two more general benefits of the code coverage metric.
Code coverage analysis is part of the dynamic code analysis (as opposed to the static one, i.e. Lint). Problems found during the dynamic code analysis (by tools such as the purify family, http://www-03.ibm.com/software/products/en/rational-purify-family) are things like uninitialized memory reads (UMR), memory leaks, etc. These problems can only be found if the code is covered by an executed test case. The code that is the hardest to cover in a test case is usually the abnormal cases in the system, but if you want the system to fail gracefully (i.e. error trace instead of crash) you might want to put some effort into covering the abnormal cases in the dynamic code analysis as well. With just a little bit of bad luck, a UMR can lead to a segfault or worse.
People take pride in keeping 100% for new code, and people discuss testing problems with a similar passion as other implementation problems. How can this function be written in a more testable manner? How would you go about trying to cover this abnormal case, etc.
And a negative, for completeness.
我想分享另一个关于测试覆盖率的轶事。
我们有一个巨大的项目,其中,通过 twitter,我注意到,有 700 个单元测试,我们只有 20% 的代码覆盖范围。
Scott Hanselman 回复了 智慧之言:
再次,它回到我的 Testivus关于代码覆盖率 答案。 锅里应该放多少米? 这取决于。
I'd have another anectode on test coverage I'd like to share.
We have a huge project wherein, over twitter, I noted that, with 700 unit tests, we only have 20% code coverage.
Scott Hanselman replied with words of wisdom:
Again, it goes back to my Testivus on Code Coverage Answer. How much rice should you put in the pot? It depends.
如果 100% 覆盖率是您的目标(而不是 100% 测试所有功能),那么代码覆盖率是一个误导性的指标。
因此请相信您自己或您的开发人员会彻底并覆盖其代码中的每条路径。 务实一点,不要追求神奇的 100% 覆盖率。 如果您对代码进行 TDD,那么您应该获得 90% 以上的覆盖率作为奖励。 使用代码覆盖来突出显示您错过的代码块(如果您是 TDD,则不应发生这种情况。因为您编写代码只是为了测试通过。如果没有其合作伙伴测试,任何代码都不可能存在。)
Code Coverage is a misleading metric if 100% coverage is your goal (instead of 100% testing of all features).
So trust yourself or your developers to be thorough and cover every path through their code. Be pragmatic and don't chase the magical 100% coverage. If you TDD your code you should get a 90%+ coverage as a bonus. Use code-coverage to highlight chunks of code you have missed (shouldn't happen if you TDD though.. since you write code only to make a test pass. No code can exist without its partner test. )
如果这是一个完美的世界,单元测试将覆盖 100% 的代码。 然而,由于这不是一个完美的世界,所以这取决于你有时间做什么。 因此,我建议减少对特定百分比的关注,而更多地关注关键领域。 如果您的代码写得很好(或者至少是其合理的复制品),那么应该有几个关键点可以将 API 暴露给其他代码。
将您的测试工作集中在这些 API 上。 确保 API 1) 有详细记录,2) 编写的测试用例与文档相匹配。 如果预期结果与文档不匹配,则说明您的代码、文档或测试用例中存在错误。 所有这些都值得审查。
祝你好运!
If this were a perfect world, 100% of code would be covered by unit tests. However, since this is NOT a perfect world, it's a matter of what you have time for. As a result, I recommend focusing less on a specific percentage, and focusing more on the critical areas. If your code is well-written (or at least a reasonable facsimile thereof) there should be several key points where APIs are exposed to other code.
Focus your testing efforts on these APIs. Make sure that the APIs are 1) well documented and 2) have test cases written that match the documentation. If the expected results don't match up with the docs, then you have a bug in either your code, documentation, or test cases. All of which are good to vet out.
Good luck!