单元测试 - 合同变更的单元测试的好处?
最近我和一位同事就单元测试进行了一次有趣的讨论。我们正在讨论当你的合同发生变化时,维护单元测试的效率会降低。
也许任何人都可以告诉我如何解决这个问题。让我详细说明一下:
因此,假设有一个类可以进行一些巧妙的计算。合约规定它应该计算一个数字,否则当由于某种原因失败时返回-1。
我有合同测试来测试这一点。在我所有的其他测试中,我都会存根这个漂亮的计算器。
所以现在我改变了合同,每当它无法计算时就会抛出 CannotCalculateException 。
我的合同测试将失败,我将相应地修复它们。但是,我所有的模拟/存根对象仍将使用旧的合同规则。这些测试将会成功,但他们不应该成功!
出现的问题是,有了对单元测试的信心,对此类更改可以有多少信心......单元测试成功,但在测试应用程序时会出现错误。使用此计算器的测试需要修复,这会花费时间,甚至可能会被存根/嘲笑很多次......
您如何看待这种情况?我从来没有仔细考虑过这个问题。在我看来,单元测试的这些改变是可以接受的。如果我不使用单元测试,我也会在测试阶段(由测试人员)看到此类错误。然而我没有足够的信心指出什么会花费更多(或更少)的时间。
有什么想法吗?
Recently I had an interesting discussion with a colleague about unit tests. We were discussing when maintaining unit tests became less productive, when your contracts change.
Perhaps anyone can enlight me how to approach this problem. Let me elaborate:
So lets say there is a class which does some nifty calculations. The contract says that it should calculate a number, or it returns -1 when it fails for some reason.
I have contract tests who test that. And in all my other tests I stub this nifty calculator thingy.
So now I change the contract, whenever it cannot calculate it will throw a CannotCalculateException.
My contract tests will fail, and I will fix them accordingly. But, all my mocked/stubbed objects will still use the old contract rules. These tests will succeed, while they should not!
The question that rises, is that with this faith in unit testing, how much faith can be placed in such changes... The unit tests succeed, but bugs will occur when testing the application. The tests using this calculator will need to be fixed, which costs time and may even be stubbed/mocked a lot of times...
How do you think about this case? I never thought about it thourougly. In my opinion, these changes to unit tests would be acceptable. If I do not use unit tests, I would also see such bugs arise within test phase (by testers). Yet I am not confident enough to point out what will cost more time (or less).
Any thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您提出的第一个问题是所谓的“脆弱测试”问题。您对应用程序进行了更改,数百个测试因该更改而中断。发生这种情况时,您就会遇到设计问题。您的测试被设计得很脆弱。它们尚未与生产代码充分解耦。解决方案是(就像在所有类似的软件问题中一样)找到一种抽象,将测试与生产代码解耦,从而使生产代码的波动性对测试隐藏起来。
导致这种脆弱性的一些简单的事情是:
测试设计是一个经常被 TDD 初学者忽视的重要问题。这通常会导致测试变得脆弱,从而导致新手拒绝 TDD,认为它“效率低下”。
您提出的第二个问题是误报。您使用了如此多的模拟,以至于您的测试都没有真正测试集成系统。虽然测试独立单元是一件好事,但测试系统的部分和整体集成也很重要。 TDD不仅仅是单元测试。
测试应按如下方式安排:
The first issue you raise is the so-called "fragile test" problem. You make a change to your application, and hundreds of tests break because of that change. When this happens, you have a design problem. Your tests have been designed to be fragile. They have not been sufficiently decoupled from the production code. The solution is (as it it in all software problems like this) to find an abstraction that decouples the tests from the production code in such a way that the volatility of the production code is hidden from the tests.
Some simple things that cause this kind of fragility are:
Test design is an important issue that is often neglected by TDD beginners. This often results in fragile tests, which then leads the novices to reject TDD as "unproductive".
The second issue you raised was false positives. You have used so many mocks that none of your tests actually test the integrated system. While testing independent units is a good thing, it is also important to test partial and whole integrations of the system. TDD is not just about unit tests.
Tests should be arranged as follows:
修复由于有意的代码更改而失败的单元测试比没有测试来捕获这些更改最终引入的错误要好。
当您的代码库具有良好的单元测试覆盖率时,您可以遇到许多单元测试失败,这些失败并不是由于代码中的错误,而是由于对合同或代码重构的有意更改。
但是,单元测试覆盖率也将使您有信心重构代码并实施任何合同更改。某些测试将失败并需要修复,但其他测试最终将由于这些更改引入的错误而失败。
It's better to have to fix unit test that fail due to intentional code changes than not having tests to catch the bugs that are eventually introduced by these changes.
When your codebase has a good unit test coverage, you may run into many unit test failures that are not due to bugs in the code but intentional changes on the contracts or code refactoring.
However, that unit test coverage will also give you confidence to refactor the code and implement any contract changes. Some test will fail and will need to be fixed, but other tests will eventually fail due to bugs that you introduced with these changes.
即使在 100% 代码/功能覆盖率的理想情况下,单元测试也肯定无法捕获所有错误。我认为这是不可预期的。
如果被测试的合约发生了变化,我(开发者)应该动动脑子相应地更新所有代码(包括测试代码!)。如果我未能更新一些因此仍然产生旧行为的模拟,那是我的错,而不是单元测试的错。
这与我修复错误并为其生成单元测试的情况类似,但我未能思考(并测试)所有类似的情况,其中一些后来也被证明是有错误的。
所以,是的,单元测试和生产代码本身一样需要维护。如果没有维护,它们就会腐烂。
Unit tests surely can not catch all bugs, even in the ideal case of 100% code / functionality coverage. I think that is not to be expected.
If the tested contract changes, I (the developer) should use my brains to update all code (including test code!) accordingly. If I fail to update some mocks which therefore still produce the old behaviour, that is my fault, not of the unit tests.
It is similar to the case when I fix a bug and produce a unit test for, but I fail to think through (and test) all similar cases, some of which later turns out to be buggy as well.
So yes, unit tests need maintenance just as well as the production code itself. Without maintenance, they decay and rot.
我在单元测试方面有类似的经验 - 当您更改一个类的合同时,您通常还需要更改其他测试的负载(这在许多情况下实际上会通过,这使得它变得更加困难)。这就是为什么我也总是使用更高级别的测试:
请注意,即使您有 100% 的单元测试覆盖率,您甚至不能保证您的应用程序启动!这就是为什么您需要更高级别的测试。测试有很多不同的层,因为测试的层数越低,通常就越便宜(在开发、维护测试基础设施以及执行时间方面)。
作为旁注 - 由于您提到的问题,使用单元测试教会您使组件尽可能解耦,并使它们的契约尽可能小 - 这绝对是一个很好的实践!
I have similar experiences with unit tests - when you change the contract of one class often you need to change loads of other tests as well (which will actually pass in many cases, which makes it even more difficult). That is why I always use higher level tests as well:
Please note that even if you have 100% unit test coverage, you are not even guaranteed that your application starts! That is why you need higher level tests. There are so many different layers of tests because the lower you test something, the cheaper it usually is (in terms of development, maintaining test infrastructure as well as execution time).
As a side note - because of the problem you mentioned using unit tests teaches you to keep your components as decoupled as possible and their contracts as small as possible - which is definitely a good practise!
单元测试代码(以及用于测试的所有其他代码)的规则之一是以与生产代码相同的方式对待它 - 不多也不少 - 一样。
我对此的理解是(除了保持其相关性、重构、工作等,如生产代码)还应该从投资/成本的角度来看待它。
也许您的测试策略应该包括一些内容来解决您在第一篇文章中描述的问题 - 一些内容指定当设计人员更改时应该审查(执行、检查、修改、修复等)哪些测试代码(包括存根/模拟)生产代码中的函数/方法。因此,任何生产代码更改的成本都必须包括这样做的成本 - 如果没有 - 测试代码将成为“三等公民”,并且设计人员对单元测试套件及其相关性的信心将会降低......显然,投资回报率在于发现和修复错误的时间。
One of the rules for unit tests code (and all other code used for testing) is to treat it the same way as production code - no more, no less - just the same.
My understanding of this is that (beside keeping it relevant, refactored, working etc. like production code) it should be looked at it the same way from the investment/cost prospective as well.
Probably your testing strategy should include something to address the problem you have described in the initial post - something along the lines specifying what test code (including stubs/mocks) should be reviewed (executed, inspected, modified, fixed etc) when a designer change a function/method in production code. Therefore the cost of any production code change must include the cost of doing this - if not - the test code will become "third-class citizen" and the designers' confidence in the unit test suite as well as its relevance will decrease... Obviously, the ROI is in the timing of bugs discovery and fix.
我在这里依赖的一项原则是消除重复。我通常没有很多不同的假货或模拟来实现这个合约(部分原因是我使用的假货比模拟多)。当我更改合同时,很自然地要检查该合同、生产代码或测试的每个实现。当我发现我正在做出这种改变时,这让我很烦恼,我的抽象也许应该经过更好的考虑等等,但是如果测试代码对于合同变更的规模来说太繁重,那么我必须问自己是否这些也需要进行一些重构。
One principle that I rely on here is removing duplication. I generally don't have many different fakes or mocks implementing this contract (I use more fakes than mocks partly for this reason). When I change the contract it is natural to inspect every implementation of that contract, production code or test. It bugs me when I find I'm making this kind of change, my abstractions should have been better thought out perhaps etc, but if the test codes is too onerous to change for the scale of the contract change then I have to ask myself if these also are due some refactoring.
我是这样看的,当你的合同发生变化时,你应该把它当作新合同来对待。因此,您应该为这个“新”合约创建一套全新的 UNIT 测试。事实上,您拥有一组现有的测试用例并不重要。
I look at it this way, when your contract changes, you should treat it like a new contract. Therefore, you should create a whole new set of UNIT test for this "new" contract. The fact that you have an existing set of test cases is besides the point.
我二叔Bob的意见是问题出在设计上。我还会返回一步并检查您的合同设计。
简而言之,
不要说“对于 x==0 返回 -1”或“对于 x==y 抛出 CannotCalculateException”,而是在前提条件下underspecify
niftyCalcuatorThingy(x,y)
x!=y && x!=0
在适当的情况下(见下文)。因此,您的存根可能在这些情况下表现任意,您的单元测试必须反映这一点,并且您具有最大的模块化,即可以自由地针对所有未指定的情况任意更改被测系统的行为 - 无需更改合同或测试。适当的指定不足
您可以根据以下标准区分您的语句“当由于某种原因失败时为-1”: 该场景是否是
当且仅当 1) 到 3) 成立时,请在合约中指定场景(例如,在空堆栈上调用 pop() 时抛出
EmptyStackException
)。如果没有 1),实现就无法保证异常情况下的特定行为。例如,当自反性、对称性、传递性和条件满足时,Object.equals() 不指定任何行为。不满足一致性。
如果没有 2),就无法满足 SingleResponsibilityPrinciple,模块化就会被破坏,并且代码的用户/读者会感到困惑。例如,
Graph Transform(Graph Original)
不应指定可能抛出MissingResourceException
,因为在深处,已经完成了通过序列化进行的一些克隆。如果没有 3),调用者就无法使用指定的行为(某些返回值/异常)。例如,如果 JVM 抛出 UnknownError。
优点和缺点
如果您确实指定 1)、2) 或 3) 不成立的情况,您会遇到一些困难:
规格不足的缺点是(测试)稳健性(即实现对异常情况做出适当反应的能力)更加困难。
作为妥协,我喜欢尽可能使用以下合约模式:
I second uncle Bob's opinion that the problem is in the design. I would additionally go back one step and check the design of your contracts.
In short
instead of saying "return -1 for x==0" or "throw CannotCalculateException for x==y", underspecify
niftyCalcuatorThingy(x,y)
with the preconditionx!=y && x!=0
in appropriate situations (see below). Thus your stubs may behave arbitrarily for these cases, your unit tests must reflect that, and you have maximal modularity, i.e. the liberty to arbitrarily change the behavior of your system under test for all underspecified cases - without the need to change contracts or tests.Underspecification where appropriate
You can differentiate your statement "-1 when it fails for some reason" according to the following criteria: Is the scenario
If and only if 1) to 3) hold, specify the scenario in the contract (e.g. that
EmptyStackException
is thrown when calling pop() on an empty stack).Without 1), the implementation cannot guarantee a specific behavior in the exceptional case. For instance, Object.equals() does not specify any behavior when the condition of reflexivity, symmetry, transitivity & consistency is not met.
Without 2), SingleResponsibilityPrinciple is not met, modularity is broken and users/readers of the code get confused. For instance,
Graph transform(Graph original)
should not specify thatMissingResourceException
might be thrown because deep down, some cloning via serialization is done.Without 3), the caller cannot make use of the specified behavior (certain return value/exception). For instance, if the JVM throws an UnknownError.
Pros and Cons
If you do specify cases where 1), 2) or 3) does not hold, you get some difficulties:
The downside of underspecification is that (testing) robustness, i.e. the implementation's ability to react appropriately to abnormal conditions, is harder.
As compromise, I like to use the following contract schema where possible: