当前位置：文江博客话题详情

单元测试 - 合同变更的单元测试的好处？

发布于 2024-09-04 05:33:24 字数 582 浏览 8 评论 0原文

最近我和一位同事就单元测试进行了一次有趣的讨论。我们正在讨论当你的合同发生变化时，维护单元测试的效率会降低。

也许任何人都可以告诉我如何解决这个问题。让我详细说明一下：

因此，假设有一个类可以进行一些巧妙的计算。合约规定它应该计算一个数字，否则当由于某种原因失败时返回-1。

我有合同测试来测试这一点。在我所有的其他测试中，我都会存根这个漂亮的计算器。

所以现在我改变了合同，每当它无法计算时就会抛出 CannotCalculateException 。

我的合同测试将失败，我将相应地修复它们。但是，我所有的模拟/存根对象仍将使用旧的合同规则。这些测试将会成功，但他们不应该成功！

出现的问题是，有了对单元测试的信心，对此类更改可以有多少信心......单元测试成功，但在测试应用程序时会出现错误。使用此计算器的测试需要修复，这会花费时间，甚至可能会被存根/嘲笑很多次......

您如何看待这种情况？我从来没有仔细考虑过这个问题。在我看来，单元测试的这些改变是可以接受的。如果我不使用单元测试，我也会在测试阶段（由测试人员）看到此类错误。然而我没有足够的信心指出什么会花费更多（或更少）的时间。

有什么想法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

赠我空喜 2024-09-11 05:33:24

您提出的第一个问题是所谓的“脆弱测试”问题。您对应用程序进行了更改，数百个测试因该更改而中断。发生这种情况时，您就会遇到设计问题。您的测试被设计得很脆弱。它们尚未与生产代码充分解耦。解决方案是（就像在所有类似的软件问题中一样）找到一种抽象，将测试与生产代码解耦，从而使生产代码的波动性对测试隐藏起来。

导致这种脆弱性的一些简单的事情是：

测试显示的字符串。此类字符串是不稳定的，因为它们的语法或拼写可能会因分析人员的突发奇想而改变。
测试应该在抽象（例如 FULL_TIME）后面编码的离散值（例如 3）。
从许多测试中调用相同的 API。您应该将 API 调用包装在测试函数中，以便当 API 更改时您可以在一处进行更改。

测试设计是一个经常被 TDD 初学者忽视的重要问题。这通常会导致测试变得脆弱，从而导致新手拒绝 TDD，认为它“效率低下”。

您提出的第二个问题是误报。您使用了如此多的模拟，以至于您的测试都没有真正测试集成系统。虽然测试独立单元是一件好事，但测试系统的部分和整体集成也很重要。 TDD不仅仅是单元测试。

测试应按如下方式安排：

单元测试提供接近 100% 的代码覆盖率。他们测试独立的单元。它们是由程序员使用系统的编程语言编写的。
组件测试覆盖约 50% 的系统。它们是由业务分析师和 QA 编写的。它们是用 FitNesse、Selenium、Cucumber 等语言编写的。它们测试整个组件，而不是单个单元。他们主要测试快乐路径案例和一些非常明显的不快乐路径案例。
集成测试覆盖了约 20% 的系统。他们测试小型组件集而不是整个系统。也用FitNesse/Selenium/Cucumber 等语言编写。由建筑师编写。
系统测试覆盖约 10% 的系统。他们测试集成在一起的整个系统。同样，它们是用 FitNesse/Selenium/Cucumber 等语言编写的。由建筑师编写。
探索性手动测试。（参见 James Bach）这些测试是手动的，但没有脚本化。他们运用人类的聪明才智和创造力。

The first issue you raise is the so-called "fragile test" problem. You make a change to your application, and hundreds of tests break because of that change. When this happens, you have a design problem. Your tests have been designed to be fragile. They have not been sufficiently decoupled from the production code. The solution is (as it it in all software problems like this) to find an abstraction that decouples the tests from the production code in such a way that the volatility of the production code is hidden from the tests.

Some simple things that cause this kind of fragility are:

Testing for strings that are displayed. Such strings are volatile because their grammar or spelling may change at the whim of an analyst.
Testing for discrete values (e.g. 3) that should be encoded behind an abstraction (e.g. FULL_TIME).
Calling the same API from many tests. You should wrap the API call in a test function so that when the API changes you can make the change in one place.

Test design is an important issue that is often neglected by TDD beginners. This often results in fragile tests, which then leads the novices to reject TDD as "unproductive".

The second issue you raised was false positives. You have used so many mocks that none of your tests actually test the integrated system. While testing independent units is a good thing, it is also important to test partial and whole integrations of the system. TDD is not just about unit tests.

Tests should be arranged as follows:

Unit tests provide close to 100% code coverage. They test independent units. They are written by programmers using the programming language of the system.
Component tests cover ~50% of the system. They are written by business analysts and QA. They are written in a language like FitNesse, Selenium, Cucumber, etc. They test whole components, not individual units. They test primarily happy path cases and some highly visible unhappy path cases.
Integration tests cover ~20% of the system. They tests small assemblies of components as opposed to the whole system. Also written in FitNesse/Selenium/Cucumber etc. Written by architects.
System tests cover ~10% of the system. They test the whole system integrated together. Again they are written in FitNesse/Selenium/Cucumber etc. Written by architects.
Exploratory manual tests. (See James Bach) These tests are manual but not scripted. They employ human ingenuity and creativity.

回复收藏 0 原文

忆沫 2024-09-11 05:33:24

修复由于有意的代码更改而失败的单元测试比没有测试来捕获这些更改最终引入的错误要好。

当您的代码库具有良好的单元测试覆盖率时，您可以遇到许多单元测试失败，这些失败并不是由于代码中的错误，而是由于对合同或代码重构的有意更改。

但是，单元测试覆盖率也将使您有信心重构代码并实施任何合同更改。某些测试将失败并需要修复，但其他测试最终将由于这些更改引入的错误而失败。

回复收藏 0 原文

稀香 2024-09-11 05:33:24

即使在 100% 代码/功能覆盖率的理想情况下，单元测试也肯定无法捕获所有错误。我认为这是不可预期的。

如果被测试的合约发生了变化，我（开发者）应该动动脑子相应地更新所有代码（包括测试代码！）。如果我未能更新一些因此仍然产生旧行为的模拟，那是我的错，而不是单元测试的错。

这与我修复错误并为其生成单元测试的情况类似，但我未能思考（并测试）所有类似的情况，其中一些后来也被证明是有错误的。

所以，是的，单元测试和生产代码本身一样需要维护。如果没有维护，它们就会腐烂。

回复收藏 0 原文

贪了杯 2024-09-11 05:33:24

我在单元测试方面有类似的经验 - 当您更改一个类的合同时，您通常还需要更改其他测试的负载（这在许多情况下实际上会通过，这使得它变得更加困难）。这就是为什么我也总是使用更高级别的测试：

验收测试 - 测试几个或更多类。这些测试通常与需要实现的用户存储保持一致 - 因此您可以测试用户故事是否“有效”。这些不需要连接到数据库或其他外部系统，但可以。
集成测试 - 主要检查外部系统连接等。
完整的端到端测试 - 测试整个系统

请注意，即使您有 100% 的单元测试覆盖率，您甚至不能保证您的应用程序启动！这就是为什么您需要更高级别的测试。测试有很多不同的层，因为测试的层数越低，通常就越便宜（在开发、维护测试基础设施以及执行时间方面）。

作为旁注 - 由于您提到的问题，使用单元测试教会您使组件尽可能解耦，并使它们的契约尽可能小 - 这绝对是一个很好的实践！

回复收藏 0 原文

赠我空喜 2024-09-11 05:33:24

单元测试代码（以及用于测试的所有其他代码）的规则之一是以与生产代码相同的方式对待它 - 不多也不少 - 一样。

我对此的理解是（除了保持其相关性、重构、工作等，如生产代码）还应该从投资/成本的角度来看待它。

也许您的测试策略应该包括一些内容来解决您在第一篇文章中描述的问题 - 一些内容指定当设计人员更改时应该审查（执行、检查、修改、修复等）哪些测试代码（包括存根/模拟）生产代码中的函数/方法。因此，任何生产代码更改的成本都必须包括这样做的成本 - 如果没有 - 测试代码将成为“三等公民”，并且设计人员对单元测试套件及其相关性的信心将会降低......显然，投资回报率在于发现和修复错误的时间。

回复收藏 0 原文

狼亦尘 2024-09-11 05:33:24

我在这里依赖的一项原则是消除重复。我通常没有很多不同的假货或模拟来实现这个合约（部分原因是我使用的假货比模拟多）。当我更改合同时，很自然地要检查该合同、生产代码或测试的每个实现。当我发现我正在做出这种改变时，这让我很烦恼，我的抽象也许应该经过更好的考虑等等，但是如果测试代码对于合同变更的规模来说太繁重，那么我必须问自己是否这些也需要进行一些重构。

回复收藏 0 原文

若水微香 2024-09-11 05:33:24

我是这样看的，当你的合同发生变化时，你应该把它当作新合同来对待。因此，您应该为这个“新”合约创建一套全新的 UNIT 测试。事实上，您拥有一组现有的测试用例并不重要。

回复收藏 0 原文

过气美图社 2024-09-11 05:33:24

我二叔Bob的意见是问题出在设计上。我还会返回一步并检查您的合同设计。

简而言之，

不要说“对于 x==0 返回 -1”或“对于 x==y 抛出 CannotCalculateException”，而是在前提条件下underspecify niftyCalcuatorThingy(x,y) x!=y && x!=0 在适当的情况下（见下文）。因此，您的存根可能在这些情况下表现任意，您的单元测试必须反映这一点，并且您具有最大的模块化，即可以自由地针对所有未指定的情况任意更改被测系统的行为 - 无需更改合同或测试。

适当的指定不足

您可以根据以下标准区分您的语句“当由于某种原因失败时为-1”：该场景是否是

实现可以检查的异常行为？
在方法的域/职责内？
调用者（或调用堆栈中较早的人）可以通过其他方式恢复/处理的异常？

当且仅当 1) 到 3) 成立时，请在合约中指定场景（例如，在空堆栈上调用 pop() 时抛出 EmptyStackException）。

如果没有 1)，实现就无法保证异常情况下的特定行为。例如，当自反性、对称性、传递性和条件满足时，Object.equals() 不指定任何行为。不满足一致性。

如果没有 2)，就无法满足 SingleResponsibilityPrinciple，模块化就会被破坏，并且代码的用户/读者会感到困惑。例如，Graph Transform(Graph Original) 不应指定可能抛出 MissingResourceException，因为在深处，已经完成了通过序列化进行的一些克隆。

如果没有 3)，调用者就无法使用指定的行为（某些返回值/异常）。例如，如果 JVM 抛出 UnknownError。

优点和缺点

如果您确实指定 1)、2) 或 3) 不成立的情况，您会遇到一些困难：

合同（设计者）的主要目的是模块化。如果您真正分离了职责，这是最好实现的：当不满足前提条件（调用者的职责）时，不指定实现的行为会导致最大的模块化 - 正如您的示例所示。
您将来没有任何自由进行更改，甚至不能更改该方法的更通用的功能（在少数情况下抛出异常）
异常行为可能会变得相当复杂，因此涵盖它们的契约变得复杂、容易出错且难以理解。例如：是否涵盖所有情况？如果多个异常先决条件成立，哪种行为是正确的？

规格不足的缺点是（测试）稳健性（即实现对异常情况做出适当反应的能力）更加困难。

作为妥协，我喜欢尽可能使用以下合约模式：

<（半）正式的前置和后置条件，包括例外
行为，其中 1) 到 3) 保持>
如果不满足 PRE，当前实现将抛出 RTE A、B 或
C.

I second uncle Bob's opinion that the problem is in the design. I would additionally go back one step and check the design of your contracts.

In short

instead of saying "return -1 for x==0" or "throw CannotCalculateException for x==y", underspecify niftyCalcuatorThingy(x,y) with the precondition x!=y && x!=0 in appropriate situations (see below). Thus your stubs may behave arbitrarily for these cases, your unit tests must reflect that, and you have maximal modularity, i.e. the liberty to arbitrarily change the behavior of your system under test for all underspecified cases - without the need to change contracts or tests.

Underspecification where appropriate

You can differentiate your statement "-1 when it fails for some reason" according to the following criteria: Is the scenario

an exceptional behavior that the implementation can check?
within the method's domain/responsibility?
an exception that the caller (or someone earlier in the call stack) can recover from/handle in some other way?

If and only if 1) to 3) hold, specify the scenario in the contract (e.g. that EmptyStackException is thrown when calling pop() on an empty stack).

Without 1), the implementation cannot guarantee a specific behavior in the exceptional case. For instance, Object.equals() does not specify any behavior when the condition of reflexivity, symmetry, transitivity & consistency is not met.

Without 2), SingleResponsibilityPrinciple is not met, modularity is broken and users/readers of the code get confused. For instance, Graph transform(Graph original) should not specify that MissingResourceException might be thrown because deep down, some cloning via serialization is done.

Without 3), the caller cannot make use of the specified behavior (certain return value/exception). For instance, if the JVM throws an UnknownError.

Pros and Cons

If you do specify cases where 1), 2) or 3) does not hold, you get some difficulties:

a main purpose of a (design by) contract is modularity. This is best achievable if you really separate the responsibilities: When the precondition (the responsibility of the caller) is not met, not specifying the behavior of the implementation leads to maximal modularity - as your example shows.
you don't have any liberty to change in the future, not even to a more general functionality of the method which throws exception in fewer cases
exceptional behaviors can become quite complex, so the contracts covering them become complex, error prone and hard to understand. For instance: is every situation covered? Which behavior is correct if multiple exceptional preconditions hold?

The downside of underspecification is that (testing) robustness, i.e. the implementation's ability to react appropriately to abnormal conditions, is harder.

As compromise, I like to use the following contract schema where possible:

<(Semi-)formal PRE- and POST-condition, including exceptional
behavior where 1) to 3) hold>
If PRE is not met, the current implementation throws the RTE A, B or
C.

回复收藏 0 原文

~没有更多了~