使用 JBehave 编写逻辑测试有意义吗？

发布于 2024-12-05 05:16:56 字数 703 浏览 10 评论 0原文

我最近遇到了 JBehave，我认为我们应该使用它。所以我叫来了我们团队的测试人员，他也认为应该使用这个。

以此为起点，我要求测试人员为测试应用程序（鲍勃叔叔的保龄球游戏套路）编写故事。一天结束时，我们会尝试将他的测试与保龄球比赛进行对比。

我期待这样的测试：

Given a bowling game
When player rolls 5
And player rolls 4
Then total pins knocked down is 9

相反，测试人员带来了“逻辑测试”，换句话说，他并没有那么具体。但是，用他的话来说，这是一次有效的测试。

Given a bowling game
When player does a regular throw
Then score should be calculated appropriately

我的问题是含糊不清，什么是“常规投掷”？什么是‘适当’？当这些步骤之一失败时，这意味着什么？

然而，测试人员说人类确实理解，并且我正在寻找“物理测试”，而编写起来更麻烦。

我可能可以用滚动两次 4 来映射“常规”（仍然没有备用，也没有罢工），但感觉我又在做我不想做的翻译。

所以我想知道，你是如何处理这个问题的？您如何编写 JBehave 测试？您是否有过这样的经历：这些测试不是您编写的，而您必须将它们映射到您的代码中？

原文

I've encountered JBehave recently and I think we should use it. So I have called in the tester of our team and he also thinks that this should be used.

With that as starting point I have asked the tester to write stories for a test application (the Bowling Game Kata of Uncle Bob). At the end of the day we would try to map his tests against the bowling game.

I was expecting a test like this:

Given a bowling game
When player rolls 5
And player rolls 4
Then total pins knocked down is 9

Instead, the tester came with 'logical tests', in other words he was not being that specific. But, in his terms this was a valid test.

Given a bowling game
When player does a regular throw
Then score should be calculated appropriately

My problem with this is ambiguity, what is a 'regular throw'? What is 'appropriately'? What will it mean when one of those steps fail?

However, the tester says that a human does understand and that what I was looking for where 'physical tests', which where more cumbersome to write.

I could probably map 'regular' with rolling two times 4 (still no spare, nor strike), but it feels like I am again doing a translation I don't want to make.

So I wonder, how do you approach this? How do you write your JBehave tests? And do you have any experience when it is not you who writes these tests, and you have to map them to your code?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

若相惜即相离 2024-12-12 05:16:56

他的测试是有效的，但需要一定的领域知识，而这是任何框架都不具备的。自动化测试应该是明确的，将它们视为示例。编写它们比编写“逻辑测试”的成本更高，但从长远来看这是值得的，因为它们可以随意重播，非常快，并给出即时反馈。

您应该与他一起编写第一个测试，以使其朝着正确的方向发展。也许您可以给他您的测试，并要求他通过添加新测试来增加覆盖范围。

回复收藏 0 原文

风渺 2024-12-12 05:16:56

验收标准所需的明确程度取决于开发团队和业务利益相关者之间的信任水平。

在您的示例中，企业假设开发人员/测试人员对保龄球有足够的了解以确定正确的结果。

但想象一个更复杂的领域，比如金融。为此，最好有更明确的示例以确保更好地理解需求。

或者，假设您有一个场景：

Given I try to sign up with an invalid email address
Then I should not be registered

为此，开发人员/测试人员可能比业务利益相关者更了解有效或无效电子邮件地址的构成。您仍然希望针对各种地址进行测试，但可以在步骤定义中指定这些地址，而不是在场景级别公开它。

The amount of explicitness needed in acceptance criteria depends on level of trust between the development team and the business stakeholders.

In your example, the business is assuming that the developers/testers understand enough about bowling to determine the correct outcome.

But imagine a more complex domain, like finance. For that, it would probably be better to have more explicit examples to ensure a good understanding of the requirement.

Alternatively, let's say you have a scenario:

Given I try to sign up with an invalid email address
Then I should not be registered

For this, a developer/tester probably has better knowledge of what constitutes a valid or invalid email address than the business stakeholder does. You would still want to test against a variety of addresses, but that can be specified within the step definitions, rather than exposing it at the scenario level.

回复收藏 0 原文

长途伴 2024-12-12 05:16:56

我讨厌“预期值”中的“适当”这样模糊的词。 “适当地”只是测试中“有毒词”的一个例子，如果不消除，这种“方法”可能会广泛传播，从而有效地扼杀整个测试。对于人类测试人员来说，这可能“足够了”，但这样的“测试用例”只有在第一次尝试探索性“冒烟测试”时才可以接受。

无论什么可重复性、系统性和自动化，每个测试用例都必须是具体的。（不仅仅是“应该”..假设可以允许“会”的柔和性？相反，我使用现在时“shall be”或什至更好的严格“is”作为确认/拒绝的主张。）这条规则是绝对一旦涉及到自动化。

你的测试人员所做的，更像是一个“测试区域”，一个“场景模板”，而不是一个真正的测试用例：因为可以产生很多可能的测试结果......
在你的场景中，你是具体的：那是一个非常具体的真实“测试用例”。可以自动化您的测试用例，这很好：您可以将其委托给一台机器，并根据需要自动评估它。（还有来自持续集成服务器的自动报告的好处）

但是“空测试场景模板”？它也有一些价值：它是一个“场景模板”，一个准备用数据填充的空骨架：所以我喜欢将这些情况命名为“DDT”：“数据驱动测试”。

想象一个要测试的网络表单，对其 10 个输入进行验证，进行交叉验证......以及提交按钮。每个输入可以有 10 个测试用例：

空；
有一个字符，但还是太短了；
对于服务器来说太长，但允许在表单中进行复制粘贴和进一步编辑；
具有无效的字符...

我建议的方法是准备一组要传递的数据：甚至生成它们（从数据库甚至随机生成），无论您可以预测什么都将通过测试，“快乐的场景”。将数据放在一边，作为数据模板，并使用它来初始化表单，填充表单，然后限制某些单个值：创建“失败”测试用例。即对每个输入执行 10 次，对 10 个输入中的每一个输入（甚至在尝试交叉规则之前进行 100 个测试用例）...然后，在服务器拒绝表单 100 次之后，填写表单由传入的数据生成，不扭曲它们，因此表单最终可以被接受。（已接受提交更改服务器应用程序上的状态，因此需要作为最后一个，以测试同一应用程序状态上的所有 101 个案例）

要以这种方式进行测试，您需要两件事：

空场景模板，
和一个包含 100 行数据的表：
- 10 列输入数据：仅操作一个值，在表格中逐行传递（即听说过灰码吗？），
- 可能将继承历史记录保存在行描述中，其中包含行派生的来源以及如何、通过哪个操纵值。
- 第 11 列，即“预期结果”列，填充：通过/未通过预期状态、预期错误/验证消息、参考要求、用于测试覆盖范围跟踪。（即见过 FitNesse 吗？）
- 也可能是执行测试时实际检测结果的列，以跟踪单行测试用例的历史记录。（所以已经提到了 CI 服务器）

为了将一侧的“空场景骨架”和另一侧的“驱动测试的数据表”结合起来，确实需要某种机制。并且您的数据需要导入。因此，您可以在 excel 中准备行，理论上也可以导入，但为了更轻松，我建议使用 CSV、属性、XML 或任何机器和人类可读格式、文本格式。

I hate such vague words as "appropriately" in the "expected values". The "appropriately" is just an example of "toxic word" for the testing, and if not eliminated, this "approach" can get widespread, effectively killing the testing in general. It might "be enough" for human tester, but such "test cases" are acceptable only at first attempts to exploratory "smoke test".

Whatever reproducible, systematical and automatable, every test case must be specific. (not just "should".. to assume the softness of "would" could be allowed? Instead I use the present tense "shall be" or even better strict "is", as a claim to confirm/refuse.) And this rule is absolute once it comes to automation.

What your tester made, was rather a "test-area", a "scenario template", instead of a real test-case: Because so many possible test-results can be produced...
You were specific, in your scenario: That was a very specific real "test case". It is possible to automate your test case, nice: You can delegate it on a machine and evaluate it as often as you need, automatically. (with the bonus of automated report, from an Continuous Integration server)

But the "empty test scenario template"? It has some value too: It is a "scenario template", an empty skeleton prepared to be filled by data: So I love to name these situations "DDT": "Data Driven Testing".

Imagine a web-form to be tested, with validations on its 10 inputs, with cross-validations... And the submit button. There can be 10 test-cases for every single input:

empty;
with a char, but still too short anyway;
too long for the server, but allowed within the form for copy-paste and further edits;
with invalid chars...

The approach I recommend is to prepare a set of to-pass data: even to generate them (from DB or even randomly), whatever you can predict shall pass the test, the "happy scenario". Keep the data aside, as a data-template, and use it to initialize the form, to fill it up, and then to brake-down some single value: Create test cases "to fail". Do it i.e. 10 times for every single input, for each of the 10 inputs (100 tests-cases even before cross-rules attempted) ... and then, after the 100 times of the refusing of the form by the server, fill up the form by the to-pass data, without distorting them, so the form can be accepted finally. (accepted submit changes status on the server-app, so needs to go as the last one, to test all the 101 cases on the same app-state)

To do your test this way, you need two things:

the empty scenario template,
and a table of 100 rows of data:
- 10 columns of input data: with only one value manipulated, as passing row by row down the table (i.e. ever heard about grey-code?),
- possibly keeping the inheritance history in a row-description, where from is the row derived and how, via which manipulated value.
- Also the 11th column, the "expected result" column(s) filled: to pass/fail expected status, expected err/validation message, reference to the requirements, for the test-coveradge tracking. (i.e. ever seen FitNesse?)
- And possibly also the column for the real detected result, when test performed, to track history of the single row-test-case. (so the CI server mentioned already)

To combine the "empty scenario skeleton" on one side and the "data-table to drive the test" on the other side, some mechanism is needed, indeed. And your data need to be imported. So you can prepare the rows in excel, which could be theoretically imported too, but for the easier life I recommend either CSV, properties, XML, or just any machine&human readable format, textual format.

回复收藏 0 原文