使用 JBehave 编写逻辑测试有意义吗?
我最近遇到了 JBehave,我认为我们应该使用它。所以我叫来了我们团队的测试人员,他也认为应该使用这个。
以此为起点,我要求测试人员为测试应用程序(鲍勃叔叔的保龄球游戏套路)编写故事。一天结束时,我们会尝试将他的测试与保龄球比赛进行对比。
我期待这样的测试:
Given a bowling game
When player rolls 5
And player rolls 4
Then total pins knocked down is 9
相反,测试人员带来了“逻辑测试”,换句话说,他并没有那么具体。但是,用他的话来说,这是一次有效的测试。
Given a bowling game
When player does a regular throw
Then score should be calculated appropriately
我的问题是含糊不清,什么是“常规投掷”?什么是‘适当’?当这些步骤之一失败时,这意味着什么?
然而,测试人员说人类确实理解,并且我正在寻找“物理测试”,而编写起来更麻烦。
我可能可以用滚动两次 4 来映射“常规”(仍然没有备用,也没有罢工),但感觉我又在做我不想做的翻译。
所以我想知道,你是如何处理这个问题的?您如何编写 JBehave 测试?您是否有过这样的经历:这些测试不是您编写的,而您必须将它们映射到您的代码中?
I've encountered JBehave recently and I think we should use it. So I have called in the tester of our team and he also thinks that this should be used.
With that as starting point I have asked the tester to write stories for a test application (the Bowling Game Kata of Uncle Bob). At the end of the day we would try to map his tests against the bowling game.
I was expecting a test like this:
Given a bowling game
When player rolls 5
And player rolls 4
Then total pins knocked down is 9
Instead, the tester came with 'logical tests', in other words he was not being that specific. But, in his terms this was a valid test.
Given a bowling game
When player does a regular throw
Then score should be calculated appropriately
My problem with this is ambiguity, what is a 'regular throw'? What is 'appropriately'? What will it mean when one of those steps fail?
However, the tester says that a human does understand and that what I was looking for where 'physical tests', which where more cumbersome to write.
I could probably map 'regular' with rolling two times 4 (still no spare, nor strike), but it feels like I am again doing a translation I don't want to make.
So I wonder, how do you approach this? How do you write your JBehave tests? And do you have any experience when it is not you who writes these tests, and you have to map them to your code?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
他的测试是有效的,但需要一定的领域知识,而这是任何框架都不具备的。自动化测试应该是明确的,将它们视为示例。编写它们比编写“逻辑测试”的成本更高,但从长远来看这是值得的,因为它们可以随意重播,非常快,并给出即时反馈。
您应该与他一起编写第一个测试,以使其朝着正确的方向发展。也许您可以给他您的测试,并要求他通过添加新测试来增加覆盖范围。
His test is valid, but requires a certain knowledge of the domain, which no framework will have. Automated tests should be explicit, think of them as examples. Writing them costs more than writing "logical tests", but this pays in the long run since they can be replayed at will, very quickly, and give an immediate feedback.
You should have paired with him writing the first tests, to put it in the right direction. Perhaps you could give him your test, and ask him to increase the coverage by adding new tests.
验收标准所需的明确程度取决于开发团队和业务利益相关者之间的信任水平。
在您的示例中,企业假设开发人员/测试人员对保龄球有足够的了解以确定正确的结果。
但想象一个更复杂的领域,比如金融。为此,最好有更明确的示例以确保更好地理解需求。
或者,假设您有一个场景:
为此,开发人员/测试人员可能比业务利益相关者更了解有效或无效电子邮件地址的构成。您仍然希望针对各种地址进行测试,但可以在步骤定义中指定这些地址,而不是在场景级别公开它。
The amount of explicitness needed in acceptance criteria depends on level of trust between the development team and the business stakeholders.
In your example, the business is assuming that the developers/testers understand enough about bowling to determine the correct outcome.
But imagine a more complex domain, like finance. For that, it would probably be better to have more explicit examples to ensure a good understanding of the requirement.
Alternatively, let's say you have a scenario:
For this, a developer/tester probably has better knowledge of what constitutes a valid or invalid email address than the business stakeholder does. You would still want to test against a variety of addresses, but that can be specified within the step definitions, rather than exposing it at the scenario level.
我讨厌“预期值”中的“适当”这样模糊的词。 “适当地”只是测试中“有毒词”的一个例子,如果不消除,这种“方法”可能会广泛传播,从而有效地扼杀整个测试。对于人类测试人员来说,这可能“足够了”,但这样的“测试用例”只有在第一次尝试探索性“冒烟测试”时才可以接受。
无论什么可重复性、系统性和自动化,每个测试用例都必须是具体的。(不仅仅是“应该”..假设可以允许“会”的柔和性?相反,我使用现在时“shall be”或什至更好的严格“is”作为确认/拒绝的主张。)这条规则是绝对一旦涉及到自动化。
你的测试人员所做的,更像是一个“测试区域”,一个“场景模板”,而不是一个真正的测试用例:因为可以产生很多可能的测试结果......
在你的场景中,你是具体的:那是一个非常具体的真实“测试用例”。可以自动化您的测试用例,这很好:您可以将其委托给一台机器,并根据需要自动评估它。 (还有来自持续集成服务器的自动报告的好处)
但是“空测试场景模板”?它也有一些价值:它是一个“场景模板”,一个准备用数据填充的空骨架:所以我喜欢将这些情况命名为“DDT”:“数据驱动测试”。
想象一个要测试的网络表单,对其 10 个输入进行验证,进行交叉验证......以及提交按钮。每个输入可以有 10 个测试用例:
我建议的方法是准备一组要传递的数据:甚至生成它们(从数据库甚至随机生成),无论您可以预测什么都将通过测试,“快乐的场景”。将数据放在一边,作为数据模板,并使用它来初始化表单,填充表单,然后限制某些单个值:创建“失败”测试用例。即对每个输入执行 10 次,对 10 个输入中的每一个输入(甚至在尝试交叉规则之前进行 100 个测试用例)...然后,在服务器拒绝表单 100 次之后,填写表单由传入的数据生成,不扭曲它们,因此表单最终可以被接受。 (已接受提交更改服务器应用程序上的状态,因此需要作为最后一个,以测试同一应用程序状态上的所有 101 个案例)
要以这种方式进行测试,您需要两件事:
为了将一侧的“空场景骨架”和另一侧的“驱动测试的数据表”结合起来,确实需要某种机制。并且您的数据需要导入。因此,您可以在 excel 中准备行,理论上也可以导入,但为了更轻松,我建议使用 CSV、属性、XML 或任何机器和人类可读格式、文本格式。
I hate such vague words as "appropriately" in the "expected values". The "appropriately" is just an example of "toxic word" for the testing, and if not eliminated, this "approach" can get widespread, effectively killing the testing in general. It might "be enough" for human tester, but such "test cases" are acceptable only at first attempts to exploratory "smoke test".
Whatever reproducible, systematical and automatable, every test case must be specific. (not just "should".. to assume the softness of "would" could be allowed? Instead I use the present tense "shall be" or even better strict "is", as a claim to confirm/refuse.) And this rule is absolute once it comes to automation.
What your tester made, was rather a "test-area", a "scenario template", instead of a real test-case: Because so many possible test-results can be produced...
You were specific, in your scenario: That was a very specific real "test case". It is possible to automate your test case, nice: You can delegate it on a machine and evaluate it as often as you need, automatically. (with the bonus of automated report, from an Continuous Integration server)
But the "empty test scenario template"? It has some value too: It is a "scenario template", an empty skeleton prepared to be filled by data: So I love to name these situations "DDT": "Data Driven Testing".
Imagine a web-form to be tested, with validations on its 10 inputs, with cross-validations... And the submit button. There can be 10 test-cases for every single input:
The approach I recommend is to prepare a set of to-pass data: even to generate them (from DB or even randomly), whatever you can predict shall pass the test, the "happy scenario". Keep the data aside, as a data-template, and use it to initialize the form, to fill it up, and then to brake-down some single value: Create test cases "to fail". Do it i.e. 10 times for every single input, for each of the 10 inputs (100 tests-cases even before cross-rules attempted) ... and then, after the 100 times of the refusing of the form by the server, fill up the form by the to-pass data, without distorting them, so the form can be accepted finally. (accepted submit changes status on the server-app, so needs to go as the last one, to test all the 101 cases on the same app-state)
To do your test this way, you need two things:
To combine the "empty scenario skeleton" on one side and the "data-table to drive the test" on the other side, some mechanism is needed, indeed. And your data need to be imported. So you can prepare the rows in excel, which could be theoretically imported too, but for the easier life I recommend either CSV, properties, XML, or just any machine&human readable format, textual format.
他的“逻辑测试”与测试计划或待办事项列表中的短语“测试常规保龄球得分”具有相同的信息内容。但它相当长,因此更糟。
仅当测试团队负责生成包含更多信息的测试时,使用 jbehave 才有意义。否则,获取 TODO 列表并在 JUnit 中对其进行编码会更有效。
His 'logical test' has the same information content as the phrase 'test regular bowling score' in a test plan or TODO list. But it is considerably longer, therefor worse.
Using jbehave at all only makes sense in the case the test team are responsible for generating tests with more information in them than that. Otherwise, it would be more efficient to take the TODO list and code it up in JUnit.
我喜欢“预期值”中的“适当”这样的词。您需要使用黄瓜或其他包装器作为通用文档。如果您使用它来覆盖和指定所有可能的场景,您可能会浪费大量时间滚动浏览数百个功能文件。
And I love words like "appropriately" in the "expected values". You need to use cucumber or other wrappers as the generic documentation. If you're using it to cover and specify all possible scenarios you're probably wasting a lot of your time scrolling through hundred of feature files.