如何测试需要复杂输入数据的程序?
我们有一套转换器,可以获取复杂的数据并对其进行转换。 大多数输入是 EDI,输出是 XML,反之亦然,但也有其他格式。
数据中存在许多相互依赖关系。 有什么方法或软件可以生成这样的复杂输入数据?
现在我们使用两种方法:(1) 多年来我们主要根据文件错误和文档中的示例构建的一套示例文件,以及 (2) 生成伪随机测试数据。 但前者只覆盖了一小部分情况,后者有很多妥协,只测试了一部分领域。
在进一步实施(重新发明?)复杂的表驱动数据生成器之前,您发现哪些选项是成功的?
We have a suite of converters that take complex data and transform it. Mostly the input is EDI and the output XML, or vice-versa, although there are other formats.
There are many inter-dependencies in the data. What methods or software are available that can generate complex input data like this?
Right now we use two methods: (1) a suite of sample files that we've built over the years mostly from files bugs and samples in documentation, and (2) generating pseudo-random test data. But the former only covers a fraction of the cases, and the latter has lots of compromises and only tests a subset of the fields.
Before go further down the path of implementing (reinventing?) a complex table-driven data generator, what options have you found successful?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
嗯,答案就在你的问题中。 除非您实现复杂的表驱动数据生成器,否则您就可以正确使用 (1) 和 (2)。
(1) 涵盖“验证 1 个错误,1 个新测试用例”的规则。
如果(2)的伪随机测试数据的结构与现实生活中的情况相符,那就没问题。
(2) 总是可以改进,并且在考虑新的边缘情况时,它主要会随着时间的推移而改进。 用于测试的随机数据的问题在于,它只能随机到很难从测试用例中的随机数据计算预期输出的程度,以至于您必须基本上重写测试用例中的测试算法。
所以 (2) 总是匹配一小部分情况。 如果有一天它匹配所有情况,那么它实际上将是算法的新版本。
Well, the answer is in your question. Unless you implement a complex table-driven data generator, you're doing the things right with (1) and (2).
(1) covers the rule of "1 bug verified, 1 new test case".
And if the structure of the pseudo-random test data of (2) corresponds whatsoever in real life situations, it is fine.
(2) can always be improved, and it'll improve mainly over time, when thinking about new edge cases. The problem with random data for tests is that it can only be random to a point where it becomes so difficult to compute the expected output from the random data in the test case, that you have to basically rewrite the tested algorithm in the test case.
So (2) will always match a fraction of the cases. If one day it matches all the cases, it will be in fact a new version of your algorithm.
我建议不要使用随机数据,因为它可能会使重现报告的错误变得困难(如果不是不可能的话)(我知道你说“伪随机”,只是不确定你的意思到底是什么)。
对整个数据文件进行操作可能会考虑功能或集成测试。 我建议您获取包含已知错误的文件集并将其转换为单元测试,或者至少对您将来遇到的任何错误执行此操作。 然后,您还可以扩展这些单元测试,以涵盖您没有任何“样本数据”的其他错误情况。 这可能会比每次想到要检查的条件/规则违规时提出一个全新的数据文件更容易。
确保您对数据格式的解析是从格式中数据的解释中封装出来的。 这将使如上所述的单元测试变得更加容易。
如果您确实需要推动测试,您可能需要考虑获取文件格式的机器可读描述,并编写一个测试数据生成器来分析格式并根据它生成有效/无效文件。 这也将允许您的测试数据随着文件格式的变化而发展。
I'd advise against using random data as it can make it difficult if not impossible to reproduce the error that reported (I know you said 'pseudo-random', just not sure what you mean by that exactly).
Operating over entire files of data would likely be considering functional or integration testing. I would suggest taking your set of files with known bugs and translating these into unit tests, or at least do so for any future bugs you come across. Then you can also extend these unit tests to include coverage for the other erroneous conditions that you don't have any 'sample data'. This will likely be easier then coming up with a whole new data file every time you think of a condition/rule violation you want to check for.
Make sure your parsing of the data format is encapsulated from the interpretation of the data in the format. This will make unit testing as described above much easier.
If you definitely need to drive your testing you may want to consider getting a machine readable description of the file format, and writing a test data generator which will analyze the format and generate valid/invalid files based upon it. This will also allow your test data to evolve as the file formats do as well.