I'm surprised no one in this topic or in the one Jason Baker linked to mentioned Monte Carlo Testing. That's the only time I've extensively used randomized test inputs. However, it was very important to make the test reproducible, by having a constant seed for the random number generator for each test case.
describe Item, "#most_expensive" do
it 'should return the most expensive item' do
items = [
Item.create!(:price => 100),
Item.create!(:price => 50)
]
Item.most_expensive.price.should == 100
end
end
(2) I use testing to as a form of documentation for the code. If I have hard-coded fixture values, it's hard to reveal what a particular test is trying to demonstrate.
I agree. Ideally spec examples should be understandable by themselves. Using fixtures is problematic, because it splits the pre-conditions of the example from its expected results.
Because of this, many RSpec users have stopped using fixtures altogether. Instead, construct the needed objects in the spec example itself.
describe Item, "#most_expensive" do
it 'should return the most expensive item' do
items = [
Item.create!(:price => 100),
Item.create!(:price => 50)
]
Item.most_expensive.price.should == 100
end
end
If your end up with lots of boilerplate code for object creation, you should take a look at some of the many test object factory libraries, such as factory_girl, Machinist, or FixtureReplacement.
We thought about this a lot on a recent project of mine. In the end, we settled on two points:
Repeatability of test cases is of paramount importance. If you must write a random test, be prepared to document it extensively, because if/when it fails, you will need to know exactly why.
Using randomness as a crutch for code coverage means you either don't have good coverage or you don't understand the domain enough to know what constitutes representative test cases. Figure out which is true and fix it accordingly.
In sum, randomness can often be more trouble than it's worth. Consider carefully whether you're going to be using it correctly before you pull the trigger. We ultimately decided that random test cases were a bad idea in general and to be used sparingly, if at all.
Lots of good information has already been posted, but see also: Fuzz Testing. Word on the street is that Microsoft uses this approach on a lot of their projects.
My experience with testing is mostly with simple programs written in C/Python/Java, so I'm not sure if this is entirely applicable, but whenever I have a program that can accept any sort of user input, I always include a test with random input data, or at least input data generated by the computer in an unpredictable way, because you can never make assumptions about what users will enter. Or, well, you can, but if you do then some hacker who doesn't make that assumption may well find a bug that you totally overlooked. Machine-generated input is the best (only?) way I know of to keep human bias completely out of the testing procedures. Of course, in order to reproduce a failed test you have to do something like saving the test input to a file or printing it out (if it's text) before running the test.
Random testing is a bad practice a long as you don't have a solution for the oracle problem, i.e., determining which is the expected outcome of your software given its input.
If you solved the oracle problem, you can get one step further than simple random input generation. You can choose input distributions such that specific parts of your software get exercised more than with simple random.
You then switch from random testing to statistical testing.
if (a > 0)
// Do Foo
else (if b < 0)
// Do Bar
else
// Do Foobar
If you select a and b randomly in int range, you exercise Foo 50% of the time, Bar 25% of the time and Foobar 25% of the time. It is likely that you will find more bugs in Foo than in Bar or Foobar.
If you select a such that it is negative 66.66% of the time, Bar and Foobar get exercised more than with your first distribution. Indeed the three branches get exercised each 33.33% of the time.
Of course, if your observed outcome is different than your expected outcome, you have to log everything that can be useful to reproduce the bug.
Use of random test data is an excellent practice -- hard-coded test data only tests the cases you explicitly thought of, whereas random data flushes out your implicit assumptions that might be wrong.
I highly recommend using Factory Girl and ffaker for this. (Never use Rails fixtures for anything under any circumstances.)
Effectiveness of such testing largely depends on quality of random number generator you use and on how correct is the code that translates RNG's output into test data.
If the RNG never produces values causing your code to get into some edge case condition you will not have this case covered. If your code that translates the RNG's output into input of the code you test is defective it may happen that even with a good generator you still don't hit all the edge cases.
The problem with randomness in test cases is that the output is, well, random.
The idea behind tests (especially regression tests) is to check that nothing is broken.
If you find something that is broken, you need to include that test every time from then on, otherwise you won't have a consistent set of tests. Also, if you run a random test that works, then you need to include that test, because its possible that you may break the code so that the test fails.
In other words, if you have a test which uses random data generated on the fly, I think this is a bad idea. If however, you use a set of random data, WHICH YOU THEN STORE AND REUSE, this may be a good idea. This could take the form of a set of seeds for a random number generator.
This storing of the generated data allows you to find the 'correct' response to this data.
So, I would recommend using random data to explore your system, but use defined data in your tests (which may have originally been randomly generated data)
Like everything in software engineering, it depends.
The biggest argument people use against it, is that it breaks the test cases being deterministic. However, that's not a problem really, as long as your test cases can fail deterministically. The problem is when your tests become flaky due to random data.
In practice there's several good cases to random data:
Solve conflicts related to your data. E.g. a fixture factory solving unique constraint automatically for a field by using a random-generated UUID.
Maintainable test cases due to reduced boilerplate. When you want to test x, let's focus on just x and not its dependencies.
Fuzzy testing. In which case you want random combinations of data and even noise.
发布评论
评论(13)
我很惊讶这个主题或 Jason Baker 链接的主题中没有人 提到的
蒙特卡罗测试。 这是我唯一一次广泛使用随机测试输入。 然而,通过为每个测试用例的随机数生成器提供恒定的种子,使测试具有可重复性非常重要。
I'm surprised no one in this topic or in the one Jason Baker linked to mentioned
Monte Carlo Testing. That's the only time I've extensively used randomized test inputs. However, it was very important to make the test reproducible, by having a constant seed for the random number generator for each test case.
这是对你的第二点的回答:
我同意。 理想情况下,规范示例本身应该是可以理解的。 使用装置是有问题的,因为它将示例的前提条件与其预期结果分开。
因此,许多 RSpec 用户已经完全停止使用灯具。 相反,在规范示例本身中构造所需的对象。
如果您最终有大量用于创建对象的样板代码,您应该查看一些测试对象工厂库,例如 factory_girl、机械师或FixtureReplacement。
This is an answer to your second point:
I agree. Ideally spec examples should be understandable by themselves. Using fixtures is problematic, because it splits the pre-conditions of the example from its expected results.
Because of this, many RSpec users have stopped using fixtures altogether. Instead, construct the needed objects in the spec example itself.
If your end up with lots of boilerplate code for object creation, you should take a look at some of the many test object factory libraries, such as factory_girl, Machinist, or FixtureReplacement.
我们在我最近的一个项目中对此进行了很多思考。 最后,我们确定了两点:
总而言之,随机性往往带来的麻烦大于其价值。 在扣动扳机之前,请仔细考虑是否会正确使用它。 我们最终认为随机测试用例总体来说是一个坏主意,并且应该谨慎使用(如果有的话)。
We thought about this a lot on a recent project of mine. In the end, we settled on two points:
In sum, randomness can often be more trouble than it's worth. Consider carefully whether you're going to be using it correctly before you pull the trigger. We ultimately decided that random test cases were a bad idea in general and to be used sparingly, if at all.
已经发布了很多好的信息,但另请参阅:模糊测试。 据传闻,微软在他们的许多项目中都使用了这种方法。
Lots of good information has already been posted, but see also: Fuzz Testing. Word on the street is that Microsoft uses this approach on a lot of their projects.
我的测试经验主要是用 C/Python/Java 编写的简单程序,所以我不确定这是否完全适用,但每当我有一个可以接受任何类型的用户输入的程序时,我总是包含一个测试随机输入数据,或者至少是计算机以不可预测的方式生成的输入数据,因为您永远无法假设用户将输入什么。 或者,你可以,但如果你这样做,那么一些没有做出这种假设的黑客很可能会发现你完全忽视的错误。 机器生成的输入是我所知道的将人类偏见完全排除在测试程序之外的最佳(唯一?)方法。 当然,为了重现失败的测试,您必须在运行测试之前执行一些操作,例如将测试输入保存到文件或将其打印出来(如果是文本)。
My experience with testing is mostly with simple programs written in C/Python/Java, so I'm not sure if this is entirely applicable, but whenever I have a program that can accept any sort of user input, I always include a test with random input data, or at least input data generated by the computer in an unpredictable way, because you can never make assumptions about what users will enter. Or, well, you can, but if you do then some hacker who doesn't make that assumption may well find a bug that you totally overlooked. Machine-generated input is the best (only?) way I know of to keep human bias completely out of the testing procedures. Of course, in order to reproduce a failed test you have to do something like saving the test input to a file or printing it out (if it's text) before running the test.
只要您没有解决oracle 问题(即根据输入确定软件的预期结果)的解决方案,随机测试就是一种不好的做法。
如果你解决了预言机问题,你就可以比简单的随机输入生成更进一步。 您可以选择输入分布,以便软件的特定部分比简单的随机分布得到更多的锻炼。
然后,您从随机测试切换到统计测试。
如果您在
int
范围内随机选择a
和b
,则您有 50% 的时间锻炼Foo
,Bar
占 25% 的时间,Foobar
占 25% 的时间。 您可能会在Foo
中发现比Bar
或Foobar
中更多的错误。如果您选择
a
,使其在 66.66% 的情况下为负,则Bar
和Foobar
会比您的第一个分布得到更多运用。 事实上,这三个分支各有 33.33% 的时间得到行使。当然,如果您观察到的结果与预期结果不同,您必须记录对重现错误有用的所有内容。
Random testing is a bad practice a long as you don't have a solution for the oracle problem, i.e., determining which is the expected outcome of your software given its input.
If you solved the oracle problem, you can get one step further than simple random input generation. You can choose input distributions such that specific parts of your software get exercised more than with simple random.
You then switch from random testing to statistical testing.
If you select
a
andb
randomly inint
range, you exerciseFoo
50% of the time,Bar
25% of the time andFoobar
25% of the time. It is likely that you will find more bugs inFoo
than inBar
orFoobar
.If you select
a
such that it is negative 66.66% of the time,Bar
andFoobar
get exercised more than with your first distribution. Indeed the three branches get exercised each 33.33% of the time.Of course, if your observed outcome is different than your expected outcome, you have to log everything that can be useful to reproduce the bug.
我建议看看机械师:
Machinist 将为您生成数据,但它是可重复的,因此每次测试运行都有相同的随机数据。
您可以通过一致地播种随机数生成器来执行类似的操作。
I would suggest having a look at Machinist:
Machinist will generate data for you, but it is repeatable, so each test-run has the same random data.
You could do something similar by seeding the random number generator consistently.
使用随机测试数据是一种很好的做法——硬编码测试数据仅测试您明确想到的情况,而随机数据会清除您可能错误的隐含假设。
我强烈建议使用 Factory Girl 和 ffaker 来实现此目的。 (在任何情况下都不要使用 Rails 固定装置。)
Use of random test data is an excellent practice -- hard-coded test data only tests the cases you explicitly thought of, whereas random data flushes out your implicit assumptions that might be wrong.
I highly recommend using Factory Girl and ffaker for this. (Never use Rails fixtures for anything under any circumstances.)
随机生成的测试用例的一个问题是验证答案应该通过代码计算,并且您不能确定它没有错误:)
One problem with randomly generated test cases is that validating the answer should be computed by code and you can't be sure it doesn't have bugs :)
您可能还会看到此主题:使用随机输入最佳实践进行测试。
You might also see this topic: Testing with random inputs best practices.
此类测试的有效性很大程度上取决于您使用的随机数生成器的质量以及将 RNG 的输出转换为测试数据的代码的正确性。
如果 RNG 从未产生导致您的代码进入某些边缘情况的值,那么您将不会涵盖这种情况。 如果将 RNG 的输出转换为您测试的代码的输入的代码有缺陷,则即使使用良好的生成器,您仍然可能无法满足所有边缘情况。
你将如何测试这一点?
Effectiveness of such testing largely depends on quality of random number generator you use and on how correct is the code that translates RNG's output into test data.
If the RNG never produces values causing your code to get into some edge case condition you will not have this case covered. If your code that translates the RNG's output into input of the code you test is defective it may happen that even with a good generator you still don't hit all the edge cases.
How will you test for that?
测试用例中随机性的问题在于输出是随机的。
测试(尤其是回归测试)背后的想法是检查没有任何问题。
如果您发现某些东西损坏了,那么从那时起您每次都需要包含该测试,否则您将不会有一组一致的测试。 另外,如果您运行有效的随机测试,那么您需要包含该测试,因为您可能会破坏代码,从而导致测试失败。
换句话说,如果您有一个使用动态生成的随机数据的测试,我认为这是一个坏主意。 但是,如果您使用一组随机数据,然后存储并重复使用,这可能是一个好主意。 这可以采用随机数生成器的一组种子的形式。
通过存储生成的数据,您可以找到对此数据的“正确”响应。
因此,我建议使用随机数据来探索您的系统,但在测试中使用定义的数据(最初可能是随机生成的数据)
The problem with randomness in test cases is that the output is, well, random.
The idea behind tests (especially regression tests) is to check that nothing is broken.
If you find something that is broken, you need to include that test every time from then on, otherwise you won't have a consistent set of tests. Also, if you run a random test that works, then you need to include that test, because its possible that you may break the code so that the test fails.
In other words, if you have a test which uses random data generated on the fly, I think this is a bad idea. If however, you use a set of random data, WHICH YOU THEN STORE AND REUSE, this may be a good idea. This could take the form of a set of seeds for a random number generator.
This storing of the generated data allows you to find the 'correct' response to this data.
So, I would recommend using random data to explore your system, but use defined data in your tests (which may have originally been randomly generated data)
就像软件工程中的一切一样,这取决于情况。
人们反对它的最大论点是它破坏了测试用例的确定性。 然而,这实际上并不是一个问题,只要您的测试用例可以确定性失败。 问题是当你的测试由于随机数据而变得不稳定时。
在实践中,随机数据有几个很好的例子:
Like everything in software engineering, it depends.
The biggest argument people use against it, is that it breaks the test cases being deterministic. However, that's not a problem really, as long as your test cases can fail deterministically. The problem is when your tests become flaky due to random data.
In practice there's several good cases to random data: