使用加密弱伪随机生成器进行模糊测试是不明智的吗?
在开发大型软件项目时,我经常使用模糊测试作为测试用例的一部分,以帮助排除仅在输入达到特定大小或形状时才可能出现的错误。我最常通过使用与我正在使用的编程语言捆绑在一起的标准随机数工具来完成此操作。
最近,我开始想知道,忽略一般模糊测试的优点或缺点,在进行模糊测试时使用非加密安全伪随机数生成器是否是一个好主意。弱随机数生成器通常会表现出将其与真正的随机序列区分开来的模式,即使这些模式并不明显。使用弱 PRNG 的模糊测试似乎总是无法触发某些仅在某些情况下才会出现的潜在错误,因为伪随机数可能以一种永远不会触发这些情况的方式相互关联。
使用弱 PRNG 进行模糊测试本质上是不明智的吗?如果理论上这样做不合理,那么实践中还合理吗?
When working on a large software project I often use fuzz-testing as part of my test cases to help smoke out bugs that may only show up when the input reaches a certain size or shape. I've done this most commonly by just using the standard random number facilities that are bundled with the programming language I happen to be using.
Recently I've started wondering, ignoring the advantages or disadvantages of fuzz testing in general, whether or not it's a good idea to be using non-cryptographically secure pseudorandom number generators when doing fuzz testing. Weak random number generators often exhibit patterns that distinguish them from true random sequences, even if those patterns are not readily obvious. It seems like a fuzz test using a weak PRNG might always fail to trigger certain latent bugs that only show up in certain circumstances because the pseudorandom numbers could be related to one another in a way that never trigger those circumstances.
Is it inherently unwise to use a weak PRNG for fuzz testing? If it is theoretically unsound to do so, is it still reasonable in practice?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您混淆了两种截然不同的“弱点”等级:
You're confusing two very different grades of "weakness":
我认为这并不重要,但我无法证明这一点。
模糊测试只会尝试一些输入,在大多数情况下只尝试一小部分可能性。无论您使用的 RNG 有多好,它都可能会或可能不会找到破坏您代码的输入之一,具体取决于所有可能输入破坏您代码的比例。除非 PRNG 中的模式非常简单,否则在我看来,它不太可能以任何方式对应于您正在寻找的“坏”输入中的模式,因此它不会或多或少地命中它,而不是真正的随机。
事实上,如果您知道如何选择 RNG 来最大程度地提高找到错误输入的概率,您可能可以利用这些知识来帮助更直接地找到错误...
我认为您不应该使用 糟糕的 PRNG。例如,rand 可以表现出非常简单的模式,例如 LSB 交替。如果您的代码在内部使用 PRNG,您可能希望避免在测试中以类似的方式使用相同的 PRNG,只是为了确保您不会意外地仅测试输入数据与内部生成的数字流匹配的情况!当然,风险很小,因为你希望他们使用不同的种子,但仍然如此。
在给定的语言中找到加密或至少安全的哈希库通常并不难。 SHA-1 无处不在,并且易于使用来生成流,否则 RC4 很容易实现。两者都提供了相当好的 PRNG,尽管不如 Blum Blum Shub 那么安全。我认为主要关心的是速度 - 例如,如果 Mersenne Twister 可以以 10 倍的速度生成模糊测试用例,并且被测试的代码相当快,那么它可能有更好的机会在给定的情况下找到错误的输入时间,无论给定 624 个输出,您都可以推断出 RNG 的完整状态...
I don't think it really matters, but I can't prove it.
Fuzz-testing will only try some inputs, in most cases a minuscule proportion of the possibilities. No matter how good the RNG you use, it may or may not find one of the inputs that breaks your code, depending on what proportion of all possible inputs break your code. Unless the pattern in the PRNG is very simple, it seems to me unlikely that it will correspond in any way to a pattern in the "bad" inputs you're looking for, so it will hit it neither more nor less than true random.
In fact, if you knew how to pick an RNG to maximise the probability of it finding bad inputs, you could probably use that knowledge to help find the bug more directly...
I don't think you should use a really bad PRNG.
rand
for example is permitted to exhibit very simple patterns, such as the LSB alternating. And if your code uses a PRNG internally, you probably want to avoid using the same PRNG in a similar way in the test, just to be sure you don't accidentally only test cases where the input data matches the internally-generated number stream! Small risk, of course, since you'd hope they'll be using different seeds, but still.It's usually not that hard in a given language to find crypto or at least secure hash libraries. SHA-1 is everywhere and easy to use to generate a stream, or failing that RC4 is trivial to implement yourself. Both provide pretty good PRNG, if not quite so secure as Blum Blum Shub. I'd have thought the main concern is speed - if for example a Mersenne Twister can generate fuzz test cases 10 times as fast, and the code under test is reasonably fast, then it might have a better chance of finding bad inputs in a given time regardless of the fact that given 624 outputs you can deduce the complete state of the RNG...
您不需要不可预测的源(这正是加密安全生成器的本质),您只需要具有良好统计特性的源。
因此,使用通用生成器就足够了 - 它速度快且通常可重现(这意味着问题也是可重现的)。
You don't need unpredictable source (that's exactly what a cryptographically secure generator is), you only need a source with good statistical properties.
So using a general purpose generator is enough - it is fast and usually reproduceable (which means problems are also reproduceable).