如何衡量稳健性?

发布于 2024-08-23 16:13:14 字数 1433 浏览 12 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

御弟哥哥 2024-08-30 16:13:14

嗯,简短的回答是“不”。稳健可以意味着很多事情,但我能想到的最好的定义是“在每种情况下都能正确执行”。如果您向强大的 Web 服务器发送错误的 HTTP 标头,它应该不会崩溃。它应该返回正确类型的错误,并且应该在某处记录事件,也许以可配置的方式。如果一个强大的网络服务器运行很长时间,它的内存占用应该保持不变。

系统之所以健壮,很大程度上在于它对边缘情况的处理。良好的单元测试是其中的一部分,但很可能不会针对系统存在的任何问题进行单元测试(如果这些问题已知,开发人员可能会修复它们,然后才添加测试) 。

不幸的是,测量任意程序的稳健性几乎是不可能的,因为为了做到这一点,您需要知道该程序应该做什么。如果您有规范,您可以编写大量测试,然后针对任何客户端运行它们作为测试。例如,看看 Acid2 浏览器测试。它以简单、可重复的方式仔细测量任何给定的网络浏览器对标准的遵守程度。这大约是你能得到的最接近的结果,人们已经指出了这种方法的许多缺陷(例如,一个程序是否更频繁地崩溃,但根据规范做了一件额外的事情更稳健?)

但是,有各种检查您可以将其用作系统健康状况的粗略数字估计。单元测试覆盖率是一个非常标准的覆盖率,它的兄弟覆盖率、分支覆盖率、函数覆盖率、语句覆盖率等也是如此。另一个不错的选择是像 FindBugs 这样的“lint”程序。这些可以表明潜在的问题。开源项目通常根据提交或发布的频率和最近程度来判断。如果项目有错误系统,您可以衡量已修复的错误数量和百分比。如果您正在测量程序的特定实例,尤其是具有大量活动的程序,则 MTBF(平均故障间隔时间)是稳健性的一个很好的衡量标准(请参阅 Philip 的回答

但是,这些测量并不能真正告诉您程序的鲁棒性。它们只是猜测的方法。如果很容易判断一个程序是否健壮,我们可能会让编译器检查它。

Well, the short answer is "no." Robust can mean a lot of things, but the best definition I can come up with is "performing correctly in every situation." If you send a bad HTTP header to a robust web server, it shouldn't crash. It should return exactly the right kind of error, and it should log the event somewhere, perhaps in a configurable way. If a robust web server runs for a very long time, its memory footprint should stay the same.

A lot of what makes a system robust is its handling of edge cases. Good unit tests are a part of that, but it's quite likely that there will not be unit tests for any of the problems that a system has (if those problems were known, the developers probably would have fixed them and only then added a test).

Unfortunately, it's nearly impossible to measure the robustness of an arbitrary program because in order to do that you need to know what that program is supposed to do. If you had a specification, you could write a huge number of tests and then run them against any client as a test. For example, look at the Acid2 browser test. It carefully measures how well any given web browser complies with a standard in an easy, repeatable fashion. That's about as close as you can get, and people have pointed out many flaws with such an approach (for instance, is a program that crashes more often but does one extra thing according to spec more robust?)

There are, though, various checks that you could use as a rough, numerical estimate of the health of a system. Unit test coverage is a pretty standard one, as are its siblings, branch coverage, function coverage, statement coverage, etc. Another good choice is "lint" programs like FindBugs. These can indicate the potential for problems. Open source projects are often judged by how frequently and recently commits are made or releases released. If a project has a bug system, you can measure how many bugs have been fixed and the percentage. If there's a specific instance of the program you're measuring, especially one with a lot of activity, MTBF (Mean Time Between Failures) is a good measure of robustness (See Philip's Answer)

These measurements, though, don't really tell you how robust a program is. They're merely ways to guess at it. If it were easy to figure out if a program was robust, we'd probably just make the compiler check for it.

剑心龙吟 2024-08-30 16:13:14

您可以将平均故障间隔时间作为稳健性衡量标准。问题在于,它是一个难以测量的理论量,特别是在您将产品部署到具有实际负载的实际情况之前。造成这种情况的部分原因是测试通常无法涵盖现实世界的可扩展性问题。

You could look into mean time between failures as a robustness measure. The problem is that it is a theoretical quantity which is difficult to measure, particularly before you have deployed your product to a real-world situation with real-world loads. Part of the reason for this is that testing often does not cover real-world scalability issues.

夕色琉璃 2024-08-30 16:13:14

在我们的《模糊测试》一书中(由Takanen、DeMott、Miller 撰写),我们有几章专门讨论负面测试中的指标和覆盖范围(鲁棒性、可靠性、语法测试、模糊测试,同一事物的许多名称)。我还尝试在这里总结我们公司白皮书中最重要的方面:

http://www.codenomicon。 com/products/coverage.shtml

摘录如下:


覆盖率可以看作两个特征的总和:精度和准确度。精度与协议覆盖范围有关。测试的精度取决于测试覆盖不同协议消息、消息结构、标签和数据定义的程度。另一方面,准确性衡量测试在不同协议区域内发现错误的准确程度。因此,准确性可以被视为异常覆盖的一种形式。然而,精度和准确性是相当抽象的术语,因此,我们需要考虑更具体的指标来评估覆盖范围。

第一个覆盖分析方面与攻击面有关。测试需求分析总是从识别需要测试的接口开始。不同接口的数量以及它们在各个层中实现的协议设定了模糊器的要求。每个协议、文件格式或 API 可能需要自己类型的模糊器,具体取决于安全要求。

第二个覆盖率指标与模糊器支持的规范有关。这种类型的度量很容易与基于模型的模糊器一起使用,因为该工具的基础是由用于创建模糊器的规范构成的,因此它们很容易列出。基于模型的模糊器应该涵盖整个规范。然而,基于突变的模糊器不一定完全覆盖规范,因为实现或包含规范中的一个消息交换示例并不能保证覆盖整个规范。通常,当基于突变的模糊器声称支持规范时,这意味着它可以与实现规范的测试目标进行互操作。

特别是对于协议模糊测试,第三个最关键的指标是所选模糊测试方法的状态级别。完全随机的模糊器通常只会测试复杂的有状态协议中的第一条消息。您使用的模糊测试方法的状态感知能力越强,模糊测试器在复杂协议交换中的作用就越深。对于模糊测试工具来说,状态性是一个很难定义的要求,因为它更多的是定义所用协议模型质量的指标,因此只能通过运行测试来验证。


我们还对其他指标进行了研究,例如查看代码覆盖率和其他或多或少无用的数据。 ;) 指标是论文的一个很好的主题。

In our Fuzzing book (by Takanen, DeMott, Miller) we have several chapters dedicated for metrics and coverage in negative testing (robustness, reliability, grammar testing, fuzzing, many names for the same thing). Also I tried to summarize most important aspects in our company whitepaper here:

http://www.codenomicon.com/products/coverage.shtml

Snippet from there:


Coverage can be seen as the sum of two features, precision and accuracy. Precision is concerned with protocol coverage. The precision of testing is determined by how well the tests cover the different protocol messages, message structures, tags and data definitions. Accuracy, on the other hand, measures how accurately the tests can find bugs within different protocol areas. Therefore, accuracy can be regarded as a form of anomaly coverage. However, precision and accuracy are fairly abstract terms, thus, we will need to look at more specific metrics for evaluating coverage.

The first coverage analysis aspect is related to the attack surface. Test requirement analysis always starts off by identifying the interfaces that need testing. The number of different interfaces and the protocols they implement in various layers set the requirements for the fuzzers. Each protocol, file format, or API might require its own type of fuzzer, depending on the security requirements.

Second coverage metric is related to the specification that a fuzzer supports. This type of metric is easy to use with model-based fuzzers, as the basis of the tool is formed by the specifications used to create the fuzzer, and therefore they are easy to list. A model-based fuzzer should cover the entire specification. Whereas, mutation-based fuzzers do not necessarily fully cover the specification, as implementing or including one message exchange sample from a specification does not guarantee that the entire specification is covered. Typically when a mutation-based fuzzer claims specification support, it means it is interoperable with test targets implementing the specification.

Especially regarding protocol fuzzing, the third-most critical metric is the level of statefulness of the selected Fuzzing approach. An entirely random fuzzer will typically only test the first messages in complex stateful protocols. The more state-aware the fuzzing approach you are using is, the deeper the fuzzer can go in complex protocols exchanges. The statefulness is a difficult requirement to define for Fuzzing tools, as it is more a metric for defining the quality of the used protocol model, and can, thus, only be verified by running the tests.


We also have studies in other metrics such as looking at code coverage and other more or less useless data. ;) Metrics is a great topic for a thesis.

箹锭⒈辈孓 2024-08-30 16:13:14

稳健性是非常主观的,但你可以看看FingBugsCoberturaHudson 正确时随着时间的推移,结合在一起可以给您一种安全感,即该软件是强大的。

Robustness is very subjective but you could have a look at FingBugs, Cobertura and Hudson which when correctly combined together could give you a sense of security over time that the software is robust.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文