两个程序运行时间比较的方法物体

发布于 2024-12-06 16:14:24 字数 1961 浏览 1 评论 0原文

我正在进行一种特定类型的代码测试，该测试相当麻烦并且可以自动化，但我不确定最佳实践。在描述问题之前，我想澄清一下，我正在寻找合适的术语和概念，以便我可以阅读有关如何实现它的更多信息。当然，欢迎提出有关最佳实践的建议，但我的目标很具体：这种方法叫什么？

在最简单的情况下，我有两个程序，它们接收一堆数据，生成各种中间对象，然后返回最终结果。当端到端测试时，最终结果会有所不同，因此需要找出差异发生在哪里。不幸的是，即使是中间结果也可能有所不同，但并不总是显着不同（即一些差异是可以容忍的）。最后的问题是，两个程序之间的中间对象不一定具有相同的名称，并且两组中间对象可能不完全重叠（例如，一个程序可能比另一个程序具有更多的中间对象）。因此，我不能假设两个程序中创建的对象之间存在一对一的关系。

我正在考虑采用的自动比较对象的方法如下（它大致受到文本语料库中的频率计数的启发）：

对于每个程序，A和B：创建在整个执行过程中创建的对象的列表，这可能以非常简单的方式进行索引，例如a001，a002，a003，a004，...，对于B（b001，...）也类似。
令 Na = A 中遇到的唯一对象名称的数量，对于 Nb 和 B 中的对象的数量类似
。创建两个表，TableA 和 TableB，分别包含 Na 和 Nb 列。条目将在每次触发时记录每个对象的值（即，接下来定义的每行）。
对于 A 中的每个分配，最简单的方法是捕获所有 Na 项的哈希值；当然，对于那些没有改变的项目可以使用LOCF（最后一次观察结转），并且任何尚未观察到的对象都被简单地赋予一个NULL条目。对 B 重复此操作
。通过哈希值匹配 TableA 和 TableB 中的条目。理想情况下，对象将以大致相同的顺序到达“词汇表”，因此顺序和哈希值将允许人们识别值的序列。
根据具有不同序列的任何对象的哈希值序列何时发散，查找 A 和 B 之间的对象中的差异。

现在，这是一种简单的方法，如果数据简单、原子且不易受到数值精度问题的影响，则可以非常有效。然而，我认为数值精度可能会导致哈希值出现偏差，尽管如果差异大约在机器容忍水平上，则影响微不足道。

第一：此类测试方法和概念的名称是什么？答案不一定是上面的方法，而是反映用于比较来自两个（或更多）不同程序的对象的方法类。

第二：我在步骤 3 和 4 中描述的标准方法有哪些？例如，“值”不仅需要是散列：还可以存储对象的大小 - 毕竟，如果两个对象的大小相差很大，那么它们就不可能相同。

在实践中，我倾向于比较少量的项目，但我怀疑当自动化时，这不需要涉及用户的大量输入。

也许数据跟踪和堆栈跟踪方法是相关的。检查点有点相关，但它的典型用途（即保存所有状态）有点矫枉过正。

编辑2：其他相关概念包括差异程序分析和远程监控系统（例如太空探测器），其中尝试使用本地实现（通常是克隆）来重现计算（将 HAL-9000 与其地球上的克隆进行比较）。我研究了单元测试、逆向工程、各种取证等方法。在开发阶段，可以确保与单元测试的一致性，但这对于仪器分析似乎没有用。对于逆向工程，目标可以是代码和结果。数据协议，但评估重新设计的代码保真度的方法似乎并不容易找到。基于每个程序的取证很容易找到，但程序之间的比较似乎并不常见。

原文

I am working through a particular type of code testing that is rather nettlesome and could be automated, yet I'm not sure of the best practices. Before describing the problem, I want to make clear that I'm looking for the appropriate terminology and concepts, so that I can read more about how to implement it. Suggestions on best practices are welcome, certainly, but my goal is specific: what is this kind of approach called?

In the simplest case, I have two programs that take in a bunch of data, produce a variety of intermediate objects, and then return a final result. When tested end-to-end, the final results differ, hence the need to find out where the differences occur. Unfortunately, even intermediate results may differ, but not always in a significant way (i.e. some discrepancies are tolerable). The final wrinkle is that intermediate objects may not necessarily have the same names between the two programs, and the two sets of intermediate objects may not fully overlap (e.g. one program may have more intermediate objects than the other). Thus, I can't assume there is a one-to-one relationship between the objects created in the two programs.

The approach that I'm thinking of taking to automate this comparison of objects is as follows (it's roughly inspired by frequency counts in text corpora):

For each program, A and B: create a list of the objects created throughout execution, which may be indexed in a very simple manner, such as a001, a002, a003, a004, ... and similarly for B (b001, ...).
Let Na = # of unique object names encountered in A, similarly for Nb and # of objects in B.
Create two tables, TableA and TableB, with Na and Nb columns, respectively. Entries will record a value for each object at each trigger (i.e. for each row, defined next).
For each assignment in A, the simplest approach is to capture the hash value of all of the Na items; of course, one can use LOCF (last observation carried forward) for those items that don't change, and any as-yet unobserved objects are simply given a NULL entry. Repeat this for B.
Match entries in TableA and TableB via their hash values. Ideally, objects will arrive into the "vocabulary" in approximately the same order, so that order and hash value will allow one to identify the sequences of values.
Find discrepancies in the objects between A and B based on when the sequences of hash values diverge for any objects with divergent sequences.

Now, this is a simple approach and could work wonderfully if the data were simple, atomic, and not susceptible to numerical precision issues. However, I believe that numerical precision may cause hash values to diverge, though the impact is insignificant if the discrepancies are approximately at the machine tolerance level.

First: What is a name for such types of testing methods and concepts? An answer need not necessarily be the method above, but reflects the class of methods for comparing objects from two (or more) different programs.

Second: What are standard methods exist for what I describe in steps 3 and 4? For instance, the "value" need not only be a hash: one might also store the sizes of the objects - after all, two objects cannot be the same if they are massively different in size.

In practice, I tend to compare a small number of items, but I suspect that when automated this need not involve a lot of input from the user.

Edit 1: This paper is related in terms of comparing the execution traces; it mentions "code comparison", which is related to my interest, though I'm concerned with the data (i.e. objects) than with the actual code that produces the objects. I've just skimmed it, but will review it more carefully for methodology. More importantly, this suggests that comparing code traces may be extended to comparing data traces. This paper analyzes some comparisons of code traces, albeit in a wholly unrelated area of security testing.

Perhaps data-tracing and stack-trace methods are related. Checkpointing is slightly related, but its typical use (i.e. saving all of the state) is overkill.

Edit 2: Other related concepts include differential program analysis and monitoring of remote systems (e.g. space probes) where one attempts to reproduce the calculations using a local implementation, usually a clone (think of a HAL-9000 compared to its earth-bound clones). I've looked down the routes of unit testing, reverse engineering, various kinds of forensics, and whatnot. In the development phase, one could ensure agreement with unit tests, but this doesn't seem to be useful for instrumented analyses. For reverse engineering, the goal can be code & data agreement, but methods for assessing fidelity of re-engineered code don't seem particularly easy to find. Forensics on a per-program basis are very easily found, but comparisons between programs don't seem to be that common.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

治碍 2024-12-13 16:14:24

（制作这个答案社区维基，因为数据流编程和反应式编程不是我的专业领域。）

数据流编程领域似乎是相关的，因此数据流程序的调试可能会有所帮助。这篇 1981 年的论文给出了一些有用的高级内容想法。尽管很难将这些转换为立即适用的代码，但它确实提出了一种我忽略的方法：当将程序作为数据流处理时，可以静态或动态地识别输入值的变化在中间处理中导致其他值变化的位置或者在输出中（不仅仅是执行中的变化，如果要检查控制流）。

尽管数据流编程通常与并行或分布式计算相关，但它似乎与响应式编程相吻合，后者这就是如何实现对象监控（例如散列）。

这个答案还远远不够，因此使用了 CW 标签，因为它并没有真正命名我所描述的调试方法。也许这是反应式编程范例的一种调试形式。

[另请注意：虽然这个答案是 CW，但如果有人在数据流或反应式编程方面有更好的答案，请随时发布单独的答案，我将删除这个答案。]

注 1：Henrik Nilsson 和 Peter Fritzson < a href="http://www.cs.nott.ac.uk/~nhn/papers.html" rel="nofollow">有很多关于惰性函数式语言调试的论文，这些论文有点相关：调试目标是评估值，而不是代码的执行。这篇论文似乎有几个不错的想法，他们的工作部分启发了本文在调试器上一种名为 Lustre 的反应式编程语言。这些参考资料并没有回答最初的问题，但可能会引起面临同样挑战的任何人的兴趣，尽管是在不同的编程环境中。