调试和二分查找

发布于 2024-07-19 04:50:12 字数 93 浏览 13 评论 0 原文

第 2 栏（“啊哈！算法”）中的“编程珍珠”讨论了二分搜索如何帮助排序、树遍历等各种过程。但它提到二进制搜索可以用于“程序调试”。有人可以解释一下这是如何完成的吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

山田美奈子 2024-07-26 04:50:12

二分搜索是在排序列表中查找项目的有效方法。例如，如果您正在查找一本书中的特定页面（例如，第 147 页），您将打开该书靠近中间的位置，并确定打开的页面是在您要查找的页面之前还是之后。然后你会选择你已经缩小到的部分并重复这个过程：将其分成两半并确定哪一半包含第 147 页。更好的是，你可以猜测第 147 页有多远——如果这本书是非常长，接近一本小书的结尾，并使用该猜测作为第一个划分点。这种二分搜索的变体称为插值搜索。

因此，如果您有错误并且它可能隐藏在排序列表中，插值搜索通常是消除它的方法。其他答案解释了隐藏在一系列行或源代码提交中的错误的常见情况。但该技术可以应用于其他情况：

日志搜索

在一个长期运行的系统上，特别是在处理大量数据的系统上，您必须每天轮换日志，看到几周/几个月/几年前还好的东西今天损坏的情况并不少见。对于复杂的、互锁的系统，无需更改任何代码就可以发现错误。查找硬件、网络、操作系统、配置（尽管应该与代码一起存储）、输入、手动过程等中发生的变化可能很困难，因为其中很多东西都在很长一段时间内发生了变化时间段。日志的全文搜索（无论是在表中还是在文件中）通常是不切实际的。

在这种情况下，几乎没有其他选择，只能打开中间某处的日志，看看问题是否存在。然后剪切您知道隐藏错误的部分并再次查找错误。最终，您应该能够在错误出现的第一刻就发现，这使得找到罪魁祸首变得更加容易。
输入搜索内容

有一天，我注意到用长文本模糊“bug”。追踪有效文本和破坏系统的文本之间确切边界的最快方法是将文本切成两半，直到找到分界线。（事实证明我是个白痴，但我做得更好数香蕉。）
概念流程步骤

大多数人甚至不知道他们大部分时间都在使用二分（或者更好的是插值）搜索；这是解决问题的一种非常自然的方法。当考虑包含潜在错误的一长串步骤时，通常明智的做法是首先检查中间步骤之一的输出，以避免检查整个代码才发现问题出在最后一步。

Binary search is an efficient way to find an item in a sorted list. For instance, if you are looking for a specific page in a book (say, pp. 147) you'd open the book near the middle and determine if the opened page is before or after the page you are looking for. Then you'd pick the section you've narrowed it down to and repeat the process: split it in half and determine which half contains page 147. Even better, you can guess how far in page 147 is—not far if the book is very long and near the end of a short book—and use that guess as the first division point. This variation on binary search is called interpolation search.

So if you have a bug and a sorted list it might be hiding in, interpolation search is usually the way to squash it. Other answers explain the common cases of a bug hidden somewhere in a range of lines or source code commits. But the technique can be applied in other situations:

log search

On a long-running system, especially one that processes so much data you have to rotate your logs daily, it's not uncommon to see something broken today that was fine a few weeks/months/years ago. With a complicated, interlocking system, it's possible for bugs to be uncovered without any code changes. Finding what changed in the hardware, network, OS, configuration (though that should be stored along with the code), input, manual procedures, etc. can be difficult since so many of these things change over long time periods. Full text searches of the logs (whether in a table or in files) is often impractical.

In this case, there's hardly any choice but to open the logs somewhere in the middle and see if the problem exists or not. Then cut the section where you know the bug is hiding and look for the bug again. Eventually, you should be able to discover the first moment your bug showed up, which makes finding the culprit a lot easier.
input search

The other day, I noticed an obscure "bug" with long text. The fastest way to track down the exact boundary between text that worke and text that broke the system was to cut the text in half until I found the dividing line. (Turns out I'm an idiot, but I did better counting bananas.)
conceptual process steps

Most people don't even know that they are using binary (or better, interpolation) search most of the time; it's a really a natural way to break down a problem. When thinking about a long series of steps that includes a potential bug, it's often sensible to check the output of one of the middle steps first to avoid examining the entire code only to find the problem is in the last step.

回复收藏 0 原文

歌入人心 2024-07-26 04:50:12

如果您不知道 100 行程序中的哪一行有错误，您会尝试运行前 50 行并跳过其余的。如果问题出现，您就会知道第一段包含错误。接下来，您将尝试将其拆分并运行前 25 行，看看问题是否存在，依此类推，直到您找出足够短的部分来查看。

二分搜索背后的想法是识别/隔离有问题的小区域。然而，与所有方法一样，这并不适用于所有情况。例如：递归函数对于这样的工具来说会很笨重。当执行路径太多时，对要运行的代码进行分段可能会变得困难。

回复收藏 0 原文

夜光 2024-07-26 04:50:12

另一种可能性是您有一个错误，并且您知道它不存在于您的 2 月份版本中，但它存在于您的 4 月份版本中（或者更确切地说，您的 4 月份版本候选中 - 您实际上永远不会向您的用户发送错误，对吧？）。

您可以通过修订控制历史记录进行手动二分搜索，以缩小引入错误的时间。首先检查两个版本之间的代码，构建它，看看是否存在错误。继续分区，直到找出它是何时引入的。如果您不知道从哪里开始寻找错误，这可能非常有效，特别是当您进行相当小的提交时。

这与 Subversion 配合得很好，因为它具有存储库范围的修订号。如果您 2 月份的版本是 533 版，4 月份的版本是 701 版，那么您可以更新到 617 版，进行测试，然后从那里开始。（实际上，我通常四舍五入到 600，所以我不必在脑子里做那么多数学。）一旦我开始缩小范围，我就开始查看提交评论并做出有根据的猜测（“我真的不知道我认为这个提交会破坏它”），所以我通常不需要执行所有 log₂(n) 签出。

我从未使用过 Git，但他们通过内置的“bisect”命令。你给它一个起点（什么时候知道它可以工作？）和终点（你什么时候注意到它坏了？），它会自动获取二分搜索中中间点的代码。然后，在构建和测试之后，您可以告诉它此版本是通过还是失败；然后它获取下一个中点的代码。您甚至可以告诉它为每个转速运行一个命令，并使用该命令的退出代码来确定该转速是通过还是失败，此时它可以全自动运行。

回复收藏 0 原文

眉目亦如画i 2024-07-26 04:50:12

二分查找可以通过以下方式帮助调试：

假设控制必须达到某一点，而您怀疑它没有达到。将 print 语句放在第一行和最后一行代码中。假设您看到第一个语句的结果，但没有看到第二个语句的结果。中间放一条打印语句，然后重试。通过这种方式，您可以在代码行空间中使用二分搜索来将错误归零。
假设您使用版本控制系统。版本 10 通过了所有测试。即将发布的版本 70 未通过一些测试。查看版本 40 并对其运行测试。如果工作正常，请尝试版本 55。如果版本 40 失败，请尝试版本 25。这样，您可以在程序版本空间上使用二分搜索，以便将错误进入程序的第一个版本归零。

回复收藏 0 原文

ζ澈沫 2024-07-26 04:50:12

假设您有一个错误，但您不知道它在哪里。您可以随机放置断点或单步执行代码，在每个停止点验证数据。然而，更好的策略是在您正在查看的代码块中间选择一个位置。如果那里存在问题，则在开始位置和当前位置之间选择一个位置，然后重试。如果问题不存在，则选择当前位置和末尾之间的中间位置，然后重试。继续这种方式，直到您将代码量缩小到足够大的块，以便比停止/重新启动更有效地单步执行。这基本上是对你的代码进行二分搜索。