调试和二分查找
第 2 栏(“啊哈!算法”)中的“编程珍珠”讨论了二分搜索如何帮助排序、树遍历等各种过程。 但它提到二进制搜索可以用于“程序调试”。 有人可以解释一下这是如何完成的吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
第 2 栏(“啊哈!算法”)中的“编程珍珠”讨论了二分搜索如何帮助排序、树遍历等各种过程。 但它提到二进制搜索可以用于“程序调试”。 有人可以解释一下这是如何完成的吗?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(7)
二分搜索是在排序列表中查找项目的有效方法。 例如,如果您正在查找一本书中的特定页面(例如,第 147 页),您将打开该书靠近中间的位置,并确定打开的页面是在您要查找的页面之前还是之后。 然后你会选择你已经缩小到的部分并重复这个过程:将其分成两半并确定哪一半包含第 147 页。更好的是,你可以猜测第 147 页有多远——如果这本书是非常长,接近一本小书的结尾,并使用该猜测作为第一个划分点。 这种二分搜索的变体称为插值搜索。
因此,如果您有错误并且它可能隐藏在排序列表中,插值搜索通常是消除它的方法。 其他答案解释了隐藏在一系列行或源代码提交中的错误的常见情况。 但该技术可以应用于其他情况:
日志搜索
在一个长期运行的系统上,特别是在处理大量数据的系统上,您必须每天轮换日志,看到几周/几个月/几年前还好的东西今天损坏的情况并不少见。 对于复杂的、互锁的系统,无需更改任何代码就可以发现错误。 查找硬件、网络、操作系统、配置(尽管应该与代码一起存储)、输入、手动过程等中发生的变化可能很困难,因为其中很多东西都在很长一段时间内发生了变化时间段。 日志的全文搜索(无论是在表中还是在文件中)通常是不切实际的。
在这种情况下,几乎没有其他选择,只能打开中间某处的日志,看看问题是否存在。 然后剪切您知道隐藏错误的部分并再次查找错误。 最终,您应该能够在错误出现的第一刻就发现,这使得找到罪魁祸首变得更加容易。
输入搜索内容
有一天,我注意到用长文本模糊“bug”。 追踪有效文本和破坏系统的文本之间确切边界的最快方法是将文本切成两半,直到找到分界线。 (事实证明我是个白痴,但我做得更好数香蕉。)
概念流程步骤
大多数人甚至不知道他们大部分时间都在使用二分(或者更好的是插值)搜索; 这是解决问题的一种非常自然的方法。 当考虑包含潜在错误的一长串步骤时,通常明智的做法是首先检查中间步骤之一的输出,以避免检查整个代码才发现问题出在最后一步。
Binary search is an efficient way to find an item in a sorted list. For instance, if you are looking for a specific page in a book (say, pp. 147) you'd open the book near the middle and determine if the opened page is before or after the page you are looking for. Then you'd pick the section you've narrowed it down to and repeat the process: split it in half and determine which half contains page 147. Even better, you can guess how far in page 147 is—not far if the book is very long and near the end of a short book—and use that guess as the first division point. This variation on binary search is called interpolation search.
So if you have a bug and a sorted list it might be hiding in, interpolation search is usually the way to squash it. Other answers explain the common cases of a bug hidden somewhere in a range of lines or source code commits. But the technique can be applied in other situations:
log search
On a long-running system, especially one that processes so much data you have to rotate your logs daily, it's not uncommon to see something broken today that was fine a few weeks/months/years ago. With a complicated, interlocking system, it's possible for bugs to be uncovered without any code changes. Finding what changed in the hardware, network, OS, configuration (though that should be stored along with the code), input, manual procedures, etc. can be difficult since so many of these things change over long time periods. Full text searches of the logs (whether in a table or in files) is often impractical.
In this case, there's hardly any choice but to open the logs somewhere in the middle and see if the problem exists or not. Then cut the section where you know the bug is hiding and look for the bug again. Eventually, you should be able to discover the first moment your bug showed up, which makes finding the culprit a lot easier.
input search
The other day, I noticed an obscure "bug" with long text. The fastest way to track down the exact boundary between text that worke and text that broke the system was to cut the text in half until I found the dividing line. (Turns out I'm an idiot, but I did better counting bananas.)
conceptual process steps
Most people don't even know that they are using binary (or better, interpolation) search most of the time; it's a really a natural way to break down a problem. When thinking about a long series of steps that includes a potential bug, it's often sensible to check the output of one of the middle steps first to avoid examining the entire code only to find the problem is in the last step.
如果您不知道 100 行程序中的哪一行有错误,您会尝试运行前 50 行并跳过其余的。 如果问题出现,您就会知道第一段包含错误。 接下来,您将尝试将其拆分并运行前 25 行,看看问题是否存在,依此类推,直到您找出足够短的部分来查看。
二分搜索背后的想法是识别/隔离有问题的小区域。 然而,与所有方法一样,这并不适用于所有情况。 例如:递归函数对于这样的工具来说会很笨重。 当执行路径太多时,对要运行的代码进行分段可能会变得困难。
If you don't know which line in a 100-line program is buggy, you'd try to run the first 50 lines and skip the rest. If the problem shows up, you'll know this first segment contains the bug. You'd next try to split this and run the first 25 lines and see if the problem is there and so on until you have figured out a short enough piece to look at.
The idea behind binary search is to identify/isolate a small region that is buggy. However, as with all methods, this is not applicable in every situation. E.g: a recursive function will be unwieldy to such a tool. When there are far too many execution paths, segmenting your code to be run may become difficult.
另一种可能性是您有一个错误,并且您知道它不存在于您的 2 月份版本中,但它存在于您的 4 月份版本中(或者更确切地说,您的 4 月份版本候选中 - 您实际上永远不会向您的用户发送错误,对吧?)。
您可以通过修订控制历史记录进行手动二分搜索,以缩小引入错误的时间。 首先检查两个版本之间的代码,构建它,看看是否存在错误。 继续分区,直到找出它是何时引入的。 如果您不知道从哪里开始寻找错误,这可能非常有效,特别是当您进行相当小的提交时。
这与 Subversion 配合得很好,因为它具有存储库范围的修订号。 如果您 2 月份的版本是 533 版,4 月份的版本是 701 版,那么您可以更新到 617 版,进行测试,然后从那里开始。 (实际上,我通常四舍五入到 600,所以我不必在脑子里做那么多数学。)一旦我开始缩小范围,我就开始查看提交评论并做出有根据的猜测(“我真的不知道我认为这个提交会破坏它”),所以我通常不需要执行所有 log2(n) 签出。
我从未使用过 Git,但他们通过内置的“bisect”命令。 你给它一个起点(什么时候知道它可以工作?)和终点(你什么时候注意到它坏了?),它会自动获取二分搜索中中间点的代码。 然后,在构建和测试之后,您可以告诉它此版本是通过还是失败; 然后它获取下一个中点的代码。 您甚至可以告诉它为每个转速运行一个命令,并使用该命令的退出代码来确定该转速是通过还是失败,此时它可以全自动运行。
Another possibility is that you have a bug, and you know it wasn't there in your February release, but it was in your April release (or rather, your April release candidate -- you would never actually ship a bug to your users, right?).
You can do a manual binary search through your revision-control history to narrow down when the bug was introduced. First check out the code halfway between the two releases, build it, and see if the bug is there. Keep partitioning until you find out when it was introduced. If you don't know where to start looking for the bug, this can be very effective, especially if you do fairly small commits.
This works very well with Subversion because it has repository-wide revision numbers. If your February release was rev 533 and your April release was rev 701, then you update to rev 617, test it, and go from there. (Actually, I usually round to 600 so I don't have to do so much math in my head.) Once I start to narrow it down, I start looking at the commit comments and making educated guesses ("I really don't think this commit would have broken it"), so I usually don't need to do all log2(n) checkouts.
I've never used Git, but they take this one step further with the built-in "bisect" command. You give it a starting point (when was it known to work?) and ending point (when did you notice it was broken?), and it will automatically get the code for the halfway point in the binary search. Then after you've built and tested, you tell it whether this rev passed or failed; then it gets the code for the next halfway point. You can even tell it to run a command for each rev and use the command's exit code to determine whether the rev is a pass or fail, at which point it can run on full automatic.
二分查找可以通过以下方式帮助调试:
Binary search may help debugging in the following ways:
假设您有一个错误,但您不知道它在哪里。 您可以随机放置断点或单步执行代码,在每个停止点验证数据。 然而,更好的策略是在您正在查看的代码块中间选择一个位置。 如果那里存在问题,则在开始位置和当前位置之间选择一个位置,然后重试。 如果问题不存在,则选择当前位置和末尾之间的中间位置,然后重试。 继续这种方式,直到您将代码量缩小到足够大的块,以便比停止/重新启动更有效地单步执行。 这基本上是对你的代码进行二分搜索。
Let's say you have a bug, but you don't know where it is. You could place break points randomly or single step through the code, verifying the data at each stop. A better strategy, however, would be to pick a spot in the middle of the code block that you're looking at. If the problem exists there, then pick a spot mid-way between the start and the current spot and try it again. If the problem doesn't exist, then pick a spot midway between the current spot and the end, and try again. Keep going this way until you've narrowed the amount of code down to a block big enough to single step through more efficiently than stopping/restarting. That's basically doing binary search on your code.
您可以注释掉代码、添加日志注释或简单地设置断点,这
对于没有错误但功能不起作用的代码来说非常有用。 你充满了自我怀疑
首先将断点设置在代码的中间,如果一切正常那么你就知道问题不在那里
然后将它设置在代码点的 75% - 如果问题出现在这里你就知道了它位于 50% 和 50% 之间的代码中 75%
所以接下来你将其设置为 57%
再次,如果问题存在,那么你再次将其分成两半
基本上你可以在几分钟内找到问题,而不是花费大量时间重新分析你的代码
那么它仍然取决于你来修复它。
you can comment out code, add a logging comment or simply set the break point
great for code with no error but a nonfunctioning feature & you are filled with self doubt
First set the break point smack at the middle of the code, if all is well then you know the problem is not there
then set it at the 75% of code point - if the issue arises here then you know it is in the code between 50% & 75%
So next you set it at 57%
Again if the problem is there then you split it in half again
Basically you can find the issue in a few minutes rather than spending intellectually hours reanalyzing your code
Then its still up to you to fix it.
完整的算法称为“Delta 调试”,由 Andreas Zeller 开发,他是一位信息学教授,也是该书的作者 为什么程序失败。
然而,这不仅仅是二分搜索。 二分搜索仅在开始时进行,一旦二分搜索不再最小化输入,就会采取另一种方法。
完整的算法并不难理解,实际上非常简单。 然而,有时很难重现错误并应用问题是否已重现的决定。
除了这本书之外,Udacity 上还有免费的在线课程。 如果您喜欢简短版本,请阅读他的 IEEE 论文< /a>
The full algorithm is called Delta Debugging and was developed by Andreas Zeller, a Professor of informatics and author of the book Why programs fail.
However, this is not a binary search only. The binary search is only done in the beginning and once binary search does not minimize the input any more, another approach is taken.
The complete algorithm is not so hard to understand, actually very simple. However, it's sometimes difficult to reproduce the bug and apply the decision whether or not the issue has been reproduced.
Besides the book, there is a free online course on Udacity. If you prefer the short version, read his IEEE paper