二分查找在实践中用在什么地方?
每个程序员都被告知,二分搜索是搜索有序数据列表的一种又好又快的方法。 有很多使用二分搜索的玩具教科书示例,但是在实际编程中呢:二分搜索在现实程序中实际使用在哪里?
Every programmer is taught that binary search is a good, fast way to search an ordered list of data. There are many toy textbook examples of using binary search, but what about in real programming: where is binary search actually used in real-life programs?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(22)
二分查找随处可见。 从任何语言库(Java、.NET、C++ STL 等)中获取任何已排序的集合,它们都将使用(或可以选择使用)二分搜索来查找值。 虽然您确实很少需要实现它,但您仍然需要了解其背后的原理才能利用它。
Binary search is used everywhere. Take any sorted collection from any language library (Java, .NET, C++ STL and so on) and they all will use (or have the option to use) binary search to find values. While true that you have to implement it rarely, you still have to understand the principles behind it to take advantage of it.
当内存空间紧张时,二分查找可用于快速访问有序数据。 假设您想要将一组 100.000 个 32 位整数存储在可搜索的有序数据结构中,但您不会经常更改该组。 您可以轻松地将整数存储在 400.000 字节的排序数组中,并且可以使用二分搜索来快速访问它。 但是,如果您将它们放入 B 树、RB 树或任何“更动态”的数据结构中,您就会开始产生内存开销。 为了说明这一点,将整数存储在具有左子节点和右子节点指针的任何类型的树中都会使您消耗至少 1.200.000 字节的内存(假设是 32 位内存架构)。 当然,您可以进行一些优化,但这就是它的一般工作原理。
由于更新有序数组(执行插入或删除)非常慢,因此当数组经常更改时,二分查找没有用处。
这里有一些我使用二分搜索的实际例子:
Binary search can be used to access ordered data quickly when memory space is tight. Suppose you want to store a set of 100.000 32-bit integers in a searchable, ordered data structure but you are not going to change the set often. You can trivially store the integers in a sorted array of 400.000 bytes, and you can use binary search to access it fast. But if you put them e.g. into a B-tree, RB-tree or whatever "more dynamic" data structure, you start to incur memory overhead. To illustrate, storing the integers in any kind of tree where you have left child and right child pointers would make you consume at least 1.200.000 bytes of memory (assuming 32-bit memory architecture). Sure, there are optimizations you can do, but that's how it works in general.
Because it is very slow to update an ordered array (doing insertions or deletions), binary search is not useful when the array changes often.
Here some practical examples where I have used binary search:
每个程序员在调试时都需要知道如何使用二分查找。
当您有一个程序,并且您知道错误在特定点可见时
在程序执行过程中,可以使用二分查找来精确定位
实际发生的地方。 这比单步执行要快得多
大部分代码。
Every programmer needs to know how to use binary search when debugging.
When you have a program, and you know that a bug is visible at a particular point
during the execution of the program, you can use binary search to pin-point the
place where it actually happens. This can be much faster than single-stepping through
large parts of the code.
二分查找是一种又好又快的方法!
在 STL 和 .NET 框架等出现之前,您经常会遇到需要推出自己的自定义集合类的情况。 只要排序数组是存储数据的可行位置,二分搜索就是在该数组中查找条目的方法。
我很确定二分搜索如今也得到了广泛使用,尽管为了方便起见,图书馆在“幕后”处理了它。
Binary search is a good and fast way!
Before the arrival of STL and .NET framework, etc, you rather often could bump into situations where you needed to roll your own customized collection classes. Whenever a sorted array would be a feasible place of storing the data, binary search would be the way of locating entries in that array.
I'm quite sure binary search is in widespread use today as well, although it is taken care of "under the hood" by the library for your convenience.
我在 BTree 实现中实现了二分搜索。
BTree 搜索算法用于查找下一个要读取的节点块,但在 4K 块本身(包含基于键大小的多个键)内,二分搜索用于查找记录号(对于叶节点) )或下一个块(对于非叶节点)。
与顺序搜索相比,速度快得令人眼花缭乱,因为就像平衡二叉树一样,每次检查都会删除一半的剩余搜索空间。
I've implemented binary searches in BTree implementations.
The BTree search algorithms were used for finding the next node block to read but, within the 4K block itself (which contained a number of keys based on the key size), binary search was used for find either the record number (for a leaf node) or the next block (for a non-leaf node).
Blindingly fast compared to sequential search since, like balanced binary trees, you remove half the remaining search space with every check.
我曾经实现过它(甚至不知道这确实是二分搜索),用于在图形中显示二维数据的 GUI 控件。 单击鼠标应将数据光标设置到最接近 x 值的点。 当处理大量点时(几个 1000,这要追溯到 x86 刚刚开始获得超过 100 MHz CPU 频率时),这并不是真正可以交互使用的 - 我从一开始就进行线性搜索。 经过一番思考后,我想到可以采用分而治之的方式来解决这个问题。 我花了一些时间让它在所有边缘情况下都能工作。
过了一段时间我才知道这确实是一个基础的CS算法……
I once implemented it (without even knowing that this was indeed binary search) for a GUI control showing two-dimensional data in a graph. Clicking with the mouse should set the data cursor to the point with the closest x value. When dealing with large numbers of points (several 1000, this was way back when x86 was only beginning to get over 100 MHz CPU frequency) this was not really usable interactively - I was doing a linear search from the start. After some thinking it occurred to me that I could approach this in a divide and conquer fashion. Took me some time to get it working under all edge cases.
It was only some time later that I learned that this is indeed a fundamental CS algorithm...
stl 集就是一个例子。 底层数据结构是平衡二叉搜索树,由于二分搜索,它支持 O(log n) 的查找、插入和删除。
另一个例子是在对数时间内运行的整数除法算法。
One example is the stl set. The underlying data structure is a balanced binary search tree which supports look-up, insertion, and deletion in O(log n) due to binary search.
Another example is an integer division algorithm that runs in log time.
我们仍然在代码中大量使用它,每秒搜索数千个 ACLS 数千次。 它很有用,因为一旦 ACL 从文件中传入,它们就是静态的,并且当我们在启动时添加数组时,我们可能会承受增加数组的费用。 一旦运行起来也非常快。
当您最多可以在 7 次比较/跳转中搜索 255 个元素的数组时(8 次比较/跳转 511 次,9 次跳转 1023 次,等等),您可以看到二分搜索的速度大约是您能达到的最快速度。
We still use it heavily in our code to search thousands of ACLS many thousands of times a second. It's useful because the ACLs are static once they come in from file, and we can suffer the expense of growing the array as we add to it at bootup. Blazingly fast once its running too.
When you can search a 255 element array in at most 7 compare/jumps (511 in 8, 1023 in 9, etc) you can see that binary search is about as fast as you can get.
嗯,二分搜索现在用于 99% 的 3D 游戏和应用程序。 空间被划分为树结构,并使用二分搜索来根据 3D 位置和摄像机检索要显示的细分。
它的第一个最伟大的展示之一是《毁灭战士》。 二叉树和相关搜索增强了渲染。
Well, binary search is now used in 99% of 3D games and applications. Space is divided into a tree structure and a binary search is used to retrieve which subdivisions to display according to a 3D position and camera.
One of its first greatest showcase was Doom. Binary trees and associated search enhanced the rendering.
用实际例子回答你的问题。
在R编程语言中,有一个包data.table。 它由C实现、语法短、数据转换的高性能扩展而闻名。 它使用二分搜索。 即使没有二分搜索,它的扩展性也比竞争对手更好。
您可以在项目 wiki 分组 2E9 - 随机顺序数据。
还有很好的基准测试、数据库与大数据benchm-databases。
在最近的 data.table 版本 (1.9.6) 中,二分搜索得到了扩展,现在可以用作任何原子列上的索引。
我刚刚找到了一个很好的总结,我完全同意 - 参见。
所以,是的,二分搜索正在被使用,并且由于它,世界变得更加美好。
Answering your question with hands-on example.
In R programming language there is a package data.table. It is known from C-implemented, short syntax, high performance extension for data transformation. It uses binary search. Even without binary search it scales better than competitors.
You can find benchmark vs python pandas and vs R dplyr in project wiki grouping 2E9 - random order data.
There is also nice benchmark vs databases vs bigdata benchm-databases.
In recent data.table version (1.9.6) binary search was extended and now can be used as index on any atomic column.
I just found a nice summary with which I totally agree - see.
So yes, binary search is being used and world is much better place thanks to it.
二进制搜索可用于使用 Git 进行调试。 它称为 git bisect。
Binary search can be used to debug with Git. It's called git bisect.
除其他地方外,我有一个解释器,其中包含命令名称表和指向解释该命令的函数的指针。 大约有 60 个命令。 使用线性搜索不会非常繁重 - 但我使用二分搜索。
Amongst other places, I have an interpreter with a table of command names and a pointer to the function to interpret that command. There are about 60 commands. It would not be incredibly onerous to use a linear search - but I use a binary search.
用于测量数字定时或模拟电平的半导体测试程序广泛使用二分搜索。 Advantest、Teradyne、Verigy 等公司的自动测试设备 (ATE) 可以被视为真值表爆破器,应用输入逻辑并验证数字部分的输出状态。
考虑一个简单的门,输入逻辑在每个周期的时间 = 0 处发生变化,并在输入逻辑变化后转换 X ns。 如果在 T=X 之前选通输出,则逻辑与预期值不匹配。 选通晚于时间 T=X,并且逻辑确实符合预期值。 二分查找用于查找逻辑不匹配的最新值与最早匹配的部分之间的阈值。(Teradyne FLEX 系统将时序解析为 39pS 分辨率,其他测试仪具有可比性)。 这是测量过渡时间的简单方法。 相同的技术可用于求解建立时间、保持时间、可操作电源电平、电源与延迟等。
任何类型的微处理器、存储器、FPGA、逻辑和许多模拟混合信号电路都在测试和表征中使用二分搜索。
——迈克
Semiconductor test programs used for measuring digital timing or analog levels make extensive use of binary search. Automatic Test Equipment (ATE) from Advantest, Teradyne, Verigy and the like can be thought of as truth table blasters, applying input logic and verifying output states of a digital part.
Think of a simple gate, with the input logic changing at time = 0 of each cycle, and transitioning X ns after the input logic changes. If you strobe the output before T=X,the logic does not match expected value. Strobe later than time T=X, and the logic does match expected value. Binary search is used to find the threshold between the latest value that the logic does not match, and the earliest part where it does.(A Teradyne FLEX system resolves timing to 39pS resolution, other testers are comparable). That's a simple way to measure transition time. Same technique can be used to solve for setup time, hold time, operable power supply levels, power supply vs. delay,etc.
Any kind of microprocessor, memory, FPGA, logic, and many analog mixed signal circuits use binary search in test and characterization.
-- mike
我有一个程序可以迭代集合来执行一些计算。 我认为这效率低下,所以我对集合进行了排序,然后使用单个二分搜索来查找感兴趣的项目。 我退回了该物品及其匹配的邻居。 我实际上已经过滤了该集合。
这样做实际上比迭代整个集合并找出匹配的项目要慢。
我继续向集合中添加项目,因为我知道排序和搜索性能最终会赶上迭代。 需要收集大约 600 个物体才能达到相同的速度。 1000 个对象具有明显的性能优势。
我还会考虑您正在使用的数据类型、重复数据和传播数据。 这将对排序和搜索产生影响。
我的答案是尝试这两种方法并计时。
I had a program that iterated through a collection to perform some calculations. I thought that this was inefficient so I sorted the collection and then used a single binary search to find an item of interest. I returned this item and its matching neighbours. I had in-effect filtered the collection.
Doing this was actually slower than iterating the entire collection and fishing out matching items.
I continued to add items to the collection knowing that the sorting and searching performance would eventually catch up with the iteration. It took a collection of about 600 objects until the speed was identical. 1000 objects had a clear performance benefit.
I would also consider the type of data you are working with, the duplicates and spread. This will have an effect on the sorting and searching.
My answer is to try both methods and time them.
它是 hg bisect 的基础
It's the basis for hg bisect
二进制排序可用于将字体调整为文本框尺寸恒定的文本大小
Binary sort is useful in adjusting fonts to size of text with constant dimension of textbox
找到方程的根可能是您想要使用像二分搜索这样的非常简单的算法做的非常简单的事情之一。
Finding roots of an equation is probably one of those very easy things you want to do with a very easy algorithm like binary search.
Delphi 用户可以在排序的 TStringList 中搜索字符串时享受二分搜索。
Delphi uses can enjoy binary search while searching string in sorted TStringList.
我相信 .NET
SortedDictionary
在幕后使用二叉树(很像 STL 映射)...因此使用二分搜索来访问SortedDictionary
中的元素I believe that the .NET
SortedDictionary
uses a binary tree behind the scenes (much like the STL map)... so a binary search is used to access elements in theSortedDictionary
Python 的
list.sort()
方法使用 Timsort (AFAIK)使用二分查找来定位元素的位置。Python's
list.sort()
method uses Timsort which (AFAIK) uses binary search to locate the positions of elements.二分搜索提供了许多现成的地图/字典实现所没有的功能:查找不完全匹配。
例如,我使用二分搜索来实现基于 GPS 日志的照片地理标记:将所有 GPS 航路点放入按时间戳排序的数组中,并使用二分搜索来识别时间上与每张照片的时间戳最接近的航路点。
Binary search offers a feature that many readymade map/dictionary implementations don't: finding non-exact matches.
For example, I've used binary search to implement geotagging of photos based on GPS logs: put all GPS waypoints in an array sorted by timestamp, and use binary search to identify the waypoint that lies closest in time to each photo's timestamp.
如果要在数组中查找一组元素,您可以线性搜索每个元素或对数组进行排序,然后使用具有相同比较谓词的二分搜索。 后者要快得多。
If you have a set of elements to find in an array you can either search for each of them linearly or sort the array and then use binary search with the same comparison predicate. The latter is much faster.