搜索排序矩阵的最有效方法?
我有一个任务是编写一个算法(不是用任何特定的语言,只是伪代码),该算法接收一个矩阵 [大小:M x N],该矩阵的排序方式是所有行都已排序并且所有列都已排序单独排序,并在该矩阵中找到某个值。我需要编写我能想到的最省时的算法。
矩阵看起来像这样:
1 3 5
4 6 8
7 9 10
我的想法是从第一行和最后一列开始,简单地检查该值,如果它更大,则向下移动,如果它小于,则向左移动,并继续这样做,直到找到该值或直到找到索引超出范围(如果该值不存在)。该算法的线性复杂度为 O(m+n)。有人告诉我可以用对数复杂度来做到这一点。是否可以?如果是这样,怎么办?
I have an assignment to write an algorithm (not in any particular language, just pseudo-code) that receives a matrix [size: M x N] that is sorted in a way that all of it's rows are sorted and all of it's columns are sorted individually, and finds a certain value within this matrix. I need to write the most time-efficient algorithm I can think of.
The matrix looks something like:
1 3 5
4 6 8
7 9 10
My idea is to start at the first row and last column and simply check the value, if it's bigger go down and if it's smaller than go left and keep doing so until the value is found or until the indexes are out of bounds (in case the value does not exist). This algorithm works at linear complexity O(m+n). I've been told that it's possible to do so with a logarithmic complexity. Is it possible? and if so, how?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
你的矩阵看起来像这样:
并且具有以下属性:
所以最右最角落的值(例如
i
)始终是整个矩阵中最大的如果将矩阵分成 4 个相等的部分,则此属性是递归的。
所以我们可以尝试使用二分搜索:
因此,算法可能如下所示:
这对我来说就像 O(log n),其中 n 是矩阵中的元素数量。这是一种二分搜索,但是是二维的。我无法正式证明它,但类似于典型的二分搜索。
Your matrix looks like this:
and has following properties:
So value in lowest-rigth most corner (eg.
i
) is always the biggest in whole matrixand this property is recursive if you divide matrix into 4 equal pieces.
So we could try to use binary search:
Hence algorithm could look like this:
This looks for me like a O(log n) where n is number of elements in matrix. It is kind of binary search but in two dimensions. I cannot prove it formally but resembles typical binary search.
这就是示例输入的样子?按对角线排序?当然,这是一种有趣的类型。
由于下一行的值可能低于该行上的任何值,因此您不能对给定的数据行做出任何特定的假设。
我会(如果要求在大输入上执行此操作)将矩阵读入列表结构,该列表结构将数据作为一对元组,并将 mxn 坐标作为元组的一部分,然后对矩阵进行一次快速排序,然后根据值找到它。
或者,如果每个单独位置的值是唯一的,则将 MxN 数据放入以该值为键的字典中,然后根据输入的键(或输入的键的哈希值)跳转到 MxN 的字典条目)。
编辑:
请注意,如果您要多次查看矩阵,我上面给出的答案是有效的。如果您只需要解析一次,那么这就是您能做到的最快速度:
显然我对这个问题的评论也应该放在这里:|
检查每行的末尾(并从中间行的末尾开始)以查找高于内存数组中检查的数字的数字将是最快的,然后对每个匹配行进行二进制搜索,直到找到它。
and that's how the sample input looks? Sorted by diagonals? That's an interesting sort, to be sure.
Since the following row may have a value that's lower than any value on this row, you can't assume anything in particular about a given row of data.
I would (if asked to do this over a large input) read the matrix into a list-struct that took the data as one pair of a tuple, and the mxn coord as the part of the tuple, and then quicksort the matrix once, then find it by value.
Alternately, if the value of each individual location is unique, toss the MxN data into a dictionary keyed on the value, then jump to the dictionary entry of the MxN based on the key of the input (or the hash of the key of the input).
EDIT:
Notice that the answer I give above is valid if you're going to look through the matrix more than once. If you only need to parse it once, then this is as fast as you can do it:
Apparently my comment on the question should go down here too :|
Checking the end of each row (and starting on the end of the middle row) to find a number higher than the checked for number on an in memory array would be fastest, then doing a binary search on each matching row till you find it.
在 log M 中,您可以获得能够包含目标的一系列行(对行的第一个值进行二分搜索,对行的最后一个值进行二分搜索,仅保留第一个 <= 目标和最后一个 >= 目标的行)两次二分查找仍然是 O(log M)
然后在 O(log N) 中,您可以再次使用二分搜索来探索每一行!
这使得它的复杂度为 O(logM x logN)
塔达阿阿
in log M you can get a range of rows able to contain the target (binary search on the first value of rows, binary search on last value of rows, keep only those rows whose first <= target and last >= target) two binary searches is still O(log M)
then in O(log N) you can explore each of these rows, with again, a binary search!
that makes it O(logM x logN)
tadaaaa
这符合 Michal 的回答(我将从中窃取漂亮的图片)。
矩阵:
Min 和 max 分别是最小值和最大值。 “mid”不一定是平均值/中位数/任何值。
我们知道中间的值>=象限II中的所有值,并且<=象限IV中的所有值。我们不能对第一象限和第三象限做出这样的主张。如果我们递归,我们可以在每一级消除一个象限。
因此,如果目标值小于中值,我们必须搜索象限 I、II 和 III。如果目标值大于mid,我们必须搜索象限I、III、IV。
每一步空间减少到之前的 3/4:
n * (3/4)x = 1
n = (4/3)x
x = log4/3(n)
对数相差一个常数因子,所以这是 O(log(n))。
This is in the vein of Michal's answer (from which I will steal the nice graphic).
Matrix:
Min and max are the smallest and largest values, respectively. "mid" is not necessarily the average/median/whatever value.
We know that the value at mid is >= all values in quadrant II, and <= all values in quadrant IV. We cannot make such claims for quadrants I and III. If we recurse, we can eliminate one quadrant at each level.
Thus, if the target value is less than mid, we must search quadrants I, II, and III. If the target value is greater than mid, we must search quadrants I, III, and IV.
The space reduces to 3/4 its previous at each step:
n * (3/4)x = 1
n = (4/3)x
x = log4/3(n)
Logarithms differ by a constant factor, so this is O(log(n)).
取出对角线,然后对对角线进行二分搜索,从右下角开始检查它是否在上面,如果是,则将对角线数组位置作为它所在的列,如果不是,则检查它是否在下面。每次在对角线上命中后对列运行二分搜索(使用对角线的数组位置作为列索引)。我认为这就是 @user942640 所说的,
你可以获得上面的运行时间,并且在需要时(在某些时候)交换算法以在初始对角数组上进行二分搜索(这是考虑到它的 n * n元素并获得 x 或 y 长度为 O(1),因为 x.length = y.length 即使在百万 * 百万对角线中搜索,如果它小于对角线的半步,如果它不小于二进制。向后搜索你所在的位置(这是沿对角线进行二分搜索时对算法的轻微更改),我认为对角线比沿行进行二分搜索更好,我现在只是厌倦了查看数学:)
顺便说一句,我相信运行时间与分析略有不同,您会根据最佳/最差/平均情况以及时间与内存大小等来描述分析,因此问题最好表述为“什么是最好的”最坏情况分析中的运行时间”,因为在最好的情况下,您可以进行强力线性扫描,并且该项目可能位于第一个位置,这将是比二分搜索更好的“运行时间”...
what about getting the diagonal out, then binary search over the diagonal, start bottom right check if it is above, if yes take the diagonal array position as the column it is in, if not then check if it is below. each time running a binary search on the column once you have a hit on the diagonal (using the array position of the diagonal as the column index). I think this is what was stated by @user942640
you could get the running time of the above and when required (at some point) swap the algo to do a binary search on the initial diagonal array (this is taking into consideration its n * n elements and getting x or y length is O(1) as x.length = y.length. even on a million * million binary search the diagonal if it is less then half step back up the diagonal, if it is not less then binary search back towards where you where (this is a slight change to the algo when doing a binary search along the diagonal). I think the diagonal is better than the binary search down the rows, Im just to tired at the moment to look at the maths :)
by the way I believe running time is slightly different to analysis which you would describe in terms of best/worst/avg case, and time against memory size etc. so the question would be better stated as in 'what is the best running time in worst case analysis', because in best case you could do a brute linear scan and the item could be in the first position and this would be a better 'running time' than binary search...
这是n 的下限。从长度为 n 的未排序数组 A 开始。根据以下规则构造一个新矩阵 M:次对角线包含数组 A,其上方的所有内容均为负无穷大,其下方的所有内容均为正无穷大。行和列已排序,在 M 中查找条目与在 A 中查找条目相同。
Here is a lower bound of n. Start with an unsorted array A of length n. Construct a new matrix M according to the following rule: the secondary diagonal contains the array A, everything above it is minus infinity, everything below it is plus infinity. The rows and columns are sorted, and looking for an entry in M is the same as looking for an entry in A.
JavaScript 解决方案:
JavaScript solution:
这是错误的答案
我真的不确定是否有任何答案是最佳答案。我正在努力。
我认为时间复杂度是 2* (log m + log n)。
如果输入数组是正方形 (n * n),则可以通过沿对角线进行二分搜索来减少常量。
this is wrong answer
I am really not sure if any of the answers are the optimal answers. I am going at it.
I think the time complexity is 2* (log m + log n).
You can reduce the constant, if the input array is a square (n * n), by binary searching along the diagonal.