缺失号码面试问题 Redux
确定 1 到 N 范围内的缺失值的常见面试问题已经被重复了一千次。变化包括 2 个缺失值,最多 K 个缺失值。
示例问题:范围 [1,10] (1 2 4 5 7 8 9 10) = {3,6}
以下是各种解决方案的示例:
简单的面试问题变得更难:给定数字 1..100,找到缺失的数字
我的问题是,将一个缺失值的简单情况视为 O(n) 复杂度,并且较大情况的复杂度收敛于大约大于 O(nlogn) 的值:
这不是更容易回答吗通过对范围进行排序(合并排序)并对其进行迭代来观察丢失的元素来解决问题?
此解决方案的时间不应超过 O(nlogn),并且能够解决 1 到 N 以外的范围的问题,例如 10 到 1000 或 -100 到 +100 等。 对于大量缺失值,
是否有理由相信上述 SO 链接中给出的解决方案会比基于排序的解决方案更好?
注意:这个问题似乎有很多常见的解决方案,都假设仅采用数论方法。如果在 S/E 面试中被问到这样的问题,假设该方法与数论解决方案的复杂性相当,那么使用更计算机科学/算法的方法不是明智的选择吗
?相关链接:
The common interview problem of determining the missing value in a range from 1 to N has been done a thousand times over. Variations include 2 missing values up to K missing values.
Example problem: Range [1,10] (1 2 4 5 7 8 9 10) = {3,6}
Here is an example of the various solutions:
Easy interview question got harder: given numbers 1..100, find the missing number(s)
My question is that seeing as the simple case of one missing value is of O(n) complexity and that the complexity of the larger cases converge at roughly something larger than O(nlogn):
Couldn't it just be easier to answer the question by saying sort (mergesort) the range and iterate over it observing the missing elements?
This solution should take no more than O(nlogn) and is capable of solving the problem for ranges other than 1-to-N such as 10-to-1000 or -100 to +100 etc...
Is there any reason to believe that the given solutions in the above SO link will be better than the sorting based solution for larger number of missing values?
Note: It seems a lot of the common solutions to this problem, assume an only number theoretic approach. If one is being asked such a question in an S/E interview wouldn't it be prudent to use a more computer science/algorithmic approach, assuming the approach is on par with the number theoretic solution's complexity...
More related links:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我已经回答了此处
您还可以创建一个大小为
last_element_in_the_existing_array + 1
的布尔数组。在
for
循环中,标记现有数组中存在的所有元素true
。在另一个 for 循环中打印包含 false 的元素的索引,也称为缺失元素。
时间复杂度:
O(last_element_in_the_existing_array)
空间复杂度:
O(array.length)
I already answered it HERE
You can also create an array of boolean of the size
last_element_in_the_existing_array + 1
.In a
for
loop mark all the elementtrue
that are present in the existing array.In another
for
loop print the index of the elements which containsfalse
AKA The missing ones.Time Complexity:
O(last_element_in_the_existing_array)
Space Complexity:
O(array.length)
如果范围提前给你,在这种情况下范围是[1,10],你可以用你的范围和给你的数字执行异或运算。由于 XOR 是可交换运算。您将剩下 {3,6}
(1 2 3 4 5 6 7 8 9 10) XOR (1 2 4 5 7 8 9 10) ={3,6}
If the range is given to you well ahead, in this case range is [1,10] you can perform XOR operation with your range and the numbers given to you. Since XOR is commutative operation. You will be left with {3,6}
(1 2 3 4 5 6 7 8 9 10) XOR (1 2 4 5 7 8 9 10) ={3,6}
您仅指定时间复杂度,但空间复杂度也很重要。
问题的复杂性可以用
N
(范围的长度)和K
(缺失元素的数量)来指定。在您链接的问题中,使用方程的解在空间中是 O(K) (或者可能更多一点?),因为每个未知值需要一个方程。
还有一个保存点:您可以更改已知元素列表吗?在许多情况下,这是不希望的,在这种情况下,任何涉及对元素重新排序或消耗它们的解决方案都必须首先在空间中制作一个副本,O(NK)。
我无法比线性解决方案更快地看到:您需要读取所有已知元素(NK)并输出所有未知元素(K)。因此,你无法及时获得比 O(N) 更好的结果。
让我们分解解
就个人而言,虽然我发现方程系统解决方案很聪明,但我可能会使用其中任何一种排序解决方案。让我们面对现实吧:它们的编码要简单得多,尤其是计数排序!
就时间而言,在实际执行中,我认为“计数排序”将轻松击败所有其他解决方案。
注意:计数排序不要求范围为
[0, X)
,任何范围都可以,因为任何有限范围都可以转置为[ 0, X)
通过简单翻译形成。编辑:
将排序更改为 O(N),需要拥有所有可用于对它们进行排序的元素。
经过一些时间思考这个问题后,我还提出了另一个解决方案。如前所述,当 N (急剧)增长时,所需的空间可能会爆炸。但是,如果 K 很小,那么我们可以使用间隔来更改列表的表示:
{4, 5, 3, 1, 7}
可以表示为
[1,1] U [3,5] U [7,7]
在一般情况下,维护间隔的排序列表比维护元素的排序列表成本要低得多,而且推断缺失的数字也很容易。
时间复杂度很简单:O(N log N),毕竟它基本上是一种插入排序。
当然,真正有趣的是,不需要实际存储列表,因此您可以将其通过流提供给算法。
另一方面,我很难计算出平均空间复杂度。 “最终”占用的空间是 O(K)(最多 K+1 个间隔),但在构造过程中,由于我们没有按特定顺序引入元素,因此会出现更多缺失的间隔。
最坏的情况很简单:N/2 个间隔(考虑奇数与偶数)。但我无法弄清楚平均情况。我的直觉告诉我它应该比 O(N) 更好,但我不太相信。
You are only specifying the time complexity, but the space complexity is also important to consider.
The problem complexity can be specified in term of
N
(the length of the range) andK
(the number of missing elements).In the question you link, the solution of using equations is O(K) in space (or perhaps a bit more ?), as you need one equation per unknown value.
There is also the preservation point: may you alter the list of known elements ? In a number of cases this is undesirable, in which case any solution involving reordering the elements, or consuming them, must first make a copy, O(N-K) in space.
I cannot see faster than a linear solution: you need to read all known elements (N-K) and output all unknown elements (K). Therefore you cannot get better than O(N) in time.
Let us break down the solutions
Personally, though I find the equation system solution clever, I would probably use either of the sorting solutions. Let's face it: they are much simpler to code, especially the counting sort one!
And as far as time goes, in a real execution, I think the "counting sort" would beat all other solutions hands down.
Note: the counting sort does not require the range to be
[0, X)
, any range will do, as any finite range can be transposed to the[0, X)
form by a simple translation.EDIT:
Changed the sort to O(N), one needs to have all the elements available to sort them.
Having had some time to think about the problem, I also have another solution to propose. As noted, when N grows (dramatically) the space required might explode. However, if K is small, then we could change our representation of the list, using intervals:
{4, 5, 3, 1, 7}
can be represented as
[1,1] U [3,5] U [7,7]
In the average case, maintaining a sorted list of intervals is much less costly than maintaining a sorted list of elements, and it's as easy to deduce the missing numbers too.
The time complexity is easy: O(N log N), after all it's basically an insertion sort.
Of course what's really interesting is that there is no need to actually store the list, thus you can feed it with a stream to the algorithm.
On the other hand, I have quite a hard time figuring out the average space complexity. The "final" space occupied is O(K) (at most K+1 intervals), but during the construction there will be much more missing intervals as we introduce the elements in no particular order.
The worst case is easy enough: N/2 intervals (think odd vs even numbers). I cannot however figure out the average case though. My gut feeling is telling me it should be better than O(N), but I am not that trusting.
理论上给定的解决方案是否优于排序方案取决于 N 和 K。虽然您的解决方案的复杂度为
O(N*log(N))
,但给定的解决方案为O(N *K)
。我认为给定的解决方案(与排序解决方案相同)能够通过将范围[A, B]
转换为[A, B]
来解决任何范围[A, B]
代码>[1,N]。Whether the given solution is theoretically better than the sorting one depends on N and K. While your solution has complexity of
O(N*log(N))
, the given solution isO(N*K)
. I think that the given solution is (same as the sorting solution) able to solve any range[A, B]
just by transforming the range[A, B]
to[1, N]
.这又如何呢?
您的集合中剩下的是丢失的数字。
What about this?
What's left in your set are the missing numbers.
2011 年(在您发布此问题之后)
Caf
发布了 一个简单的答案,可以在O(n)
时间和O(k)
空间中解决问题 [其中数组大小为n - k< /代码>]。
重要的是,与其他解决方案不同,Caf 的答案没有隐藏的内存要求(使用位数组、向元素添加数字、将元素乘以
-1
- 这些都需要O(log(n) )
空格)。注意:这里的问题(以及原始问题)没有询问问题的流版本,并且这里的答案不处理这种情况。
关于其他答案:我同意针对这个问题提出的许多“解决方案”都有可疑的复杂性声明,并且如果它们的时间复杂度在某种程度上并不比其中任何一个更好:
O(n)时间和空间)
O(n*log(n))
时间,O(1)
空间)...然后你可以那么就通过排序来解决问题吧。
然而,我们可以获得更好的复杂性(更重要的是,真正更快的解决方案):
In 2011 (after you posted this question)
Caf
posted a simple answer that solves the problem inO(n)
time andO(k)
space [where the array size isn - k
].Importantly, unlike in other solutions, Caf's answer has no hidden memory requirements (using bit array's, adding numbers to elements, multiplying elements by
-1
- these would all requireO(log(n))
space).Note: The question here (and the original question) didn't ask about the streaming version of the problem, and the answer here doesn't handle that case.
Regarding the other answers: I agree that many of the proposed "solutions" to this problem have dubious complexity claims, and if their time complexities aren't better in some way than either:
O(n)
time and space)O(n*log(n))
time,O(1)
space)...then you may as well just solve the problem by sorting.
However, we can get better complexities (and more importantly, genuinely faster solutions):
由于这些数字取自较小的有限范围,因此可以在线性时间内对它们进行“排序”。
我们所做的就是初始化一个包含 100 个布尔值的数组,并为每个输入设置与输入中每个数字对应的布尔值,然后逐步报告未设置的布尔值。
Because the numbers are taken from a small, finite range, they can be 'sorted' in linear time.
All we do is initialize an array of 100 booleans, and for each input, set the boolean corresponding to each number in the input, and then step through reporting the unset booleans.
如果总共有 N 个元素,其中每个数字 x 都满足 1 <= x <= N 那么我们可以用 <时间复杂度为 O(nlogn),空间复杂度为 O(1)。
当 N > 时,时间复杂度为 O(nlogn)+O(n),几乎等于 O(nlogn)。 100.
If there are total N elements where each number x is such that 1 <= x <= N then we can solve this in O(nlogn) time complexity and O(1) space complexity.
Time complexity is O(nlogn)+O(n) almost equal to O(nlogn) when N > 100.