简单的面试问题变得更难:给定数字 1..100,在给定 k 个缺失的情况下找到缺失的数字
不久前我有过一次有趣的求职面试经历。这个问题一开始很简单:
Q1:我们有一个包含数字
1
、2
、3
、...、的包100
。每个数字只出现一次,所以有 100 个数字。现在从袋子里随机选出一个号码。找到丢失的号码。
当然,我以前听说过这个面试问题,所以我很快就回答了以下内容:
A1:嗯,数字之和
1 + 2 + 3 + … + N
是(N+1)(N/2)< /code> (参见维基百科:算术级数之和)。对于
N = 100
,总和为5050
。
因此,如果袋子中包含所有数字,则总和将恰好为
5050
。由于缺少一个数字,因此总和将小于此数字,差值就是该数字。因此我们可以在O(N)
时间和O(1)
空间中找到丢失的数字。
到这里我以为自己已经做得很好了,但是突然问题来了个意想不到的转折:
问题2:这是正确的,但现在如果缺少两个个数字,你会怎么做?
我以前从未见过/听说过/考虑过这种变化,所以我惊慌失措,无法回答这个问题。面试官坚持要了解我的思考过程,所以我提到也许我们可以通过与预期产品进行比较来获得更多信息,或者可能在从第一遍收集了一些信息后进行第二遍等等,但我真的只是在拍摄在黑暗中而不是实际上有一条清晰的解决方案。
面试官确实试图鼓励我,说拥有第二个方程确实是解决问题的一种方法。此时我有点沮丧(因为事先不知道答案),并询问这是否是一种通用的(读:“有用的”)编程技术,或者这是否只是一个技巧/陷阱的答案。
面试官的回答令我惊讶:你可以推广该技术来找到 3 个缺失的数字。事实上,您可以将其概括为查找 k 个缺失的数字。
Qk:如果包中恰好缺少 k 个号码,您将如何有效地找到它?
这是几个月前的事了,我仍然不明白这个技术是什么。显然,存在一个Ω(N)
时间下限,因为我们必须至少扫描一次所有数字,但面试官坚持认为时间和空间 求解技术的复杂性(减去O(N)
时间输入扫描)在k而不是N中定义。
所以这里的问题很简单:
- 你将如何解决Q2?
- 您将如何解决Q3?
- 你会如何解决Qk?
说明
- 通常有从 1..N 开始的 N 个数字,而不仅仅是 1..100。
- 我不是在寻找明显的基于集合的解决方案,例如使用 位集,对通过指定位的值来确定每个数字的存在/不存在,因此在附加空间中使用
O(N)
位。我们无法承担与N成正比的任何额外空间。 - 我也不是在寻找明显的排序优先方法。这种方法和基于集合的方法在面试中值得一提(它们很容易实现,并且根据N,可以非常实用)。我正在寻找圣杯解决方案(它可能会或可能不会实际实施,但仍然具有所需的渐近特性)。
再说一遍,当然您必须在 O(N)
中扫描输入,但您只能捕获少量信息(以 k 定义,而不是 N ),然后必须以某种方式找到 k 个缺失的数字。
I had an interesting job interview experience a while back. The question started really easy:
Q1: We have a bag containing numbers
1
,2
,3
, …,100
. Each number appears exactly once, so there are 100 numbers. Now one number is randomly picked out of the bag. Find the missing number.
I've heard this interview question before, of course, so I very quickly answered along the lines of:
A1: Well, the sum of the numbers
1 + 2 + 3 + … + N
is(N+1)(N/2)
(see Wikipedia: sum of arithmetic series). ForN = 100
, the sum is5050
.Thus, if all numbers are present in the bag, the sum will be exactly
5050
. Since one number is missing, the sum will be less than this, and the difference is that number. So we can find that missing number inO(N)
time andO(1)
space.
At this point I thought I had done well, but all of a sudden the question took an unexpected turn:
Q2: That is correct, but now how would you do this if TWO numbers are missing?
I had never seen/heard/considered this variation before, so I panicked and couldn't answer the question. The interviewer insisted on knowing my thought process, so I mentioned that perhaps we can get more information by comparing against the expected product, or perhaps doing a second pass after having gathered some information from the first pass, etc, but I really was just shooting in the dark rather than actually having a clear path to the solution.
The interviewer did try to encourage me by saying that having a second equation is indeed one way to solve the problem. At this point I was kind of upset (for not knowing the answer before hand), and asked if this is a general (read: "useful") programming technique, or if it's just a trick/gotcha answer.
The interviewer's answer surprised me: you can generalize the technique to find 3 missing numbers. In fact, you can generalize it to find k missing numbers.
Qk: If exactly k numbers are missing from the bag, how would you find it efficiently?
This was a few months ago, and I still couldn't figure out what this technique is. Obviously there's a Ω(N)
time lower bound since we must scan all the numbers at least once, but the interviewer insisted that the TIME and SPACE complexity of the solving technique (minus the O(N)
time input scan) is defined in k not N.
So the question here is simple:
- How would you solve Q2?
- How would you solve Q3?
- How would you solve Qk?
Clarifications
- Generally there are N numbers from 1..N, not just 1..100.
- I'm not looking for the obvious set-based solution, e.g. using a bit set, encoding the presence/absence each number by the value of a designated bit, therefore using
O(N)
bits in additional space. We can't afford any additional space proportional to N. - I'm also not looking for the obvious sort-first approach. This and the set-based approach are worth mentioning in an interview (they are easy to implement, and depending on N, can be very practical). I'm looking for the Holy Grail solution (which may or may not be practical to implement, but has the desired asymptotic characteristics nevertheless).
So again, of course you must scan the input in O(N)
, but you can only capture small amount of information (defined in terms of k not N), and must then find the k missing numbers somehow.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
以下是 迪米特里斯·安德烈欧的 链接。
记住 i 次方的和,其中 i=1,2,..,k。这将问题简化为求解方程组
a1 + a2 + ... + ak = b1< /sub>
a12 + a22 + ... + ak2 = b2
...
a1k + a2k + ... + ak< super>k = bk
使用 牛顿身份,知道 bi 可以计算
c1 = a1 + a2 + 。 .. ak
c2 = a1a2 + a1a 3 + ... + ak-1ak
...
ck = a1a2 ... ak
如果展开多项式 (xa 1)...(xak) 系数将恰好为 c1, ..., ck > - 参见Viète 公式。由于每个多项式因子都是唯一的(多项式环是一个欧几里得域),这意味着 i 是唯一确定的,直至排列。
这就证明了记住幂就足以恢复数字。对于常数 k,这是一个很好的方法。
然而,当 k 变化时,计算 c1,...,ck 的直接方法非常昂贵,因为例如 ck是所有缺失数字的乘积,大小为 n!/(nk)!。为了解决这个问题,请执行在 Zq 字段中进行计算,其中 q 是一个质数,使得 n <= q < 2n - 它由 Bertrand 假设存在。证明不需要更改,因为公式仍然成立,并且多项式的因式分解仍然是唯一的。您还需要一种对有限域进行因式分解的算法,例如 Berlekamp 或 < a href="https://en.wikipedia.org/wiki/Cantor%E2%80%93Zassenhaus_algorithm" rel="noreferrer">康托-扎森豪斯。
常数 k 的高级伪代码:
对于变化的 k,找到素数 n <= q < 2n 使用例如 Miller-Rabin,并执行所有数字以 q 为模减少的步骤。
编辑:此答案的先前版本指出,可以使用特征 2 (q=2^(log n)) 的有限域来代替 Zq(其中 q 是素数)。事实并非如此,因为牛顿公式要求除以 k 以内的数字。
Here's a summary of Dimitris Andreou's link.
Remember sum of i-th powers, where i=1,2,..,k. This reduces the problem to solving the system of equations
a1 + a2 + ... + ak = b1
a12 + a22 + ... + ak2 = b2
...
a1k + a2k + ... + akk = bk
Using Newton's identities, knowing bi allows to compute
c1 = a1 + a2 + ... ak
c2 = a1a2 + a1a3 + ... + ak-1ak
...
ck = a1a2 ... ak
If you expand the polynomial (x-a1)...(x-ak) the coefficients will be exactly c1, ..., ck - see Viète's formulas. Since every polynomial factors uniquely (ring of polynomials is an Euclidean domain), this means ai are uniquely determined, up to permutation.
This ends a proof that remembering powers is enough to recover the numbers. For constant k, this is a good approach.
However, when k is varying, the direct approach of computing c1,...,ck is prohibitely expensive, since e.g. ck is the product of all missing numbers, magnitude n!/(n-k)!. To overcome this, perform computations in Zq field, where q is a prime such that n <= q < 2n - it exists by Bertrand's postulate. The proof doesn't need to be changed, since the formulas still hold, and factorization of polynomials is still unique. You also need an algorithm for factorization over finite fields, for example the one by Berlekamp or Cantor-Zassenhaus.
High level pseudocode for constant k:
For varying k, find a prime n <= q < 2n using e.g. Miller-Rabin, and perform the steps with all numbers reduced modulo q.
EDIT: The previous version of this answer stated that instead of Zq, where q is prime, it is possible to use a finite field of characteristic 2 (q=2^(log n)). This is not the case, since Newton's formulas require division by numbers up to k.
我们可以通过将数字本身和数字的平方相加来求解 Q2。
然后我们可以将问题简化为
其中
x
和y
是总和低于预期值的程度。替换给我们:
然后我们可以解决它以确定我们丢失的数字。
We can solve Q2 by summing both the numbers themselves, and the squares of the numbers.
We can then reduce the problem to
Where
x
andy
are how far the sums are below the expected values.Substituting gives us:
Which we can then solve to determine our missing numbers.
我请一个四岁的孩子来解决这个问题。他把数字排序,然后一起数。这需要 O(厨房地板)的空间,尽管缺少很多球,但它的工作原理一样简单。
I asked a 4-year-old to solve this problem. He sorted the numbers and then counted along. This has a space requirement of O(kitchen floor), and it works just as easy however many balls are missing.
正如 @j_random_hacker 指出的,这与在 O(n) 时间和 O(1) 空间中查找重复项非常相似,我的答案的改编也适用于此。
假设“袋子”由大小为
N - k
的从 1 开始的数组A[]
表示,我们可以在O(N)< 中求解 Qk /code> 时间和
O(k)
额外空间。首先,我们将数组
A[]
扩展为k
个元素,使其现在的大小为N
。这是O(k)
额外空间。然后,我们运行以下伪代码算法:第一个循环将
k
个额外条目初始化为与数组中的第一个条目相同(这只是一个方便的值,我们知道它已经存在于array - 在此步骤之后,大小为Nk
的初始数组中缺失的任何条目在扩展数组中仍然缺失)。第二个循环对扩展数组进行排列,以便如果元素
x
至少出现一次,则这些条目之一将位于位置A[x]
。请注意,虽然它有一个嵌套循环,但它仍然以
O(N)
时间运行 - 仅当存在i
使得A[i ] != i
,并且每次交换都设置至少一个元素,使得A[i] == i
,而以前情况并非如此。这意味着交换总数(以及while
循环体的执行总数)最多为N-1
。第三个循环打印数组
i
中未被值i
占用的索引 - 这意味着i
一定已丢失。As @j_random_hacker pointed out, this is quite similar to Finding duplicates in O(n) time and O(1) space, and an adaptation of my answer there works here too.
Assuming that the "bag" is represented by a 1-based array
A[]
of sizeN - k
, we can solve Qk inO(N)
time andO(k)
additional space.First, we extend our array
A[]
byk
elements, so that it is now of sizeN
. This is theO(k)
additional space. We then run the following pseudo-code algorithm:The first loop initialises the
k
extra entries to the same as the first entry in the array (this is just a convenient value that we know is already present in the array - after this step, any entries that were missing in the initial array of sizeN-k
are still missing in the extended array).The second loop permutes the extended array so that if element
x
is present at least once, then one of those entries will be at positionA[x]
.Note that although it has a nested loop, it still runs in
O(N)
time - a swap only occurs if there is ani
such thatA[i] != i
, and each swap sets at least one element such thatA[i] == i
, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of thewhile
loop body) is at mostN-1
.The third loop prints those indexes of the array
i
that are not occupied by the valuei
- this means thati
must have been missing.不确定这是否是最有效的解决方案,但我会循环所有条目,并使用位集来记住设置了哪些数字,然后测试 0 位。
我喜欢简单的解决方案 - 我什至相信,它可能比计算总和或平方和等更快。
Not sure, if it's the most efficient solution, but I would loop over all entries, and use a bitset to remember, which numbers are set, and then test for 0 bits.
I like simple solutions - and I even believe, that it might be faster than calculating the sum, or the sum of squares etc.
我没有检查数学,但我怀疑在计算
Σ(n)
的同一遍中计算Σ(n^2)
将提供足够的信息来获取两个缺失的数字,如果有三个,也执行Σ(n^3)
,依此类推。I haven't checked the maths, but I suspect that computing
Σ(n^2)
in the same pass as we computeΣ(n)
would provide enough info to get two missing numbers, DoΣ(n^3)
as well if there are three, and so on.基于数字总和的解决方案的问题是,它们没有考虑存储和处理大指数数字的成本......在实践中,为了使其适用于非常大的 n,将使用大数字库。我们可以分析这些算法的空间利用率。
我们可以分析sdcvvc和Dimitris Andreou算法的时间和空间复杂度。
存储:
所以
l_j \in \Theta(j log n)
使用的总存储空间:
\sum_{j=1}^k l_j \in \Theta(k^2 log n)
code>使用的空间:假设计算
a^j
需要ceil(log_2 j)
时间,总时间:使用的总时间:
\Theta(kn log n)
如果这个时空如果满足的话,可以使用简单的递归
算法。设 b!i 为包中的第 i 个条目,n 为之前数字的个数
移除次数,k 为移除次数。在 Haskell 语法中...
使用的存储:
O(k)
对于列表,O(log(n))
对于堆栈:O(k + log(n) ))
该算法更加直观,具有相同的时间复杂度,并且占用的空间更少。
The problem with solutions based on sums of numbers is they don't take into account the cost of storing and working with numbers with large exponents... in practice, for it to work for very large n, a big numbers library would be used. We can analyse the space utilisation for these algorithms.
We can analyse the time and space complexity of sdcvvc and Dimitris Andreou's algorithms.
Storage:
So
l_j \in \Theta(j log n)
Total storage used:
\sum_{j=1}^k l_j \in \Theta(k^2 log n)
Space used: assuming that computing
a^j
takesceil(log_2 j)
time, total time:Total time used:
\Theta(kn log n)
If this time and space is satisfactory, you can use a simple recursive
algorithm. Let b!i be the ith entry in the bag, n the number of numbers before
removals, and k the number of removals. In Haskell syntax...
Storage used:
O(k)
for list,O(log(n))
for stack:O(k + log(n))
This algorithm is more intuitive, has the same time complexity, and uses less space.
Q2 的一个非常简单的解决方案,令我惊讶的是还没有人回答。使用 Q1 中的方法求出两个缺失数字的总和。我们用 S 来表示,那么缺失的数字之一小于 S/2,另一个大于 S/2(废话)。将 1 到 S/2 之间的所有数字相加,并将其与公式的结果进行比较(与 Q1 中的方法类似),以找出缺失数字中较小的一个。从 S 中减去它即可找到更大的缺失数。
A very simple solution to Q2 which I'm surprised nobody answered already. Use the method from Q1 to find the sum of the two missing numbers. Let's denote it by S, then one of the missing numbers is smaller than S/2 and the other is bigger than S/2 (duh). Sum all the numbers from 1 to S/2 and compare it to the formula's result (similarly to the method in Q1) to find the lower between the missing numbers. Subtract it from S to find the bigger missing number.
等一下。正如问题所述,袋子里有 100 个数字。无论 k 有多大,问题都可以在常数时间内解决,因为您可以使用一个集合并在最多 100 - k 次循环迭代中从该集合中删除数字。 100 是常数。剩下的一组数字就是你的答案。
如果我们将解推广到从 1 到 N 的数字,除了 N 不是常数之外没有任何变化,因此我们的时间为 O(N - k) = O(N)。例如,如果我们使用位集,我们会在 O(N) 时间内将这些位设置为 1,迭代这些数字,然后将这些位设置为 0 (O(Nk) = O(N)),然后我们有答案了。
在我看来,面试官是在问你如何在 O(k) 时间而不是 O(N) 时间内打印最后一组的内容。显然,设置了一个位后,您必须迭代所有 N 位才能确定是否应该打印该数字。但是,如果您更改该集合的实现方式,您可以在 k 次迭代中打印出数字。这是通过将数字放入要存储在哈希集和双向链表中的对象中来完成的。当您从哈希集中删除一个对象时,您也会将其从列表中删除。答案将保留在现在长度为 k 的列表中。
Wait a minute. As the question is stated, there are 100 numbers in the bag. No matter how big k is, the problem can be solved in constant time because you can use a set and remove numbers from the set in at most 100 - k iterations of a loop. 100 is constant. The set of remaining numbers is your answer.
If we generalise the solution to the numbers from 1 to N, nothing changes except N is not a constant, so we are in O(N - k) = O(N) time. For instance, if we use a bit set, we set the bits to 1 in O(N) time, iterate through the numbers, setting the bits to 0 as we go (O(N-k) = O(N)) and then we have the answer.
It seems to me that the interviewer was asking you how to print out the contents of the final set in O(k) time rather than O(N) time. Clearly, with a bit set, you have to iterate through all N bits to determine whether you should print the number or not. However, if you change the way the set is implemented you can print out the numbers in k iterations. This is done by putting the numbers into an object to be stored in both a hash set and a doubly linked list. When you remove an object from the hash set, you also remove it from the list. The answers will be left in the list which is now of length k.
要解决 2(和 3)个缺失数字的问题,您可以修改
quickselect
,平均运行时间为O(n)
,并且如果分区就地完成,则使用常量内存。根据随机主元
p
将集合划分为分区l
,其中包含小于主元的数字,以及r
,其中包含大于主元的数字。通过将主元值与每个分区的大小进行比较来确定 2 个缺失数字所在的分区(
p - 1 - count(l) = l 中缺失数字的计数
和n - count(r) - p = r 中缺失数字的计数
)a) 如果每个分区缺失一个数字,则使用总和差法查找每个缺失数字。< /p>
(1 + 2 + ... + (p-1)) - sum(l) = 缺失 #1
和((p+1) + (p+2) ... + n) - sum(r) = 缺失#2
b) 如果一个分区缺少这两个数字并且该分区为空,则缺少的数字为
(p-1,p-2)
或(p+1,p +2)
取决于哪个分区缺少数字。
如果一个分区缺少 2 个数字但不为空,则递归到该分区。
由于只有 2 个缺失数字,该算法总是丢弃至少一个分区,因此它保留了快速选择的
O(n)
平均时间复杂度。类似地,对于 3 个缺失数字,该算法也会在每次传递中丢弃至少一个分区(因为与 2 个缺失数字一样,最多只有 1 个分区包含多个缺失数字)。但是,我不确定添加更多缺失数字时性能会下降多少。这是一个不使用就地分区的实现,因此该示例不满足空间要求,但它确实说明了算法的步骤:
演示
To solve the 2 (and 3) missing numbers question, you can modify
quickselect
, which on average runs inO(n)
and uses constant memory if partitioning is done in-place.Partition the set with respect to a random pivot
p
into partitionsl
, which contain numbers smaller than the pivot, andr
, which contain numbers greater than the pivot.Determine which partitions the 2 missing numbers are in by comparing the pivot value to the size of each partition (
p - 1 - count(l) = count of missing numbers in l
andn - count(r) - p = count of missing numbers in r
)a) If each partition is missing one number, then use the difference of sums approach to find each missing number.
(1 + 2 + ... + (p-1)) - sum(l) = missing #1
and((p+1) + (p+2) ... + n) - sum(r) = missing #2
b) If one partition is missing both numbers and the partition is empty, then the missing numbers are either
(p-1,p-2)
or(p+1,p+2)
depending on which partition is missing the numbers.
If one partition is missing 2 numbers but is not empty, then recurse onto that partiton.
With only 2 missing numbers, this algorithm always discards at least one partition, so it retains
O(n)
average time complexity of quickselect. Similarly, with 3 missing numbers this algorithm also discards at least one partition with each pass (because as with 2 missing numbers, at most only 1 partition will contain multiple missing numbers). However, I'm not sure how much the performance decreases when more missing numbers are added.Here's an implementation that does not use in-place partitioning, so this example does not meet the space requirement but it does illustrate the steps of the algorithm:
Demo
对于 Q2,这是一个比其他解决方案效率低一些的解决方案,但仍然具有 O(N) 运行时间并占用 O(k) 空间。
这个想法是运行原始算法两次。在第一个中,您会得到丢失的总数,这给您提供了丢失数字的上限。我们将这个数字称为
N
。您知道缺少的两个数字的总和将为N
,因此第一个数字只能位于区间[1, Floor((N-1)/2)]< /code> 而第二个将位于
[floor(N/2)+1,N-1]
中。因此,您再次循环所有数字,丢弃第一个间隔中未包含的所有数字。那些是,你记录它们的总和。最后,您将知道缺失的两个数字之一,并进而知道第二个数字。
我有一种感觉,这种方法可以推广,并且可能在一次输入输入期间“并行”运行多个搜索,但我还没有弄清楚如何实现。
For Q2 this is a solution that is a bit more inefficient than the others, but still has O(N) runtime and takes O(k) space.
The idea is to run the original algorithm two times. In the first one you get a total number which is missing, which gives you an upper bound of the missing numbers. Let's call this number
N
. You know that the missing two numbers are going to sum up toN
, so the first number can only be in the interval[1, floor((N-1)/2)]
while the second is going to be in[floor(N/2)+1,N-1]
.Thus you loop on all numbers once again, discarding all numbers that are not included in the first interval. The ones that are, you keep track of their sum. Finally, you'll know one of the missing two numbers, and by extension the second.
I have a feeling that this method could be generalized and maybe multiple searches run in "parallel" during a single pass over the input, but I haven't yet figured out how.
这是一个使用 k 位额外存储的解决方案,没有任何巧妙的技巧,而且很简单。执行时间O(n),额外空间O(k)。只是为了证明这个问题可以解决,而无需先阅读解决方案或成为天才:
Here's a solution that uses k bits of extra storage, without any clever tricks and just straightforward. Execution time O (n), extra space O (k). Just to prove that this can be solved without reading up on the solution first or being a genius:
动机
如果你想解决一般情况的问题,并且可以存储和编辑数组,那么 Caf 的解决方案 是迄今为止最有效的。如果您无法存储数组(流版本),则 sdcvvc 的答案是当前建议的唯一解决方案类型。
如果您可以存储数组但无法编辑它,我提出的解决方案是最有效的答案(到目前为止在该线程上),我从 Svalorzen 的解决方案,可解决 1 或 2 个缺失项目。此解决方案需要
θ(k*n)
时间和O(min(k,log(n)))
和Ω(log(k))
代码> 空格。它也适用于并行性。概念
这个想法是,如果您使用比较总和的原始方法:
sum = SumOf(1,n) - SumOf(array)
...然后取缺失数字的平均值:
average = sum/n_missing_numbers
...它提供了一个边界:在缺失的数字中,保证至少有一个数字小于或等于
平均值
、并且至少有一个数字大于平均值
。这意味着我们可以分成子问题,每个子问题扫描数组 [O(n)
] 并且只关心它们各自的子数组。代码
C 风格的解决方案(不要判断我的全局变量,我只是想让代码对非 C 人员可读):
分析
暂时忽略递归,每个函数调用显然都需要
O( n)
时间和O(1)
空间。请注意,sum
可以等于n(n-1)/2
,因此需要存储n-1
所需位数的两倍>。这最多意味着我们实际上需要两个额外元素的空间,无论数组或 k 的大小如何,因此在正常情况下它仍然是 O(1) 空间惯例。对于
k
缺失的元素有多少函数调用并不那么明显,所以我将提供一个视觉效果。您的原始子数组(连接数组)是完整数组,其中包含所有k
缺失元素。我们将按升序想象它们,其中--
表示连接(同一子数组的一部分):m1 -- m2 -- m3 -- m4 -- (...) -- mk-1 -- mk< /sub>
Find 功能的作用是将缺失的元素断开到不同的不重叠的位置子数组。它保证每个子数组中至少有一个缺失元素,这意味着恰好中断一个连接。
这意味着,无论分割如何发生,总是需要调用
k-1
Find
函数来查找只有一个的子数组其中缺少元素。因此时间复杂度为 θ((k-1 + k) * n) = θ(k*n) 。
对于空间复杂度,如果我们每次按比例除法,那么我们会得到
O(log(k))
空间复杂度,但如果我们一次只分离一个,就会得到O(k)
。请参阅此处,了解为什么空间复杂度为
O(log(n))代码>.鉴于上面我们已经证明它也是
O(k)
,那么我们就知道它是O(min(k,log(n)))
。Motivation
If you want to solve the general-case problem, and you can store and edit the array, then Caf's solution is by far the most efficient. If you can't store the array (streaming version), then sdcvvc's answer is the only type of solution currently suggested.
The solution I propose is the most efficient answer (so far on this thread) if you can store the array but can't edit it, and I got the idea from Svalorzen's solution, which solves for 1 or 2 missing items. This solution takes
Θ(k*n)
time andO(min(k,log(n)))
andΩ(log(k))
space. It also works well with parallelism.Concept
The idea is that if you use the original approach of comparing sums:
sum = SumOf(1,n) - SumOf(array)
... then you take the average of the missing numbers:
average = sum/n_missing_numbers
... which provides a boundary: Of the missing numbers, there's guaranteed to be at least one number less-or-equal to
average
, and at least one number greater thanaverage
. This means that we can split into sub problems that each scan the array [O(n)
] and are only concerned with their respective sub-arrays.Code
C-style solution (don't judge me for the global variables, I'm just trying to make the code readable for non-c folks):
Analysis
Ignoring recursion for a moment, each function call clearly takes
O(n)
time andO(1)
space. Note thatsum
can equal as much asn(n-1)/2
, so requires double the amount of bits needed to storen-1
. At most this means than we effectively need two extra elements worth of space, regardless of the size of the array ork
, hence it's stillO(1)
space under the normal conventions.It's not so obvious how many function calls there are for
k
missing elements, so I'll provide a visual. Your original sub-array (connected array) is the full array, which has allk
missing elements in it. We'll imagine them in increasing order, where--
represent connections (part of same sub-array):m1 -- m2 -- m3 -- m4 -- (...) -- mk-1 -- mk
The effect of the
Find
function is to disconnect the missing elements into different non-overlapping sub-arrays. It guarantees that there's at least one missing element in each sub-array, which means breaking exactly one connection.What this means is that regardless of how the splits occur, it will always take
k-1
Find
function calls to do the work of finding the sub-arrays that have only one missing element in it.So the time complexity is
Θ((k-1 + k) * n) = Θ(k*n)
.For the space complexity, if we divide proportionally each time then we get
O(log(k))
space complexity, but if we only separate one at a time it gives usO(k)
.See here for a proof as to why the space complexity is
O(log(n))
. Given that above we've shown that it's alsoO(k)
, then we know that it'sO(min(k,log(n)))
.也许这个算法可以解决问题 1:
甚至更好:
该算法实际上可以针对两个缺失的数字进行扩展。第一步保持不变。当我们调用 GetValue 并添加两个缺失的数字时,结果将是
a1^a2
,即两个缺失的数字。假设val = a1^a2
现在,为了从 val 中筛选出 a1 和 a2,我们采用 val 中的任何设置位。假设第 i 个位在 val 中设置。这意味着 a1 和 a2 在第 i 位位置具有不同的奇偶校验。
现在我们对原始数组进行另一次迭代并保留两个异或值。一个用于设置了第 i 位的数字,另一个用于未设置第 i 位的数字。我们现在有两个数字桶,并且保证
a1和a2
将位于不同的桶中。现在重复我们在每个桶上查找一个缺失元素时所做的相同操作。May be this algorithm can work for question 1:
Or even better:
This algorithm can in fact be expanded for two missing numbers. The first step remains the same. When we call GetValue with two missing numbers the result will be a
a1^a2
are the two missing numbers. Lets sayval = a1^a2
Now to sieve out a1 and a2 from val we take any set bit in val. Lets say the
ith
bit is set in val. That means that a1 and a2 have different parity atith
bit position.Now we do another iteration on the original array and keep two xor values. One for the numbers which have the ith bit set and other which doesn't have the ith bit set. We now have two buckets of numbers, and its guranteed that
a1 and a2
will lie in different buckets. Now repeat the same what we did for finding one missing element on each of the bucket.有一种通用方法可以解决此类流媒体问题。
这个想法是使用一点随机化来希望将
k
元素“分散”到独立的子问题中,我们的原始算法可以为我们解决问题。该技术用于稀疏信号重建等。u = k^2
的数组a
。h : {1,...,n} -> {1,...,u}
。 (如 multiply-shift)i
code>1, ..., n 增加a[h(i)] += i
x
,减少a[h(x)] -= x
。如果所有缺失的数字都已散列到不同的存储桶,则数组的非零元素现在将包含缺失的数字。
根据通用哈希函数的定义,特定对发送到同一存储桶的概率小于
1/u
。由于大约有k^2/2
对,因此错误概率最多为k^2/2/u=1/2
。也就是说,我们成功的概率至少为 50%,如果我们增加u
,我们的机会就会增加。请注意,该算法需要
k^2 logn
位空间(我们需要每个数组存储桶logn
位。)这与 @Dimitris Andreou 的答案所需的空间相匹配(特别是多项式因式分解的空间要求,恰好也是随机的。)该算法每次更新的时间也是恒定的,而不是幂和情况下的时间
k
。事实上,通过使用评论中描述的技巧,我们甚至可以比幂和方法更有效。
There is a general way to solve streaming problems like this.
The idea is to use a bit of randomization to hopefully 'spread' the
k
elements into independent sub problems, where our original algorithm solves the problem for us. This technique is used in sparse signal reconstruction, among other things.a
, of sizeu = k^2
.h : {1,...,n} -> {1,...,u}
. (Like multiply-shift)i
in1, ..., n
increasea[h(i)] += i
x
in the input stream, decrementa[h(x)] -= x
.If all of the missing numbers have been hashed to different buckets, the non-zero elements of the array will now contain the missing numbers.
The probability that a particular pair is sent to the same bucket, is less than
1/u
by definition of a universal hash function. Since there are aboutk^2/2
pairs, we have that the error probability is at mostk^2/2/u=1/2
. That is, we succeed with probability at least 50%, and if we increaseu
we increase our chances.Notice that this algorithm takes
k^2 logn
bits of space (We needlogn
bits per array bucket.) This matches the space required by @Dimitris Andreou's answer (In particular the space requirement of polynomial factorization, which happens to also be randomized.)This algorithm also has constant time per update, rather than time
k
in the case of power-sums.In fact, we can be even more efficient than the power sum method by using the trick described in the comments.
你能检查一下每个数字是否都存在吗?如果是的话你可以尝试这个:
如果缺失数字是
x
和y
那么:因此,您检查从
1
到max(x)
的范围并找到数字Can you check if every number exists? If yes you may try this:
if the missing numbers are
x
andy
then:So you check the range from
1
tomax(x)
and find the number如果您有两个列表的总和以及两个列表的乘积,则可以解决 Q2。
(l1 是原始列表,l2 是修改后的列表)
我们可以对此进行优化,因为算术级数的总和是第一项和最后一项的平均值的 n 倍:
现在我们知道(如果 a 和 b 是删除的数字):
所以我们可以重新排列为:
并相乘:
重新排列,使右侧为零:
然后我们可以用二次公式求解:
示例 Python 3 代码:
我不知道 sqrt、reduce 和 sum 函数的复杂性,所以我不能计算出该解决方案的复杂性(如果有谁知道请在下面评论。)
You can solve Q2 if you have the sum of both lists and the product of both lists.
(l1 is the original, l2 is the modified list)
We can optimise this since the sum of an arithmetic series is n times the average of the first and last terms:
Now we know that (if a and b are the removed numbers):
So we can rearrange to:
And multiply out:
And rearrange so the right side is zero:
Then we can solve with the quadratic formula:
Sample Python 3 code:
I do not know the complexity of the sqrt, reduce and sum functions so I cannot work out the complexity of this solution (if anyone does know please comment below.)
这是一个解决方案,它不像 sdcvvc/Dimitris Andreou 的答案那样依赖复杂的数学,不像 caf 和 Colonel Panic 那样改变输入数组,并且不像 Chris Lercher、JeremyP 和其他许多人也这样做了。基本上,我从 Svalorzen/Gilad Deutch 的 Q2 想法开始,将其推广到常见情况 Qk 并用 Java 实现以证明该算法有效。
想法
假设我们有一个任意区间I,我们只知道它至少包含一个缺失的数字。遍历输入数组后,仅查看 I 中的数字,我们可以获得总和 S 以及缺失的数量 Q来自我的数字。为此,我们只需在每次遇到 I 中的数字时减少 I 的长度(用于获取 Q),并减少预先计算的总和I 中每次遇到的数字的所有数字(用于获取S)。
现在我们看看S 和Q。如果Q = 1,则意味着I只包含一个缺失的数字,并且这个数字显然是S。我们将I标记为已完成(在程序中称为“明确”)并将其排除在进一步考虑之外。另一方面,如果Q > 1,我们可以计算I中包含的缺失数的平均值A = S / Q。由于所有数字都是不同的,因此这些数字中至少有一个严格小于 A,并且至少有一个严格大于 A。现在我们将 A 中的 I 分成两个较小的区间,每个区间至少包含一个缺失的数字。请注意,如果 A 是整数,则分配给哪个间隔并不重要。
我们进行下一个数组遍历,分别计算每个间隔的 S 和 Q(但在同一次遍历中),然后用 Q = 1< 标记间隔/em> 并用 Q > 分割间隔1。我们继续这一过程,直到没有新的“模糊”区间,即我们没有什么可分割的,因为每个区间恰好包含一个缺失的数字(并且我们总是知道这个数字,因为我们知道S)。我们从包含所有可能数字的唯一“整个范围”区间开始(例如问题中的 [1..N])。
时间和空间复杂度分析
直到进程停止为止我们需要执行的总次数 p 永远不会大于缺失数字计数 k。不等式p <= k可以严格证明。另一方面,也存在一个经验上限p < 。 log2N + 3 这对于较大的 k 值很有用。我们需要对输入数组的每个数字进行二分查找,以确定它所属的区间。这会将 log k 乘数添加到时间复杂度中。
总的来说,时间复杂度为O(N ᛫ min(k, log N) ᛫ log k)。请注意,对于较大的 k,这明显优于 sdcvvc/Dimitris Andreou 的方法,即 O(N ᛫ k)。
就其工作而言,该算法需要 O(k) 个额外空间来存储最多 k 个间隔,这明显优于 O (N) 在“bitset”解决方案中。
Java 实现
这是一个实现上述算法的 Java 类。它总是返回一个排序缺失数字的数组。除此之外,它不需要缺失的数字计数k,因为它在第一遍中计算它。整个数字范围由
minNumber
和maxNumber
参数给出(例如,问题中的第一个示例为 1 和 100)。为了公平起见,此类接收更适合大型数组测试,因为它避免了原始
NumberBag
对象形式的输入。NumberBag
不允许数组修改和随机访问,并且还会统计数组被请求顺序遍历的次数。它也比 Iterableint
值的装箱,并允许包装大型int[] 的一部分
方便测试准备。如果需要的话,在findint[]
或Iterable
类型替换NumberBag
并不难。 code> 签名,将其中的两个 for 循环更改为 foreach 循环。测试
下面给出了演示这些类的用法的简单示例。
大型阵列测试可以通过以下方式执行:
在 Ideone 上尝试
Here is a solution that doesn't rely on complex math as sdcvvc's/Dimitris Andreou's answers do, doesn't change the input array as caf and Colonel Panic did, and doesn't use the bitset of enormous size as Chris Lercher, JeremyP and many others did. Basically, I began with Svalorzen's/Gilad Deutch's idea for Q2, generalized it to the common case Qk and implemented in Java to prove that the algorithm works.
The idea
Suppose we have an arbitrary interval I of which we only know that it contains at least one of the missing numbers. After one pass through the input array, looking only at the numbers from I, we can obtain both the sum S and the quantity Q of missing numbers from I. We do this by simply decrementing I's length each time we encounter a number from I (for obtaining Q) and by decreasing pre-calculated sum of all numbers in I by that encountered number each time (for obtaining S).
Now we look at S and Q. If Q = 1, it means that then I contains only one of the missing numbers, and this number is clearly S. We mark I as finished (it is called "unambiguous" in the program) and leave it out from further consideration. On the other hand, if Q > 1, we can calculate the average A = S / Q of missing numbers contained in I. As all numbers are distinct, at least one of such numbers is strictly less than A and at least one is strictly greater than A. Now we split I in A into two smaller intervals each of which contains at least one missing number. Note that it doesn't matter to which of the intervals we assign A in case it is an integer.
We make the next array pass calculating S and Q for each of the intervals separately (but in the same pass) and after that mark intervals with Q = 1 and split intervals with Q > 1. We continue this process until there are no new "ambiguous" intervals, i.e. we have nothing to split because each interval contains exactly one missing number (and we always know this number because we know S). We start out from the sole "whole range" interval containing all possible numbers (like [1..N] in the question).
Time and space complexity analysis
The total number of passes p we need to make until the process stops is never greater than the missing numbers count k. The inequality p <= k can be proved rigorously. On the other hand, there is also an empirical upper bound p < log2N + 3 that is useful for large values of k. We need to make a binary search for each number of the input array to determine the interval to which it belongs. This adds the log k multiplier to the time complexity.
In total, the time complexity is O(N ᛫ min(k, log N) ᛫ log k). Note that for large k, this is significantly better than that of sdcvvc/Dimitris Andreou's method, which is O(N ᛫ k).
For its work, the algorithm requires O(k) additional space for storing at most k intervals, that is significantly better than O(N) in "bitset" solutions.
Java implementation
Here's a Java class that implements the above algorithm. It always returns a sorted array of missing numbers. Besides that, it doesn't require the missing numbers count k because it calculates it in the first pass. The whole range of numbers is given by the
minNumber
andmaxNumber
parameters (e.g. 1 and 100 for the first example in the question).For fairness, this class receives input in form of
NumberBag
objects.NumberBag
doesn't allow array modification and random access and also counts how many times the array was requested for sequential traversing. It is also more suitable for large array testing thanIterable<Integer>
because it avoids boxing of primitiveint
values and allows wrapping a part of a largeint[]
for a convenient test preparation. It is not hard to replace, if desired,NumberBag
byint[]
orIterable<Integer>
type in thefind
signature, by changing two for-loops in it into foreach ones.Tests
Simple examples demonstrating the usage of these classes are given below.
Large array testing can be performed this way:
Try them out on Ideone
我认为这不需要任何复杂的数学方程和理论就可以完成。下面是一个就地和 O(2n) 时间复杂度解决方案的建议:
输入形式假设:
袋子中的数字 = n
丢失的数字 = k
袋子中的数字由长度为 n 的数组表示
输入的长度algo 的数组 = n
数组中缺失的条目(从包中取出的数字)将被数组中第一个元素的值替换。
例如。最初的包看起来像 [2,9,3,7,8,6,4,5,1,10]。
如果取出 4,则 4 的值将变为 2(数组的第一个元素)。
因此,取出 4 后,袋子将看起来像 [2,9,3,7,8,6,2,5,1,10]
此解决方案的关键是通过对值取反来标记已访问数字的 INDEX该 INDEX 作为数组被遍历。
I think this can be done without any complex mathematical equations and theories. Below is a proposal for an in place and O(2n) time complexity solution:
Input form assumptions :
# of numbers in bag = n
# of missing numbers = k
The numbers in the bag are represented by an array of length n
Length of input array for the algo = n
Missing entries in the array (numbers taken out of the bag) are replaced by the value of the first element in the array.
Eg. Initially bag looks like [2,9,3,7,8,6,4,5,1,10].
If 4 is taken out, value of 4 will become 2 (the first element of the array).
Therefore after taking 4 out the bag will look like [2,9,3,7,8,6,2,5,1,10]
The key to this solution is to tag the INDEX of a visited number by negating the value at that INDEX as the array is traversed.
感谢这个非常有趣的问题:
因为你提醒了我牛顿的工作确实可以解决这个问题
请参考
作为要查找的变量数量=方程数量(必须保持一致性)
我相信为此我们应该提出收集数字以创建许多不同方程的能力。
我不知道,但是,我相信是否应该有一个函数
f
,我们将为其添加 f( xi )x< sub>1 + x2 + ... + xk = z1
x1 2 + x22 + ... + xk2 = z2
。 …………………………
<
/
x1k support> + x2k + ... + xkk = zk
其余的是一部不确定时间和空间复杂性的数学著作,但牛顿恒等式肯定会发挥重要作用。
.difference_update()
或者这个问题方法中是否有线性代数的机会?Thanks for this very interesting question:
It's because you reminded me Newton's work which really can solve this problem
Please refer Newton's Identities
As number of variables to find = number of equations (must for consistency)
I believe for this we should raise power to bag numbers so as to create number of different equations.
I don't know but, I believe if there should a function say
f
for which we'll add f( xi )x1 + x2 + ... + xk = z1
x12 + x22 + ... + xk2 = z2
............
............
............
x1k + x2k + ... + xkk = zk
rest is a mathematical work not sure about time and space complexity but Newton's Identities will surely play important role.
.difference_update()
or Is there any chance of Linear Algebra in this question method?非常好的问题。我会选择对 Qk 使用设定的差异。许多编程语言甚至都支持它,例如 Ruby:
它可能不是最有效的解决方案,但如果我在这种情况下面临这样的任务(已知边界,低边界),我会在现实生活中使用它。如果数字集非常大,那么我当然会考虑更有效的算法,但在那之前简单的解决方案对我来说就足够了。
Very nice problem. I'd go for using a set difference for Qk. A lot of programming languages even have support for it, like in Ruby:
It's probably not the most efficient solution but it's one I would use in real life if I was faced with such a task in this case (known boundaries, low boundaries). If the set of number would be very large then I would consider a more efficient algorithm, of course, but until then the simple solution would be enough for me.
您可能需要澄清 O(k) 的含义。
这是任意 k 的一个简单解决方案:对于数字集中的每个 v,累加 2^v 的总和。最后,将 i 从 1 循环到 N。如果与 2^i 进行按位与运算的和为零,则 i 丢失。 (或者在数字上,如果总和除以 2^i 的下限是偶数。或者
sum modulo 2^(i+1))
sum modulo 2^(i+1))
sum modulo 2^(i+1) 2^i
。)很简单,对吧? O(N)时间,O(1)存储,并且支持任意k。
除非您要计算大量数字,而在真实计算机上,每个数字都需要 O(N) 空间。事实上,这个解决方案与位向量相同。
因此,您可以聪明地计算总和、平方和以及立方和……直到 v^k 之和,然后进行复杂的数学运算来提取结果。但这些数字也很大,这就引出了一个问题:我们在谈论什么抽象的操作模型? O(1) 空间适合多少,以及需要多长时间才能对所需大小的数字求和?
You'd probably need clarification on what O(k) means.
Here's a trivial solution for arbitrary k: for each v in your set of numbers, accumulate the sum of 2^v. At the end, loop i from 1 to N. If sum bitwise ANDed with 2^i is zero, then i is missing. (Or numerically, if floor of the sum divided by 2^i is even. Or
sum modulo 2^(i+1)) < 2^i
.)Easy, right? O(N) time, O(1) storage, and it supports arbitrary k.
Except that you're computing enormous numbers that on a real computer would each require O(N) space. In fact, this solution is identical to a bit vector.
So you could be clever and compute the sum and the sum of squares and the sum of cubes... up to the sum of v^k, and do the fancy math to extract the result. But those are big numbers too, which begs the question: what abstract model of operation are we talking about? How much fits in O(1) space, and how long does it take to sum up numbers of whatever size you need?
我已经阅读了所有 30 个答案,发现最简单的一个,即使用 100 的位数组是最好的。但正如问题所说,我们不能使用大小为 N 的数组,我将使用 O(1) 空间复杂度和 k 次迭代,即 O(NK) 时间复杂度来解决这个问题。
为了让解释更简单,假设我得到了从 1 到 15 的数字,其中两个缺失,即 9 和 14,但我不知道。让包看起来像这样:
我们知道每个数字在内部都是以位的形式表示的。
对于 16 之前的数字,我们只需要 4 位。对于 10^9 之前的数字,我们需要 32 位。但我们先关注 4 位,然后再对其进行概括。
现在,假设如果我们有从 1 到 15 的所有数字,那么在内部,我们将有这样的数字(如果我们对它们进行排序):
但现在我们缺少两个数字。因此,我们的表示形式将如下所示(为了便于理解而按顺序显示,但可以按任何顺序):
现在让我们创建一个大小为 2 的位数组,用于保存具有相应 2 个最高有效位的数字的计数。即
从左和右扫描袋子并填充上面的数组,使得位数组的每个bin都包含数字的计数。结果如下:
如果所有数字都存在,则结果将如下所示:
因此我们知道缺少两个数字:一个的最高 2 位有效位是 10,一个的最高 2 位有效位是 11现在再次扫描列表并为低 2 位有效数字填写大小为 2 的位数组。这次,只考虑最高 2 位有效数字为 10 的元素。我们的位数组将如下:
如果所有 MSD=10 的数字都存在,则所有 bin 中都会有 1,但现在我们发现缺少一个。因此,我们得到了 MSD=10 和 LSD=01 缺失的数字,即 1001,即 9。
同样,如果我们再次扫描但只考虑 MSD=11 的元素,我们会得到 MSD=11 和 LSD=10 缺失,即 1110,即14.
这样,我们就可以在一定的空间内找到缺失的数字。我们可以将其推广为 100、1000 或 10^9 或任何数字集。
参考文献:http://users.ece.utexas 中的问题 1.6。 edu/~adnan/afi-samples-new.pdf
I have read all thirty answers and found the simplest one i.e to use a bit array of 100 to be the best. But as the question said we can't use an array of size N, I would use O(1) space complexity and k iterations i.e O(NK) time complexity to solve this.
To make the explanation simpler, consider I have been given numbers from 1 to 15 and two of them are missing i.e 9 and 14 but I don't know. Let the bag look like this:
We know that each number is represented internally in the form of bits.
For numbers till 16 we only need 4 bits. For numbers till 10^9, we will need 32 bits. But let's focus on 4 bits and then later we can generalize it.
Now, assume if we had all the numbers from 1 to 15, then internally, we would have numbers like this (if we had them ordered):
But now we have two numbers missing. So our representation will look something like this (shown ordered for understanding but can be in any order):
Now let's make a bit array of size 2 that holds the count of numbers with corresponding 2 most significant digits. i.e
Scan the bag from left and right and fill the above array such that each of bin of bit array contains the count of numbers. The result will be as under:
If all the numbers would have been present, it would have looked like this:
Thus we know that there are two numbers missing: one whose most 2 significant digits are 10 and one whose most 2 significant bits are 11. Now scan the list again and fill out a bit array of size 2 for the lower 2 significant digits. This time, only consider elements whose most 2 significant digits are 10. We will have the bit array as:
If all numbers of MSD=10 were present, we would have 1 in all the bins but now we see that one is missing. Thus we have the number whose MSD=10 and LSD=01 is missing which is 1001 i.e 9.
Similarly, if we scan again but consider only elements whose MSD=11,we get MSD=11 and LSD=10 missing which is 1110 i.e 14.
Thus, we can find the missing numbers in a constant amount of space. We can generalize this for 100, 1000 or 10^9 or any set of numbers.
References: Problem 1.6 in http://users.ece.utexas.edu/~adnan/afi-samples-new.pdf
您可以尝试使用 Bloom Filter。将袋中的每个数字插入到 Bloom 中,然后迭代完整的 1-k 集合,直到报告未找到每个数字。这可能无法在所有情况下找到答案,但可能是一个足够好的解决方案。
You could try using a Bloom Filter. Insert each number in the bag into the bloom, then iterate over the complete 1-k set until reporting each one not found. This may not find the answer in all scenarios, but might be a good enough solution.
我会对这个问题采取不同的方法,并向面试官询问他试图解决的更大问题的更多细节。根据问题及其周围的要求,明显的基于集合的解决方案可能是正确的,而生成列表并随后选择的方法可能不是。
例如,面试官可能要发送
n
条消息,并且需要知道没有得到回复的k
条消息,并且需要知道它:第 nk 个回复到达后,挂钟时间尽可能短。我们还可以说,消息通道的性质是,即使全速运行,也有足够的时间在消息之间进行一些处理,而不会对最后一个回复到达后生成最终结果所需的时间产生任何影响。可以利用这段时间将每条发送的消息的一些标识方面插入到一个集合中,并在每个相应的回复到达时将其删除。一旦最后一个回复到达,唯一要做的就是从集合中删除其标识符,这在典型的实现中需要O(log k+1)
。之后,该集合包含k
个缺失元素的列表,并且无需进行任何其他处理。这当然不是批量处理预先生成的数字包的最快方法,因为整个过程的运行时间为 O((log 1 + log 2 + ... + log n) + (log n + log n-1 + ... + log k))。但它确实适用于任何
k
值(即使事先不知道),并且在上面的示例中,它以最小化最关键间隔的方式应用。I'd take a different approach to that question and probe the interviewer for more details about the larger problem he's trying to solve. Depending on the problem and the requirements surrounding it, the obvious set-based solution might be the right thing and the generate-a-list-and-pick-through-it-afterward approach might not.
For example, it might be that the interviewer is going to dispatch
n
messages and needs to know thek
that didn't result in a reply and needs to know it in as little wall clock time as possible after then-k
th reply arrives. Let's also say that the message channel's nature is such that even running at full bore, there's enough time to do some processing between messages without having any impact on how long it takes to produce the end result after the last reply arrives. That time can be put to use inserting some identifying facet of each sent message into a set and deleting it as each corresponding reply arrives. Once the last reply has arrived, the only thing to be done is to remove its identifier from the set, which in typical implementations takesO(log k+1)
. After that, the set contains the list ofk
missing elements and there's no additional processing to be done.This certainly isn't the fastest approach for batch processing pre-generated bags of numbers because the whole thing runs
O((log 1 + log 2 + ... + log n) + (log n + log n-1 + ... + log k))
. But it does work for any value ofk
(even if it's not known ahead of time) and in the example above it was applied in a way that minimizes the most critical interval.这可能听起来很愚蠢,但是,在呈现给您的第一个问题中,您必须查看袋子中的所有剩余数字,然后将它们实际相加,以使用该方程找到丢失的数字。
因此,既然您可以看到所有数字,只需查找丢失的数字即可。当两个数字缺失时也是如此。我认为很简单。当你看到袋子里剩余的数字时,使用方程式就没有意义了。
This might sound stupid, but, in the first problem presented to you, you would have to see all the remaining numbers in the bag to actually add them up to find the missing number using that equation.
So, since you get to see all the numbers, just look for the number that's missing. The same goes for when two numbers are missing. Pretty simple I think. No point in using an equation when you get to see the numbers remaining in the bag.
我不知道这是否有效,但我想建议这个解决方案。
4.使用总和公式 diff 的常用方法获取缺失的 No 的总和,假设 diff 为 d。
现在运行一个循环来获取可能的对 (p,q),它们都位于 [1 , 100] 中,并且总和为 d。
当获得一对时,检查(3 的结果)是否 XOR p = q
如果是,我们就完成了。
如果我错了,请纠正我,如果这是正确的,请评论时间复杂度
I don't know whether this is efficient or not but I would like to suggest this solution.
4.Get the sum of the missing Nos with your usual approach of the sum formula diff and lets say the diff is d.
Now run a loop to get the possible pairs (p,q) both of which lies in [1 , 100] and sum to d.
When a pair is obtained check whether (result of 3) XOR p = q
and if yes we are done.
Please correct me if I am wrong and also comment on time complexity if this is correct
您可以通过对称性(数学语言中的群)来思考它来激发解决方案。无论这组数字的顺序如何,答案都应该是相同的。如果您要使用
k
函数来帮助确定缺失的元素,您应该考虑哪些函数具有该属性:对称。函数s_1(x) = x_1 + x_2 + ... + x_n
是对称函数的一个示例,但还有其他更高阶的函数。特别是考虑初等对称函数。 2 次初等对称函数为s_2(x) = x_1 x_2 + x_1 x_3 + ... + x_1 x_n + x_2 x_3 + ... + x_(n-1) x_n
,总和两种元素的所有乘积。对于 3 次及以上的初等对称函数也是如此。它们显然是对称的。此外,事实证明它们是所有对称函数的构建块。您可以通过记下
s_2(x,x_(n+1)) = s_2(x) + s_1(x)(x_(n+1))
来构建初等对称函数。进一步思考应该会让您相信s_3(x,x_(n+1)) = s_3(x) + s_2(x)(x_(n+1))
等等,因此它们可以是一次性计算。我们如何判断数组中缺少哪些项目?考虑一下多项式
(z-x_1)(z-x_2)...(z-x_n)
。如果您输入任何数字x_i
,其计算结果为0
。展开多项式,可得到z^n-s_1(x)z^(n-1)+ ... + (-1)^n s_n
。初等对称函数也出现在这里,这并不奇怪,因为如果我们对根应用任何排列,多项式应该保持不变。因此,我们可以构建多项式并尝试将其分解以找出哪些数字不在集合中,正如其他人提到的那样。
最后,如果我们担心大数溢出内存(第 n 个对称多项式的量级为
100!
),我们可以进行这些计算mod p
,其中p
是大于 100 的素数。在这种情况下,我们计算多项式mod p
并发现当输入是数字时它再次计算为0
在集合中,并且它的计算结果为当输入的数字不在集合中时,为非零值。然而,正如其他人指出的那样,为了及时从多项式中获取取决于k
而不是N
的值,我们必须对多项式mod 进行因式分解p
。You can motivate the solution by thinking about it in terms of symmetries (groups, in math language). No matter the order of the set of numbers, the answer should be the same. If you're going to use
k
functions to help determine the missing elements, you should be thinking about what functions have that property: symmetric. The functions_1(x) = x_1 + x_2 + ... + x_n
is an example of a symmetric function, but there are others of higher degree. In particular, consider the elementary symmetric functions. The elementary symmetric function of degree 2 iss_2(x) = x_1 x_2 + x_1 x_3 + ... + x_1 x_n + x_2 x_3 + ... + x_(n-1) x_n
, the sum of all products of two elements. Similarly for the elementary symmetric functions of degree 3 and higher. They are obviously symmetric. Furthermore, it turns out they are the building blocks for all symmetric functions.You can build the elementary symmetric functions as you go by noting that
s_2(x,x_(n+1)) = s_2(x) + s_1(x)(x_(n+1))
. Further thought should convince you thats_3(x,x_(n+1)) = s_3(x) + s_2(x)(x_(n+1))
and so on, so they can be computed in one pass.How do we tell which items were missing from the array? Think about the polynomial
(z-x_1)(z-x_2)...(z-x_n)
. It evaluates to0
if you put in any of the numbersx_i
. Expanding the polynomial, you getz^n-s_1(x)z^(n-1)+ ... + (-1)^n s_n
. The elementary symmetric functions appear here too, which is really no surprise, since the polynomial should stay the same if we apply any permutation to the roots.So we can build the polynomial and try to factor it to figure out which numbers are not in the set, as others have mentioned.
Finally, if we are concerned about overflowing memory with large numbers (the nth symmetric polynomial will be of the order
100!
), we can do these calculationsmod p
wherep
is a prime bigger than 100. In that case we evaluate the polynomialmod p
and find that it again evaluates to0
when the input is a number in the set, and it evaluates to a non-zero value when the input is a number not in the set. However, as others have pointed out, to get the values out of the polynomial in time that depends onk
, notN
, we have to factor the polynomialmod p
.我相信我有一个
O(k)
时间和O(log(k))
空间算法,假设您有floor(x)
> 和log2(x)
函数可用于任意大整数:您有一个
k
位长整数(因此有log8(k)
空间) 在其中添加x^2
,其中 x 是您在袋子中找到的下一个数字:s=1^2+2^2+...
这需要O( N)
时间(这对面试官来说不是问题)。最后你会得到j=floor(log2(s))
,这是你正在寻找的最大数字。然后s=sj
,您再次执行上述操作:现在,您通常没有用于
2756
位整数的floor和log2函数,而是用于双精度数。所以?简单地说,对于每 2 个字节(或 1、3 或 4),您可以使用这些函数来获取所需的数字,但这会增加O(N)
因子的时间复杂度I believe I have a
O(k)
time andO(log(k))
space algorithm, given that you have thefloor(x)
andlog2(x)
functions for arbitrarily big integers available:You have an
k
-bit long integer (hence thelog8(k)
space) where you add thex^2
, where x is the next number you find in the bag:s=1^2+2^2+...
This takesO(N)
time (which is not a problem for the interviewer). At the end you getj=floor(log2(s))
which is the biggest number you're looking for. Thens=s-j
and you do again the above:Now, you usually don't have floor and log2 functions for
2756
-bit integers but instead for doubles. So? Simply, for each 2 bytes (or 1, or 3, or 4) you can use these functions to get the desired numbers, but this adds anO(N)
factor to time complexity尝试求 1 到 50 之间的数字的乘积:
设乘积,P1 = 1 x 2 x 3 x ........................ 50
当您逐个取出数字时,将它们相乘,以便你得到产品P2。但这里缺少两个数字,因此 P2 < P1。
两个缺失项的乘积 axb = P1 - P2。
您已经知道总和,a + b = S1。
由以上两个方程,通过二次方程求解a和b。 a 和 b 是您缺失的数字。
Try to find the product of numbers from 1 to 50:
Let product, P1 = 1 x 2 x 3 x ............. 50
When you take out numbers one by one, multiply them so that you get the product P2. But two numbers are missing here, hence P2 < P1.
The product of the two mising terms, a x b = P1 - P2.
You already know the sum, a + b = S1.
From the above two equations, solve for a and b through a quadratic equation. a and b are your missing numbers.