简单的面试问题变得更难：给定数字 1..100，在给定 k 个缺失的情况下找到缺失的数字

发布于 2024-12-11 19:44:59 字数 2040 浏览 0 评论 0原文

不久前我有过一次有趣的求职面试经历。这个问题一开始很简单：

Q1：我们有一个包含数字 1、2、3、...、的包100。每个数字只出现一次，所以有 100 个数字。现在从袋子里随机选出一个号码。找到丢失的号码。

当然，我以前听说过这个面试问题，所以我很快就回答了以下内容：

A1：嗯，数字之和1 + 2 + 3 + … + N是(N+1)(N/2)< /code> （参见维基百科：算术级数之和）。对于 N = 100，总和为 5050。
因此，如果袋子中包含所有数字，则总和将恰好为 5050。由于缺少一个数字，因此总和将小于此数字，差值就是该数字。因此我们可以在 O(N) 时间和 O(1) 空间中找到丢失的数字。

到这里我以为自己已经做得很好了，但是突然问题来了个意想不到的转折：

问题2：这是正确的，但现在如果缺少两个个数字，你会怎么做？

我以前从未见过/听说过/考虑过这种变化，所以我惊慌失措，无法回答这个问题。面试官坚持要了解我的思考过程，所以我提到也许我们可以通过与预期产品进行比较来获得更多信息，或者可能在从第一遍收集了一些信息后进行第二遍等等，但我真的只是在拍摄在黑暗中而不是实际上有一条清晰的解决方案。

面试官确实试图鼓励我，说拥有第二个方程确实是解决问题的一种方法。此时我有点沮丧（因为事先不知道答案），并询问这是否是一种通用的（读：“有用的”）编程技术，或者这是否只是一个技巧/陷阱的答案。

面试官的回答令我惊讶：你可以推广该技术来找到 3 个缺失的数字。事实上，您可以将其概括为查找 k 个缺失的数字。

Qk：如果包中恰好缺少 k 个号码，您将如何有效地找到它？

这是几个月前的事了，我仍然不明白这个技术是什么。显然，存在一个Ω(N)时间下限，因为我们必须至少扫描一次所有数字，但面试官坚持认为时间和空间求解技术的复杂性（减去O(N)时间输入扫描）在k而不是N中定义。

所以这里的问题很简单：

你将如何解决Q2？
您将如何解决Q3？
你会如何解决Qk？

说明

通常有从 1..N 开始的 N 个数字，而不仅仅是 1..100。
我不是在寻找明显的基于集合的解决方案，例如使用位集，对通过指定位的值来确定每个数字的存在/不存在，因此在附加空间中使用O(N)位。我们无法承担与N成正比的任何额外空间。
我也不是在寻找明显的排序优先方法。这种方法和基于集合的方法在面试中值得一提（它们很容易实现，并且根据N，可以非常实用）。我正在寻找圣杯解决方案（它可能会或可能不会实际实施，但仍然具有所需的渐近特性）。

再说一遍，当然您必须在 O(N) 中扫描输入，但您只能捕获少量信息（以 k 定义，而不是 N ），然后必须以某种方式找到 k 个缺失的数字。

原文

I had an interesting job interview experience a while back. The question started really easy:

Q1: We have a bag containing numbers 1, 2, 3, …, 100. Each number appears exactly once, so there are 100 numbers. Now one number is randomly picked out of the bag. Find the missing number.

I've heard this interview question before, of course, so I very quickly answered along the lines of:

A1: Well, the sum of the numbers 1 + 2 + 3 + … + N is (N+1)(N/2) (see Wikipedia: sum of arithmetic series). For N = 100, the sum is 5050.
Thus, if all numbers are present in the bag, the sum will be exactly 5050. Since one number is missing, the sum will be less than this, and the difference is that number. So we can find that missing number in O(N) time and O(1) space.

At this point I thought I had done well, but all of a sudden the question took an unexpected turn:

Q2: That is correct, but now how would you do this if TWO numbers are missing?

I had never seen/heard/considered this variation before, so I panicked and couldn't answer the question. The interviewer insisted on knowing my thought process, so I mentioned that perhaps we can get more information by comparing against the expected product, or perhaps doing a second pass after having gathered some information from the first pass, etc, but I really was just shooting in the dark rather than actually having a clear path to the solution.

The interviewer did try to encourage me by saying that having a second equation is indeed one way to solve the problem. At this point I was kind of upset (for not knowing the answer before hand), and asked if this is a general (read: "useful") programming technique, or if it's just a trick/gotcha answer.

The interviewer's answer surprised me: you can generalize the technique to find 3 missing numbers. In fact, you can generalize it to find k missing numbers.

Qk: If exactly k numbers are missing from the bag, how would you find it efficiently?

This was a few months ago, and I still couldn't figure out what this technique is. Obviously there's a Ω(N) time lower bound since we must scan all the numbers at least once, but the interviewer insisted that the TIME and SPACE complexity of the solving technique (minus the O(N) time input scan) is defined in k not N.

So the question here is simple:

How would you solve Q2?
How would you solve Q3?
How would you solve Qk?

Clarifications

Generally there are N numbers from 1..N, not just 1..100.
I'm not looking for the obvious set-based solution, e.g. using a bit set, encoding the presence/absence each number by the value of a designated bit, therefore using O(N) bits in additional space. We can't afford any additional space proportional to N.
I'm also not looking for the obvious sort-first approach. This and the set-based approach are worth mentioning in an interview (they are easy to implement, and depending on N, can be very practical). I'm looking for the Holy Grail solution (which may or may not be practical to implement, but has the desired asymptotic characteristics nevertheless).

So again, of course you must scan the input in O(N), but you can only capture small amount of information (defined in terms of k not N), and must then find the k missing numbers somehow.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

霓裳挽歌倾城醉 2024-12-18 19:44:59

以下是迪米特里斯·安德烈欧的链接。

记住 i 次方的和，其中 i=1,2,..,k。这将问题简化为求解方程组

a₁ + a₂ + ... + a_k = b_{1< /sub>}

a₁² + a₂² + ... + a_{k² = b₂}

...

a₁^k + a₂^k + ... + a_k< super>k = b_k

使用牛顿身份，知道 b_i 可以计算

c₁ = a₁ + a₂ + 。 .. a_k

c₂ = a₁a₂ + a₁a ₃ + ... + a_k-1a_k

...

c_k = a₁a₂ ... a_k

如果展开多项式 (xa ₁)...(xa_k) 系数将恰好为 c₁, ..., c_k > - 参见Viète 公式。由于每个多项式因子都是唯一的（多项式环是一个欧几里得域），这意味着_i 是唯一确定的，直至排列。

这就证明了记住幂就足以恢复数字。对于常数 k，这是一个很好的方法。

然而，当 k 变化时，计算 c₁,...,c_k 的直接方法非常昂贵，因为例如 c_k是所有缺失数字的乘积，大小为 n!/(nk)!。为了解决这个问题，请执行在 Z_q 字段中进行计算，其中 q 是一个质数，使得 n <= q < 2n - 它由 Bertrand 假设存在。证明不需要更改，因为公式仍然成立，并且多项式的因式分解仍然是唯一的。您还需要一种对有限域进行因式分解的算法，例如 Berlekamp 或 < a href="https://en.wikipedia.org/wiki/Cantor%E2%80%93Zassenhaus_algorithm" rel="noreferrer">康托-扎森豪斯。

常数 k 的高级伪代码：

计算给定数字的 i 次幂
相减以获得未知数的 i 次幂之和。将总和称为 b_i。
使用牛顿恒等式计算 b_i 的系数；称它们为 c_i。基本上，c₁ = b₁; c₂ = (c₁b₁ - b₂)/2;有关精确公式，请参阅维基百科
因式分解多项式 x^k-c₁x^k-1 + ... + c_k。
多项式的根是所需的数字 a₁, ..., a_k。

对于变化的 k，找到素数 n <= q < 2n 使用例如 Miller-Rabin，并执行所有数字以 q 为模减少的步骤。

编辑：此答案的先前版本指出，可以使用特征 2 (q=2^(log n)) 的有限域来代替 Z_q（其中 q 是素数）。事实并非如此，因为牛顿公式要求除以 k 以内的数字。

Here's a summary of Dimitris Andreou's link.

Remember sum of i-th powers, where i=1,2,..,k. This reduces the problem to solving the system of equations

a₁ + a₂ + ... + a_k = b₁

a₁² + a₂² + ... + a_k² = b₂

...

a₁^k + a₂^k + ... + a_k^k = b_k

Using Newton's identities, knowing b_i allows to compute

c₁ = a₁ + a₂ + ... a_k

c₂ = a₁a₂ + a₁a₃ + ... + a_k-1a_k

...

c_k = a₁a₂ ... a_k

If you expand the polynomial (x-a₁)...(x-a_k) the coefficients will be exactly c₁, ..., c_k - see Viète's formulas. Since every polynomial factors uniquely (ring of polynomials is an Euclidean domain), this means a_i are uniquely determined, up to permutation.

This ends a proof that remembering powers is enough to recover the numbers. For constant k, this is a good approach.

However, when k is varying, the direct approach of computing c₁,...,c_k is prohibitely expensive, since e.g. c_k is the product of all missing numbers, magnitude n!/(n-k)!. To overcome this, perform computations in Z_q field, where q is a prime such that n <= q < 2n - it exists by Bertrand's postulate. The proof doesn't need to be changed, since the formulas still hold, and factorization of polynomials is still unique. You also need an algorithm for factorization over finite fields, for example the one by Berlekamp or Cantor-Zassenhaus.

High level pseudocode for constant k:

Compute i-th powers of given numbers
Subtract to get sums of i-th powers of unknown numbers. Call the sums b_i.
Use Newton's identities to compute coefficients from b_i; call them c_i. Basically, c₁ = b₁; c₂ = (c₁b₁ - b₂)/2; see Wikipedia for exact formulas
Factor the polynomial x^k-c₁x^k-1 + ... + c_k.
The roots of the polynomial are the needed numbers a₁, ..., a_k.

For varying k, find a prime n <= q < 2n using e.g. Miller-Rabin, and perform the steps with all numbers reduced modulo q.

EDIT: The previous version of this answer stated that instead of Z_q, where q is prime, it is possible to use a finite field of characteristic 2 (q=2^(log n)). This is not the case, since Newton's formulas require division by numbers up to k.

回复收藏 0 原文

请你别敷衍 2024-12-18 19:44:59

我们可以通过将数字本身和数字的平方相加来求解 Q2。

然后我们可以将问题简化为

k1 + k2 = x
k1^2 + k2^2 = y

其中 x 和 y 是总和低于预期值的程度。

替换给我们：

(x-k2)^2 + k2^2 = y

然后我们可以解决它以确定我们丢失的数字。

We can solve Q2 by summing both the numbers themselves, and the squares of the numbers.

We can then reduce the problem to

k1 + k2 = x
k1^2 + k2^2 = y

Where x and y are how far the sums are below the expected values.

Substituting gives us:

(x-k2)^2 + k2^2 = y

Which we can then solve to determine our missing numbers.

回复收藏 0 原文

冷情妓 2024-12-18 19:44:59

我请一个四岁的孩子来解决这个问题。他把数字排序，然后一起数。这需要 O（厨房地板）的空间，尽管缺少很多球，但它的工作原理一样简单。

回复收藏 0 原文

似梦非梦 2024-12-18 19:44:59

正如 @j_random_hacker 指出的，这与在 O(n) 时间和 O(1) 空间中查找重复项非常相似，我的答案的改编也适用于此。

假设“袋子”由大小为 N - k 的从 1 开始的数组 A[] 表示，我们可以在 O(N)< 中求解 Qk /code> 时间和 O(k) 额外空间。

首先，我们将数组 A[] 扩展为 k 个元素，使其现在的大小为 N。这是 O(k) 额外空间。然后，我们运行以下伪代码算法：

for i := n - k + 1 to n
    A[i] := A[1]
end for

for i := 1 to n - k
    while A[A[i]] != A[i] 
        swap(A[i], A[A[i]])
    end while
end for

for i := 1 to n
    if A[i] != i then 
        print i
    end if
end for

第一个循环将 k 个额外条目初始化为与数组中的第一个条目相同（这只是一个方便的值，我们知道它已经存在于array - 在此步骤之后，大小为 Nk 的初始数组中缺失的任何条目在扩展数组中仍然缺失）。

第二个循环对扩展数组进行排列，以便如果元素 x 至少出现一次，则这些条目之一将位于位置 A[x]。

请注意，虽然它有一个嵌套循环，但它仍然以 O(N) 时间运行 - 仅当存在 i 使得 A[i ] != i，并且每次交换都设置至少一个元素，使得 A[i] == i ，而以前情况并非如此。这意味着交换总数（以及 while 循环体的执行总数）最多为 N-1。

第三个循环打印数组 i 中未被值 i 占用的索引 - 这意味着 i 一定已丢失。

As @j_random_hacker pointed out, this is quite similar to Finding duplicates in O(n) time and O(1) space, and an adaptation of my answer there works here too.

Assuming that the "bag" is represented by a 1-based array A[] of size N - k, we can solve Qk in O(N) time and O(k) additional space.

First, we extend our array A[] by k elements, so that it is now of size N. This is the O(k) additional space. We then run the following pseudo-code algorithm:

for i := n - k + 1 to n
    A[i] := A[1]
end for

for i := 1 to n - k
    while A[A[i]] != A[i] 
        swap(A[i], A[A[i]])
    end while
end for

for i := 1 to n
    if A[i] != i then 
        print i
    end if
end for

The first loop initialises the k extra entries to the same as the first entry in the array (this is just a convenient value that we know is already present in the array - after this step, any entries that were missing in the initial array of size N-k are still missing in the extended array).

The second loop permutes the extended array so that if element x is present at least once, then one of those entries will be at position A[x].

Note that although it has a nested loop, it still runs in O(N) time - a swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.

The third loop prints those indexes of the array i that are not occupied by the value i - this means that i must have been missing.

回复收藏 0 原文

玩世 2024-12-18 19:44:59

不确定这是否是最有效的解决方案，但我会循环所有条目，并使用位集来记住设置了哪些数字，然后测试 0 位。

我喜欢简单的解决方案 - 我什至相信，它可能比计算总和或平方和等更快。

回复收藏 0 原文

断桥再见 2024-12-18 19:44:59

我没有检查数学，但我怀疑在计算 Σ(n) 的同一遍中计算 Σ(n^2) 将提供足够的信息来获取两个缺失的数字，如果有三个，也执行Σ(n^3)，依此类推。

回复收藏 0 原文

有深☉意 2024-12-18 19:44:59

基于数字总和的解决方案的问题是，它们没有考虑存储和处理大指数数字的成本......在实践中，为了使其适用于非常大的 n，将使用大数字库。我们可以分析这些算法的空间利用率。

我们可以分析sdcvvc和Dimitris Andreou算法的时间和空间复杂度。

存储：

l_j = ceil (log_2 (sum_{i=1}^n i^j))
l_j > log_2 n^j  (assuming n >= 0, k >= 0)
l_j > j log_2 n \in \Omega(j log n)

l_j < log_2 ((sum_{i=1}^n i)^j) + 1
l_j < j log_2 (n) + j log_2 (n + 1) - j log_2 (2) + 1
l_j < j log_2 n + j + c \in O(j log n)`

所以 l_j \in \Theta(j log n)

使用的总存储空间：\sum_{j=1}^k l_j \in \Theta(k^2 log n) code>

使用的空间：假设计算a^j需要ceil(log_2 j)时间，总时间：

t = k ceil(\sum_i=1^n log_2 (i)) = k ceil(log_2 (\prod_i=1^n (i)))
t > k log_2 (n^n + O(n^(n-1)))
t > k log_2 (n^n) = kn log_2 (n)  \in \Omega(kn log n)
t < k log_2 (\prod_i=1^n i^i) + 1
t < kn log_2 (n) + 1 \in O(kn log n)

使用的总时间：\Theta(kn log n)

如果这个时空如果满足的话，可以使用简单的递归
算法。设 b!i 为包中的第 i 个条目，n 为之前数字的个数
移除次数，k 为移除次数。在 Haskell 语法中...

let
  -- O(1)
  isInRange low high v = (v >= low) && (v <= high)
  -- O(n - k)
  countInRange low high = sum $ map (fromEnum . isInRange low high . (!)b) [1..(n-k)]
  findMissing l low high krange
    -- O(1) if there is nothing to find.
    | krange=0 = l
    -- O(1) if there is only one possibility.
    | low=high = low:l
    -- Otherwise total of O(knlog(n)) time
    | otherwise =
       let
         mid = (low + high) `div` 2
         klow = countInRange low mid
         khigh = krange - klow
       in
         findMissing (findMissing low mid klow) (mid + 1) high khigh
in
  findMising 1 (n - k) k

使用的存储：O(k) 对于列表，O(log(n)) 对于堆栈：O(k + log(n) ））
该算法更加直观，具有相同的时间复杂度，并且占用的空间更少。

The problem with solutions based on sums of numbers is they don't take into account the cost of storing and working with numbers with large exponents... in practice, for it to work for very large n, a big numbers library would be used. We can analyse the space utilisation for these algorithms.

We can analyse the time and space complexity of sdcvvc and Dimitris Andreou's algorithms.

Storage:

l_j = ceil (log_2 (sum_{i=1}^n i^j))
l_j > log_2 n^j  (assuming n >= 0, k >= 0)
l_j > j log_2 n \in \Omega(j log n)

l_j < log_2 ((sum_{i=1}^n i)^j) + 1
l_j < j log_2 (n) + j log_2 (n + 1) - j log_2 (2) + 1
l_j < j log_2 n + j + c \in O(j log n)`

So l_j \in \Theta(j log n)

Total storage used: \sum_{j=1}^k l_j \in \Theta(k^2 log n)

Space used: assuming that computing a^j takes ceil(log_2 j) time, total time:

t = k ceil(\sum_i=1^n log_2 (i)) = k ceil(log_2 (\prod_i=1^n (i)))
t > k log_2 (n^n + O(n^(n-1)))
t > k log_2 (n^n) = kn log_2 (n)  \in \Omega(kn log n)
t < k log_2 (\prod_i=1^n i^i) + 1
t < kn log_2 (n) + 1 \in O(kn log n)

Total time used: \Theta(kn log n)

If this time and space is satisfactory, you can use a simple recursive
algorithm. Let b!i be the ith entry in the bag, n the number of numbers before
removals, and k the number of removals. In Haskell syntax...

let
  -- O(1)
  isInRange low high v = (v >= low) && (v <= high)
  -- O(n - k)
  countInRange low high = sum $ map (fromEnum . isInRange low high . (!)b) [1..(n-k)]
  findMissing l low high krange
    -- O(1) if there is nothing to find.
    | krange=0 = l
    -- O(1) if there is only one possibility.
    | low=high = low:l
    -- Otherwise total of O(knlog(n)) time
    | otherwise =
       let
         mid = (low + high) `div` 2
         klow = countInRange low mid
         khigh = krange - klow
       in
         findMissing (findMissing low mid klow) (mid + 1) high khigh
in
  findMising 1 (n - k) k

Storage used: O(k) for list, O(log(n)) for stack: O(k + log(n))
This algorithm is more intuitive, has the same time complexity, and uses less space.

回复收藏 0 原文

攀登最高峰 2024-12-18 19:44:59

Q2 的一个非常简单的解决方案，令我惊讶的是还没有人回答。使用 Q1 中的方法求出两个缺失数字的总和。我们用 S 来表示，那么缺失的数字之一小于 S/2，另一个大于 S/2（废话）。将 1 到 S/2 之间的所有数字相加，并将其与公式的结果进行比较（与 Q1 中的方法类似），以找出缺失数字中较小的一个。从 S 中减去它即可找到更大的缺失数。

回复收藏 0 原文

情未る 2024-12-18 19:44:59

等一下。正如问题所述，袋子里有 100 个数字。无论 k 有多大，问题都可以在常数时间内解决，因为您可以使用一个集合并在最多 100 - k 次循环迭代中从该集合中删除数字。 100 是常数。剩下的一组数字就是你的答案。

如果我们将解推广到从 1 到 N 的数字，除了 N 不是常数之外没有任何变化，因此我们的时间为 O(N - k) = O(N)。例如，如果我们使用位集，我们会在 O(N) 时间内将这些位设置为 1，迭代这些数字，然后将这些位设置为 0 (O(Nk) = O(N))，然后我们有答案了。

在我看来，面试官是在问你如何在 O(k) 时间而不是 O(N) 时间内打印最后一组的内容。显然，设置了一个位后，您必须迭代所有 N 位才能确定是否应该打印该数字。但是，如果您更改该集合的实现方式，您可以在 k 次迭代中打印出数字。这是通过将数字放入要存储在哈希集和双向链表中的对象中来完成的。当您从哈希集中删除一个对象时，您也会将其从列表中删除。答案将保留在现在长度为 k 的列表中。

回复收藏 0 原文

·深蓝 2024-12-18 19:44:59

要解决 2（和 3）个缺失数字的问题，您可以修改 quickselect，平均运行时间为 O(n)，并且如果分区就地完成，则使用常量内存。

根据随机主元p将集合划分为分区l，其中包含小于主元的数字，以及r，其中包含大于主元的数字。
通过将主元值与每个分区的大小进行比较来确定 2 个缺失数字所在的分区（p - 1 - count(l) = l 中缺失数字的计数 和
n - count(r) - p = r 中缺失数字的计数)
a) 如果每个分区缺失一个数字，则使用总和差法查找每个缺失数字。< /p>
(1 + 2 + ... + (p-1)) - sum(l) = 缺失 #1 和
((p+1) + (p+2) ... + n) - sum(r) = 缺失#2
b) 如果一个分区缺少这两个数字并且该分区为空，则缺少的数字为 (p-1,p-2) 或 (p+1,p +2)
取决于哪个分区缺少数字。
如果一个分区缺少 2 个数字但不为空，则递归到该分区。

由于只有 2 个缺失数字，该算法总是丢弃至少一个分区，因此它保留了快速选择的 O(n) 平均时间复杂度。类似地，对于 3 个缺失数字，该算法也会在每次传递中丢弃至少一个分区（因为与 2 个缺失数字一样，最多只有 1 个分区包含多个缺失数字）。但是，我不确定添加更多缺失数字时性能会下降多少。

这是一个不使用就地分区的实现，因此该示例不满足空间要求，但它确实说明了算法的步骤：

<?php

  $list = range(1,100);
  unset($list[3]);
  unset($list[31]);

  findMissing($list,1,100);

  function findMissing($list, $min, $max) {
    if(empty($list)) {
      print_r(range($min, $max));
      return;
    }

    $l = $r = [];
    $pivot = array_pop($list);

    foreach($list as $number) {
      if($number < $pivot) {
        $l[] = $number;
      }
      else {
        $r[] = $number;
      }
    }

    if(count($l) == $pivot - $min - 1) {
      // only 1 missing number use difference of sums
      print array_sum(range($min, $pivot-1)) - array_sum($l) . "\n";
    }
    else if(count($l) < $pivot - $min) {
      // more than 1 missing number, recurse
      findMissing($l, $min, $pivot-1);
    }

    if(count($r) == $max - $pivot - 1) {
      // only 1 missing number use difference of sums
      print array_sum(range($pivot + 1, $max)) - array_sum($r) . "\n";
    } else if(count($r) < $max - $pivot) {
      // mroe than 1 missing number recurse
      findMissing($r, $pivot+1, $max);
    }
  }

演示

To solve the 2 (and 3) missing numbers question, you can modify quickselect, which on average runs in O(n) and uses constant memory if partitioning is done in-place.

Partition the set with respect to a random pivot p into partitions l, which contain numbers smaller than the pivot, and r, which contain numbers greater than the pivot.
Determine which partitions the 2 missing numbers are in by comparing the pivot value to the size of each partition (p - 1 - count(l) = count of missing numbers in l and
n - count(r) - p = count of missing numbers in r)
a) If each partition is missing one number, then use the difference of sums approach to find each missing number.
(1 + 2 + ... + (p-1)) - sum(l) = missing #1 and
((p+1) + (p+2) ... + n) - sum(r) = missing #2
b) If one partition is missing both numbers and the partition is empty, then the missing numbers are either (p-1,p-2) or (p+1,p+2)
depending on which partition is missing the numbers.
If one partition is missing 2 numbers but is not empty, then recurse onto that partiton.

With only 2 missing numbers, this algorithm always discards at least one partition, so it retains O(n) average time complexity of quickselect. Similarly, with 3 missing numbers this algorithm also discards at least one partition with each pass (because as with 2 missing numbers, at most only 1 partition will contain multiple missing numbers). However, I'm not sure how much the performance decreases when more missing numbers are added.

Here's an implementation that does not use in-place partitioning, so this example does not meet the space requirement but it does illustrate the steps of the algorithm:

<?php

  $list = range(1,100);
  unset($list[3]);
  unset($list[31]);

  findMissing($list,1,100);

  function findMissing($list, $min, $max) {
    if(empty($list)) {
      print_r(range($min, $max));
      return;
    }

    $l = $r = [];
    $pivot = array_pop($list);

    foreach($list as $number) {
      if($number < $pivot) {
        $l[] = $number;
      }
      else {
        $r[] = $number;
      }
    }

    if(count($l) == $pivot - $min - 1) {
      // only 1 missing number use difference of sums
      print array_sum(range($min, $pivot-1)) - array_sum($l) . "\n";
    }
    else if(count($l) < $pivot - $min) {
      // more than 1 missing number, recurse
      findMissing($l, $min, $pivot-1);
    }

    if(count($r) == $max - $pivot - 1) {
      // only 1 missing number use difference of sums
      print array_sum(range($pivot + 1, $max)) - array_sum($r) . "\n";
    } else if(count($r) < $max - $pivot) {
      // mroe than 1 missing number recurse
      findMissing($r, $pivot+1, $max);
    }
  }

Demo

回复收藏 0 原文

北风几吹夏 2024-12-18 19:44:59

对于 Q2，这是一个比其他解决方案效率低一些的解决方案，但仍然具有 O(N) 运行时间并占用 O(k) 空间。

这个想法是运行原始算法两次。在第一个中，您会得到丢失的总数，这给您提供了丢失数字的上限。我们将这个数字称为N。您知道缺少的两个数字的总和将为 N，因此第一个数字只能位于区间 [1, Floor((N-1)/2)]< /code> 而第二个将位于 [floor(N/2)+1,N-1] 中。

因此，您再次循环所有数字，丢弃第一个间隔中未包含的所有数字。那些是，你记录它们的总和。最后，您将知道缺失的两个数字之一，并进而知道第二个数字。

我有一种感觉，这种方法可以推广，并且可能在一次输入输入期间“并行”运行多个搜索，但我还没有弄清楚如何实现。

回复收藏 0 原文

习惯成性 2024-12-18 19:44:59

这是一个使用 k 位额外存储的解决方案，没有任何巧妙的技巧，而且很简单。执行时间O(n)，额外空间O(k)。只是为了证明这个问题可以解决，而无需先阅读解决方案或成为天才：

void puzzle (int* data, int n, bool* extra, int k)
{
    // data contains n distinct numbers from 1 to n + k, extra provides
    // space for k extra bits. 

    // Rearrange the array so there are (even) even numbers at the start
    // and (odd) odd numbers at the end.
    int even = 0, odd = 0;
    while (even + odd < n)
    {
        if (data [even] % 2 == 0) ++even;
        else if (data [n - 1 - odd] % 2 == 1) ++odd;
        else { int tmp = data [even]; data [even] = data [n - 1 - odd]; 
               data [n - 1 - odd] = tmp; ++even; ++odd; }
    }

    // Erase the lowest bits of all numbers and set the extra bits to 0.
    for (int i = even; i < n; ++i) data [i] -= 1;
    for (int i = 0; i < k; ++i) extra [i] = false;

    // Set a bit for every number that is present
    for (int i = 0; i < n; ++i)
    {
        int tmp = data [i];
        tmp -= (tmp % 2);
        if (i >= even) ++tmp;
        if (tmp <= n) data [tmp - 1] += 1; else extra [tmp - n - 1] = true;
    }

    // Print out the missing ones
    for (int i = 1; i <= n; ++i)
        if (data [i - 1] % 2 == 0) printf ("Number %d is missing\n", i);
    for (int i = n + 1; i <= n + k; ++i)
        if (! extra [i - n - 1]) printf ("Number %d is missing\n", i);

    // Restore the lowest bits again.
    for (int i = 0; i < n; ++i) {
        if (i < even) { if (data [i] % 2 != 0) data [i] -= 1; }
        else { if (data [i] % 2 == 0) data [i] += 1; }
    }
}

Here's a solution that uses k bits of extra storage, without any clever tricks and just straightforward. Execution time O (n), extra space O (k). Just to prove that this can be solved without reading up on the solution first or being a genius:

void puzzle (int* data, int n, bool* extra, int k)
{
    // data contains n distinct numbers from 1 to n + k, extra provides
    // space for k extra bits. 

    // Rearrange the array so there are (even) even numbers at the start
    // and (odd) odd numbers at the end.
    int even = 0, odd = 0;
    while (even + odd < n)
    {
        if (data [even] % 2 == 0) ++even;
        else if (data [n - 1 - odd] % 2 == 1) ++odd;
        else { int tmp = data [even]; data [even] = data [n - 1 - odd]; 
               data [n - 1 - odd] = tmp; ++even; ++odd; }
    }

    // Erase the lowest bits of all numbers and set the extra bits to 0.
    for (int i = even; i < n; ++i) data [i] -= 1;
    for (int i = 0; i < k; ++i) extra [i] = false;

    // Set a bit for every number that is present
    for (int i = 0; i < n; ++i)
    {
        int tmp = data [i];
        tmp -= (tmp % 2);
        if (i >= even) ++tmp;
        if (tmp <= n) data [tmp - 1] += 1; else extra [tmp - n - 1] = true;
    }

    // Print out the missing ones
    for (int i = 1; i <= n; ++i)
        if (data [i - 1] % 2 == 0) printf ("Number %d is missing\n", i);
    for (int i = n + 1; i <= n + k; ++i)
        if (! extra [i - n - 1]) printf ("Number %d is missing\n", i);

    // Restore the lowest bits again.
    for (int i = 0; i < n; ++i) {
        if (i < even) { if (data [i] % 2 != 0) data [i] -= 1; }
        else { if (data [i] % 2 == 0) data [i] += 1; }
    }
}

回复收藏 0 原文

自演自醉 2024-12-18 19:44:59

动机

如果你想解决一般情况的问题，并且可以存储和编辑数组，那么 Caf 的解决方案是迄今为止最有效的。如果您无法存储数组（流版本），则 sdcvvc 的答案是当前建议的唯一解决方案类型。

如果您可以存储数组但无法编辑它，我提出的解决方案是最有效的答案（到目前为止在该线程上），我从 Svalorzen 的解决方案，可解决 1 或 2 个缺失项目。此解决方案需要 θ(k*n) 时间和 O(min(k,log(n))) 和 Ω(log(k))代码> 空格。它也适用于并行性。

概念

这个想法是，如果您使用比较总和的原始方法：
sum = SumOf(1,n) - SumOf(array)

...然后取缺失数字的平均值：
average = sum/n_missing_numbers

...它提供了一个边界：在缺失的数字中，保证至少有一个数字小于或等于平均值、并且至少有一个数字大于平均值。这意味着我们可以分成子问题，每个子问题扫描数组 [O(n)] 并且只关心它们各自的子数组。

代码

C 风格的解决方案（不要判断我的全局变量，我只是想让代码对非 C 人员可读）：

#include "stdio.h"

// Example problem:
const int array [] = {0, 7, 3, 1, 5};
const int N = 8; // size of original array
const int array_size = 5;

int SumOneTo (int n)
{
    return n*(n-1)/2; // non-inclusive
}

int MissingItems (const int begin, const int end, int & average)
{
    // We consider only sub-array elements with values, v:
    // begin <= v < end
    
    // Initialise info about missing elements.
    // First assume all are missing:
    int n = end - begin;
    int sum = SumOneTo(end) - SumOneTo(begin);

    // Minus everything that we see (ie not missing):
    for (int i = 0; i < array_size; ++i)
    {
        if ((begin <= array[i]) && (array[i] < end))
        {
            --n;
            sum -= array[i];
        }
    }
    
    // used by caller:
    average = sum/n;
    return n;
}

void Find (const int begin, const int end)
{
    int average;

    if (MissingItems(begin, end, average) == 1)
    {
        printf(" %d", average); // average(n) is same as n
        return;
    }
    
    Find(begin, average + 1); // at least one missing here
    Find(average + 1, end); // at least one here also
}

int main ()
{   
    printf("Missing items:");
    
    Find(0, N);
    
    printf("\n");
}

分析

暂时忽略递归，每个函数调用显然都需要 O( n) 时间和 O(1) 空间。请注意，sum 可以等于 n(n-1)/2，因此需要存储 n-1 所需位数的两倍>。这最多意味着我们实际上需要两个额外元素的空间，无论数组或 k 的大小如何，因此在正常情况下它仍然是 O(1) 空间惯例。

对于 k 缺失的元素有多少函数调用并不那么明显，所以我将提供一个视觉效果。您的原始子数组（连接数组）是完整数组，其中包含所有 k 缺失元素。我们将按升序想象它们，其中 -- 表示连接（同一子数组的一部分）：

m₁ -- m₂ -- m₃ -- m₄ -- (...) -- m_k-1 -- m_{k< /sub>}

Find 功能的作用是将缺失的元素断开到不同的不重叠的位置子数组。它保证每个子数组中至少有一个缺失元素，这意味着恰好中断一个连接。

这意味着，无论分割如何发生，总是需要调用 k-1 Find 函数来查找只有一个的子数组其中缺少元素。

因此时间复杂度为 θ((k-1 + k) * n) = θ(k*n) 。

对于空间复杂度，如果我们每次按比例除法，那么我们会得到O(log(k))空间复杂度，但如果我们一次只分离一个，就会得到O(k)。

请参阅此处，了解为什么空间复杂度为O(log(n))代码>.鉴于上面我们已经证明它也是O(k)，那么我们就知道它是O(min(k,log(n)))。

Motivation

If you want to solve the general-case problem, and you can store and edit the array, then Caf's solution is by far the most efficient. If you can't store the array (streaming version), then sdcvvc's answer is the only type of solution currently suggested.

The solution I propose is the most efficient answer (so far on this thread) if you can store the array but can't edit it, and I got the idea from Svalorzen's solution, which solves for 1 or 2 missing items. This solution takes Θ(k*n) time and O(min(k,log(n))) and Ω(log(k)) space. It also works well with parallelism.

Concept

The idea is that if you use the original approach of comparing sums:
sum = SumOf(1,n) - SumOf(array)

... then you take the average of the missing numbers:
average = sum/n_missing_numbers

... which provides a boundary: Of the missing numbers, there's guaranteed to be at least one number less-or-equal to average, and at least one number greater than average. This means that we can split into sub problems that each scan the array [O(n)] and are only concerned with their respective sub-arrays.

Code

C-style solution (don't judge me for the global variables, I'm just trying to make the code readable for non-c folks):

#include "stdio.h"

// Example problem:
const int array [] = {0, 7, 3, 1, 5};
const int N = 8; // size of original array
const int array_size = 5;

int SumOneTo (int n)
{
    return n*(n-1)/2; // non-inclusive
}

int MissingItems (const int begin, const int end, int & average)
{
    // We consider only sub-array elements with values, v:
    // begin <= v < end
    
    // Initialise info about missing elements.
    // First assume all are missing:
    int n = end - begin;
    int sum = SumOneTo(end) - SumOneTo(begin);

    // Minus everything that we see (ie not missing):
    for (int i = 0; i < array_size; ++i)
    {
        if ((begin <= array[i]) && (array[i] < end))
        {
            --n;
            sum -= array[i];
        }
    }
    
    // used by caller:
    average = sum/n;
    return n;
}

void Find (const int begin, const int end)
{
    int average;

    if (MissingItems(begin, end, average) == 1)
    {
        printf(" %d", average); // average(n) is same as n
        return;
    }
    
    Find(begin, average + 1); // at least one missing here
    Find(average + 1, end); // at least one here also
}

int main ()
{   
    printf("Missing items:");
    
    Find(0, N);
    
    printf("\n");
}

Analysis

Ignoring recursion for a moment, each function call clearly takes O(n) time and O(1) space. Note that sum can equal as much as n(n-1)/2, so requires double the amount of bits needed to store n-1. At most this means than we effectively need two extra elements worth of space, regardless of the size of the array or k, hence it's still O(1) space under the normal conventions.

It's not so obvious how many function calls there are for k missing elements, so I'll provide a visual. Your original sub-array (connected array) is the full array, which has all k missing elements in it. We'll imagine them in increasing order, where -- represent connections (part of same sub-array):

m₁ -- m₂ -- m₃ -- m₄ -- (...) -- m_k-1 -- m_k

The effect of the Find function is to disconnect the missing elements into different non-overlapping sub-arrays. It guarantees that there's at least one missing element in each sub-array, which means breaking exactly one connection.

What this means is that regardless of how the splits occur, it will always take k-1 Find function calls to do the work of finding the sub-arrays that have only one missing element in it.

So the time complexity is Θ((k-1 + k) * n) = Θ(k*n).

For the space complexity, if we divide proportionally each time then we get O(log(k)) space complexity, but if we only separate one at a time it gives us O(k).

See here for a proof as to why the space complexity is O(log(n)). Given that above we've shown that it's also O(k), then we know that it's O(min(k,log(n))).

回复收藏 0 原文

只有一腔孤勇 2024-12-18 19:44:59

也许这个算法可以解决问题 1：

预计算前 100 个整数的异或（val=1^2^3^4..100）
异或元素，因为它们不断来自输入流（val1=val1^next_input）
最终answer=val^val1

甚至更好：

def GetValue(A)
  val=0
  for i=1 to 100
    do
      val=val^i
    done
  for value in A:
    do
      val=val^value 
    done
  return val

该算法实际上可以针对两个缺失的数字进行扩展。第一步保持不变。当我们调用 GetValue 并添加两个缺失的数字时，结果将是 a1^a2 ，即两个缺失的数字。假设

val = a1^a2

现在，为了从 val 中筛选出 a1 和 a2，我们采用 val 中的任何设置位。假设第 i 个位在 val 中设置。这意味着 a1 和 a2 在第 i 位位置具有不同的奇偶校验。
现在我们对原始数组进行另一次迭代并保留两个异或值。一个用于设置了第 i 位的数字，另一个用于未设置第 i 位的数字。我们现在有两个数字桶，并且保证a1和a2将位于不同的桶中。现在重复我们在每个桶上查找一个缺失元素时所做的相同操作。

May be this algorithm can work for question 1:

Precompute xor of first 100 integers(val=1^2^3^4....100)
xor the elements as they keep coming from input stream ( val1=val1^next_input)
final answer=val^val1

Or even better:

def GetValue(A)
  val=0
  for i=1 to 100
    do
      val=val^i
    done
  for value in A:
    do
      val=val^value 
    done
  return val

This algorithm can in fact be expanded for two missing numbers. The first step remains the same. When we call GetValue with two missing numbers the result will be a a1^a2 are the two missing numbers. Lets say

val = a1^a2

Now to sieve out a1 and a2 from val we take any set bit in val. Lets say the ith bit is set in val. That means that a1 and a2 have different parity at ith bit position.
Now we do another iteration on the original array and keep two xor values. One for the numbers which have the ith bit set and other which doesn't have the ith bit set. We now have two buckets of numbers, and its guranteed that a1 and a2 will lie in different buckets. Now repeat the same what we did for finding one missing element on each of the bucket.

回复收藏 0 原文

吹梦到西洲 2024-12-18 19:44:59

有一种通用方法可以解决此类流媒体问题。
这个想法是使用一点随机化来希望将 k 元素“分散”到独立的子问题中，我们的原始算法可以为我们解决问题。该技术用于稀疏信号重建等。

创建一个大小为 u = k^2 的数组 a。
选择任意通用哈希函数，h : {1,...,n} -> {1，...，u}。（如 multiply-shift）
对于 < 中的每个 i code>1, ..., n 增加 a[h(i)] += i
对于输入流中的每个数字 x，减少a[h(x)] -= x。

如果所有缺失的数字都已散列到不同的存储桶，则数组的非零元素现在将包含缺失的数字。

根据通用哈希函数的定义，特定对发送到同一存储桶的概率小于1/u。由于大约有 k^2/2 对，因此错误概率最多为 k^2/2/u=1/2。也就是说，我们成功的概率至少为 50%，如果我们增加 u，我们的机会就会增加。

请注意，该算法需要 k^2 logn 位空间（我们需要每个数组存储桶 logn 位。）这与 @Dimitris Andreou 的答案所需的空间相匹配（特别是多项式因式分解的空间要求，恰好也是随机的。）
该算法每次更新的时间也是恒定的，而不是幂和情况下的时间k。

事实上，通过使用评论中描述的技巧，我们甚至可以比幂和方法更有效。

回复收藏 0 原文

倦话 2024-12-18 19:44:59

你能检查一下每个数字是否都存在吗？如果是的话你可以尝试这个：

S = 包中所有数字的总和 (S < 5050)
Z = 缺失数字之和 5050 - S

如果缺失数字是 x 和 y 那么：

x = Z - y 且
最大值(x) = Z - 1

因此，您检查从 1 到 max(x) 的范围并找到数字

回复收藏 0 原文

不弃不离 2024-12-18 19:44:59

如果您有两个列表的总和以及两个列表的乘积，则可以解决 Q2。

（l1 是原始列表，l2 是修改后的列表）

d = sum(l1) - sum(l2)
m = mul(l1) / mul(l2)

我们可以对此进行优化，因为算术级数的总和是第一项和最后一项的平均值的 n 倍：

n = len(l1)
d = (n/2)*(n+1) - sum(l2)

现在我们知道（如果 a 和 b 是删除的数字）：

a + b = d
a * b = m

所以我们可以重新排列为：

a = s - b
b * (s - b) = m

并相乘：

-b^2 + s*b = m

重新排列，使右侧为零：

-b^2 + s*b - m = 0

然后我们可以用二次公式求解：

b = (-s + sqrt(s^2 - (4*-1*-m)))/-2
a = s - b

示例 Python 3 代码：

from functools import reduce
import operator
import math
x = list(range(1,21))
sx = (len(x)/2)*(len(x)+1)
x.remove(15)
x.remove(5)
mul = lambda l: reduce(operator.mul,l)
s = sx - sum(x)
m = mul(range(1,21)) / mul(x)
b = (-s + math.sqrt(s**2 - (-4*(-m))))/-2
a = s - b
print(a,b) #15,5

我不知道 sqrt、reduce 和 sum 函数的复杂性，所以我不能计算出该解决方案的复杂性（如果有谁知道请在下面评论。）

You can solve Q2 if you have the sum of both lists and the product of both lists.

(l1 is the original, l2 is the modified list)

d = sum(l1) - sum(l2)
m = mul(l1) / mul(l2)

We can optimise this since the sum of an arithmetic series is n times the average of the first and last terms:

n = len(l1)
d = (n/2)*(n+1) - sum(l2)

Now we know that (if a and b are the removed numbers):

a + b = d
a * b = m

So we can rearrange to:

a = s - b
b * (s - b) = m

And multiply out:

-b^2 + s*b = m

And rearrange so the right side is zero:

-b^2 + s*b - m = 0

Then we can solve with the quadratic formula:

b = (-s + sqrt(s^2 - (4*-1*-m)))/-2
a = s - b

Sample Python 3 code:

from functools import reduce
import operator
import math
x = list(range(1,21))
sx = (len(x)/2)*(len(x)+1)
x.remove(15)
x.remove(5)
mul = lambda l: reduce(operator.mul,l)
s = sx - sum(x)
m = mul(range(1,21)) / mul(x)
b = (-s + math.sqrt(s**2 - (-4*(-m))))/-2
a = s - b
print(a,b) #15,5

I do not know the complexity of the sqrt, reduce and sum functions so I cannot work out the complexity of this solution (if anyone does know please comment below.)

回复收藏 0 原文

め七分饶幸 2024-12-18 19:44:59

这是一个解决方案，它不像 sdcvvc/Dimitris Andreou 的答案那样依赖复杂的数学，不像 caf 和 Colonel Panic 那样改变输入数组，并且不像 Chris Lercher、JeremyP 和其他许多人也这样做了。基本上，我从 Svalorzen/Gilad Deutch 的 Q2 想法开始，将其推广到常见情况 Qk 并用 Java 实现以证明该算法有效。

想法

假设我们有一个任意区间I，我们只知道它至少包含一个缺失的数字。遍历输入数组后，仅查看 I 中的数字，我们可以获得总和 S 以及缺失的数量 Q来自我的数字。为此，我们只需在每次遇到 I 中的数字时减少 I 的长度（用于获取 Q），并减少预先计算的总和I 中每次遇到的数字的所有数字（用于获取S）。

现在我们看看S 和Q。如果Q = 1，则意味着I只包含一个缺失的数字，并且这个数字显然是S。我们将I标记为已完成（在程序中称为“明确”）并将其排除在进一步考虑之外。另一方面，如果Q > 1，我们可以计算I中包含的缺失数的平均值A = S / Q。由于所有数字都是不同的，因此这些数字中至少有一个严格小于 A，并且至少有一个严格大于 A。现在我们将 A 中的 I 分成两个较小的区间，每个区间至少包含一个缺失的数字。请注意，如果 A 是整数，则分配给哪个间隔并不重要。

我们进行下一个数组遍历，分别计算每个间隔的 S 和 Q（但在同一次遍历中），然后用 Q = 1< 标记间隔/em> 并用 Q > 分割间隔1。我们继续这一过程，直到没有新的“模糊”区间，即我们没有什么可分割的，因为每个区间恰好包含一个缺失的数字（并且我们总是知道这个数字，因为我们知道S）。我们从包含所有可能数字的唯一“整个范围”区间开始（例如问题中的 [1..N]）。

时间和空间复杂度分析

直到进程停止为止我们需要执行的总次数 p 永远不会大于缺失数字计数 k。不等式p <= k可以严格证明。另一方面，也存在一个经验上限p < 。 log₂N + 3 这对于较大的 k 值很有用。我们需要对输入数组的每个数字进行二分查找，以确定它所属的区间。这会将 log k 乘数添加到时间复杂度中。

总的来说，时间复杂度为O(N ᛫ min(k, log N) ᛫ log k)。请注意，对于较大的 k，这明显优于 sdcvvc/Dimitris Andreou 的方法，即 O(N ᛫ k)。

就其工作而言，该算法需要 O(k) 个额外空间来存储最多 k 个间隔，这明显优于 O (N) 在“bitset”解决方案中。

Java 实现

这是一个实现上述算法的 Java 类。它总是返回一个排序缺失数字的数组。除此之外，它不需要缺失的数字计数k，因为它在第一遍中计算它。整个数字范围由 minNumber 和 maxNumber 参数给出（例如，问题中的第一个示例为 1 和 100）。

public class MissingNumbers {
    private static class Interval {
        boolean ambiguous = true;
        final int begin;
        int quantity;
        long sum;

        Interval(int begin, int end) { // begin inclusive, end exclusive
            this.begin = begin;
            quantity = end - begin;
            sum = quantity * ((long)end - 1 + begin) / 2;
        }

        void exclude(int x) {
            quantity--;
            sum -= x;
        }
    }

    public static int[] find(int minNumber, int maxNumber, NumberBag inputBag) {
        Interval full = new Interval(minNumber, ++maxNumber);
        for (inputBag.startOver(); inputBag.hasNext();)
            full.exclude(inputBag.next());
        int missingCount = full.quantity;
        if (missingCount == 0)
            return new int[0];
        Interval[] intervals = new Interval[missingCount];
        intervals[0] = full;
        int[] dividers = new int[missingCount];
        dividers[0] = minNumber;
        int intervalCount = 1;
        while (true) {
            int oldCount = intervalCount;
            for (int i = 0; i < oldCount; i++) {
                Interval itv = intervals[i];
                if (itv.ambiguous)
                    if (itv.quantity == 1) // number inside itv uniquely identified
                        itv.ambiguous = false;
                    else
                        intervalCount++; // itv will be split into two intervals
            }
            if (oldCount == intervalCount)
                break;
            int newIndex = intervalCount - 1;
            int end = maxNumber;
            for (int oldIndex = oldCount - 1; oldIndex >= 0; oldIndex--) {
                // newIndex always >= oldIndex
                Interval itv = intervals[oldIndex];
                int begin = itv.begin;
                if (itv.ambiguous) {
                    // split interval itv
                    // use floorDiv instead of / because input numbers can be negative
                    int mean = (int)Math.floorDiv(itv.sum, itv.quantity) + 1;
                    intervals[newIndex--] = new Interval(mean, end);
                    intervals[newIndex--] = new Interval(begin, mean);
                } else
                    intervals[newIndex--] = itv;
                end = begin;
            }
            for (int i = 0; i < intervalCount; i++)
                dividers[i] = intervals[i].begin;
            for (inputBag.startOver(); inputBag.hasNext();) {
                int x = inputBag.next();
                // find the interval to which x belongs
                int i = java.util.Arrays.binarySearch(dividers, 0, intervalCount, x);
                if (i < 0)
                    i = -i - 2;
                Interval itv = intervals[i];
                if (itv.ambiguous)
                    itv.exclude(x);
            }
        }
        assert intervalCount == missingCount;
        for (int i = 0; i < intervalCount; i++)
            dividers[i] = (int)intervals[i].sum;
        return dividers;
    }
}

为了公平起见，此类接收 NumberBag 对象形式的输入。 NumberBag 不允许数组修改和随机访问，并且还会统计数组被请求顺序遍历的次数。它也比 Iterable更适合大型数组测试，因为它避免了原始 int 值的装箱，并允许包装大型 int[] 的一部分 方便测试准备。如果需要的话，在 findint[] 或 Iterable 类型替换 NumberBag 并不难。 code> 签名，将其中的两个 for 循环更改为 foreach 循环。

import java.util.*;

public abstract class NumberBag {
    private int passCount;

    public void startOver() {
        passCount++;
    }

    public final int getPassCount() {
        return passCount;
    }

    public abstract boolean hasNext();

    public abstract int next();

    // A lightweight version of Iterable<Integer> to avoid boxing of int
    public static NumberBag fromArray(int[] base, int fromIndex, int toIndex) {
        return new NumberBag() {
            int index = toIndex;

            public void startOver() {
                super.startOver();
                index = fromIndex;
            }

            public boolean hasNext() {
                return index < toIndex;
            }

            public int next() {
                if (index >= toIndex)
                    throw new NoSuchElementException();
                return base[index++];
            }
        };
    }

    public static NumberBag fromArray(int[] base) {
        return fromArray(base, 0, base.length);
    }

    public static NumberBag fromIterable(Iterable<Integer> base) {
        return new NumberBag() {
            Iterator<Integer> it;

            public void startOver() {
                super.startOver();
                it = base.iterator();
            }

            public boolean hasNext() {
                return it.hasNext();
            }

            public int next() {
                return it.next();
            }
        };
    }
}

测试

下面给出了演示这些类的用法的简单示例。

import java.util.*;

public class SimpleTest {
    public static void main(String[] args) {
        int[] input = { 7, 1, 4, 9, 6, 2 };
        NumberBag bag = NumberBag.fromArray(input);
        int[] output = MissingNumbers.find(1, 10, bag);
        System.out.format("Input: %s%nMissing numbers: %s%nPass count: %d%n",
                Arrays.toString(input), Arrays.toString(output), bag.getPassCount());

        List<Integer> inputList = new ArrayList<>();
        for (int i = 0; i < 10; i++)
            inputList.add(2 * i);
        Collections.shuffle(inputList);
        bag = NumberBag.fromIterable(inputList);
        output = MissingNumbers.find(0, 19, bag);
        System.out.format("%nInput: %s%nMissing numbers: %s%nPass count: %d%n",
                inputList, Arrays.toString(output), bag.getPassCount());

        // Sieve of Eratosthenes
        final int MAXN = 1_000;
        List<Integer> nonPrimes = new ArrayList<>();
        nonPrimes.add(1);
        int[] primes;
        int lastPrimeIndex = 0;
        while (true) {
            primes = MissingNumbers.find(1, MAXN, NumberBag.fromIterable(nonPrimes));
            int p = primes[lastPrimeIndex]; // guaranteed to be prime
            int q = p;
            for (int i = lastPrimeIndex++; i < primes.length; i++) {
                q = primes[i]; // not necessarily prime
                int pq = p * q;
                if (pq > MAXN)
                    break;
                nonPrimes.add(pq);
            }
            if (q == p)
                break;
        }
        System.out.format("%nSieve of Eratosthenes. %d primes up to %d found:%n",
                primes.length, MAXN);
        for (int i = 0; i < primes.length; i++)
            System.out.format(" %4d%s", primes[i], (i % 10) < 9 ? "" : "\n");
    }
}

大型阵列测试可以通过以下方式执行：

import java.util.*;

public class BatchTest {
    private static final Random rand = new Random();
    public static int MIN_NUMBER = 1;
    private final int minNumber = MIN_NUMBER;
    private final int numberCount;
    private final int[] numbers;
    private int missingCount;
    public long finderTime;

    public BatchTest(int numberCount) {
        this.numberCount = numberCount;
        numbers = new int[numberCount];
        for (int i = 0; i < numberCount; i++)
            numbers[i] = minNumber + i;
    }

    private int passBound() {
        int mBound = missingCount > 0 ? missingCount : 1;
        int nBound = 34 - Integer.numberOfLeadingZeros(numberCount - 1); // ceil(log_2(numberCount)) + 2
        return Math.min(mBound, nBound);
    }

    private void error(String cause) {
        throw new RuntimeException("Error on '" + missingCount + " from " + numberCount + "' test, " + cause);
    }

    // returns the number of times the input array was traversed in this test
    public int makeTest(int missingCount) {
        this.missingCount = missingCount;
        // numbers array is reused when numberCount stays the same,
        // just Fisher–Yates shuffle it for each test
        for (int i = numberCount - 1; i > 0; i--) {
            int j = rand.nextInt(i + 1);
            if (i != j) {
                int t = numbers[i];
                numbers[i] = numbers[j];
                numbers[j] = t;
            }
        }
        final int bagSize = numberCount - missingCount;
        NumberBag inputBag = NumberBag.fromArray(numbers, 0, bagSize);
        finderTime -= System.nanoTime();
        int[] found = MissingNumbers.find(minNumber, minNumber + numberCount - 1, inputBag);
        finderTime += System.nanoTime();
        if (inputBag.getPassCount() > passBound())
            error("too many passes (" + inputBag.getPassCount() + " while only " + passBound() + " allowed)");
        if (found.length != missingCount)
            error("wrong result length");
        int j = bagSize; // "missing" part beginning in numbers
        Arrays.sort(numbers, bagSize, numberCount);
        for (int i = 0; i < missingCount; i++)
            if (found[i] != numbers[j++])
                error("wrong result array, " + i + "-th element differs");
        return inputBag.getPassCount();
    }

    public static void strideCheck(int numberCount, int minMissing, int maxMissing, int step, int repeats) {
        BatchTest t = new BatchTest(numberCount);
        System.out.println("╠═══════════════════════╬═════════════════╬═════════════════╣");
        for (int missingCount = minMissing; missingCount <= maxMissing; missingCount += step) {
            int minPass = Integer.MAX_VALUE;
            int passSum = 0;
            int maxPass = 0;
            t.finderTime = 0;
            for (int j = 1; j <= repeats; j++) {
                int pCount = t.makeTest(missingCount);
                if (pCount < minPass)
                    minPass = pCount;
                passSum += pCount;
                if (pCount > maxPass)
                    maxPass = pCount;
            }
            System.out.format("║ %9d  %9d  ║  %2d  %5.2f  %2d  ║  %11.3f    ║%n", missingCount, numberCount, minPass,
                    (double)passSum / repeats, maxPass, t.finderTime * 1e-6 / repeats);
        }
    }

    public static void main(String[] args) {
        System.out.println("╔═══════════════════════╦═════════════════╦═════════════════╗");
        System.out.println("║      Number count     ║      Passes     ║  Average time   ║");
        System.out.println("║   missimg     total   ║  min  avg   max ║ per search (ms) ║");
        long time = System.nanoTime();
        strideCheck(100, 0, 100, 1, 20_000);
        strideCheck(100_000, 2, 99_998, 1_282, 15);
        MIN_NUMBER = -2_000_000_000;
        strideCheck(300_000_000, 1, 10, 1, 1);
        time = System.nanoTime() - time;
        System.out.println("╚═══════════════════════╩═════════════════╩═════════════════╝");
        System.out.format("%nSuccess. Total time: %.2f s.%n", time * 1e-9);
    }
}

在 Ideone 上尝试

Here is a solution that doesn't rely on complex math as sdcvvc's/Dimitris Andreou's answers do, doesn't change the input array as caf and Colonel Panic did, and doesn't use the bitset of enormous size as Chris Lercher, JeremyP and many others did. Basically, I began with Svalorzen's/Gilad Deutch's idea for Q2, generalized it to the common case Qk and implemented in Java to prove that the algorithm works.

The idea

Suppose we have an arbitrary interval I of which we only know that it contains at least one of the missing numbers. After one pass through the input array, looking only at the numbers from I, we can obtain both the sum S and the quantity Q of missing numbers from I. We do this by simply decrementing I's length each time we encounter a number from I (for obtaining Q) and by decreasing pre-calculated sum of all numbers in I by that encountered number each time (for obtaining S).

Now we look at S and Q. If Q = 1, it means that then I contains only one of the missing numbers, and this number is clearly S. We mark I as finished (it is called "unambiguous" in the program) and leave it out from further consideration. On the other hand, if Q > 1, we can calculate the average A = S / Q of missing numbers contained in I. As all numbers are distinct, at least one of such numbers is strictly less than A and at least one is strictly greater than A. Now we split I in A into two smaller intervals each of which contains at least one missing number. Note that it doesn't matter to which of the intervals we assign A in case it is an integer.

We make the next array pass calculating S and Q for each of the intervals separately (but in the same pass) and after that mark intervals with Q = 1 and split intervals with Q > 1. We continue this process until there are no new "ambiguous" intervals, i.e. we have nothing to split because each interval contains exactly one missing number (and we always know this number because we know S). We start out from the sole "whole range" interval containing all possible numbers (like [1..N] in the question).

Time and space complexity analysis

The total number of passes p we need to make until the process stops is never greater than the missing numbers count k. The inequality p <= k can be proved rigorously. On the other hand, there is also an empirical upper bound p < log₂N + 3 that is useful for large values of k. We need to make a binary search for each number of the input array to determine the interval to which it belongs. This adds the log k multiplier to the time complexity.

In total, the time complexity is O(N ᛫ min(k, log N) ᛫ log k). Note that for large k, this is significantly better than that of sdcvvc/Dimitris Andreou's method, which is O(N ᛫ k).

For its work, the algorithm requires O(k) additional space for storing at most k intervals, that is significantly better than O(N) in "bitset" solutions.

Java implementation

Here's a Java class that implements the above algorithm. It always returns a sorted array of missing numbers. Besides that, it doesn't require the missing numbers count k because it calculates it in the first pass. The whole range of numbers is given by the minNumber and maxNumber parameters (e.g. 1 and 100 for the first example in the question).

public class MissingNumbers {
    private static class Interval {
        boolean ambiguous = true;
        final int begin;
        int quantity;
        long sum;

        Interval(int begin, int end) { // begin inclusive, end exclusive
            this.begin = begin;
            quantity = end - begin;
            sum = quantity * ((long)end - 1 + begin) / 2;
        }

        void exclude(int x) {
            quantity--;
            sum -= x;
        }
    }

    public static int[] find(int minNumber, int maxNumber, NumberBag inputBag) {
        Interval full = new Interval(minNumber, ++maxNumber);
        for (inputBag.startOver(); inputBag.hasNext();)
            full.exclude(inputBag.next());
        int missingCount = full.quantity;
        if (missingCount == 0)
            return new int[0];
        Interval[] intervals = new Interval[missingCount];
        intervals[0] = full;
        int[] dividers = new int[missingCount];
        dividers[0] = minNumber;
        int intervalCount = 1;
        while (true) {
            int oldCount = intervalCount;
            for (int i = 0; i < oldCount; i++) {
                Interval itv = intervals[i];
                if (itv.ambiguous)
                    if (itv.quantity == 1) // number inside itv uniquely identified
                        itv.ambiguous = false;
                    else
                        intervalCount++; // itv will be split into two intervals
            }
            if (oldCount == intervalCount)
                break;
            int newIndex = intervalCount - 1;
            int end = maxNumber;
            for (int oldIndex = oldCount - 1; oldIndex >= 0; oldIndex--) {
                // newIndex always >= oldIndex
                Interval itv = intervals[oldIndex];
                int begin = itv.begin;
                if (itv.ambiguous) {
                    // split interval itv
                    // use floorDiv instead of / because input numbers can be negative
                    int mean = (int)Math.floorDiv(itv.sum, itv.quantity) + 1;
                    intervals[newIndex--] = new Interval(mean, end);
                    intervals[newIndex--] = new Interval(begin, mean);
                } else
                    intervals[newIndex--] = itv;
                end = begin;
            }
            for (int i = 0; i < intervalCount; i++)
                dividers[i] = intervals[i].begin;
            for (inputBag.startOver(); inputBag.hasNext();) {
                int x = inputBag.next();
                // find the interval to which x belongs
                int i = java.util.Arrays.binarySearch(dividers, 0, intervalCount, x);
                if (i < 0)
                    i = -i - 2;
                Interval itv = intervals[i];
                if (itv.ambiguous)
                    itv.exclude(x);
            }
        }
        assert intervalCount == missingCount;
        for (int i = 0; i < intervalCount; i++)
            dividers[i] = (int)intervals[i].sum;
        return dividers;
    }
}

For fairness, this class receives input in form of NumberBag objects. NumberBag doesn't allow array modification and random access and also counts how many times the array was requested for sequential traversing. It is also more suitable for large array testing than Iterable<Integer> because it avoids boxing of primitive int values and allows wrapping a part of a large int[] for a convenient test preparation. It is not hard to replace, if desired, NumberBag by int[] or Iterable<Integer> type in the find signature, by changing two for-loops in it into foreach ones.

import java.util.*;

public abstract class NumberBag {
    private int passCount;

    public void startOver() {
        passCount++;
    }

    public final int getPassCount() {
        return passCount;
    }

    public abstract boolean hasNext();

    public abstract int next();

    // A lightweight version of Iterable<Integer> to avoid boxing of int
    public static NumberBag fromArray(int[] base, int fromIndex, int toIndex) {
        return new NumberBag() {
            int index = toIndex;

            public void startOver() {
                super.startOver();
                index = fromIndex;
            }

            public boolean hasNext() {
                return index < toIndex;
            }

            public int next() {
                if (index >= toIndex)
                    throw new NoSuchElementException();
                return base[index++];
            }
        };
    }

    public static NumberBag fromArray(int[] base) {
        return fromArray(base, 0, base.length);
    }

    public static NumberBag fromIterable(Iterable<Integer> base) {
        return new NumberBag() {
            Iterator<Integer> it;

            public void startOver() {
                super.startOver();
                it = base.iterator();
            }

            public boolean hasNext() {
                return it.hasNext();
            }

            public int next() {
                return it.next();
            }
        };
    }
}

Tests

Simple examples demonstrating the usage of these classes are given below.

import java.util.*;

public class SimpleTest {
    public static void main(String[] args) {
        int[] input = { 7, 1, 4, 9, 6, 2 };
        NumberBag bag = NumberBag.fromArray(input);
        int[] output = MissingNumbers.find(1, 10, bag);
        System.out.format("Input: %s%nMissing numbers: %s%nPass count: %d%n",
                Arrays.toString(input), Arrays.toString(output), bag.getPassCount());

        List<Integer> inputList = new ArrayList<>();
        for (int i = 0; i < 10; i++)
            inputList.add(2 * i);
        Collections.shuffle(inputList);
        bag = NumberBag.fromIterable(inputList);
        output = MissingNumbers.find(0, 19, bag);
        System.out.format("%nInput: %s%nMissing numbers: %s%nPass count: %d%n",
                inputList, Arrays.toString(output), bag.getPassCount());

        // Sieve of Eratosthenes
        final int MAXN = 1_000;
        List<Integer> nonPrimes = new ArrayList<>();
        nonPrimes.add(1);
        int[] primes;
        int lastPrimeIndex = 0;
        while (true) {
            primes = MissingNumbers.find(1, MAXN, NumberBag.fromIterable(nonPrimes));
            int p = primes[lastPrimeIndex]; // guaranteed to be prime
            int q = p;
            for (int i = lastPrimeIndex++; i < primes.length; i++) {
                q = primes[i]; // not necessarily prime
                int pq = p * q;
                if (pq > MAXN)
                    break;
                nonPrimes.add(pq);
            }
            if (q == p)
                break;
        }
        System.out.format("%nSieve of Eratosthenes. %d primes up to %d found:%n",
                primes.length, MAXN);
        for (int i = 0; i < primes.length; i++)
            System.out.format(" %4d%s", primes[i], (i % 10) < 9 ? "" : "\n");
    }
}

Large array testing can be performed this way:

import java.util.*;

public class BatchTest {
    private static final Random rand = new Random();
    public static int MIN_NUMBER = 1;
    private final int minNumber = MIN_NUMBER;
    private final int numberCount;
    private final int[] numbers;
    private int missingCount;
    public long finderTime;

    public BatchTest(int numberCount) {
        this.numberCount = numberCount;
        numbers = new int[numberCount];
        for (int i = 0; i < numberCount; i++)
            numbers[i] = minNumber + i;
    }

    private int passBound() {
        int mBound = missingCount > 0 ? missingCount : 1;
        int nBound = 34 - Integer.numberOfLeadingZeros(numberCount - 1); // ceil(log_2(numberCount)) + 2
        return Math.min(mBound, nBound);
    }

    private void error(String cause) {
        throw new RuntimeException("Error on '" + missingCount + " from " + numberCount + "' test, " + cause);
    }

    // returns the number of times the input array was traversed in this test
    public int makeTest(int missingCount) {
        this.missingCount = missingCount;
        // numbers array is reused when numberCount stays the same,
        // just Fisher–Yates shuffle it for each test
        for (int i = numberCount - 1; i > 0; i--) {
            int j = rand.nextInt(i + 1);
            if (i != j) {
                int t = numbers[i];
                numbers[i] = numbers[j];
                numbers[j] = t;
            }
        }
        final int bagSize = numberCount - missingCount;
        NumberBag inputBag = NumberBag.fromArray(numbers, 0, bagSize);
        finderTime -= System.nanoTime();
        int[] found = MissingNumbers.find(minNumber, minNumber + numberCount - 1, inputBag);
        finderTime += System.nanoTime();
        if (inputBag.getPassCount() > passBound())
            error("too many passes (" + inputBag.getPassCount() + " while only " + passBound() + " allowed)");
        if (found.length != missingCount)
            error("wrong result length");
        int j = bagSize; // "missing" part beginning in numbers
        Arrays.sort(numbers, bagSize, numberCount);
        for (int i = 0; i < missingCount; i++)
            if (found[i] != numbers[j++])
                error("wrong result array, " + i + "-th element differs");
        return inputBag.getPassCount();
    }

    public static void strideCheck(int numberCount, int minMissing, int maxMissing, int step, int repeats) {
        BatchTest t = new BatchTest(numberCount);
        System.out.println("╠═══════════════════════╬═════════════════╬═════════════════╣");
        for (int missingCount = minMissing; missingCount <= maxMissing; missingCount += step) {
            int minPass = Integer.MAX_VALUE;
            int passSum = 0;
            int maxPass = 0;
            t.finderTime = 0;
            for (int j = 1; j <= repeats; j++) {
                int pCount = t.makeTest(missingCount);
                if (pCount < minPass)
                    minPass = pCount;
                passSum += pCount;
                if (pCount > maxPass)
                    maxPass = pCount;
            }
            System.out.format("║ %9d  %9d  ║  %2d  %5.2f  %2d  ║  %11.3f    ║%n", missingCount, numberCount, minPass,
                    (double)passSum / repeats, maxPass, t.finderTime * 1e-6 / repeats);
        }
    }

    public static void main(String[] args) {
        System.out.println("╔═══════════════════════╦═════════════════╦═════════════════╗");
        System.out.println("║      Number count     ║      Passes     ║  Average time   ║");
        System.out.println("║   missimg     total   ║  min  avg   max ║ per search (ms) ║");
        long time = System.nanoTime();
        strideCheck(100, 0, 100, 1, 20_000);
        strideCheck(100_000, 2, 99_998, 1_282, 15);
        MIN_NUMBER = -2_000_000_000;
        strideCheck(300_000_000, 1, 10, 1, 1);
        time = System.nanoTime() - time;
        System.out.println("╚═══════════════════════╩═════════════════╩═════════════════╝");
        System.out.format("%nSuccess. Total time: %.2f s.%n", time * 1e-9);
    }
}

Try them out on Ideone

回复收藏 0 原文

羁绊已千年 2024-12-18 19:44:59

我认为这不需要任何复杂的数学方程和理论就可以完成。下面是一个就地和 O(2n) 时间复杂度解决方案的建议：

输入形式假设：

袋子中的数字 = n

丢失的数字 = k

袋子中的数字由长度为 n 的数组表示

输入的长度algo 的数组 = n

数组中缺失的条目（从包中取出的数字）将被数组中第一个元素的值替换。

例如。最初的包看起来像 [2,9,3,7,8,6,4,5,1,10]。
如果取出 4，则 4 的值将变为 2（数组的第一个元素）。
因此，取出 4 后，袋子将看起来像 [2,9,3,7,8,6,2,5,1,10]

此解决方案的关键是通过对值取反来标记已访问数字的 INDEX该 INDEX 作为数组被遍历。

    IEnumerable<int> GetMissingNumbers(int[] arrayOfNumbers)
    {
        List<int> missingNumbers = new List<int>();
        int arrayLength = arrayOfNumbers.Length;

        //First Pass
        for (int i = 0; i < arrayLength; i++)
        {
            int index = Math.Abs(arrayOfNumbers[i]) - 1;
            if (index > -1)
            {
                arrayOfNumbers[index] = Math.Abs(arrayOfNumbers[index]) * -1; //Marking the visited indexes
            }
        }

        //Second Pass to get missing numbers
        for (int i = 0; i < arrayLength; i++)
        {                
            //If this index is unvisited, means this is a missing number
            if (arrayOfNumbers[i] > 0)
            {
                missingNumbers.Add(i + 1);
            }
        }

        return missingNumbers;
    }

I think this can be done without any complex mathematical equations and theories. Below is a proposal for an in place and O(2n) time complexity solution:

Input form assumptions :

# of numbers in bag = n

# of missing numbers = k

The numbers in the bag are represented by an array of length n

Length of input array for the algo = n

Missing entries in the array (numbers taken out of the bag) are replaced by the value of the first element in the array.

Eg. Initially bag looks like [2,9,3,7,8,6,4,5,1,10].
If 4 is taken out, value of 4 will become 2 (the first element of the array).
Therefore after taking 4 out the bag will look like [2,9,3,7,8,6,2,5,1,10]

The key to this solution is to tag the INDEX of a visited number by negating the value at that INDEX as the array is traversed.

    IEnumerable<int> GetMissingNumbers(int[] arrayOfNumbers)
    {
        List<int> missingNumbers = new List<int>();
        int arrayLength = arrayOfNumbers.Length;

        //First Pass
        for (int i = 0; i < arrayLength; i++)
        {
            int index = Math.Abs(arrayOfNumbers[i]) - 1;
            if (index > -1)
            {
                arrayOfNumbers[index] = Math.Abs(arrayOfNumbers[index]) * -1; //Marking the visited indexes
            }
        }

        //Second Pass to get missing numbers
        for (int i = 0; i < arrayLength; i++)
        {                
            //If this index is unvisited, means this is a missing number
            if (arrayOfNumbers[i] > 0)
            {
                missingNumbers.Add(i + 1);
            }
        }

        return missingNumbers;
    }

回复收藏 0 原文

少年亿悲伤 2024-12-18 19:44:59

感谢这个非常有趣的问题：

因为你提醒了我牛顿的工作确实可以解决这个问题

请参考

作为要查找的变量数量=方程数量（必须保持一致性）

我相信为此我们应该提出收集数字以创建许多不同方程的能力。

我不知道，但是，我相信是否应该有一个函数 f ，我们将为其添加 f( x_i )

x< sub>1 + x₂ + ... + x_k = z₁

x₁ ² + x₂² + ... + x_k² = z₂

。 …………………………

x₁^{k support> + x₂^k + ... + x_k^k = z_k}

其余的是一部不确定时间和空间复杂性的数学著作，但牛顿恒等式肯定会发挥重要作用。

我们不能使用集合论吗
.difference_update() 或者这个问题方法中是否有线性代数的机会？

回复收藏 0 原文

沧桑㈠ 2024-12-18 19:44:59

非常好的问题。我会选择对 Qk 使用设定的差异。许多编程语言甚至都支持它，例如 Ruby：

missing = (1..100).to_a - bag

它可能不是最有效的解决方案，但如果我在这种情况下面临这样的任务（已知边界，低边界），我会在现实生活中使用它。如果数字集非常大，那么我当然会考虑更有效的算法，但在那之前简单的解决方案对我来说就足够了。

Very nice problem. I'd go for using a set difference for Qk. A lot of programming languages even have support for it, like in Ruby:

missing = (1..100).to_a - bag

It's probably not the most efficient solution but it's one I would use in real life if I was faced with such a task in this case (known boundaries, low boundaries). If the set of number would be very large then I would consider a more efficient algorithm, of course, but until then the simple solution would be enough for me.

回复收藏 0 原文

错々过的事 2024-12-18 19:44:59

您可能需要澄清 O(k) 的含义。

这是任意 k 的一个简单解决方案：对于数字集中的每个 v，累加 2^v 的总和。最后，将 i 从 1 循环到 N。如果与 2^i 进行按位与运算的和为零，则 i 丢失。（或者在数字上，如果总和除以 2^i 的下限是偶数。或者 sum modulo 2^(i+1)） sum modulo 2^(i+1)) sum modulo 2^(i+1) 2^i。）

很简单，对吧？ O(N)时间，O(1)存储，并且支持任意k。

除非您要计算大量数字，而在真实计算机上，每个数字都需要 O(N) 空间。事实上，这个解决方案与位向量相同。

因此，您可以聪明地计算总和、平方和以及立方和……直到 v^k 之和，然后进行复杂的数学运算来提取结果。但这些数字也很大，这就引出了一个问题：我们在谈论什么抽象的操作模型？ O(1) 空间适合多少，以及需要多长时间才能对所需大小的数字求和？

回复收藏 0 原文

小女人ら 2024-12-18 19:44:59

我已经阅读了所有 30 个答案，发现最简单的一个，即使用 100 的位数组是最好的。但正如问题所说，我们不能使用大小为 N 的数组，我将使用 O(1) 空间复杂度和 k 次迭代，即 O(NK) 时间复杂度来解决这个问题。

为了让解释更简单，假设我得到了从 1 到 15 的数字，其中两个缺失，即 9 和 14，但我不知道。让包看起来像这样：

[8,1,2,12,4,7,5,10,11,13,15,3,6]。

我们知道每个数字在内部都是以位的形式表示的。
对于 16 之前的数字，我们只需要 4 位。对于 10^9 之前的数字，我们需要 32 位。但我们先关注 4 位，然后再对其进行概括。

现在，假设如果我们有从 1 到 15 的所有数字，那么在内部，我们将有这样的数字（如果我们对它们进行排序）：

但现在我们缺少两个数字。因此，我们的表示形式将如下所示（为了便于理解而按顺序显示，但可以按任何顺序）：

(2MSD|2LSD)
00|01
00|10
00|11
-----
01|00
01|01
01|10
01|11
-----
10|00
missing=(10|01) 
10|10
10|11
-----
11|00
11|01
missing=(11|10)
11|11

现在让我们创建一个大小为 2 的位数组，用于保存具有相应 2 个最高有效位的数字的计数。即

= [__,__,__,__] 
   00,01,10,11

从左和右扫描袋子并填充上面的数组，使得位数组的每个bin都包含数字的计数。结果如下：

= [ 3, 4, 3, 3] 
   00,01,10,11

如果所有数字都存在，则结果将如下所示：

= [ 3, 4, 4, 4] 
   00,01,10,11

因此我们知道缺少两个数字：一个的最高 2 位有效位是 10，一个的最高 2 位有效位是 11现在再次扫描列表并为低 2 位有效数字填写大小为 2 的位数组。这次，只考虑最高 2 位有效数字为 10 的元素。我们的位数组将如下：

= [ 1, 0, 1, 1] 
   00,01,10,11

如果所有 MSD=10 的数字都存在，则所有 bin 中都会有 1，但现在我们发现缺少一个。因此，我们得到了 MSD=10 和 LSD=01 缺失的数字，即 1001，即 9。

同样，如果我们再次扫描但只考虑 MSD=11 的元素，我们会得到 MSD=11 和 LSD=10 缺失，即 1110，即14.

= [ 1, 0, 1, 1] 
   00,01,10,11

这样，我们就可以在一定的空间内找到缺失的数字。我们可以将其推广为 100、1000 或 10^9 或任何数字集。

参考文献：http://users.ece.utexas 中的问题 1.6。 edu/~adnan/afi-samples-new.pdf

I have read all thirty answers and found the simplest one i.e to use a bit array of 100 to be the best. But as the question said we can't use an array of size N, I would use O(1) space complexity and k iterations i.e O(NK) time complexity to solve this.

To make the explanation simpler, consider I have been given numbers from 1 to 15 and two of them are missing i.e 9 and 14 but I don't know. Let the bag look like this:

[8,1,2,12,4,7,5,10,11,13,15,3,6].

We know that each number is represented internally in the form of bits.
For numbers till 16 we only need 4 bits. For numbers till 10^9, we will need 32 bits. But let's focus on 4 bits and then later we can generalize it.

Now, assume if we had all the numbers from 1 to 15, then internally, we would have numbers like this (if we had them ordered):

But now we have two numbers missing. So our representation will look something like this (shown ordered for understanding but can be in any order):

(2MSD|2LSD)
00|01
00|10
00|11
-----
01|00
01|01
01|10
01|11
-----
10|00
missing=(10|01) 
10|10
10|11
-----
11|00
11|01
missing=(11|10)
11|11

Now let's make a bit array of size 2 that holds the count of numbers with corresponding 2 most significant digits. i.e

= [__,__,__,__] 
   00,01,10,11

Scan the bag from left and right and fill the above array such that each of bin of bit array contains the count of numbers. The result will be as under:

= [ 3, 4, 3, 3] 
   00,01,10,11

If all the numbers would have been present, it would have looked like this:

= [ 3, 4, 4, 4] 
   00,01,10,11

Thus we know that there are two numbers missing: one whose most 2 significant digits are 10 and one whose most 2 significant bits are 11. Now scan the list again and fill out a bit array of size 2 for the lower 2 significant digits. This time, only consider elements whose most 2 significant digits are 10. We will have the bit array as:

= [ 1, 0, 1, 1] 
   00,01,10,11

If all numbers of MSD=10 were present, we would have 1 in all the bins but now we see that one is missing. Thus we have the number whose MSD=10 and LSD=01 is missing which is 1001 i.e 9.

Similarly, if we scan again but consider only elements whose MSD=11,we get MSD=11 and LSD=10 missing which is 1110 i.e 14.

= [ 1, 0, 1, 1] 
   00,01,10,11

Thus, we can find the missing numbers in a constant amount of space. We can generalize this for 100, 1000 or 10^9 or any set of numbers.

References: Problem 1.6 in http://users.ece.utexas.edu/~adnan/afi-samples-new.pdf

回复收藏 0 原文

┊风居住的梦幻卍 2024-12-18 19:44:59

您可以尝试使用 Bloom Filter。将袋中的每个数字插入到 Bloom 中，然后迭代完整的 1-k 集合，直到报告未找到每个数字。这可能无法在所有情况下找到答案，但可能是一个足够好的解决方案。

回复收藏 0 原文

高跟鞋的旋律 2024-12-18 19:44:59

我会对这个问题采取不同的方法，并向面试官询问他试图解决的更大问题的更多细节。根据问题及其周围的要求，明显的基于集合的解决方案可能是正确的，而生成列表并随后选择的方法可能不是。

例如，面试官可能要发送 n 条消息，并且需要知道没有得到回复的 k 条消息，并且需要知道它：第 nk 个回复到达后，挂钟时间尽可能短。我们还可以说，消息通道的性质是，即使全速运行，也有足够的时间在消息之间进行一些处理，而不会对最后一个回复到达后生成最终结果所需的时间产生任何影响。可以利用这段时间将每条发送的消息的一些标识方面插入到一个集合中，并在每个相应的回复到达时将其删除。一旦最后一个回复到达，唯一要做的就是从集合中删除其标识符，这在典型的实现中需要O(log k+1)。之后，该集合包含 k 个缺失元素的列表，并且无需进行任何其他处理。

这当然不是批量处理预先生成的数字包的最快方法，因为整个过程的运行时间为 O((log 1 + log 2 + ... + log n) + (log n + log n-1 + ... + log k))。但它确实适用于任何 k 值（即使事先不知道），并且在上面的示例中，它以最小化最关键间隔的方式应用。

回复收藏 0 原文

愁杀 2024-12-18 19:44:59

这可能听起来很愚蠢，但是，在呈现给您的第一个问题中，您必须查看袋子中的所有剩余数字，然后将它们实际相加，以使用该方程找到丢失的数字。

因此，既然您可以看到所有数字，只需查找丢失的数字即可。当两个数字缺失时也是如此。我认为很简单。当你看到袋子里剩余的数字时，使用方程式就没有意义了。

回复收藏 0 原文

坚持沉默 2024-12-18 19:44:59

我不知道这是否有效，但我想建议这个解决方案。

计算 100 个元素的异
或计算 98 个元素的异或（删除 2 个元素后）
现在（1 的结果） XOR（2 的结果）给出两个缺失的 no 的异或 i..ea XOR b 如果 a 和b 是缺失的元素
4.使用总和公式 diff 的常用方法获取缺失的 No 的总和，假设 diff 为 d。

现在运行一个循环来获取可能的对 (p,q)，它们都位于 [1 , 100] 中，并且总和为 d。

当获得一对时，检查（3 的结果）是否 XOR p = q
如果是，我们就完成了。

如果我错了，请纠正我，如果这是正确的，请评论时间复杂度

回复收藏 0 原文

失而复得 2024-12-18 19:44:59

您可以通过对称性（数学语言中的群）来思考它来激发解决方案。无论这组数字的顺序如何，答案都应该是相同的。如果您要使用 k 函数来帮助确定缺失的元素，您应该考虑哪些函数具有该属性：对称。函数 s_1(x) = x_1 + x_2 + ... + x_n 是对称函数的一个示例，但还有其他更高阶的函数。特别是考虑初等对称函数。 2 次初等对称函数为 s_2(x) = x_1 x_2 + x_1 x_3 + ... + x_1 x_n + x_2 x_3 + ... + x_(n-1) x_n，总和两种元素的所有乘积。对于 3 次及以上的初等对称函数也是如此。它们显然是对称的。此外，事实证明它们是所有对称函数的构建块。

您可以通过记下 s_2(x,x_(n+1)) = s_2(x) + s_1(x)(x_(n+1)) 来构建初等对称函数。进一步思考应该会让您相信 s_3(x,x_(n+1)) = s_3(x) + s_2(x)(x_(n+1)) 等等，因此它们可以是一次性计算。

我们如何判断数组中缺少哪些项目？考虑一下多项式(z-x_1)(z-x_2)...(z-x_n)。如果您输入任何数字 x_i，其计算结果为 0。展开多项式，可得到 z^n-s_1(x)z^(n-1)+ ... + (-1)^n s_n。初等对称函数也出现在这里，这并不奇怪，因为如果我们对根应用任何排列，多项式应该保持不变。

因此，我们可以构建多项式并尝试将其分解以找出哪些数字不在集合中，正如其他人提到的那样。

最后，如果我们担心大数溢出内存（第 n 个对称多项式的量级为 100！），我们可以进行这些计算 mod p，其中 p 是大于 100 的素数。在这种情况下，我们计算多项式 mod p 并发现当输入是数字时它再次计算为 0在集合中，并且它的计算结果为当输入的数字不在集合中时，为非零值。然而，正如其他人指出的那样，为了及时从多项式中获取取决于 k 而不是 N 的值，我们必须对多项式 mod 进行因式分解p。

You can motivate the solution by thinking about it in terms of symmetries (groups, in math language). No matter the order of the set of numbers, the answer should be the same. If you're going to use k functions to help determine the missing elements, you should be thinking about what functions have that property: symmetric. The function s_1(x) = x_1 + x_2 + ... + x_n is an example of a symmetric function, but there are others of higher degree. In particular, consider the elementary symmetric functions. The elementary symmetric function of degree 2 is s_2(x) = x_1 x_2 + x_1 x_3 + ... + x_1 x_n + x_2 x_3 + ... + x_(n-1) x_n, the sum of all products of two elements. Similarly for the elementary symmetric functions of degree 3 and higher. They are obviously symmetric. Furthermore, it turns out they are the building blocks for all symmetric functions.

You can build the elementary symmetric functions as you go by noting that s_2(x,x_(n+1)) = s_2(x) + s_1(x)(x_(n+1)). Further thought should convince you that s_3(x,x_(n+1)) = s_3(x) + s_2(x)(x_(n+1)) and so on, so they can be computed in one pass.

How do we tell which items were missing from the array? Think about the polynomial (z-x_1)(z-x_2)...(z-x_n). It evaluates to 0 if you put in any of the numbers x_i. Expanding the polynomial, you get z^n-s_1(x)z^(n-1)+ ... + (-1)^n s_n. The elementary symmetric functions appear here too, which is really no surprise, since the polynomial should stay the same if we apply any permutation to the roots.

So we can build the polynomial and try to factor it to figure out which numbers are not in the set, as others have mentioned.

Finally, if we are concerned about overflowing memory with large numbers (the nth symmetric polynomial will be of the order 100!), we can do these calculations mod p where p is a prime bigger than 100. In that case we evaluate the polynomial mod p and find that it again evaluates to 0 when the input is a number in the set, and it evaluates to a non-zero value when the input is a number not in the set. However, as others have pointed out, to get the values out of the polynomial in time that depends on k, not N, we have to factor the polynomial mod p.

回复收藏 0 原文

私藏温柔 2024-12-18 19:44:59

我相信我有一个 O(k) 时间和 O(log(k)) 空间算法，假设您有 floor(x) > 和 log2(x) 函数可用于任意大整数：

您有一个 k 位长整数（因此有 log8(k) 空间) 在其中添加x^2，其中 x 是您在袋子中找到的下一个数字：s=1^2+2^2+... 这需要 O( N) 时间（这对面试官来说不是问题）。最后你会得到j=floor(log2(s))，这是你正在寻找的最大数字。然后s=sj，您再次执行上述操作：

for (i = 0 ; i < k ; i++)
{
  j = floor(log2(s));
  missing[i] = j;
  s -= j;
}

现在，您通常没有用于2756位整数的floor和log2函数，而是用于双精度数。所以？简单地说，对于每 2 个字节（或 1、3 或 4），您可以使用这些函数来获取所需的数字，但这会增加 O(N) 因子的时间复杂度

I believe I have a O(k) time and O(log(k)) space algorithm, given that you have the floor(x) and log2(x) functions for arbitrarily big integers available:

You have an k-bit long integer (hence the log8(k) space) where you add the x^2, where x is the next number you find in the bag: s=1^2+2^2+... This takes O(N) time (which is not a problem for the interviewer). At the end you get j=floor(log2(s)) which is the biggest number you're looking for. Then s=s-j and you do again the above:

for (i = 0 ; i < k ; i++)
{
  j = floor(log2(s));
  missing[i] = j;
  s -= j;
}

Now, you usually don't have floor and log2 functions for 2756-bit integers but instead for doubles. So? Simply, for each 2 bytes (or 1, or 3, or 4) you can use these functions to get the desired numbers, but this adds an O(N) factor to time complexity

回复收藏 0 原文

捶死心动 2024-12-18 19:44:59

尝试求 1 到 50 之间的数字的乘积：

设乘积，P1 = 1 x 2 x 3 x ........................ 50

当您逐个取出数字时，将它们相乘，以便你得到产品P2。但这里缺少两个数字，因此 P2 < P1。

两个缺失项的乘积 axb = P1 - P2。

您已经知道总和，a + b = S1。

由以上两个方程，通过二次方程求解a和b。 a 和 b 是您缺失的数字。

回复收藏 0 原文

~没有更多了~

关于作者

゛清羽墨安

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

简单的面试问题变得更难：给定数字 1..100，在给定 k 个缺失的情况下找到缺失的数字

说明

Clarifications

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（30）

动机

概念

代码

分析

Motivation

Concept

Code

Analysis

想法

时间和空间复杂度分析

Java 实现

测试

The idea

Time and space complexity analysis

Java implementation

Tests

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

简单的面试问题变得更难：给定数字 1..100，在给定 k 个缺失的情况下找到缺失的数字

说明

Clarifications

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（30）

动机

概念

代码

分析

Motivation

Concept

Code

Analysis

想法

时间和空间复杂度分析

Java 实现

测试

The idea

Time and space complexity analysis

Java implementation

Tests

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。