找出数组中重复的元素

发布于 2024-11-30 06:33:24 字数 480 浏览 1 评论 0原文

有一个大小为 n 的数组，数组中包含的元素在 1 到 n-1 之间，每个元素出现一次，只有一个元素出现多次。我们需要找到这个元素。

尽管这是一个非常常见的常见问题解答，但我仍然没有找到正确的答案。大多数建议是我应该将数组中的所有元素相加，然后从中减去所有索引的总和，但如果元素数量非常大，这将不起作用。它会溢出。还有关于使用异或门dup = dup ^ arr[i] ^ i的建议，我不清楚。

我想出了这个算法，它是加法算法的增强，将在很大程度上减少溢出的机会！

for i=0 to n-1
  begin :
    diff = A[i] - i;
    sum  = sum + diff;
  end

diff 包含重复元素，但使用此方法我无法找到重复元素的索引。为此，我需要再次遍历数组，这是不可取的。谁能想出一个更好的解决方案，不涉及加法或 XOR 方法在 O(n) 中工作？

原文

There is an array of size n and the elements contained in the array are between 1 and n-1 such that each element occurs once and just one element occurs more than once. We need to find this element.

Though this is a very FAQ, I still haven't found a proper answer. Most suggestions are that I should add up all the elements in the array and then subtract from it the sum of all the indices, but this won't work if the number of elements is very large. It will overflow. There have also been suggestions regarding the use of XOR gate dup = dup ^ arr[i] ^ i, which are not clear to me.

I have come up with this algorithm which is an enhancement of the addition algorithm and will reduce the chances of overflow to a great extent!

for i=0 to n-1
  begin :
    diff = A[i] - i;
    sum  = sum + diff;
  end

diff contains the duplicate element, but using this method I am unable to find out the index of the duplicate element. For that I need to traverse the array once more which is not desirable. Can anyone come up with a better solution that does not involve the addition method or the XOR method works in O(n)?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅蓝的眸勾画不出的柔情 2024-12-07 06:33:24

您可以通过多种方式来思考这个问题，具体取决于问题描述的限制。

如果您知道只有一个元素是重复的这一事实，那么有很多方法可以解决这个问题。一种特别聪明的解决方案是使用按位异或运算符。 XOR 具有以下有趣的属性：

XOR 是结合律，因此 (x ^ y) ^ z = x ^ (y ^ z)
XOR 是可交换的： x ^ y = y ^ x
XOR 是它自己的逆： x ^ y = 0 iff x = y
XOR 以零作为恒等式： x ^ 0 = x

这里的性质 (1) 和 (2) 意味着当对一组值进行 XOR 时，它将 XOR 应用于元素的顺序并不重要。您可以根据需要对元素重新排序或分组。属性 (3) 意味着，如果对相同的值进行多次异或，则会返回零，而属性 (4) 意味着，如果将任何值与 0 进行异或，则会返回原始数字。将所有这些属性放在一起，您会得到一个有趣的结果：如果对一组数字进行异或，则结果是该组中出现奇数次的所有数字的异或。这样做的原因是，当您将出现偶数次的数字异或在一起时，您可以将这些数字的异或分解为一组对。每对通过 (3) 与 0 进行异或，所有这些零的组合异或通过 (4) 返回零。因此，所有偶重数的数字都相互抵消。

要使用它来解决原始问题，请执行以下操作。首先，将列表中的所有数字异或在一起。这给出了所有出现奇数次的数字的异或，最终得到从 1 到 (n-1) 的所有数字，除了重复的数字。现在，将该值与从 1 到 (n-1) 的所有数字进行异或。然后，这会使之前未取消的 1 到 (n-1) 范围内的所有数字取消，只留下重复的值。此外，它的运行时间为 O(n)，并且仅使用 O(1) 空间，因为所有值的 XOR 都适合单个整数。

在您原来的帖子中，您考虑了另一种方法，该方法利用 1 到 n-1 的整数之和为 n(n-1)/2 这一事实。然而，您担心这会导致整数溢出并引发问题。在大多数机器上，这会导致溢出，这是正确的，但是（在大多数机器上）这不是问题，因为算术是使用固定精度整数（通常是 32 位整数）完成的。当发生整数溢出时，得到的数字并非毫无意义。相反，它只是计算实际结果后去掉除最低 32 位以外的所有内容后得到的值。从数学上来说，这称为模运算，计算机中的运算是对 2³² 进行模运算。不过，更一般地说，我们可以说，对于某个固定的 k，整数是以 k 为模存储的。

幸运的是，您从普通算术中了解和喜爱的许多算术定律在模算术中仍然适用。我们只需要更准确地使用我们的术语。如果 x 和 y 除以 k 时留下相同的余数，我们就说 x 与 y 模 k 全等（表示为 x ≡_k y）。这在物理机器上工作时很重要，因为当大多数硬件上发生整数溢出时，结果值与真值模 k 一致，其中 k 取决于字大小。幸运的是，以下定律在模算术中成立：

例如：

如果 x ==_k y 且 w ==_k z，则 x + w ==_k sub> y + z
如果 x ==_k y 且 w ==_k z，则 xw ==_k yz。

这意味着，如果您想通过查找数组元素的总和并减去预期总数来计算重复值，即使存在整数溢出，一切都会正常进行，因为标准算术仍然会产生相同的值（模 k）在硬件中。也就是说，您也可以使用基于 XOR 的方法，它根本不需要考虑溢出。 :-)

如果您不能保证恰好有一个元素是重复的，但您可以修改元素数组，那么有一种漂亮的算法可以查找重复值。这个较早的问题描述了如何完成这。直观地说，这个想法是您可以尝试使用桶排序对序列进行排序，其中数组元素本身也被回收以保留存储桶的空间。

如果不能保证恰好有一个元素是重复的，并且无法修改元素数组，那么问题就会困难得多。这是一个经典（而且很难！）的面试问题，据说 Don Knuth 花了 24 小时才解决。诀窍是通过将数组视为函数，将问题简化为 cycle-finding 的实例从数字 1-n 到 1-(n-1)，然后查找该函数的两个输入。然而，生成的算法，称为Floyd 的周期查找算法，非常美丽又简单。有趣的是，它与用于检测线性时间和恒定空间中链表中的循环的算法相同。我建议您查阅一下，因为它会定期出现在软件采访中。

有关算法的完整描述以及分析、正确性证明和 Python 实现，请查看 这个实现解决了问题。

希望这有帮助！

There are many ways that you can think about this problem, depending on the constraints of your problem description.

If you know for a fact that exactly one element is duplicated, then there are many ways to solve this problem. One particularly clever solution is to use the bitwise XOR operator. XOR has the following interesting properties:

XOR is associative, so (x ^ y) ^ z = x ^ (y ^ z)
XOR is commutative: x ^ y = y ^ x
XOR is its own inverse: x ^ y = 0 iff x = y
XOR has zero as an identity: x ^ 0 = x

Properties (1) and (2) here mean that when taking the XOR of a group of values, it doesn't matter what order you apply the XORs to the elements. You can reorder the elements or group them as you see fit. Property (3) means that if you XOR the same value together multiple times, you get back zero, and property (4) means that if you XOR anything with 0 you get back your original number. Taking all these properties together, you get an interesting result: if you take the XOR of a group of numbers, the result is the XOR of all numbers in the group that appear an odd number of times. The reason for this is that when you XOR together numbers that appear an even number of times, you can break the XOR of those numbers up into a set of pairs. Each pair XORs to 0 by (3), and th combined XOR of all these zeros gives back zero by (4). Consequently, all the numbers of even multiplicity cancel out.

To use this to solve the original problem, do the following. First, XOR together all the numbers in the list. This gives the XOR of all numbers that appear an odd number of times, which ends up being all the numbers from 1 to (n-1) except the duplicate. Now, XOR this value with the XOR of all the numbers from 1 to (n-1). This then makes all numbers in the range 1 to (n-1) that were not previously canceled out cancel out, leaving behind just the duplicated value. Moreover, this runs in O(n) time and only uses O(1) space, since the XOR of all the values fits into a single integer.

In your original post you considered an alternative approach that works by using the fact that the sum of the integers from 1 to n-1 is n(n-1)/2. You were concerned, however, that this would lead to integer overflow and cause a problem. On most machines you are right that this would cause an overflow, but (on most machines) this is not a problem because arithmetic is done using fixed-precision integers, commonly 32-bit integers. When an integer overflow occurs, the resulting number is not meaningless. Rather, it's just the value that you would get if you computed the actual result, then dropped off everything but the lowest 32 bits. Mathematically speaking, this is known as modular arithmetic, and the operations in the computer are done modulo 2³². More generally, though, let's say that integers are stored modulo k for some fixed k.

Fortunately, many of the arithmetical laws you know and love from normal arithmetic still hold in modular arithmetic. We just need to be more precise with our terminology. We say that x is congruent to y modulo k (denoted x ≡_k y) if x and y leave the same remainder when divided by k. This is important when working on a physical machine, because when an integer overflow occurs on most hardware, the resulting value is congruent to the true value modulo k, where k depends on the word size. Fortunately, the following laws hold true in modular arithmetic:

For example:

If x ≡_k y and w ≡_k z, then x + w ≡_k y + z
If x ≡_k y and w ≡_k z, then xw ≡_k yz.

This means that if you want to compute the duplicate value by finding the total sum of the elements of the array and subtracting out the expected total, everything will work out fine even if there is an integer overflow because standard arithmetic will still produce the same values (modulo k) in the hardware. That said, you could also use the XOR-based approach, which doesn't need to consider overflow at all. :-)

If you are not guaranteed that exactly one element is duplicated, but you can modify the array of elements, then there is a beautiful algorithm for finding the duplicated value. This earlier SO question describes how to accomplish this. Intuitively, the idea is that you can try to sort the sequence using a bucket sort, where the array of elements itself is recycled to hold the space for the buckets as well.

If you are not guaranteed that exactly one element is duplicated, and you cannot modify the array of elements, then the problem is much harder. This is a classic (and hard!) interview problem that reportedly took Don Knuth 24 hours to solve. The trick is to reduce the problem to an instance of cycle-finding by treating the array as a function from the numbers 1-n onto 1-(n-1) and then looking for two inputs to that function. However, the resulting algorithm, called Floyd's cycle-finding algorithm, is extremely beautiful and simple. Interestingly, it's the same algorithm you would use to detect a cycle in a linked list in linear time and constant space. I'd recommend looking it up, since it periodically comes up in software interviews.

For a complete description of the algorithm along with an analysis, correctness proof, and Python implementation, check out this implementation that solves the problem.

Hope this helps!

回复收藏 0 原文