在包含整数的数组中,一个值在数组中出现两次。你如何确定是哪一个?

发布于 2024-12-05 14:53:02 字数 493 浏览 3 评论 0原文

假设数组包含 1 到 1,000,000 之间的整数。

我知道解决这个问题的一些流行方法:

  1. 如果包含 1 到 1,000,000 之间的所有数字,找到数组元素的总和并从总和中减去它(n*n+1/2)
  2. 使用哈希映射(需要额外的内存)
  3. 使用位图(更少的内存开销)

我最近遇到了另一个解决方案,我需要一些帮助来理解其背后的逻辑:

保留单个基数累加器。您对累加器进行异或运算 索引和该索引处的值。

x ^ C ^ x == C 在这里很有用,因为每个数字都是 异或两次,除了其中两次的那个,它将出现 3 次。 (x ^ x ^ x == x) 以及最终索引,将出现一次。 因此,如果我们用最终索引作为累加器的种子,则累加器的 最终值将是列表中出现两次的数字。

如果有人可以帮助我理解这种方法背后的逻辑(用一个小例子!),我将不胜感激。

Assume that the array has integers between 1 and 1,000,000.

I know some popular ways of solving this problem:

  1. If all numbers between 1 and 1,000,000 are included, find the sum of the array elements and subtract it from the total sum (n*n+1/2)
  2. Use a hash map (needs extra memory)
  3. Use a bit map (less memory overhead)

I recently came across another solution and I need some help in understanding the logic behind it:

Keep a single radix accumulator. You exclusive-or the accumulator with
both the index and the value at that index.

The fact that x ^ C ^ x == C is useful here, since each number will be
xor'd twice, except the one that's in there twice, which will appear 3
times. (x ^ x ^ x == x) And the final index, which will appear once.
So if we seed the accumulator with the final index, the accumulator's
final value will be the number that is in the list twice.

I will appreciate it if some one can help me understand the logic behind this approach (with a small example!).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

花海 2024-12-12 14:53:02

假设您有一个累加器

int accumulator = 0;

,在循环的每一步,您都将累加器与 iv 进行异或,其中 i 是循环的索引迭代,v 是数组第 i 位置的值。

accumulator ^= (i ^ v)

通常,iv将是相同的数字,所以你最终会做

accumulator ^= (i ^ i)

但是i ^ i == 0,所以这将最终作为无操作,累加器的值将保持不变。此时我应该说,数组中数字的顺序并不重要,因为 XOR 是可交换的,因此即使数组被打乱,最后的结果仍然应该是 0(累加器的初始值)。

现在,如果一个数字在数组中出现两次怎么办?显然,这个数字在异或运算中会出现三次(一次是索引等于数字,一次是数字正常出现,一次是额外出现)。此外,其他数字之一只会出现一次(仅针对其索引)。

该解决方案现在继续假设仅出现一次的数字等于数组的最后一个索引,或者换句话说:数组中的数字范围是连续的,并且从要处理的第一个索引开始(编辑:感谢咖啡馆的提醒评论,这确实是我的想法,但我在写作时完全搞砸了)。以此(N 仅出现一次)为给定,考虑从

int accumulator = N;

有效开始使得 N 在异或运算中再次出现两次。此时,我们只剩下只出现两次的数字,以及出现三次的一个数字。由于出现两次的数字将异或为 0,因此累加器的最终值将等于出现 3 次的数字(即多出 1 个)。

Assume you have an accumulator

int accumulator = 0;

At each step of your loop, you XOR the accumulator with i and v, where i is the index of the loop iteration and v is the value in the ith position of the array.

accumulator ^= (i ^ v)

Normally, i and v will be the same number so you will end up doing

accumulator ^= (i ^ i)

But i ^ i == 0, so this will end up being a no-op and the value of the accumulator will be left untouched. At this point I should say that the order of the numbers in the array does not matter because XOR is commutative, so even if the array is shuffled to begin with the result at the end should still be 0 (the initial value of the accumulator).

Now what if a number occurs twice in the array? Obviously, this number will appear three times in the XORing (one for the index equal to the number, one for the normal appearance of the number, and one for the extra appearance). Furthermore, one of the other numbers will only appear once (only for its index).

This solution now proceeds to assume that the number that only appears once is equal to the last index of the array, or in other words: that the range of numbers in the array is contiguous and starting from the first index to be processed (edit: thanks to caf for this heads-up comment, this is what I had in mind really but I totally messed it up when writing). With this (N appears only once) as a given, consider that starting with

int accumulator = N;

effectively makes N again appear twice in the XORing. At this point, we are left with numbers that only appear exactly twice, and just the one number that appears three times. Since the twice-appearing numbers will XOR out to 0, the final value of the accumulator will be equal to the number that appears three times (i.e. one extra).

苏辞 2024-12-12 14:53:02

1 到 10,001 之间的每个数字都显示为数组索引。 (C 数组不是从 0 索引吗?好吧,只要我们对数组值和索引是否都从 0 开始还是都从 1 开始保持一致,就没有什么区别。我将选择从 0 开始的数组1,因为这就是问题似乎所说的。)

无论如何,是的,1 到 10,001 之间的每个数字都作为数组索引出现,恰好一次。 1 到 10,000 之间的每个数字也作为数组值恰好出现一次,但出现两次的重复值除外。因此,从数学上讲,我们总体上进行的计算如下:

1 xor 1 xor 2 xor 2 xor 3 xor 3 xor ... xor 10,000 xor 10,000 xor 10,001 xor D

其中 D 是重复值。当然,计算中的项可能不会按该顺序出现,但异或是可交换的,因此我们可以根据需要重新排列项。对于每个 n,n xor n 都是 0。因此,上面的内容简化为

10,001 xor D

与 10,001 进行异或,得到 D,即重复值。

Each number between 1 and 10,001 inclusive appears as an array index. (Aren't C arrays 0-indexed? Well, it doesn't make a difference provided we're consistent about whether the array values and indices both start at 0 or both start at 1. I'll go with the array starting at 1, since that's what the question seems to say.)

Anyway, yes, each number between 1 and 10,001 inclusive appears, precisely once, as an array index. Each number between 1 and 10,000 inclusive also appears as an array value precisely once, with the exception of the duplicated value which occurs twice. So mathematically, the calculation we're doing overall is the following:

1 xor 1 xor 2 xor 2 xor 3 xor 3 xor ... xor 10,000 xor 10,000 xor 10,001 xor D

where D is the duplicated value. Of course, the terms in the calculation probably don't appear in that order, but xor is commutative, so we can rearrange the terms however we like. And n xor n is 0 for each n. So the above simplifies to

10,001 xor D

xor this with 10,001 and you get D, the duplicated value.

巷雨优美回忆 2024-12-12 14:53:02

逻辑是您只需要存储累加器值,并且只需要遍历数组一次。这非常聪明。

当然,这在实践中是否是最好的方法取决于计算异或的工作量以及您的数组有多大。如果数组中的值是随机分布的,则使用不同的方法可能会更快,即使它使用更多内存,因为在检查整个数组之前很可能会发现重复值。

当然,如果数组一开始就排序,事情就会变得容易得多。因此,这在很大程度上取决于值在整个数组中的分布方式。

The logic is that you only have to store the accumulator value, and only need to go through the array once. That's pretty clever.

Of course, whether this is the best method in practice depends on how much work it is to calculate the exclusive or, and how large your array is. If the values in the array are randomly distributed, it may be quicker to use a different method, even if it uses more memory, as the duplicate value is likely to be found possibly long before you check the entire array.

Of course if the array is sorted to begin with, things are considerably easier. So it depends very much on how the values are distributed throughout the array.

热血少△年 2024-12-12 14:53:02

问题是:您是否有兴趣了解如何执行与现实世界无关的聪明但纯粹学术性的异或技巧,或者您想知道这一点,因为在现实世界中您可能会编写使用数组的程序?这个答案解决了后一种情况。

严肃的解决方案是遍历整个数组并按您的方式对其进行排序。排序时,请确保没有重复值,即实现抽象数据类型“set”。这可能需要分配第二个数组,并且排序将非常耗时。我不知道它是否比聪明的异或技巧更耗时。

但是,在现实世界中,n 个未排序值的数组对您有什么好处呢?如果它们未排序,我们必须假设它们的顺序在某种程度上很重要,因此可能必须保留原始数组。如果您想搜索原始数组或分析它的重复项、中值等,您确实需要它的排序版本。排序后,您可以使用“O log n”进行二分搜索。

The question is: are you interested in knowing how to do clever but purely academic xor tricks with little relevance to the real world, or do you want to know this because in the real world you may write programs that use arrays? This answer addresses the latter case.

The no-nonsense solution is to go through the whole array and sort it as you do. While you sort, make sure there are no duplicate values, ie implement the abstract data type "set". This will probably require a second array to be allocated and the sorting will be time consuming. Whether it is more or less time consuming than clever xor tricks, I don't know.

However, what good is an array of n unsorted values to you in the real world? If they are unsorted we have to assume that their order is important somehow, so the original array might have to be preserved. If you want to search through the original array or analyse it for duplicates, median value etc etc you really want a sorted version of it. Once you have it sorted you can binary search it with "O log n".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文