给定一个大小为n+ 1的整数,由元素[1,n]组成。所有元素都是唯一的,除了重复的k时
我一直在尝试解决以下问题:
给您一个n+1
整数的数组,其中所有元素都位于[1,n]
中。您还可以认为,其中一个要素被复制了一定次,而其他元素则是不同的。开发一种算法以找到重复的数字及其重复的次数。
这是我的解决方案,我让k =重复数:
struct LatticePoint{ // to hold duplicate and k
int a;
int b;
LatticePoint(int a_, int b_) : a(a_), b(b_) {}
}
LatticePoint findDuplicateAndK(const std::vector<int>& A){
int n = A.size() - 1;
std::vector<int> Numbers (n);
for(int i = 0; i < n + 1; ++i){
++Numbers[A[i] - 1]; // A[i] in range [1,n] so no out-of-access
}
int i = 0;
while(i < n){
if(Numbers[i] > 1) {
int duplicate = i + 1;
int k = Numbers[i] - 1;
LatticePoint result{duplicate, k};
return LatticePoint;
}
因此,基本思想是:我们沿着数组进行,每次我们看到数字a [i]
时,我们都会增加<的值代码>数字[A [i]] 。由于只有副本出现不止一次,因此值大于1的数字输入的索引必须是重复的数字,该数字具有条目的值重复数-1。o(n)
o(n)<的算法/代码>在时间复杂性中,
o(n)
在太空中。
我想知道某人是否有时间和/或空间更好的解决方案? (或者,如果我的解决方案中有任何错误...)
I have been attempting to solve the following problem:
You are given an array of n+1
integers where all the elements lies in [1,n]
. You are also given that one of the elements is duplicated a certain number of times, whilst the others are distinct. Develop an algorithm to find both the duplicated number and the number of times it is duplicated.
Here is my solution where I let k = number of duplications:
struct LatticePoint{ // to hold duplicate and k
int a;
int b;
LatticePoint(int a_, int b_) : a(a_), b(b_) {}
}
LatticePoint findDuplicateAndK(const std::vector<int>& A){
int n = A.size() - 1;
std::vector<int> Numbers (n);
for(int i = 0; i < n + 1; ++i){
++Numbers[A[i] - 1]; // A[i] in range [1,n] so no out-of-access
}
int i = 0;
while(i < n){
if(Numbers[i] > 1) {
int duplicate = i + 1;
int k = Numbers[i] - 1;
LatticePoint result{duplicate, k};
return LatticePoint;
}
So, the basic idea is this: we go along the array and each time we see the number A[i]
we increment the value of Numbers[A[i]]
. Since only the duplicate appears more than once, the index of the entry of Numbers with value greater than 1 must be the duplicate number with the value of the entry the number of duplications - 1. This algorithm of O(n)
in time complexity and O(n)
in space.
I was wondering if someone had a solution that is better in time and/or space? (or indeed if there are any errors in my solution...)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以将划痕空间缩小到
n
位,而不是n
int
s,只要您有或愿意编写bitset
带有运行时指定的大小(请参阅boost :: Dynamic_bitset
)。您不需要收集重复计数,直到您知道哪个元素已重复,然后您只需要保持计数即可。因此,您需要跟踪的只是以前是否已经看到该值(因此,
n
位)。找到重复的值后,将计数
设置为2,然后在矢量的其余部分运行,每次击中该值的实例时,请增加count
。 (您初始化计数
至2,因为到达那里时,您将完全看到其中两个。)这仍然是O(n)空间,但是恒定因素要小得多。
You can reduce the scratch space to
n
bits instead ofn
int
s, provided you either have or are willing to write abitset
with run-time specified size (seeboost::dynamic_bitset
).You don't need to collect duplicate counts until you know which element is duplicated, and then you only need to keep that count. So all you need to track is whether you have previously seen the value (hence,
n
bits). Once you find the duplicated value, setcount
to 2 and run through the rest of the vector, incrementingcount
each time you hit an instance of the value. (You initialisecount
to 2, since by the time you get there, you will have seen exactly two of them.)That's still O(n) space, but the constant factor is a lot smaller.
代码的想法有效。
但是,由于
n+1
元素,我们可以实现其他时间和空间的权衡。如果我们有一定数量的存储桶,我们将数字划分在之间,将
n+1
数字放置在意味着某些水桶必须比预期的要多。这是众所周知的 pigeOnhole原理。因此,我们使用2个存储桶,一个用于范围
1..floor(n/2)
,一个用于floor(n/2)+1..n..n
。通过阵列,我们知道答案是哪一半。然后,我们将一半的一半分成一半,再过一遍,依此类推。这将导致二进制搜索,该搜索将以o(1)
数据获得答案,而则带有CEIL(log_2(n))
通过,每种时间都花费时间o o (n)
。因此,我们在时间上得到答案o(n log(n))
。现在,我们不需要使用2桶。如果使用3,我们将使用
ceil(log_3(n))
通过。因此,随着我们增加了固定的存储桶数量,我们会占用更多空间并节省时间。还有其他权衡吗?好吧,您展示了如何使用
n
存储桶中的1通过。您需要在2次通过中进行几个水桶?答案至少是sqrt(n)
bucekts。以及三块根可能可以通过3次通过。等等。因此,您将获得一个整个折衷家族的家庭,那里的水桶越多,所需的空间就越多,但是通行证就越少。而您的解决方案仅在极端,花费最多,时间最少。
The idea of your code works.
But, thanks to the
n+1
elements, we can achieve other tradeoffs of time and space.If we have some number of buckets we're dividing numbers between, putting
n+1
numbers in means that some bucket has to wind up with more than expected. This is a variant on the well-known pigeonhole principle.So we use 2 buckets, one for the range
1..floor(n/2)
and one forfloor(n/2)+1..n
. After one pass through the array, we know which half the answer is in. We then divide that half into halves, make another pass, and so on. This leads to a binary search which will get the answer withO(1)
data, andwith ceil(log_2(n))
passes, each taking timeO(n)
. Therefore we get the answer in timeO(n log(n))
.Now we don't need to use 2 buckets. If we used 3, we'd take
ceil(log_3(n))
passes. So as we increased the fixed number of buckets, we take more space and save time. Are there other tradeoffs?Well you showed how to do it in 1 pass with
n
buckets. How many buckets do you need to do it in 2 passes? The answer turns out to be at leastsqrt(n)
bucekts. And 3 passes is possible with the cube root. And so on.So you get a whole family of tradeoffs where the more buckets you have, the more space you need, but the fewer passes. And your solution is merely at the extreme end, taking the most spaces and the least time.
这是一种Cheekier算法,它仅需要恒定空间,但可以重新安排输入向量。 (它只能重新定位;所有原始元素仍然存在。)
仍然是O(n)时间,尽管这可能不是完全明显的。
这个想法是尝试重新排列数组,以便[i]是我,直到找到重复。当我们尝试将元素放在正确的索引上时,副本将显示出来,事实证明该索引已经包含该元素。这样,我们找到了副本;我们有一个要移至[J]的价值,但是相同的值已经在[J]处。然后,我们浏览其余的数组,每次找到另一个实例时都会增加计数。
Here's a cheekier algorithm, which requires only constant space but rearranges the input vector. (It only reorders; all the original elements are still present at the end.)
It's still O(n) time, although that might not be completely obvious.
The idea is to try to rearrange the array so that A[i] is i, until we find the duplicate. The duplicate will show up when we try to put an element at the right index and it turns out that that index already holds that element. With that, we've found the duplicate; we have a value we want to move to A[j] but the same value is already at A[j]. We then scan through the rest of the array, incrementing the count every time we find another instance.
具有O(1)复杂性的平行溶液是可能的。
引入一系列原子布尔和两个原子整数,称为重复和计数。首先计数为1。然后在数字的索引位置并行访问数组,并在布尔值上执行测试和集合操作。如果已经设置了布尔值,请将数字分配给重复和增量计数。
该解决方案可能并不总是比建议的顺序替代方案更好。当然,如果所有数字都是重复的,那么不是。尽管如此,理论上它仍然具有恒定的复杂性。或重复数量的线性复杂性。我不太确定。但是,当使用许多内核时,它应该表现良好,尤其是在测试和增量操作不含锁定的情况下。
A parallel solution with O(1) complexity is possible.
Introduce an array of atomic booleans and two atomic integers called duplicate and count. First set count to 1. Then access the array in parallel at the index positions of the numbers and perform a test-and-set operation on the boolean. If a boolean is set already, assign the number to duplicate and increment count.
This solution may not always perform better than the suggested sequential alternatives. Certainly not if all numbers are duplicates. Still, it has constant complexity in theory. Or maybe linear complexity in the number of duplicates. I am not quite sure. However, it should perform well when using many cores and especially if the test-and-set and increment operations are lock-free.