给定一个大小为n+ 1的整数,由元素[1,n]组成。所有元素都是唯一的,除了重复的k时

发布于 2025-02-04 09:43:44 字数 1015 浏览 6 评论 0原文

我一直在尝试解决以下问题:

给您一个n+1整数的数组,其中所有元素都位于[1,n]中。您还可以认为,其中一个要素被复制了一定次,而其他元素则是不同的。开发一种算法以找到重复的数字及其重复的次数。

这是我的解决方案,我让k =重复数:

struct LatticePoint{ // to hold duplicate and k
   int a;
   int b;
   LatticePoint(int a_, int b_) : a(a_), b(b_) {}
   }
   
LatticePoint findDuplicateAndK(const std::vector<int>& A){

int n = A.size() - 1;
std::vector<int> Numbers (n);

for(int i = 0; i < n + 1; ++i){
   ++Numbers[A[i] - 1]; // A[i] in range [1,n] so no out-of-access
   }
   
int i = 0;
while(i < n){
   if(Numbers[i] > 1) {
      int duplicate = i + 1;
      int k = Numbers[i] - 1;
      LatticePoint result{duplicate, k};
      return LatticePoint;
}

因此,基本思想是:我们沿着数组进行,每次我们看到数字a [i]时,我们都会增加<的值代码>数字[A [i]] 。由于只有副本出现不止一次,因此值大于1的数字输入的索引必须是重复的数字,该数字具有条目的值重复数-1。o(n)o(n)<的算法/代码>在时间复杂性中,o(n)在太空中。

我想知道某人是否有时间和/或空间更好的解决方案? (或者,如果我的解决方案中有任何错误...)

I have been attempting to solve the following problem:

You are given an array of n+1 integers where all the elements lies in [1,n]. You are also given that one of the elements is duplicated a certain number of times, whilst the others are distinct. Develop an algorithm to find both the duplicated number and the number of times it is duplicated.

Here is my solution where I let k = number of duplications:

struct LatticePoint{ // to hold duplicate and k
   int a;
   int b;
   LatticePoint(int a_, int b_) : a(a_), b(b_) {}
   }
   
LatticePoint findDuplicateAndK(const std::vector<int>& A){

int n = A.size() - 1;
std::vector<int> Numbers (n);

for(int i = 0; i < n + 1; ++i){
   ++Numbers[A[i] - 1]; // A[i] in range [1,n] so no out-of-access
   }
   
int i = 0;
while(i < n){
   if(Numbers[i] > 1) {
      int duplicate = i + 1;
      int k = Numbers[i] - 1;
      LatticePoint result{duplicate, k};
      return LatticePoint;
}

So, the basic idea is this: we go along the array and each time we see the number A[i] we increment the value of Numbers[A[i]]. Since only the duplicate appears more than once, the index of the entry of Numbers with value greater than 1 must be the duplicate number with the value of the entry the number of duplications - 1. This algorithm of O(n) in time complexity and O(n) in space.

I was wondering if someone had a solution that is better in time and/or space? (or indeed if there are any errors in my solution...)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

倾城月光淡如水﹏ 2025-02-11 09:43:44

您可以将划痕空间缩小到n位,而不是n int s,只要您有或愿意编写bitset带有运行时指定的大小(请参阅boost :: Dynamic_bitset)。

您不需要收集重复计数,直到您知道哪个元素已重复,然后您只需要保持计数即可。因此,您需要跟踪的只是以前是否已经看到该值(因此,n位)。找到重复的值后,将计数设置为2,然后在矢量的其余部分运行,每次击中该值的实例时,请增加count。 (您初始化计数至2,因为到达那里时,您将完全看到其中两个。)

这仍然是O(n)空间,但是恒定因素要小得多。

You can reduce the scratch space to n bits instead of n ints, provided you either have or are willing to write a bitset with run-time specified size (see boost::dynamic_bitset).

You don't need to collect duplicate counts until you know which element is duplicated, and then you only need to keep that count. So all you need to track is whether you have previously seen the value (hence, n bits). Once you find the duplicated value, set count to 2 and run through the rest of the vector, incrementing count each time you hit an instance of the value. (You initialise count to 2, since by the time you get there, you will have seen exactly two of them.)

That's still O(n) space, but the constant factor is a lot smaller.

生生漫 2025-02-11 09:43:44

代码的想法有效。

但是,由于n+1元素,我们可以实现其他时间和空间的权衡。

如果我们有一定数量的存储桶,我们将数字划分在之间,将n+1数字放置在意味着某些水桶必须比预期的要多。这是众所周知的 pigeOnhole原理

因此,我们使用2个存储桶,一个用于范围1..floor(n/2),一个用于floor(n/2)+1..n..n。通过阵列,我们知道答案是哪一半。然后,我们将一半的一半分成一半,再过一遍,依此类推。这将导致二进制搜索,该搜索将以o(1)数据获得答案,而则带有CEIL(log_2(n))通过,每种时间都花费时间o o (n)。因此,我们在时间上得到答案o(n log(n))

现在,我们不需要使用2桶。如果使用3,我们将使用ceil(log_3(n))通过。因此,随着我们增加了固定的存储桶数量,我们会占用更多空间并节省时间。还有其他权衡吗?

好吧,您展示了如何使用n存储桶中的1通过。您需要在2次通过中进行几个水桶?答案至少是sqrt(n) bucekts。以及三块根可能可以通过3次通过。等等。

因此,您将获得一个整个折衷家族的家庭,那里的水桶越多,所需的空间就越多,但是通行证就越少。而您的解决方案仅在极端,花费最多,时间最少。

The idea of your code works.

But, thanks to the n+1 elements, we can achieve other tradeoffs of time and space.

If we have some number of buckets we're dividing numbers between, putting n+1 numbers in means that some bucket has to wind up with more than expected. This is a variant on the well-known pigeonhole principle.

So we use 2 buckets, one for the range 1..floor(n/2) and one for floor(n/2)+1..n. After one pass through the array, we know which half the answer is in. We then divide that half into halves, make another pass, and so on. This leads to a binary search which will get the answer with O(1) data, and with ceil(log_2(n)) passes, each taking time O(n). Therefore we get the answer in time O(n log(n)).

Now we don't need to use 2 buckets. If we used 3, we'd take ceil(log_3(n)) passes. So as we increased the fixed number of buckets, we take more space and save time. Are there other tradeoffs?

Well you showed how to do it in 1 pass with n buckets. How many buckets do you need to do it in 2 passes? The answer turns out to be at least sqrt(n) bucekts. And 3 passes is possible with the cube root. And so on.

So you get a whole family of tradeoffs where the more buckets you have, the more space you need, but the fewer passes. And your solution is merely at the extreme end, taking the most spaces and the least time.

一花一树开 2025-02-11 09:43:44

这是一种Cheekier算法,它仅需要恒定空间,但可以重新安排输入向量。 (它只能重新定位;所有原始元素仍然存在。)

仍然是O(n)时间,尽管这可能不是完全明显的。
这个想法是尝试重新排列数组,以便[i]是我,直到找到重复。当我们尝试将元素放在正确的索引上时,副本将显示出来,事实证明该索引已经包含该元素。这样,我们找到了副本;我们有一个要移至[J]的价值,但是相同的值已经在[J]处。然后,我们浏览其余的数组,每次找到另一个实例时都会增加计数。

#include <utility>
#include <vector>
std::pair<int, int> count_dup(std::vector<int> A) {
  /* Try to put each element in its "home" position (that is,
   * where the value is the same as the index). Since the
   * values start at 1, A[0] isn't home to anyone, so we start
   * the loop at 1.
   */
  int n = A.size();
  for (int i = 1; i < n; ++i) {
    while (A[i] != i) {
      int j = A[i];
      if (A[j] == j) {
        /* j is the duplicate. Now we need to count them.
         * We have one at i. There's one at j, too, but we only
         * need to add it if we're not going to run into it in
         * the scan. And there might be one at position 0. After that,
         * we just scan through the rest of the array.
         */
        int count = 1;
        if (A[0] == j) ++count;
        if (j < i) ++count;
        for (++i; i < n; ++i) {
          if (A[i] == j) ++count;
        }
        return std::make_pair(j, count);
      }
      /* This swap can only happen once per element. */
      std::swap(A[i], A[j]);
    }
  }
  /* If we get here, every element from 1 to n is at home.
   * So the duplicate must be A[0], and the duplicate count
   * must be 2.
   */
  return std::make_pair(A[0], 2);
}

Here's a cheekier algorithm, which requires only constant space but rearranges the input vector. (It only reorders; all the original elements are still present at the end.)

It's still O(n) time, although that might not be completely obvious.
The idea is to try to rearrange the array so that A[i] is i, until we find the duplicate. The duplicate will show up when we try to put an element at the right index and it turns out that that index already holds that element. With that, we've found the duplicate; we have a value we want to move to A[j] but the same value is already at A[j]. We then scan through the rest of the array, incrementing the count every time we find another instance.

#include <utility>
#include <vector>
std::pair<int, int> count_dup(std::vector<int> A) {
  /* Try to put each element in its "home" position (that is,
   * where the value is the same as the index). Since the
   * values start at 1, A[0] isn't home to anyone, so we start
   * the loop at 1.
   */
  int n = A.size();
  for (int i = 1; i < n; ++i) {
    while (A[i] != i) {
      int j = A[i];
      if (A[j] == j) {
        /* j is the duplicate. Now we need to count them.
         * We have one at i. There's one at j, too, but we only
         * need to add it if we're not going to run into it in
         * the scan. And there might be one at position 0. After that,
         * we just scan through the rest of the array.
         */
        int count = 1;
        if (A[0] == j) ++count;
        if (j < i) ++count;
        for (++i; i < n; ++i) {
          if (A[i] == j) ++count;
        }
        return std::make_pair(j, count);
      }
      /* This swap can only happen once per element. */
      std::swap(A[i], A[j]);
    }
  }
  /* If we get here, every element from 1 to n is at home.
   * So the duplicate must be A[0], and the duplicate count
   * must be 2.
   */
  return std::make_pair(A[0], 2);
}
做个少女永远怀春 2025-02-11 09:43:44

具有O(1)复杂性的平行溶液是可能的。

引入一系列原子布尔和两个原子整数,称为重复和计数。首先计数为1。然后在数字的索引位置并行访问数组,并在布尔值上执行测试和集合操作。如果已经设置了布尔值,请将数字分配给重复和增量计数。

该解决方案可能并不总是比建议的顺序替代方案更好。当然,如果所有数字都是重复的,那么不是。尽管如此,理论上它仍然具有恒定的复杂性。或重复数量的线性复杂性。我不太确定。但是,当使用许多内核时,它应该表现良好,尤其是在测试和增量操作不含锁定的情况下。

A parallel solution with O(1) complexity is possible.

Introduce an array of atomic booleans and two atomic integers called duplicate and count. First set count to 1. Then access the array in parallel at the index positions of the numbers and perform a test-and-set operation on the boolean. If a boolean is set already, assign the number to duplicate and increment count.

This solution may not always perform better than the suggested sequential alternatives. Certainly not if all numbers are duplicates. Still, it has constant complexity in theory. Or maybe linear complexity in the number of duplicates. I am not quite sure. However, it should perform well when using many cores and especially if the test-and-set and increment operations are lock-free.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文