找出三个数字只出现过一次

发布于 2024-09-05 05:54:38 字数 596 浏览 3 评论 0原文

在长度为n的序列中,其中n=2k+3,即有k个唯一的数字出现了两次 三个数字只出现了一次。

问题是:如何找到只出现过一次的三个唯一数字?

例如,在序列 1 1 2 6 3 6 5 7 7 中,三个唯一数字是 2 3 5。

注意: 3<=n<1e6 ,数字范围从 1 到 2e9

内存限制: 1000KB ,这意味着我们无法存储整个序列。

我尝试过的方法(超出内存限制):

我初始化一棵树,当读入一个数字时,我尝试将其从树中删除,如果 删除返回 false(未找到),我将其添加到树中。最后,树有三个数字。它可以工作,但是超出了内存限制。

我知道如何使用位操作找到一两个这样的数字。所以我想知道

我们是否可以使用相同的方法(或类似的方法)找到三个?

查找一个/两个只出现过一次的数字的方法:

如果有一个数字只出现过一次,我们可以对序列进行异或来查找它。

如果有两个,我们可以先对序列进行异或,然后将序列分成2个 将结果 1 的一位分开,再次对这 2 个部分进行异或,我们就会找到答案。

In a sequence of length n, where n=2k+3, that is there are k unique numbers appeared twice
and three numbers appeared only once.

The question is: how to find the three unique numbers that appeared only once?

for example, in sequence 1 1 2 6 3 6 5 7 7 the three unique numbers are 2 3 5.

Note:
3<=n<1e6 and the number will range from 1 to 2e9

Memory limits: 1000KB , this implies that we can't store the whole sequence.

Method I have tried(Memory limit exceed):

I initialize a tree, and when read in one number I try to remove it from the tree, if the
remove returns false(not found), I add it to the tree. Finally, the tree has the three numbers. It works, but is Memory limit exceed.

I know how to find one or two such number(s) using bit manipulation. So I wonder if

we can find three using the same method(or some method similar)?

Method to find one/two number(s) appeared only once:

If there is one number appeared only once, we can apply XOR to the sequence to find it.

If there are two, we can first apply XOR to the sequence, then separate the sequence into 2
parts by one bit of the result that is 1, and again apply XOR to the 2 parts, and we will find the answer.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

谎言月老 2024-09-12 05:54:38

对于此问题的更一般版本(没有那些愚蠢的限制):

您可以在 O(n) 时间和 O(1) 空间内完成此操作,无需假设任何边界,或迭代 在所有位上,仅使用 O(1) 时间位操作技巧,例如适用于 2 个缺失数字的 XOR 技巧。

下面是仅查找其中一个数字的(伪)代码:

// Given an array arr with 2k+3 numbers, k of which are repeated twice
// and the remaining three are distinct: a,b,c.
// returns one of a,b,c.
int FindUnique(int []arr) {

    int s = 0; // This will ultimately hold a ^ b ^ c (bitwise XOR)

    for (int i = 0; i < arr.Length; i++) {
        s ^= arr[i];
    }

    int d = 0; // this holds diff(a,s) ^ diff(b,s) ^ diff(c,s)

    for (int i = 0; i < arr.Length; i++) {
        d ^= diff(arr[i],s);
    }

    int e = lowestBit(d); // This gives the position where one of a,b,c differs 
                          // from the others.

    int bucket1 = 0;
    int bucket2 = 0;

    for (int i = 0; i < arr.Length; i++) {
        if (arr[i] & e) {
            bucket1 ^= arr[i];
        } else {
            bucket2 ^= arr[i];
        }
    }

    int count1 = 0;
    int count2 = 0;

    for (int i = 0; i < arr.Length; i++) {
        if (arr[i] == bucket1) {
            count1++;
        }

        if (arr[i] == bucket2) {
            count2++;
        }
    }

    if (count1 == 1) return bucket1;

    return bucket2;
}

// return a number with the lowest bit of x ^ s set to 1 and rest 0.
// i.e. the lowest bit position where x and s differ.
int diff(int x, int s) {
    return lowestBit(x ^ s);
}

// Returns a number with only the lowest bit of y set.
int lowestBit(int y) {
    return y & ~(y-1);
}

想法如下:

假设出现一次的数字是 a、b、c。

现在对数组进行异或运算,得到 s = a XOR b XOR c。

由于数字不同,请注意 s 不能是 a、b 或 c(因为其他两个将相等),因此至少有一位(不一定在同一位置),其中 a、b 中的每一个,c 与 s 不同。

在两个数字的情况下,我们可以看到 s 不为零,并选择一个位来区分 a 和 s 。 b 并与之合作。

当我们有三个数字时,我们会遇到困难,但我们仍然可以找到一点来区分其中一个数字。

对于每个数字 x,找到与 s 不同的最低位。考虑一个二进制数,其中只有该位设置为 1,其余位设置为 0。将此数字称为 diff(x)。

现在,如果我们计算每个数字的 diff(x) 并将它们异或在一起,我们得到 d = diff(a) XOR diff(b) XOR diff(c)。

请注意,d 不能为零。

现在找到 d 的最低设置位。该位位置可用于存储 a、b、c 之一,因为并非所有 a、b、c 都可以在该位置具有相同的位:如果有,则 s 的该位是这些位的异或三个必须相同,但我们确保选择的 s 中的该位至少与 a、b、c 中的对应位之一不同。

因此,我们再次进行异或,对该位进行微分,并检查两个结果数字中哪一个在数组中恰好出现一次。一旦我们找到一个数字,我们就知道如何处理两个数字。

要查找差异,只需使用 bithack:x & ~(x-1),这是一个标准的位黑客,可以被认为是 O(1)(而不是 O(位数))。

For a more general version of this problem (without those silly limits):

You can do this in O(n) time and O(1) space without assuming any bounds, or iterating over all the bits, and using only O(1) time bit manipulation tricks like the XOR trick which worked for 2 missing numbers.

Here is (pseudo)code to find just one of the numbers:

// Given an array arr with 2k+3 numbers, k of which are repeated twice
// and the remaining three are distinct: a,b,c.
// returns one of a,b,c.
int FindUnique(int []arr) {

    int s = 0; // This will ultimately hold a ^ b ^ c (bitwise XOR)

    for (int i = 0; i < arr.Length; i++) {
        s ^= arr[i];
    }

    int d = 0; // this holds diff(a,s) ^ diff(b,s) ^ diff(c,s)

    for (int i = 0; i < arr.Length; i++) {
        d ^= diff(arr[i],s);
    }

    int e = lowestBit(d); // This gives the position where one of a,b,c differs 
                          // from the others.

    int bucket1 = 0;
    int bucket2 = 0;

    for (int i = 0; i < arr.Length; i++) {
        if (arr[i] & e) {
            bucket1 ^= arr[i];
        } else {
            bucket2 ^= arr[i];
        }
    }

    int count1 = 0;
    int count2 = 0;

    for (int i = 0; i < arr.Length; i++) {
        if (arr[i] == bucket1) {
            count1++;
        }

        if (arr[i] == bucket2) {
            count2++;
        }
    }

    if (count1 == 1) return bucket1;

    return bucket2;
}

// return a number with the lowest bit of x ^ s set to 1 and rest 0.
// i.e. the lowest bit position where x and s differ.
int diff(int x, int s) {
    return lowestBit(x ^ s);
}

// Returns a number with only the lowest bit of y set.
int lowestBit(int y) {
    return y & ~(y-1);
}

The idea is as follows:

Say the numbers which appear once are a,b,c.

Now run the XOR through the array to get s = a XOR b XOR c.

Since the numbers are distinct, notice that s cannot be either a or b or c (as the other two will be equal then), thus there is at least one bit (not necessarily at the same position), where each of a,b,c differs from s.

In the two number case, we could see that s is non-zero and pick a bit which differentiated a & b and work with that.

We run into difficulties with that when we have three numbers, but we can still find a bit to differentiate one of the numbers out.

For each number x, find the lowest bit which differs from s. Consider the binary number in which only that bit is set to one and the rest are zero. Call this number diff(x).

Now if we compute diff(x) for each number and XOR them together, we get d = diff(a) XOR diff(b) XOR diff(c).

Notice that d cannot be zero.

Now find the lowest set bit of d. This bit position can be used to bucket out one of a,b,c, as not all of a,b,c can have the same bit at that position: if they did, then that bit of s which is the XOR of those three must be the same, but we ensured that we picked that bit of s to differ from at least one of the corresponding bits in a,b,c.

So we XOR again, differentiating on this bit, and check which of the two resulting numbers appears exactly once in the array. Once we find one number, we know how to deal with two numbers.

To find the diff just use the bithack: x & ~(x-1), which is a standard bit-hack and can be considered O(1) (instead of O(number of bits)).

咽泪装欢 2024-09-12 05:54:38

您可以使用与一个和两个不同值的更简单情况类似的方式来执行此操作。

我们需要两个整数来表示数字的每一位(例如 32 位)。对于每个数字,如果该位为零,则将第一个整数与其进行异或。如果不是,则将第二个整数与它进行异或。

另外,记录一下在每个位置找到 1 或 0 的次数(我们只需要检查这是偶数还是奇数,因此保留一个布尔值)。

迭代之后,我们的整数对将是以下之一。这里的第一个数字代表偶数,第二个数字代表奇数。

0, a^b^c
a^b, c
a^c, b
b^c, a

对于每一对,检查偶数计数整数。如果它为零,那么我们就知道另一个整数是 a^b^c,因为我们的结果中没有两个是相等的。否则,我们会在奇数计数整数处找到一个值。

public static int[] find3(int[] list) {
    int[][] xors = new int[32][2];
    boolean[] counts = new boolean[32];
    for (int curr : list) {
        for (int i = 0; i < 32; i++) {
            xors[i][(curr & (1 << i)) >> i] ^= curr;
            counts[i] ^= ((curr & (1 << i)) == (1 << i));
        }
    }

    // this really shouldn't take so many lines
    int[] ret = new int[3];
    int found = 0;
    for (int i = 0; i < 32; i++) {
        int oddCount = xors[i][counts[i] ? 1 : 0];
        int evenCount = xors[i][counts[i] ? 0 : 1];
        if (evenCount != 0) { // avoid the 0, a^b^c case.
            if (found == 0) {
                ret[0] = oddCount;// a
                ret[2] = evenCount;// b^c for now
                found++;
            } else if (found == 1 && ret[0] != oddCount) {
                ret[1] = oddCount;// b
                ret[2] ^= oddCount;// (b^c)^b == c
                break;
            }
        }
    }
    return ret;
}

You can do this in a similar way to the simpler cases of one and two different values.

We need two integers for each bit of the numbers (e.g. 32 bits). For each number, if that bit is zero, XOR the first integer with it. If it isn't, XOR the second integer with it.

Also, keep count of how many times you find a 1 or 0 in each position (we only need to check if this is even or odd, so keep a boolean).

After iterating through, our pairs of integers will be one of the following. The first number here represents an even count, the second an odd.

0, a^b^c
a^b, c
a^c, b
b^c, a

For each pair, check the even count integer. If it is zero, then we know the other integer is a^b^c, since no two of our results will be equal. Otherwise, we've found a value at the odd count integer.

public static int[] find3(int[] list) {
    int[][] xors = new int[32][2];
    boolean[] counts = new boolean[32];
    for (int curr : list) {
        for (int i = 0; i < 32; i++) {
            xors[i][(curr & (1 << i)) >> i] ^= curr;
            counts[i] ^= ((curr & (1 << i)) == (1 << i));
        }
    }

    // this really shouldn't take so many lines
    int[] ret = new int[3];
    int found = 0;
    for (int i = 0; i < 32; i++) {
        int oddCount = xors[i][counts[i] ? 1 : 0];
        int evenCount = xors[i][counts[i] ? 0 : 1];
        if (evenCount != 0) { // avoid the 0, a^b^c case.
            if (found == 0) {
                ret[0] = oddCount;// a
                ret[2] = evenCount;// b^c for now
                found++;
            } else if (found == 1 && ret[0] != oddCount) {
                ret[1] = oddCount;// b
                ret[2] ^= oddCount;// (b^c)^b == c
                break;
            }
        }
    }
    return ret;
}
落花随流水 2024-09-12 05:54:38

这是一个经典的问题——实际上几周前我才被问到。为了解决这个问题,您需要计算可能出现的不同数字的数量,并分配那么多位。

例如,如果列表中的数字必须介于 1-20 之间,则分配 20 位 - 每个数字一个,并将每位初始化为 0。

然后遍历列表。每次看到一个数字时,翻转相应的位。

例如:对于 2 6 3 6 5 7 7 的示例列表,我们可以分配 7 位(对于 1 2 3 4 5 6 7)。然后,当我们遍历列表时,我们将执行以下操作:

  • 翻转第 2 位
  • 翻转第 6 位
  • 翻转第 3 位
  • 翻转第 6 位
  • 等等

然后,遍历列表后,您可以读取这些位以找到三个唯一的数字。它们都将由“1”位表示,其他数字将由 0 表示。

你将列表读了两遍,需要 2n 时间,即 O(n)。


编辑:有可能不会给出界限。那么,一种解决方案是首先简单地通读列表来确定自己的边界 - 那么它仍然是 O(n)。

然而,可能出现的一个问题是列表可能非常小,但一些数字非常大 - 实际上使范围太大。例如:

1, 99999999999999999, 1, 99999999999999999, 2, 3, 4

由于列表中存在大量数字,解决该问题将需要大量内存,因为即使数字很少,范围也很大,并且我们根据范围分配位。

然后可以使用哈希表调整解决方案以给出新的解决方案,如下所示(尽管我不确定鉴于问题的“仅位操作”规定是否允许这样做):

  1. L 表示原始列表,C 表示它的副本。
  2. C 中删除所有重复项(有多种方法可以有效地做到这一点)。
  3. 创建一个哈希表 H,并为 C 中的每个元素插入一个键/值对 <number,pos>>到 H 中,其中 numberC 中的当前元素,pos 是它在 C< 中的位置/代码>。因此,给定出现在 L 中的数字,我们现在可以使用 H 查找该数字在 C 中的位置。
  4. 分配一些等于C大小的位,并将这些位初始化为0。
  5. 遍历L。每次我们运行一个数字时,从 H 获取它的值,并翻转位列表中的该位。
  6. 遍历位列表 - 对于每个“1”位,从 C 获取该位置的数字 - 这是唯一的数字之一。

This is a classic question - I was actually just asked it a few weeks ago. To solve it, you take the number of possible distinct numbers that could appear, and allocate that many bits.

For example, if numbers in the list must be between 1-20, you allocate 20 bits - one for each number, and you initialize each bit as 0.

Then you traverse the list. Every time you see a number, flip the corresponding bit.

For example: With your example list of 2 6 3 6 5 7 7, we could allocate 7 bits (for 1 2 3 4 5 6 7). Then as we traverse the list, we would do the following:

  • flip 2nd bit
  • flip 6th bit
  • flip 3rd bit
  • flip 6th bit
  • etc

Then once you're done traversing the list, you can read through the bits to find the three unique numbers. They will all be represented by '1' bits, and the other numbers will be represented by 0s.

You read through the list twice, which takes 2n time, which is O(n).


Edit: It is possible that the bounds will not be given. One solution, then, is to simply read through the list first to determine the bounds yourself - then it's still O(n).

However one problem which could occur is that the list could be very small, but some very large numbers - effectively making the range too big. For example:

1, 99999999999999999, 1, 99999999999999999, 2, 3, 4

Solving that problem would require a lot of memory due to the large number present in the list, because even though there are very few numbers, the range is very large and we are allocating bits according to the range.

The solution could then be adjusted to give a new solution as follows using a hashtable (although I'm not sure if this is permitted given the problem's "bit manipulation only" stipulation):

  1. Let L denote the original list, and C denote a copy of it.
  2. Remove all duplicates from C (there are numerous ways of doing this efficiently).
  3. Create a hashtable H, and for each element in C, insert a key/value pair <number,pos> into H where number is the current element in C, and pos is its position in C. So, given a number that appears in L, we can now use H to find that number's position in C.
  4. Allocate a number of bits equal to the size of C, and initialize those bits to 0.
  5. Traverse L. Each time we run accross a number, get its value from H, and flip that bit in our bit list.
  6. Traverse the bit list - for each '1' bit, get the number from C which is at that position - that is one of the unique numbers.
究竟谁懂我的在乎 2024-09-12 05:54:38

如果概率解决方案就足够了,那么您可以使用 Bloom Filter

创建两个布隆过滤器。第一个 (A) 包含已找到至少一个的数字,第二个 (B) 包含已找到两次的数字。

伪代码:

A = empty
B = empty

foreach x in the list
  if x in A
    add x to B
  else
    add x to A

foreach x in the list
  if x in A
    if !(x in B)
      print x

如果您使用完整的 1000KB,那么出错的概率会低得离谱。

If a probabilistic solution will suffice then you could use a Bloom Filter.

Create two Bloom filters. The first (A) contains numbers that have been found at least one, and the second (B) contains numbers that have been found twice.

Pseudocode:

A = empty
B = empty

foreach x in the list
  if x in A
    add x to B
  else
    add x to A

foreach x in the list
  if x in A
    if !(x in B)
      print x

If you use the full 1000KB then the probability of error would be ridiculously low.

飞烟轻若梦 2024-09-12 05:54:38

当您添加更多唯一值时,问题会变得越来越困难,主要是因为您可以选择 A、B、C,使得 A xor B xor C = 0。检测值的子集是否具有相同的校验和会变得越来越困难因为它包含所有唯一值,或者因为它省略了与 0 异或的值。

您可以在常量空间和 O(n*k) 时间内执行 3 个值,其中 k 是最大整数中的位数。 (因此,对于典型情况而言,时间为 O(n):32 位整数。)

随着唯一值数量的增加,并且您继续需要恒定空间,了解时间界限是否在 N 中变得非线性,这将是很有趣的。

//Special check for 0, because otherwise we don't know A xor B xor C != A xor B
if items unique-contains 0 then
    return 0 ++ SubProblem2Unique(items - 0)
//Compute A xor B xor C
val x = fold xor items
//Try to find a split which separates A and B from C.
for i in 0..WORD_SIZE
    //see if the checksum splits
    val x1 = fold xor [e in items where e & (1<<i) == 0]
    val x2 = x xor x1
    if x1 == x or x2 == x then continue //ith bit was the same for A and B and C
    //C is either x1 or x2
    val C = if items unique-contains x1 then x1 else x2
    return C ++ SubProblem2Unique(items - C)

throw InvalidInput

The problem gets harder and harder as you add more unique values, mainly because you can choose A,B,C such that A xor B xor C = 0. It gets harder and harder to detect if a subset of the values has the same checksum because it contains all the unique values, or because it omitted values which happened to xor to 0.

You can do 3 values in constant space and O(n*k) time, where k is the number of bits in the largest integer. (So O(n) time for your typical case: 32-bit integers.)

It would be interesting to find out if the time bound becomes non-linear in N as the number of unique values increases and you continue to require constant space.

//Special check for 0, because otherwise we don't know A xor B xor C != A xor B
if items unique-contains 0 then
    return 0 ++ SubProblem2Unique(items - 0)
//Compute A xor B xor C
val x = fold xor items
//Try to find a split which separates A and B from C.
for i in 0..WORD_SIZE
    //see if the checksum splits
    val x1 = fold xor [e in items where e & (1<<i) == 0]
    val x2 = x xor x1
    if x1 == x or x2 == x then continue //ith bit was the same for A and B and C
    //C is either x1 or x2
    val C = if items unique-contains x1 then x1 else x2
    return C ++ SubProblem2Unique(items - C)

throw InvalidInput
日久见人心 2024-09-12 05:54:38

为什么不使用 aa 哈希集?
- 如果数字已经存在,则从哈希集中删除
- 如果数字不存在,则放入哈希集
最终的哈希集仅包含唯一的数字。
时间:O(n)
内存:o(k),其中 k 是不同元素的数量。

通过哈希集方法,该解决方案具有可扩展性,可用于确定任何给定序列中的任意数量的唯一元素。

Why not use a a hashset?
- If a number already exists, delete from hashset
- if a number does not exist, put into hashset
The final hashset contains only unique numbers.
Time: O(n)
Memory:o(k) where k is number of distinct elements.

With hashset approach, the solution is scalable and can be used to determine any number of unique elements in any given sequence.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文