判断n个整数数组中是否存在A+B=C

发布于 2024-10-14 17:27:23 字数 712 浏览 8 评论 0原文

这是我的一个朋友收到的作业（算法和数据结构课）的问题。他问我这件事。然而，我无法解决这个问题，这几天我一直在思考这个问题。

[0, 2³¹-1] 范围内有 n 个随机整数（可能有重复。判断这些数字中是否有 3 个数字满足 A + B = C。

我首先想到了一个简单的算法，即 O(n²log < em>n）。然后我想出了一个 O(n²) 的算法。这是伪代码：

sort(a); // non-descending
for (i = 0; i < n; i++) {
  j = i; k = i + 1;
  while (j < n && k < n) {
    if (a[i] + a[j] == a[k])
      return true;
    else if (a[i] + a[k] < a[j])
      k++;
    else
      j++;
  }
}
return false;

但是，问题表明 1 << n <= 10⁶。我认为 O(n²) 太慢了。我的解决方案没有利用随机性。但是，我不确定这是否是问题的重要部分。

原文

This is a problem a friend of mine received as his homework (in algorithm and data structure class). He asked me about this. However, I can't solve it and have been thinking about it for some time during the last several days.

There are n random integers in the range [0, 2³¹-1] (there may be duplicates. Determine if 3 numbers of these numbers satisfy A + B = C.

I first came up with a naive algorithm that's O(n²log n).
I then came up with an algorithm that's O(n²). Here is the pseudo code:

sort(a); // non-descending
for (i = 0; i < n; i++) {
  j = i; k = i + 1;
  while (j < n && k < n) {
    if (a[i] + a[j] == a[k])
      return true;
    else if (a[i] + a[k] < a[j])
      k++;
    else
      j++;
  }
}
return false;

However, the problem states that 1 < n <= 10⁶. I believe O(n²) is too slow. My solution doesn't make use of the randomness. However, I'm not sure if this is an important part of the problem.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风月客 2024-10-21 17:27:23

一般问题是3SUM-Hard以及是否存在的问题比二次算法更好的算法已经开放。

因此，如果您需要更快的算法，您可能需要利用它们是 32 位的事实。

回复收藏 0 原文

无人问我粥可暖 2024-10-21 17:27:23

如果数字是随机的，任何最坏情况的 O(n^2) 算法（包括您的算法）都会运行得非常快。事实上，实际复杂度将为O(n*logn)（排序的复杂度）。
这很像快速排序，平均 O(n*logn) 且达到 O(n^2) 的机会很小。

10^6 随机数为我们提供 ~ 10^6*10^6 在 ~ 0..10^9 范围内的“几乎随机”总和。这些 10^12 随机和之一等于整数范围内的给定随机值的可能性有多大？相当不错。
现在，这些 10^12 随机总和之一等于10^6给定随机值之一的机会有多大？ 100%，诗意地说。

我已经实现了您提出的解决方案，对于 n = 10^6 ，它在最内层循环中平均执行 5000-10000 操作。 O(n^2) 就这么多了。排序是其中成本最高的操作。

附言。如果您更新解决方案以使用散列而不是排序，则可以进一步降低复杂性，甚至使其O(1)。

PS 2.java测试程序，供参考。运行它并亲自看看。

    int n = 1000000;
    int[] a = new int[n];

    // generate random array
    Random r = new Random();
    for (int i = 0; i < n; ++i) {
        do {
            a[i] = r.nextInt();
        } while (a[i] < 0);
    }

    Arrays.sort(a);

    // number of operations inside main loop
    int ops = 0;

    // main logic, pretty much as OP described it
    boolean found = false;
    for (int i = 0; i < n && !found; ++i) {
        int j = i;
        int k = i + 1;
        while (k < n) {
            ++ops;

            if (a[i] > a[k] - a[j]) {
                ++k;
            } else if (a[i] < a[k] - a[j]) {
                ++j;
            } else {
                System.out.println(a[i] + " + " + a[j] + " = " + a[k]);
                found = true;
                break;
            }
        }
    }

    System.out.println(ops);

If numbers are random, any worst-case O(n^2) algorithm (including yours) will run very fast. In fact, the practical complexity will be O(n*logn) (the complexity of sorting).
It's much like quicksort, where we have O(n*logn) average and a tiny chance of hitting O(n^2).

10^6 random numbers give us ~ 10^6*10^6 'nearly random' sums in range ~ 0..10^9. What's the chance that one of those 10^12 random sums will be equal to a given random value in integer range? Pretty good.
Now, what's the chance that one of those 10^12 random sums will be equal to a one of 10^6 given random values? 100%, speaking poetically.

I've implemented your proposed solution, for n = 10^6 it performs on average 5000-10000 operations in the innermost loop. So much for O(n^2). Sorting is the costliest operation in there.

PS. You may reduce complexity farther and make it even O(1), if you update the solution to use hash instead of sorting.

PS 2. The test program in java, for the reference. Run it and see for yourself.

    int n = 1000000;
    int[] a = new int[n];

    // generate random array
    Random r = new Random();
    for (int i = 0; i < n; ++i) {
        do {
            a[i] = r.nextInt();
        } while (a[i] < 0);
    }

    Arrays.sort(a);

    // number of operations inside main loop
    int ops = 0;

    // main logic, pretty much as OP described it
    boolean found = false;
    for (int i = 0; i < n && !found; ++i) {
        int j = i;
        int k = i + 1;
        while (k < n) {
            ++ops;

            if (a[i] > a[k] - a[j]) {
                ++k;
            } else if (a[i] < a[k] - a[j]) {
                ++j;
            } else {
                System.out.println(a[i] + " + " + a[j] + " = " + a[k]);
                found = true;
                break;
            }
        }
    }

    System.out.println(ops);

回复收藏 0 原文

留一抹残留的笑 2024-10-21 17:27:23

使用哈希的算法在 Python 中需要 10-900 微秒（平均值：200 中位数：60）：

#!/usr/bin/env python
import random

L = frozenset(random.sample(xrange(2**31), 10**6))
print next(((a,b,a+b) for a in L for b in L if (a + b) in L), None)

它是 O(N**2) 但看起来它足够快。

为了进行比较，创建 frozenset 的分摊 O(N) 操作需要 270 毫秒（比这慢 1000 倍）搜索）并创建随机列表需要 0.9 秒。

注意：如果输入序列包含唯一元素，random.sample 不会返回重复元素，因此 frozenset 不会丢弃上例中的任何元素。为了解决允许重复元素的随机序列的问题，我们应该使用两种数据结构：

#!/usr/bin/env python
import random

L = [random.randrange(2**31) for _ in xrange(10**6)]
S = frozenset(L)
print len(L), len(S)
print next(((a, b, a+b) for a in L for b in L if (a + b) in S), None)

输出

1000000 999762
(2055933464, 83277289, 2139210753)

Algorithm that uses hashing takes 10-900 microseconds in Python (average: 200 median: 60):

#!/usr/bin/env python
import random

L = frozenset(random.sample(xrange(2**31), 10**6))
print next(((a,b,a+b) for a in L for b in L if (a + b) in L), None)

It is O(N**2) but it seems it is fast enough.

For comparison, the amortized O(N) operation of creating the frozenset takes 270 milliseconds (1000 times slower then the search) and to create random list it takes 0.9 seconds.

Note: random.sample doesn't return repeated elements if an input sequence contains unique elements therefore frozenset doesn't discard any elements in the above example. To solve the problem for a random sequence that allows repeated elements we should use two data structures:

#!/usr/bin/env python
import random

L = [random.randrange(2**31) for _ in xrange(10**6)]
S = frozenset(L)
print len(L), len(S)
print next(((a, b, a+b) for a in L for b in L if (a + b) in S), None)

Output

1000000 999762
(2055933464, 83277289, 2139210753)

回复收藏 0 原文

乄_柒ぐ汐 2024-10-21 17:27:23

在对排序列表进行测量时，我得到了 O(n log n)：

from bisect import bisect_right
import cProfile as prof
import random

def find3sum(T):
    if len(T) < 3:
        return None
    n = len(T)
    top = T[-1]
    for i in range(len(T)-1):
        b = top - T[i]
        if b < T[i]:
            return None
        k = bisect_right(T, b, i, n-1)
        while k > i:
            c = T[i] + T[k]
            j = bisect_right(T, c, k, n-1)
            if j <= k:
                break
            elif T[j] == c:
               return (i, k, j)
            else:
               k -= 1

def test_one(a):
    a = sorted(a)
    r = find3sum(a)
    i, k , j = r
    assert a[i] + a[k] == a[j]

def test():
    n = 100000
    max = 200000
    random.seed(0)
    for _ in range(100):
        a = [random.randint(0,max) for _x in xrange(n)]
        test_one(a)
        a = range(n)
        test_one(a)

prof.run('test()')

这些是结果（大约对每个元素的 bisect 进行一次调用）：

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002  183.764  183.764 <string>:1(<module>)
      200    0.005    0.000   89.996    0.450 find2sum.py:25(test_one)
        1   17.269   17.269  183.762  183.762 find2sum.py:31(test)
      200   35.096    0.175   79.601    0.398 find2sum.py:5(find3sum)
 10000000   44.958    0.000   52.398    0.000 random.py:160(randrange)
 10000000   23.891    0.000   76.289    0.000 random.py:224(randint)
        1    0.000    0.000    0.000    0.000 random.py:99(seed)
 19599982   44.077    0.000   44.077    0.000 {_bisect.bisect_right}
        1    0.000    0.000    0.000    0.000 {function seed at 0x9a1972c}
      600    0.001    0.000    0.001    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 10000000    7.440    0.000    7.440    0.000 {method 'random' of '_random.Random' objects}
      301    0.635    0.002    0.635    0.002 {range}
      200   10.390    0.052   10.390    0.052 {sorted}

有几种优化可以大大减少运行时间（例如跳过等于数字的运行）到已经测试过的那个）。

I'm getting O(n log n) when measuring this over sorted lists:

from bisect import bisect_right
import cProfile as prof
import random

def find3sum(T):
    if len(T) < 3:
        return None
    n = len(T)
    top = T[-1]
    for i in range(len(T)-1):
        b = top - T[i]
        if b < T[i]:
            return None
        k = bisect_right(T, b, i, n-1)
        while k > i:
            c = T[i] + T[k]
            j = bisect_right(T, c, k, n-1)
            if j <= k:
                break
            elif T[j] == c:
               return (i, k, j)
            else:
               k -= 1

def test_one(a):
    a = sorted(a)
    r = find3sum(a)
    i, k , j = r
    assert a[i] + a[k] == a[j]

def test():
    n = 100000
    max = 200000
    random.seed(0)
    for _ in range(100):
        a = [random.randint(0,max) for _x in xrange(n)]
        test_one(a)
        a = range(n)
        test_one(a)

prof.run('test()')

These are the results (about one call to bisect per element):

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002  183.764  183.764 <string>:1(<module>)
      200    0.005    0.000   89.996    0.450 find2sum.py:25(test_one)
        1   17.269   17.269  183.762  183.762 find2sum.py:31(test)
      200   35.096    0.175   79.601    0.398 find2sum.py:5(find3sum)
 10000000   44.958    0.000   52.398    0.000 random.py:160(randrange)
 10000000   23.891    0.000   76.289    0.000 random.py:224(randint)
        1    0.000    0.000    0.000    0.000 random.py:99(seed)
 19599982   44.077    0.000   44.077    0.000 {_bisect.bisect_right}
        1    0.000    0.000    0.000    0.000 {function seed at 0x9a1972c}
      600    0.001    0.000    0.001    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 10000000    7.440    0.000    7.440    0.000 {method 'random' of '_random.Random' objects}
      301    0.635    0.002    0.635    0.002 {range}
      200   10.390    0.052   10.390    0.052 {sorted}

There are several optimizations that can considerably reduce the running time (like skipping over runs of numbers equal to the one already tested).

回复收藏 0 原文

恰似旧人归 2024-10-21 17:27:23

A+B=C，因此
B=CA 或 A=CB

上述问题可以通过使用哈希表以 O(n) 复杂度完成。

var C; // the sum you are looking for
for(each element)
    X = C - element
    boolean exists = lookup for X in hash table
    if (exists) combination A+B=C exists in the given input
    else hashtable.put(element)

希望有帮助。

A+B=C, hence
B=C-A or A=C-B

The above problem can be done in O(n) complexity by using hash table.

var C; // the sum you are looking for
for(each element)
    X = C - element
    boolean exists = lookup for X in hash table
    if (exists) combination A+B=C exists in the given input
    else hashtable.put(element)

Hope that helps.

回复收藏 0 原文

~没有更多了~