生成前 k 个值

发布于 2024-10-31 02:11:07 字数 437 浏览 4 评论 0原文

我有一个问题，我想确定我是否做得最有效。我有一个大小为 N 的浮点值数组 A。这些值都在 0 和 1 之间。

我必须找到前 k 个值，它可以是 A 中最多三个数字的乘积。因此，前 k 个列表可以有来自 A 的单个数字、来自 A 的两个数字的乘积或三个数字的乘积。

所以，这就是我现在正在做的事情。我可以在 O(Nlogk) 时间内按降序排列前 k 个数字。然后我创建一个 max-heap 并用最大大小 3 的最佳值初始化它，即如果我将 k 值的排序数组（降序）表示为 B 以及该数组中索引的数字，我插入索引 (0)、(0,1) 和 (0,1,2) 处的数字。接下来，我对堆执行提取并每当我提取大小 z （z 数字的乘积）值时，我会将其替换为下一个可能的大小 z 数字集，即如果假设提取了(2,4)，我可以用(3,4)和(2,5)替换它。并提取k次以获得结果。

如果有的话，需要更好的想法。谢谢大家。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

流星番茄 2024-11-07 02:11:07

如果我理解正确的话，你需要找到 k 个最高的数字，这些数字可以通过将列表中的 1、2 或 3 个元素相乘而产生，并且所有值都是 0 到 1 之间的浮点数。

很明显，你只需要考虑列表中 k 个最大的数字。其余的可以直接丢弃。您可以使用 O(n log k) 算法来获取它们，再次按排序顺序（我假设您的列表没有预先排序）。为了简化问题，您现在可以取它们的对数并尝试最大化数字的总和，而不是最大化乘积的原始问题。这可能会加快一点。

现在（考虑对数表示），所有数字都是负数，因此将更多数字加在一起只会产生越来越多的负数。

我们将前 k 个最大的数字称为 A1...Ak。现在我们可以进一步简化问题，假设还存在数字 A0，其在对数表示中的值为 0，在原始表示中的值为 1；那么问题是枚举前 k 个三元组（{A0,...,Ak} 中的 x,y,z），约束条件为 x ≥ y ≥ z 并且 z < A0。让我们用 [i,j,n] 表示 3 元组，用 S[i,j,n] 表示该元组中的元素之和。要报告的第一个元素显然是 [0,0,1]，即，它在原始问题表述中对应于列表中的单例 #1 值。

我们使用原始公式中的最大堆；我们将三元组推入堆，使用它们的和 (S[...]) 作为排序键。该算法首先将 [0,0,0] 推入堆。那么：

answer = []
for m in 0 .. k:
  top = heap.pop()
  answer.append(sum(top))
  (i,j,n) = top # explode the tuple
  if (n < k - 1):
      heap.push((i,j,n+1))
  if (j == n):
      heap.push((i,j+1,j+1))
      if (i == j):
          heap.push((i+1,i+1,i+1))

最后，answer包含k + 1个元素，其中第一个元素是[0,0,0]，必须被丢弃。

设为-1、-3、-8、-9。那么算法是这样进行的：

Heap
Top          Rest (shown in order)

[ 0, 0, 0] | 
[ 0, 0,-1] | [ 0,-1,-1] [-1,-1,-1]
[ 0,-1,-1] | [-1,-1,-1] [ 0,-1,-3] [ 0,-3,-3]
[-1,-1,-1] | [-1,-1,-2] [ 0,-1,-3] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[-1,-1,-2] | [ 0,-1,-3] [-1,-1,-3] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[ 0,-1,-3] | [-1,-1,-3] [ 0,-1,-4] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[-1,-1,-3] | [ 0,-1,-4] [-1,-1,-4] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[ 0,-1,-4] | [-1,-2,-2] [-1,-1,-4] [ 0,-1,-5] [-2,-2,-2] [ 0,-3,-3]
...
etc.

这个算法的好处是它不会枚举重复项，并且堆大小为 O(k)；要了解原因，请观察算法在每次迭代时添加堆上的最大元素（通常更少），因此在 k 次迭代之后，堆中的元素不能超过 2k。

这给出了运行时间 O(n log k + k log k) = O((n + k) log k)。

if I understand you correctly you need to find k highest numbers that can be produced by multiplying together 1, 2 or 3 elements from your list, and all the values are floating point numbers between 0 and 1.

It is clear that you only need to consider the k highest numbers from the list. The rest can be discarded straight away. You can use your O(n log k) algorithm to get them, again in sorted order (I assume your list isn't preordered). To simplify the problem, you can now take their logarithms and try to maximize the sums of the numbers instead of the original problem of maximizing the products. This might speed up little.

Now (considering the logarithmic presentation), all your numbers are negative, so adding more of them together will just create more and more negative numbers.

Let's call the k highest numbers A1...Ak. We can reduce the problem further now assuming that there exists also number A0, that has the value 0 in the log representation and 1 in the original representation; then the problem is to enumerate the first k 3-tuples (x,y,z in {A0,...,Ak}) with the constraint that x ≥ y ≥ z and that z < A0. Let's denote 3-tuple by [i,j,n] and the sum of the elements in this tuple by S[i,j,n]. The first element to be reported is obviously [0,0,1], i.e. , which corresponds in the original problem formulation to the singleton #1 value on the list.

We use a max-heap as in the original formulation; we push the triples to the heap, using their sums (S[...]) as the ordering key. The algorithm starts by pushing [0,0,0] to the heap. Then:

answer = []
for m in 0 .. k:
  top = heap.pop()
  answer.append(sum(top))
  (i,j,n) = top # explode the tuple
  if (n < k - 1):
      heap.push((i,j,n+1))
  if (j == n):
      heap.push((i,j+1,j+1))
      if (i == j):
          heap.push((i+1,i+1,i+1))

At the end, answer contains k + 1 elements, the first one of them is [0,0,0] which must be discarded.

Let be given as -1, -3, -8, -9. Then the algorithm proceeds like this:

Heap
Top          Rest (shown in order)

[ 0, 0, 0] | 
[ 0, 0,-1] | [ 0,-1,-1] [-1,-1,-1]
[ 0,-1,-1] | [-1,-1,-1] [ 0,-1,-3] [ 0,-3,-3]
[-1,-1,-1] | [-1,-1,-2] [ 0,-1,-3] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[-1,-1,-2] | [ 0,-1,-3] [-1,-1,-3] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[ 0,-1,-3] | [-1,-1,-3] [ 0,-1,-4] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[-1,-1,-3] | [ 0,-1,-4] [-1,-1,-4] [-1,-2,-2] [-2,-2,-2] [ 0,-3,-3]
[ 0,-1,-4] | [-1,-2,-2] [-1,-1,-4] [ 0,-1,-5] [-2,-2,-2] [ 0,-3,-3]
...
etc.

The nice thing about this algorithm is that it doesn't enumerate duplicates and the heap size is O(k); to see why, observe that the algorithm adds on every iteration the maximum of elements on the heap (often less), so after k iterations there cannot be more than 2k elements in the heap.

This gives then running time O(n log k + k log k) = O((n + k) log k).

回复收藏 0 原文

失去的东西太少 2024-11-07 02:11:07

我当然看到你可以进行优化。

Let M be the highest number from A.
Let M2 be M * M.
Let setMM2 consist of all x from A such that M2 < x < M
If size(setMM2) >= k, 
    then your top-k consist of the highest k elements.
Else
    all x in setMM2 are in your top-k and your search becomes smaller

您可以使用 max(secondHighestNumber^2,M^3) 重复此方法并推广该算法。

I certainly see an optimization you could make.

Let M be the highest number from A.
Let M2 be M * M.
Let setMM2 consist of all x from A such that M2 < x < M
If size(setMM2) >= k, 
    then your top-k consist of the highest k elements.
Else
    all x in setMM2 are in your top-k and your search becomes smaller

You can repeat this method with max(secondHighestNumber^2,M^3) and generalize the algorithm.

回复收藏 0 原文