生成前 k 个值
我有一个问题,我想确定我是否做得最有效。我有一个大小为 N 的浮点值数组 A。这些值都在 0 和 1 之间。
我必须找到前 k 个值,它可以是 A 中最多三个数字的乘积。因此,前 k 个列表可以 有来自 A 的单个数字、来自 A 的两个数字的乘积或三个数字的乘积。
所以,这就是我现在正在做的事情。我可以在 O(Nlogk) 时间内按降序排列前 k 个数字。然后我创建一个 max-heap 并用最大大小 3 的最佳值初始化它,即如果我将 k 值的排序数组(降序)表示为 B 以及该数组中索引的数字,我插入索引 (0)、(0,1) 和 (0,1,2) 处的数字。接下来,我对堆执行提取并 每当我提取大小 z (z 数字的乘积)值时,我会将其替换为下一个可能的大小 z 数字集,即 如果假设提取了(2,4),我可以用(3,4)和(2,5)替换它。并提取k次以获得结果。
如果有的话,需要更好的想法。 谢谢大家。
I have a problem and I want to make sure if I am doing it most efficiently. I have an array A of float values of size N. The values are all between 0 and 1.
I have to find top k values which can be a product of a maximum of three numbers from A. So, the top-k list can
have individual numbers from A, product of two numbers or product of three numbers from A.
So, this is how I am doing it now. I can get top-k numbers in desecding order in O(Nlogk) time. I then create a
max-heap and initialize it with best values of maximum size 3 i.e. if I represent the sorted array(descending) of k values as B
and the numbers by its index in that array, I insert numbers which are at index (0), (0,1) and (0,1,2). Next, I perform extract on heap and
whenever I extract a size z (product of z numbers) value, I replace it with the set of next possible size z numbers i.e.
if suppose (2,4) is extracted, I can replace it with (3,4) and (2,5). And do extract k times to get results.
Need better ideas if you have.
Thanks all.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果我理解正确的话,你需要找到 k 个最高的数字,这些数字可以通过将列表中的 1、2 或 3 个元素相乘而产生,并且所有值都是 0 到 1 之间的浮点数。
很明显,你只需要考虑列表中 k 个最大的数字。其余的可以直接丢弃。您可以使用 O(n log k) 算法来获取它们,再次按排序顺序(我假设您的列表没有预先排序)。为了简化问题,您现在可以取它们的对数并尝试最大化数字的总和,而不是最大化乘积的原始问题。这可能会加快一点。
现在(考虑对数表示),所有数字都是负数,因此将更多数字加在一起只会产生越来越多的负数。
我们将前 k 个最大的数字称为 A1...Ak。现在我们可以进一步简化问题,假设还存在数字 A0,其在对数表示中的值为 0,在原始表示中的值为 1;那么问题是枚举前 k 个三元组({A0,...,Ak} 中的 x,y,z),约束条件为 x ≥ y ≥ z 并且 z < A0。让我们用 [i,j,n] 表示 3 元组,用 S[i,j,n] 表示该元组中的元素之和。要报告的第一个元素显然是 [0,0,1],即 ,它在原始问题表述中对应于列表中的单例 #1 值。
我们使用原始公式中的最大堆;我们将三元组推入堆,使用它们的和 (S[...]) 作为排序键。该算法首先将 [0,0,0] 推入堆。那么:
最后,answer包含k + 1个元素,其中第一个元素是[0,0,0],必须被丢弃。
设为-1、-3、-8、-9。那么算法是这样进行的:
这个算法的好处是它不会枚举重复项,并且堆大小为 O(k);要了解原因,请观察算法在每次迭代时添加堆上的最大元素(通常更少),因此在 k 次迭代之后,堆中的元素不能超过 2k。
这给出了运行时间 O(n log k + k log k) = O((n + k) log k)。
if I understand you correctly you need to find k highest numbers that can be produced by multiplying together 1, 2 or 3 elements from your list, and all the values are floating point numbers between 0 and 1.
It is clear that you only need to consider the k highest numbers from the list. The rest can be discarded straight away. You can use your O(n log k) algorithm to get them, again in sorted order (I assume your list isn't preordered). To simplify the problem, you can now take their logarithms and try to maximize the sums of the numbers instead of the original problem of maximizing the products. This might speed up little.
Now (considering the logarithmic presentation), all your numbers are negative, so adding more of them together will just create more and more negative numbers.
Let's call the k highest numbers A1...Ak. We can reduce the problem further now assuming that there exists also number A0, that has the value 0 in the log representation and 1 in the original representation; then the problem is to enumerate the first k 3-tuples (x,y,z in {A0,...,Ak}) with the constraint that x ≥ y ≥ z and that z < A0. Let's denote 3-tuple by [i,j,n] and the sum of the elements in this tuple by S[i,j,n]. The first element to be reported is obviously [0,0,1], i.e. , which corresponds in the original problem formulation to the singleton #1 value on the list.
We use a max-heap as in the original formulation; we push the triples to the heap, using their sums (S[...]) as the ordering key. The algorithm starts by pushing [0,0,0] to the heap. Then:
At the end, answer contains k + 1 elements, the first one of them is [0,0,0] which must be discarded.
Let be given as -1, -3, -8, -9. Then the algorithm proceeds like this:
The nice thing about this algorithm is that it doesn't enumerate duplicates and the heap size is O(k); to see why, observe that the algorithm adds on every iteration the maximum of elements on the heap (often less), so after k iterations there cannot be more than 2k elements in the heap.
This gives then running time O(n log k + k log k) = O((n + k) log k).
我当然看到你可以进行优化。
您可以使用 max(secondHighestNumber^2,M^3) 重复此方法并推广该算法。
I certainly see an optimization you could make.
You can repeat this method with max(secondHighestNumber^2,M^3) and generalize the algorithm.
kNS因为数字是从0到1,所以使用的数字越多,情况就越糟糕,问题是k很大,例如k=N^2
首先尝试使用单个数字,然后将其推入堆中。 O(N*Log(k))
然后使用堆中的这些数字并创建另一个堆 B,其中有 2 个数字 =>最坏的情况是 O(k*log(k)) ,但是如果你对 k>N 的情况下的数字进行排序,那么你可以做一些加速然后
你有堆有 2 个数字和产品,并尝试以与堆 B 相同的方式从堆 B 中创建第三个堆 C你会为B做,但来自更大的堆。
我认为这将使得 O(k*log(k))
kNSince numbers are from 0 to 1, more numbers you use, the worst it gets and problem is whit big k, for instance k=N^2
First try whit single numbers and push then in heap. O(N*Log(k))
Than use this numbers from heap and make another heap B whit 2 numbers => O(k*log(k)) at worst, but you can do some speedups if you sort numbers in case k>N
And then You have heap whit 2 numbers and there products and try making 3rd heap C from heap B same way as you would do for B, but from much bigger heap.
I think that this will make a O(k*log(k))