成对优先队列

发布于 2024-07-13 21:47:01 字数 759 浏览 16 评论 0原文

我有一组 A 和一组 B,每个都有一个关联的数字优先级,其中每个 A 可能匹配一些或所有 B,反之亦然,我的主循环基本上包括:

按优先级顺序选取最好的 AB,然后执行与 AB 一起使用。

最明显的方法是使用 (A,B) 对的单个优先级队列,但如果有 100,000 个 A 和 100,000 个 B< /code> 那么O(N^2) 对的集合将无法放入内存(并且磁盘太慢)。

另一种可能性是对于每个 A,循环遍历每个 B。 然而,这意味着全局优先级排序仅按 A 进行,我确实需要考虑两个组件的优先级。

(该应用是定理证明,其中上述选项分别称为配对算法和给定子句算法;每个算法的缺点都是已知的,但我还没有找到任何好的解决方案的参考。)

某种两层优先级队列似乎已指示,但不清楚如何在最坏情况下不使用 O(N^2) 内存或 O(N^2) 时间来执行此操作。

有已知的方法可以做到这一点吗?

澄清:每个 A 必须与所有相应的 B 一起处理,而不仅仅是一个。

I have a set of A's and a set of B's, each with an associated numerical priority, where each A may match some or all B's and vice versa, and my main loop basically consists of:

Take the best A and B in priority order, and do stuff with A and B.

The most obvious way to do this is with a single priority queue of (A,B) pairs, but if there are 100,000 A's and 100,000 B's then the set of O(N^2) pairs won't fit in memory (and disk is too slow).

Another possibility is for each A, loop through every B. However this means that global priority ordering is by A only, and I really need to take priority of both components into account.

(The application is theorem proving, where the above options are called the pair algorithm and the given clause algorithm respectively; the shortcomings of each are known, but I haven't found any reference to a good solution.)

Some kind of two layer priority queue would seem indicated, but it's not clear how to do this without using either O(N^2) memory or O(N^2) time in the worst case.

Is there a known method of doing this?

Clarification: each A must be processed with all corresponding B's, not just one.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

少跟Wǒ拽 2024-07-20 21:47:01

也许有一些我不明白的事情,但是,

为什么不将 A 和 B 放在单独的堆中,在每个堆上 get_Max,完成你的工作,从其关联的堆中删除每个 max 并继续?

Maybe there's something I'm not understanding but,

Why not keep the A's and B's in separate heaps, get_Max on each of the heaps, do your work, remove each max from its associated heap and continue?

哆啦不做梦 2024-07-20 21:47:01

您可以首先处理最好的对,如果没有什么好的结果,为了完整性起见,用给定的子句算法清理其余的。 这可能会导致一些双重工作,但我敢打赌这是微不足道的。

您考虑过有序顺调制或叠加吗?

You could handle the best pairs first, and if nothing good comes up mop up the rest with the given clause algorithm for completeness' sake. This may lead to some double work, but I'd bet that this is insignificant.

Have you considered ordered paramodulation or superposition?

半步萧音过轻尘 2024-07-20 21:47:01

看来 A 中的项目具有单独的优先级,B 中的项目具有单独的优先级,并且 (A,B) 对具有组合的优先级。 只有组合的优先级才重要,但希望我们可以一路使用各个属性。 然而,A中的项目和B中的项目之间也存在独立优先级的匹配关系。

我假设,对于 A 中的所有 a、B 中的 b1 和 b2,使得 Match(a,b1) 和 Match(a,b2),则 Priority(b1) >= Priority(b2) 意味着 CombinedPriority(a,b1) ) >= 组合优先级(a,b2)。

现在,首先按优先级降序对 B 进行排序。 令 B(j) 表示此排序顺序中的第 j 个元素。 另外,让 A(i) 指示 A 的第 i 个元素(可能按顺序排列,也可能不按顺序排列)。

设 nextb(i,j) 是一个函数,它找到最小的 j' >= j 使得 Match(A(i),B(j'))。 如果不存在这样的 j',则该函数返回 null(或其他一些合适的错误值)。 搜索 j' 可能只涉及从 j 向上循环,或者如果我们更多地了解 Match 关系的结构,我们可能能够做得更快。

为 A 中的所有索引 i 创建一个包含 (i,nextb(i,0)) 的优先级队列 Q,使得 nextb(i,0) != null。 Q 中的 (i,j) 对按 CombinedPriority(A(i),B(j)) 排序。

现在循环直到 Q 为空。 拉出最高优先级对 (i,j) 并适当处理 (A(i),B(j))。 然后将 (i,nextb(i,j+1)) 重新插入 Q 中(除非 nextb(i,j+1) 为空)。

总而言之,在所有对都匹配的最坏情况下,这需要 O(N^2 log N) 时间。 一般来说,需要 O(N^2 + M log N),其中 M 是匹配数。 如果有一种更快的方法来计算 nextb(i,j) ,只需向上循环,则可以减少 N^2 分量,但这取决于匹配关系的知识。

(在上面的分析中,我假设 A 和 B 的大小都是 N。如果它们的大小不同,则可以轻松修改公式。)

在最坏的情况下,您似乎想要比 O(N^2) 时间更好的东西,但是如果你需要处理每一个匹配,那么你有一个下限 M,它可以是 N^2 本身。 我不认为你能够做得比 O(N^2 log N) 时间更好,除非组合优先级有一些特殊的结构,可以让你使用比 log-N 更好的优先级队列。

It appears that the items in A have an individual priority, the items in B have an individual priority, and the (A,B) pairs have a combined priority. Only the combined priority matters, but hopefully we can use the individual properties along the way. However, there is also a matching relation between items in A and items in B that is independent priority.

I assume that, for all a in A, b1 and b2 in B, such that Match(a,b1) and Match(a,b2), then Priority(b1) >= Priority(b2) implies CombinedPriority(a,b1) >= CombinedPriority(a,b2).

Now, begin by sorting B in decreasing order priority. Let B(j) indicate the jth element in this sorted order. Also, let A(i) indicate the ith element of A (which may or may not be in sorted order).

Let nextb(i,j) be a function that finds the smallest j' >= j such that Match(A(i),B(j')). If no such j' exists, the function returns null (or some other suitable error value). Searching for j' may just involve looping upward from j, or we may be able to do something faster if we know more about the structure of the Match relation.

Create a priority queue Q containing (i,nextb(i,0)) for all indices i in A such that nextb(i,0) != null. The pairs (i,j) in Q are ordered by CombinedPriority(A(i),B(j)).

Now just loop until Q is empty. Pull out the highest-priority pair (i,j) and process (A(i),B(j)) appropriately. Then re-insert (i,nextb(i,j+1)) into Q (unless nextb(i,j+1) is null).

Altogether, this takes O(N^2 log N) time in the worst case that all pairs match. In general, it takes O(N^2 + M log N) where M are the number of matches. The N^2 component can be reduced if there is a faster way of calculating nextb(i,j) that just looping upward, but that depends on knowledge of the Match relation.

(In the above analysis, I assumed both A and B were of size N. The formulas could easily be modified if they are different sizes.)

You seemed to want something better than O(N^2) time in the worst case, but if you need to process every match, then you have a lower bound of M, which can be N^2 itself. I don't think you're going to be able to do better than O(N^2 log N) time unless there is some special structure to the combined priority that lets you use a better-than-log-N priority queue.

怪我闹别瞎闹 2024-07-20 21:47:01

因此,您有一组 A 和一组 B,并且您需要从该组中选择一个 (A, B) 对,使得某些 f(a, b) 是任何其他 (A, B) 对中最高的。

这意味着您可以存储所有可能的 (A, B) 对并对它们进行排序,然后每次通过循环选择最高的(每次迭代 O(1),但内存 O(N*M))。

或者您可以循环遍历所有可能的对并跟踪当前最大值并使用它(每次迭代 O(N*M),但仅 O(N+M) 内存)。

如果我理解正确的话,这就是你所问的。

我认为这很大程度上取决于f()来确定是否有更好的方法来做到这一点。

如果f(a, b) = a + b,那么显然很简单,最高A,最高的B就是你想要的。

So you have a Set of A's, and a set of B's, and you need to pick a (A, B) pair from this set such that some f(a, b) is the highest of any other (A, B) pair.

This means you can either store all possible (A, B) pairs and order them, and just pick the highest each time through the loop (O(1) per iteration but O(N*M) memory).

Or you could loop through all possible pairs and keep track of the current maximum and use that (O(N*M) per iteration, but only O(N+M) memory).

If I am understanding you correctly this is what you are asking.

I think it very much depends on f() to determine if there is a better way to do it.

If f(a, b) = a + b, then it is obviously very simple, the highest A, and the highest B are what you want.

梦里人 2024-07-20 21:47:01

我认为你最初的想法是可行的,你只需要将你的 As 和 B 放在单独的集合中,并将对它们的引用粘贴到你的优先级队列中。 如果每个引用占用 16 个字节(仅选择一个数字),那么 10,000,000 个 A/B 引用将只占用约 300M。 假设你的 As 和 B 本身不是太大,它应该是可行的。

I think your original idea will work, you just need to keep your As and Bs in separate collections and just stick references to them in your priority queue. If each reference takes 16 bytes (just to pick a number), then 10,000,000 A/B references will only take ~300M. Assuming your As and Bs themselves aren't too big, it should be workable.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文