查找两组之间的邻近匹配对

发布于 2024-12-22 14:17:24 字数 226 浏览 1 评论 0原文

给定两组关键字，其中每个关键字都有开始和结束偏移量（例如关键字“abc”从偏移量 23 开始并在偏移量 25 结束），我想有效地找到这些组之间的匹配对。匹配对是 set1 中的一个关键字和 set2 中的一个关键字，其中一个关键字在另一个关键字结束之后开始，但一个关键字的结尾到另一个关键字的开头之间不超过 MAX_PROXIMITY 个字符。此外，每个关键字只能属于一对（匹配的关键字不能重复用于其他匹配）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

是你 2024-12-29 14:17:24

您可以将其表示为二分图中的最大匹配。将您拥有的两个集合视为两组顶点，并在从第一个集合的所有顶点到第二个集合中满足您的规则的所有顶点之间生成边，即“其中一个关键字在另一个关键字结束之后开始，但不超过 MAX_PROXIMITY一个的末尾到另一个的开头之间的字符”

一旦你有了图，就可以在二分图中运行最大匹配算法。
http://en.wikipedia.org/wiki/Matching_(graph_theory)#Maximum_matchings_in_bipartite_graphs

回复收藏 0 原文

阳光下的泡沫是彩色的 2024-12-29 14:17:24

您可以使用动态规划来解决这个问题。

假设您有每组关键字，按它们开始的偏移量排序。
让我们定义

set1_keyword[i] # i-th keyword in the first set (ordered by the start offset)
set2_keyword[j] # same for the second set
max_pairs[i][j] # number of pairs in optimal assignment between keywords 1..i from the first set and keywords 1..j from the second set

当然 max_pairs[n1][n2] 是您问题的答案（其中 n1 和 n2 是第一个和分别设置第二个关键字）

要计算 max_pairs 表，请使用以下公式

if (set1_keyword[i] matches set2_keyword[j])
    max_pairs[i][j] = max_pairs[i-1][j-1] + 1
else
    max_pairs[i][j] = max(max_pairs[i-1][j], max_pairs[i][j-1])

这基本上是说，如果您可以匹配关键字，则可以这样做，因为对于关键字 1..i 和 1..j 的问题，它是尽力而为做。在另一种情况下（第 i 个和第 j 个关键字不匹配），您无法找到将第 i 个和第 j 个关键字与某些不同关键字配对的解决方案。因此，在最佳解决方案中，第 i 个关键字或第 j 个关键字应该是不配对的。
这基本上告诉我们查看问题的（已计算的）解决方案 max_pairs[i-1][j] （没有第 i 个关键字）或 max_pairs[i][j-1 ] （没有第 j 个关键字）并选择两者中较好的一个。

如果您以正确的顺序计算此表，即

for (int i = 0; i < n1; ++i)
    for (int j = 0; j < n2; ++j)
        # compute max_pairs[i][j] here

算法将具有 O(n1*n2) 复杂度，这比二部图中的分配问题（运行时间为 O(n^3)）要好

有关动态规划的更多信息，请参阅动态规划

You can use dynamic programming to solve this problem.

Suppose you have each set of keywords ordered by the offset at which they start.
Let's define

set1_keyword[i] # i-th keyword in the first set (ordered by the start offset)
set2_keyword[j] # same for the second set
max_pairs[i][j] # number of pairs in optimal assignment between keywords 1..i from the first set and keywords 1..j from the second set

Of course max_pairs[n1][n2] is the answer to your problem (wheren1 and n2 are the sizes of the first and second keyword set respectively)

To compute max_pairs table use the following formula

if (set1_keyword[i] matches set2_keyword[j])
    max_pairs[i][j] = max_pairs[i-1][j-1] + 1
else
    max_pairs[i][j] = max(max_pairs[i-1][j], max_pairs[i][j-1])

This basically says that if you can match the keywords you do it, because for the problem with keywords 1..i and 1..j it's the best you can do. In the other case (i-th and j-th keywords don't match) you cannot have a solution in which both i-th and j-th keywords are paired to some different keywords. So in the optimal solution either i-th keyword or j-th keyword should be unpaired.
That basically tells us to look at the (already computed) solutions for problems max_pairs[i-1][j] (without i-th keyword) or max_pairs[i][j-1] (without j-th keyword) and choose the better of the two.

If you compute this table in the correct order i.e

for (int i = 0; i < n1; ++i)
    for (int j = 0; j < n2; ++j)
        # compute max_pairs[i][j] here

the algorithm will have O(n1*n2) complexity, which is better than the assignment problem in bipartite graph (which runs in O(n^3))

For more on dynamic programming please refer to dynamic programming

回复收藏 0 原文

~没有更多了~

关于作者

攀登最高峰

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

查找两组之间的邻近匹配对

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

查找两组之间的邻近匹配对

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。