用于在具有多个可能匹配的两个列表之间找到最佳匹配的最佳数据结构/算法

发布于 2025-01-11 16:45:49 字数 1041 浏览 4 评论 0原文

我有两个不同长度的元素列表（N！= M），

A = [a_0, a_1, ..., a_N]
B = [b_0, b_1, ..., b_M]

列表中的元素是数字间隔，我必须根据给定的度量找到它们之间的最佳匹配（基本上，这些间隔的交集）。

据此，我使用了一个简单的嵌套 for 循环来比较两个列表的所有元素：

for a in A:
    for b in B:
        found_match_score(a, b)

这样，我找到了 B 中可能与 A 中的元素匹配的元素，并将结果排列在一个字典中。对于 A 中的每个元素，我有一个字典，显示 B 中可能的匹配项以及相应的匹配分数：

{
a_0: {},
a_1: {b_0: 0.25},
a_2: {b_0: 0.5},
a_3: {},
a_4: {b_1: 0.29},
a_5: {b_1: 0.33},
a_6: {b_2: 0.25},
a_7: {b_2: 0.57},
a_8: {b_4: 0.10,
      b_5: 0.25},
a_9: {b_5: 0.44},
a_10: {b_6: 0.17,
       b_7: 0.25}
a_11: {b_7: 0.3,
       b_8: 0.43}
}

现在，我需要检查此结构并找到最佳匹配项，以便理想情况下我最终得到此结果（我希望每个匹配都是唯一的）：

{
a_0: {},
a_1: {},
a_2: {b_0: 0.5},
a_3: {},
a_4: {},
a_5: {b_1: 0.33},
a_6: {},
a_7: {b_2: 0.57},
a_8: {b_4: 0.10},
a_9: {b_5: 0.44},
a_10: {b_7: 0.25}
a_11: {b_8: 0.43}
}

我可以通过迭代来做到这一点在字典中，检查某些 B 元素是否与 A 中的多个项目匹配，将它们存储到临时对象中，并只保留最匹配的一个，直到得到结果。
但这似乎有点令人费解，我想知道是否有更有效的方法来做到这一点。也许我完全使用了错误的数据结构？

原文

I have two lists of elements of different lenghts (N != M),

A = [a_0, a_1, ..., a_N]
B = [b_0, b_1, ..., b_M]

The elements in the lists are numeric intervals, and I have to find the best matches between them based on a given metric (basically, the Intersection over Union of these intervals).

According to this, I used a simple nested for loop to compare all the elements of the two lists between each other:

for a in A:
    for b in B:
        found_match_score(a, b)

This way, I found the possible elements from B matching the elements from A, and I arranged the results in a dict...for each element from A, I have a dictionary showing the possible matches from B, and the corresponding matching score:

{
a_0: {},
a_1: {b_0: 0.25},
a_2: {b_0: 0.5},
a_3: {},
a_4: {b_1: 0.29},
a_5: {b_1: 0.33},
a_6: {b_2: 0.25},
a_7: {b_2: 0.57},
a_8: {b_4: 0.10,
      b_5: 0.25},
a_9: {b_5: 0.44},
a_10: {b_6: 0.17,
       b_7: 0.25}
a_11: {b_7: 0.3,
       b_8: 0.43}
}

Now, I need to go over this structure and find the best matches, so that ideally I end up with this result (I want each matching to be unique):

{
a_0: {},
a_1: {},
a_2: {b_0: 0.5},
a_3: {},
a_4: {},
a_5: {b_1: 0.33},
a_6: {},
a_7: {b_2: 0.57},
a_8: {b_4: 0.10},
a_9: {b_5: 0.44},
a_10: {b_7: 0.25}
a_11: {b_8: 0.43}
}

I could do this by iterating over the dictionary, checking if some B elements match multiple items from A, store them to a temp object, and only keep the best matching one, until I get the results.
But this seems kinda convoluted, and I was wondering if there's a more efficient way to do it. Maybe I'm using the wrong data structure altogether?

分享到QQ

分享到微博