如何一遍又一遍地在 {0,1,2}^12 中找到最近的向量

发布于 2024-10-03 10:38:06 字数 620 浏览 6 评论 0原文

我正在搜索长度为 12 的向量空间，其中条目为 0、1、2。例如，这样的一个向量是
001122001122。我有大约一千个好的向量，和大约一千个坏的向量。对于每个坏向量，我需要找到最接近的好向量。两个向量之间的距离只是不匹配的坐标数。好的向量排列得不是特别好，而且它们“好”的原因在这里似乎没有帮助。我的首要任务是算法要快。

如果我进行简单的穷举搜索，我必须计算大约 1000*1000 的距离。看来真是脑洞很大啊。

如果我首先使用好的向量应用 Dijkstra 算法，我可以计算空间中每个向量的最近向量和最小距离，以便每个坏向量都需要一个简单的查找。但该空间中有 3^12 = 531,441 个向量，因此预计算是 50 万次距离计算。积蓄不多。

你能帮我想一个更好的办法吗？

编辑：由于人们认真地询问是什么让它们“好”：每个向量代表六个等边三角形的六边形图片的描述，这是立方体的 3D 排列的 2D 图像（想想广义的 Q-bert）。等边三角形是立方体面 (45-45-90) 的一半，倾斜透视。其中六个坐标描述了三角形的性质（感知的地板、左墙、右墙），六个坐标描述了边缘的性质（感知的连续性、两种感知的不连续性）。这 1000 个好的向量是那些代表在透视立方体时可以看到的六边形的向量。搜索的原因是对充满三角形的六角形地图应用局部校正......

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只为一人 2024-10-10 10:38:07

我的计算几何非常粗糙，但似乎您应该能够：

计算 Voronoi 图为你的一组好的向量。
计算图中单元格的 BSP 树。

Voronoi 图将为每个好的向量提供一个 12 维凸包，其中包含最接近该向量的所有点。

BSP 树将为您提供一种快速的方法来确定向量位于哪个单元格中，从而确定它最接近哪个好的向量。

编辑：我刚刚注意到您使用的是汉明距离而不是欧几里德距离。我不确定如何调整它以适应该限制。对不起。

回复收藏 0 原文

梦回梦里 2024-10-10 10:38:07

假设向量的压缩表示，一次距离计算（比较一个好的向量和一个坏的向量以产生距离）可以在大约 20 个时钟周期或更短的时间内完成。因此，一百万次这样的距离计算可以在 2000 万个周期或（假设 2GHz cpu）0.01 秒内完成。这些数字有帮助吗？

PS：- 20 个周期是保守的高估。

回复收藏 0 原文

情丝乱 2024-10-10 10:38:06

为了正确看待事情，并确保没有优化不必要的事情，没有任何优化的蛮力方法在我的机器上需要 12 秒。

Mathematica 中的代码：

bad = Table[RandomInteger[5, 12], {1000}];
good = Table[RandomInteger[2, 12], {1000}];
distance[a_, b_] := Total[Sign@Abs[a - b]];

bestMatch = #[[2]] & /@ 
   Position[
    Table[Ordering@
      Table[distance[good[[j]], bad[[i]]], {j, Length@good}], {i, 
      Length@bad}], 1] // Timing

正如您所料，时间遵循 O(n^2) 法则：

alt text

Just to keep the things in perspective, and be sure you are not optimizing unnecessary things, the brute force approach without any optimization takes 12 seconds in my machine.

Code in Mathematica:

bad = Table[RandomInteger[5, 12], {1000}];
good = Table[RandomInteger[2, 12], {1000}];
distance[a_, b_] := Total[Sign@Abs[a - b]];

bestMatch = #[[2]] & /@ 
   Position[
    Table[Ordering@
      Table[distance[good[[j]], bad[[i]]], {j, Length@good}], {i, 
      Length@bad}], 1] // Timing

As you may expect, the Time follows a O(n^2) law:

alt text

回复收藏 0 原文

爱的那么颓废 2024-10-10 10:38:06

这听起来很像拼写检查器必须做的事情。诀窍通常是滥用tries。

您可以做的最基本的事情是在好的向量上构建一个特里树，然后进行洪水填充，优先考虑几乎没有不匹配的分支。当附近有一个向量时，这会非常快，而当最近的向量很远时，就会退化为暴力。不错。

但我认为你可以做得更好。共享相同前缀的坏向量将执行相同的初始分支工作，因此我们也可以尝试共享它。因此，我们还对坏向量构建了一个特里树，并一次性将它们全部完成。

不能保证这是正确的，因为算法和代码都超出了我的想象：

var goodTrie = new Trie(goodVectors)
var badTrie = new Trie(badVectors)
var result = new Map<Vector, Vector>()
var pq = new PriorityQueue(x => x.error)
pq.add(new {good: goodTrie, bad: badTrie, error: 0})
while pq.Count > 0
  var g,b,e = q.Dequeue()
  if b.Count == 0: 
      //all leafs of this path have been removed
      continue
  if b.IsLeaf:
      //we have found a mapping with minimum error for this bad item
      result[b.Item] = g.Item
      badTrie.remove(b) //prevent redundant results
  else:
      //We are zipping down the tries. Branch to all possibilities.
      q.EnqueueAll(from i in {0,1,2}
                   from j in {0,1,2}
                   select new {good: g[i], bad: b[j], error: e + i==j ? 0 : 1})

return result

最终的优化可能是对向量重新排序，以便不良向量之间高度一致的位置排在第一位并分担更多工作。

This sounds a lot like what spellcheckers have to do. The trick is generally to abuse tries.

The most basic thing you can do is build a trie over the good vectors, then do a flood-fill prioritizing branches with few mismatches. This will be very fast when there is a nearby vector, and degenerate to brute force when the closest vector is very far away. Not bad.

But I think you can do better. Bad vectors which share the same prefix will do the same initial branching work, so we can try to share that as well. So we also build a trie over the bad vectors and sortof do them all at once.

No guarantees this is correct, since both the algorithm and code are off the top of my head:

var goodTrie = new Trie(goodVectors)
var badTrie = new Trie(badVectors)
var result = new Map<Vector, Vector>()
var pq = new PriorityQueue(x => x.error)
pq.add(new {good: goodTrie, bad: badTrie, error: 0})
while pq.Count > 0
  var g,b,e = q.Dequeue()
  if b.Count == 0: 
      //all leafs of this path have been removed
      continue
  if b.IsLeaf:
      //we have found a mapping with minimum error for this bad item
      result[b.Item] = g.Item
      badTrie.remove(b) //prevent redundant results
  else:
      //We are zipping down the tries. Branch to all possibilities.
      q.EnqueueAll(from i in {0,1,2}
                   from j in {0,1,2}
                   select new {good: g[i], bad: b[j], error: e + i==j ? 0 : 1})

return result

A final optimization might be to re-order the vectors so positions with high agreement among the bad vectors come first and share more work.

回复收藏 0 原文