映射 2 个向量 - 帮助向量化

发布于 2024-08-19 10:23:54 字数 623 浏览 8 评论 0原文

在 Matlab 中工作,我有 2 个不同长度的 x 坐标向量。例如:

xm = [15 20 24 25 26 35 81 84 93];
xn = [14 22 26 51 55 59 70 75 89 96];

我需要将xm映射到xn,或者换句话说,找到xn中哪些坐标最接近xm。因此,如果我有与这些坐标关联的值,我可以使用此地图作为索引并将这些值关联起来。

两个向量均已排序,并且每个向量中没有重复项。

我用 for 循环编写了一个简单的函数:

function xmap = vectors_map(xm,xn)
xmap = zeros(size(xm));
for k=1:numel(xm)
    [~, ind] = min(abs(xm(k)-xn));
    xmap(k) = ind(1);
end

对于上面的例子来说,

xmap =
    1     2     2     3     3     3     8     9    10

它工作正常,但对于长向量(超过 100,000 个点)需要一段时间。

关于如何向量化这段代码有什么想法吗?

Working in Matlab I have 2 vectors of x coordinate with different length. For example:

xm = [15 20 24 25 26 35 81 84 93];
xn = [14 22 26 51 55 59 70 75 89 96];

I need to map xm to xn, or in other words to find which coordinates in xn are closest to xm. So if I have values associated with those coordinates, I can use this map as index and correlate those values.

Both vectors are sorted and there are no duplicates in each vector.

I wrote a simple function with for-loop:

function xmap = vectors_map(xm,xn)
xmap = zeros(size(xm));
for k=1:numel(xm)
    [~, ind] = min(abs(xm(k)-xn));
    xmap(k) = ind(1);
end

For the above example is returns

xmap =
    1     2     2     3     3     3     8     9    10

It works ok, but takes a while with long vectors (over 100,000 points).

Any ideas how to vectorize this code?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

冬天旳寂寞 2024-08-26 10:23:54

哦!另一种选择:由于您正在寻找两个排序列表之间的紧密对应关系,因此您可以使用类似合并的算法同时遍历它们。这应该是 O(max(length(xm), length(xn))) 左右。


match_for_xn = zeros(length(xn), 1);
last_M = 1;
for N = 1:length(xn)
  % search through M until we find a match.
  for M = last_M:length(xm)
    dist_to_curr = abs(xm(M) - xn(N));
    dist_to_next = abs(xm(M+1) - xn(N));

    if dist_to_next > dist_to_curr
      match_for_xn(N) = M;
      last_M = M;
      break
    else
      continue
    end

  end % M
end % N

编辑:
看@yuk的评论,上面的代码并不完全正确!

Oh! One other option: since you're looking for close correspondences between two sorted lists, you could go through them both simultaneously, using a merge-like algorithm. This should be O(max(length(xm), length(xn)))-ish.


match_for_xn = zeros(length(xn), 1);
last_M = 1;
for N = 1:length(xn)
  % search through M until we find a match.
  for M = last_M:length(xm)
    dist_to_curr = abs(xm(M) - xn(N));
    dist_to_next = abs(xm(M+1) - xn(N));

    if dist_to_next > dist_to_curr
      match_for_xn(N) = M;
      last_M = M;
      break
    else
      continue
    end

  end % M
end % N

EDIT:
See @yuk's comment, the above code is not totally correct!

深居我梦 2024-08-26 10:23:54

考虑这个矢量化解决方案:

[~, xmap] = min( abs(bsxfun(@minus, xm, xn')) )

Consider this vectorized solution:

[~, xmap] = min( abs(bsxfun(@minus, xm, xn')) )
她如夕阳 2024-08-26 10:23:54

我知道解决这个问题的最快实现是这个(C可以编译为 .mex 文件的代码;对我来说,它比接受的答案中的 rescdsk 代码快大约 20 倍)。令人惊讶的是,这样的常见操作并不是 MATLAB 内置函数。

The fastest implementation I'm aware of that solves this problem is this one (C code that can be compiled as a .mex file; for me it's about 20 times faster than rescdsk's code in the accepted answer). It's surprising that such a common operation is not a MATLAB built-in function.

苍白女子 2024-08-26 10:23:54

看起来您的输入向量已排序。使用二分搜索来查找最接近的匹配。这将为您提供 O(n ln n) 运行时间。

It looks like your input vectors are sorted. Use a binary search to find the closest match. This will give you a O(n ln n) run time.

泛滥成性 2024-08-26 10:23:54

您的 xm 和 xn 已排序。如果通常是这种情况,那么您可以比单步遍历整个数组更好。

对于 xn 中的每个值,都会有一个值范围,对于该范围,xm 中的值将比任何其他值更接近该数字。预先计算这些间隔,然后您可以按顺序遍历两个数组。

Your xm and xn are sorted. If this is generally the case, then you can do much better than stepping over the entire array.

For each value in xn, there will be a range of values for which a value in xm will be closer to that number than any other. Compute these intervals beforehand and you can then step through both arrays sequentially.

魂牵梦绕锁你心扉 2024-08-26 10:23:54

正如 David 所说,利用排序会更快,因为你有很多点,但作为参考,矢量化的一种方法是使用 meshgrid:

[X Y] = meshgrid(xn, xm);
diffs = X - y;
mins = min(diffs, [], 2);

请注意,这将在内存中创建两个 100,000 x 100,000 数组,因此可能是仅适用于较小的数据集。

Taking advantage of being sorted, as David says, will be faster since you have so many points, but for reference one way to vectorize this would be to use meshgrid:

[X Y] = meshgrid(xn, xm);
diffs = X - y;
mins = min(diffs, [], 2);

Note that this will create two 100,000 x 100,000 arrays in memory, so it's probably only feasible for smaller data sets.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文