优化从 MATLAB 矩阵中提取数据?
给定一个 n 维值矩阵:通过任意索引(即坐标)检索值的最有效方法是什么?
例如,在随机 5x5 矩阵中,如果我想要 (1,1) (2,3) 和 (4,5) 处的值,返回这些坐标处的值的最有效方法是什么?
例如,如果我在单独的矩阵中提供这些坐标,是否有一行 MATLAB 可以做这份工作吗?比如:
x=rand(5,5);
y=[[1,1];[2,3];[4,5]];
z=x(y);
除了那不起作用。
然而,需要注意的是,由于各种原因,我无法使用线性索引 - 必须使用原始索引返回结果。这些矩阵的大小可能非常大,所以我也不想使用循环。
Given an n-dimensional matrix of values: what is the most efficient way of retrieving values by arbitrary indices (i.e. coordinates)?
E.g. in a random 5x5 matrix, if I want the values at (1,1) (2,3) and (4,5) what is the most efficient way of returning just the values at these coordinates?
If I provide these coordinates in a separate matrix for example is there a one line of MATLAB which can do the job? Something like:
x=rand(5,5);
y=[[1,1];[2,3];[4,5]];
z=x(y);
Except that doesn't work.
One caveat however, for various reasons I am unable to use linear indexing - the results must be returned using the original indices. And the size of these matrices is potentially very large so I don't want to use loops either.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您反对使用线性索引和循环,那么唯一的其他选择,据我所知,是逻辑索引。但是,如果 y 始终采用您建议的形式,则您需要根据 y 中指定的索引创建一个逻辑矩阵。
您能解释一下为什么不允许线性索引吗?
无论如何,如果你想要一个非常愚蠢的答案(这是我所能提供的这么多信息):
z = diag(x(y(:,1),y(:,2)))
当然,这将不必要地创建一个巨大的矩阵并从中提取对角线元素(您需要的元素) - 但它可以在一行中完成,等等。
编辑:如果限制使用线性在原始数据上建立索引,然后您可以使用线性索引创建一个逻辑矩阵并用它索引
x
。例如,类似地,对于 3 维矩阵:
此外,逻辑索引应该比线性索引更快,因此除了构建掩码之外,您还处于良好状态。
If you're against using linear indexing and loops, the only other alternative, AFAIK, is logical indexing. But if
y
always comes in the form you've suggested, you'll need to create a logical matrix from the indices specified iny
.Could you explain why linear indexing is not allowed?
Anyway, if you want a really stupid answer (which is all I can provide with this much information):
z = diag(x(y(:,1),y(:,2)))
Of course, this will needlessly create a huge matrix and extract the diagonal elements (the ones you need) from it - but it gets it done in one line, etc.
EDIT: If the restriction is using linear indexing on the original data, then you can use linear indexing to create a logical matrix and index
x
with that. E.g.Similarly, for a 3-dimensional matrix:
Also, logical indexing is supposed to be faster than linear indexing, so apart from building the mask, you're in good shape.
为什么 sub2ind 不适合解决这个问题?我不认为需要逻辑掩码;例如应该
也可以。
why isn't sub2ind alone suitable for this problem? I don't see the need for the logical mask; e.g.
should work as well.
在音乐停止很久之后才来参加聚会,但我无法控制自己...
如果由于工具箱中的错误而需要“完整”索引,并且工具箱一次仅加载矩阵的一部分,您可能会考虑遵循工具箱的行为。通过两件事可以获得大矩阵的巨大效率增益:
1)不要复制不需要复制的东西;例如,这包括创建一个原始矩阵大小的逻辑数组(尽管它名义上“高效”,但每个元素占用一个字节。如果您的矩阵太大而无法一次放入内存,即使是一个矩阵是 1/8,大小可能很重要)
2)保持内存一致性:访问“同一区域”的内存,或者发现自己因大量磁盘交换而减慢速度;即使所有内容都适合内存,保持“缓存一致性”也可以显着提高性能。如果您可以按照矩阵元素的存储顺序访问它们,那么速度会大大加快。
为了解决第一点,您需要寻找一种不需要创建完整副本的方法(因此雅各布的答案将会消失)。为了解决第二个问题,您需要在访问矩阵之前对索引进行排序 - 这样,可以“从同一内存块”访问的任何元素都将被排序。
下面将这两种技术结合起来。我假设 numel(y) << numel(x) - 换句话说,您只对查看
x
中相对较少的元素感兴趣。如果情况并非如此,对 y 向量进行排序实际上会减慢速度:我使用 10000x10000 矩阵 x 和 5000x2 随机元素查找向量 y 对此进行了基准测试。与 Jacob 的代码相比,我得到
将查找向量的大小增加到 50000x2,这些值是
换句话说 - 哪种方法效果更好取决于您想要访问的元素数量。另请注意,使用逻辑 L 矩阵会隐式导致对大 x 矩阵的顺序访问 - 但在创建该矩阵期间,您会随机访问内存。 ..
顺便请注意,您遇到的一个问题是“是否有单衬” - 答案是“是”。如果您定义了数组
x
和y
,那么确实只是一行,并且不使用线性索引...
Coming to the party long after the music has stopped, but I couldn't help myself...
If you need "full" indexing because of a bug in the toolbox, and the toolbox is loading only a part of the matrix at one time, you might consider following along with the behavior of the toolbox. A big efficiency gain with large matrices is gained with two things
1) don't make copies of things that don't need to be copied; this includes, for example, creating a logical array of the size of the original matrix (although it's nominally "efficient", it takes one byte per element. If your matrix is too large to fit in memory all at once, even a matrix that is 1/8 the size is probably significant)
2) preserve memory coherence: access memory "in the same region", or find yourself slowed down by lots of disk swapping; even when everything fits in memory, preserving "cache coherence" can result in significant performance improvements. If you can access matrix elements in the order in which they are stored, things speed up considerably.
To address the first point, you need to look for a method that doesn't require creating a complete copy (so Jacob's answer would be out). To address the second, you need to sort your indices before accessing the matrix - in that way, any elements that can be accessed "from the same block of memory", will be.
The two techniques are combined in the following. I am assuming that
numel(y) << numel(x)
- in other words you are only interested in looking at a relatively small number of elements ofx
. If that's not the case, sorting the y vector would actually slow you down a lot:I benchmarked this using a 10000x10000 matrix
x
and a 5000x2 random element lookup vectory
. Comparing against Jacob's code, I obtainedIncreasing the size of the lookup vector to 50000x2, the values are
In other words - which method will work better depends on the number of elements you want to access. Note also that the use of the logical
L
matrix implicitly results in sequential access of the largex
matrix - but that during the creation of that matrix you are randomly accessing the memory...Note by the way that one question you had was "is there a one liner" - and the answer is "yes". If you have your arrays
x
andy
as defined, thenis indeed just one line, and doesn't use linear indexing...