R:将函数应用于矩阵的所有行对,无需 for 循环
我希望对矩阵中的所有行进行所有成对比较,显然双 for 循环可以工作,但对于大型数据集来说非常昂贵。
我查找了隐式循环,例如 apply()
等,但不知道如何避免内部循环。
如何才能实现呢?
I want all pairwise comparisons for all rows in the matrix, obviously double for loop will work but extremely expensive for large dataset.
I looked up implicit loop like apply()
, etc. but have no a clue how to avoid the inner loop.
How can it be achieved?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我假设您正在尝试对矩阵的所有行对进行某种类型的比较。
您可以使用
outer()
遍历所有行索引对,并应用向量化每个行对的比较函数。例如,您可以计算所有行对之间的平方欧几里德距离,如下所示:
I'm assuming you're trying do some type of comparison across all row-pairs of a matrix.
You could use
outer()
to run through all pairs of row-indices, and apply a vectorizedcomparison function to each row-pair. E.g. you could calculate the squared Euclidean distance among all row-pairs as follows:
如果您愿意进行自我比较,
outer()
效果很好 - 例如 1-1 和 2-2 等...(矩阵中的对角线值)。此外,outer() 还执行 1-2 和 2-1 比较。大多数时候成对比较只需要三角比较,不需要自比较和镜像比较。要实现三角比较,请使用
combn()
方法。下面是一个示例输出,显示了
outer()
和combn()
之间的差异,请注意上面的“1-1”自我比较。以及“1-2”和“2-1”的镜像比较。与下面的对比:
你可以看到上面矩阵的“上三角”部分。
当您有两个不同的向量进行成对运算时,Outer() 更合适。为了在单个向量内执行成对运算,通常可以使用combn。
例如,如果您正在执行
outer(x,x,...)
那么您可能做错了 - 您应该考虑combn(length(x),2))
代码>outer()
works fine if you are willing to do self-compare - such as 1-1 and 2-2 etc... (the diagonal values in the matrix). Also outer() performs both 1-2 and 2-1 comparisions.Most of the times pair-wise comparisions only require triangular comparisions, without the self-comparision and mirror comparisions. To achieve triangular comparisions, use
combn()
method.Here is a sample output to show the difference between
outer()
andcombn()
Note the "1-1" self-comparisions above. And the "1-2" and "2-1" mirror comparisions. Contrast it with the below:
You can see the "upper triangular" part of the matrix in the above.
Outer() is more apt when you have two different vectors to do pair-wise operation. For performing pair-wise operations within a single vector, more often than not you can get away with combn.
For example, if you are doing
outer(x,x,...)
then you are perhaps doing it wrong - you should considercombn(length(x),2))
也许不是像@Prasad那么通用的解决方案,但在平方和的这种特殊情况下要快得多:
Maybe not so universal solution as @Prasad but much faster in this special case of sum of squares:
@Gopalkrishna Palem
我喜欢你的解决方案!但是,我认为您应该使用combn(v, 2) 而不是comn(length(v), 2)。 commn(length(v), 2) 仅迭代 v 的不定数,
因此最终结果与 commn(v, 2) 是正确的。
然后,如果我们有一个数据框,我们可以使用索引将函数应用于成对的行:
但是,a_ply 会丢弃结果,那么如何将输出存储在向量中以供进一步分析呢?我不想只打印结果
@Gopalkrishna Palem
I like your solution! However, I think you should use combn(v, 2) instead of combn(length(v), 2). combn(length(v), 2) only iterates over the indecies of v
so the final result is correct with combn(v, 2).
Then if we have a dataframe, we can use the indices to apply a function to pairwise rows:
However, a_ply will discard the result, so how can I store the output in a vector for further analysis? I don't want to just print the result