我需要加快为每个条目找到最佳距离的过程。我正在使用 gower.dist
来自 statmatch
和 solve_lsap
来自 clue> clue> clue
package> solve> solve_lsap 。 Gower距离根本不需要时间,但是,LSAP求解器需要太长时间,而我需要运行它的次数。
有没有一种方法可以使用并行计算更快地进行此运行,或者只是使其一部分并行运行 [链接到线索github] adagio 和 rcpphungarian
(两个都较慢)。
示例数据:
> dim(gowerdist)
[1] 4309 10366
solve_LSAP(gowerdist, maximum = FALSE)
I need to speed up a process of finding most optimal distance for each entry. I am using gower.dist
from StatMatch
and solve_LSAP
from the clue
package. The gower distance takes no time at all, however the LSAP solver takes too long with the number of times I need to run it.
Is there a way to make this run faster using parallel computing or just making part of it run in parallel [link to clue github] [link to scientific journal discussing this] or another solver that I may be unaware of that is faster? The other two libraries I am aware of are adagio
and RcppHungarian
(both are slower).
Example data:
Gower Distance Data (google drive link to folder with data)
> dim(gowerdist)
[1] 4309 10366
solve_LSAP(gowerdist, maximum = FALSE)
发布评论
评论(2)
线索:: SOLVE_LSAP
使用匈牙利算法。 Jonker and Volgenant(1987)算法,在 ,更有效。我测试了矩阵的简化版本,以在几秒钟而不是分钟内获得结果。运行时间的差异可能会随着较大的矩阵而增加。
graphAlignment :: linearassignment()
还使用LAPJV算法,但只能应用于方形矩阵。 (解决方法是在具有极高值的矩阵中添加额外的行。)另一种选择是 /a>;这没有指定使用哪种算法,但要慢得多
clue::solve_LSAP
uses the Hungarian algorithm. The Jonker and Volgenant (1987) algorithm, implemented inTreeDist::LAPJV()
, is more efficient.I tested a reduced version of the matrix to get results in seconds rather than minutes; differences in run time are likely to increase with larger matrices.
GraphAlignment::LinearAssignment()
on Bioconductor also uses the LAPJV algorithm, but can only be applied to square matrices. (A workaround is to add extra rows to the matrix with extremely high values.)Another alternative is lpSolve; this does not specify which algorithm it uses, but is much slower
如果您可以将问题转换为在整数上运行而不是双打,则应该更快。不确定在所有用例中都有多大的实用性,但这可能是一种选择。仍然不是快速燃烧,但绝对是一个改进。
这需要548秒。
圆形作为整数需要116秒。
If you can transform your problem to run on integers instead of doubles, it should be much faster. Not sure how practical it is for all use cases, but that may be an option. Still not blazing fast, but definitely an improvement.
This takes 548 seconds.
Rounded as an integer takes 116 seconds.