线性总和分配/R中的匈牙利方法性能

发布于 2025-02-11 17:40:09 字数 762 浏览 2 评论 0 原文

我需要加快为每个条目找到最佳距离的过程。我正在使用 gower.dist 来自 statmatch solve_lsap 来自 clue> clue> clue package> solve> solve_lsap 。 Gower距离根本不需要时间,但是,LSAP求解器需要太长时间,而我需要运行它的次数。

有没有一种方法可以使用并行计算更快地进行此运行,或者只是使其一部分并行运行 [链接到线索github] adagio 和 rcpphungarian (两个都较慢)。

示例数据:

> dim(gowerdist)
[1]  4309 10366
solve_LSAP(gowerdist, maximum = FALSE)

I need to speed up a process of finding most optimal distance for each entry. I am using gower.dist from StatMatch and solve_LSAP from the clue package. The gower distance takes no time at all, however the LSAP solver takes too long with the number of times I need to run it.

Is there a way to make this run faster using parallel computing or just making part of it run in parallel [link to clue github] [link to scientific journal discussing this] or another solver that I may be unaware of that is faster? The other two libraries I am aware of are adagio and RcppHungarian (both are slower).

Example data:
Gower Distance Data (google drive link to folder with data)

> dim(gowerdist)
[1]  4309 10366
solve_LSAP(gowerdist, maximum = FALSE)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

雪化雨蝶 2025-02-18 17:40:09

线索:: SOLVE_LSAP 使用匈牙利算法。 Jonker and Volgenant(1987)算法,在 ,更有效。

我测试了矩阵的简化版本,以在几秒钟而不是分钟内获得结果。运行时间的差异可能会随着较大的矩阵而增加。

gower <- read.csv("GowerDistance.csv")
dim(gower)
gowerMat <- as.matrix(gower)
gow1 <- gowerMat[1:2400, 1:4800]
tictoc::tic()
clue <- clue::solve_LSAP(gow1, maximum = FALSE)
tictoc::toc()
# 67.95 sec elapsed
tictoc::tic()
td <- TreeDist::LAPJV(gow1)
tictoc::toc()
# 20.24 sec elapsed

graphAlignment :: linearassignment() 还使用LAPJV算法,但只能应用于方形矩阵。 (解决方法是在具有极高值的矩阵中添加额外的行。)

另一种选择是 /a>;这没有指定使用哪种算法,但要慢得多

tictoc::tic()
lps <- lpSolve::lp.assign(gowerMat[1:800, 1:1600]) # much smaller matrix
tictoc::toc()
# 364.67 sec elapsed

clue::solve_LSAP uses the Hungarian algorithm. The Jonker and Volgenant (1987) algorithm, implemented in TreeDist::LAPJV(), is more efficient.

I tested a reduced version of the matrix to get results in seconds rather than minutes; differences in run time are likely to increase with larger matrices.

gower <- read.csv("GowerDistance.csv")
dim(gower)
gowerMat <- as.matrix(gower)
gow1 <- gowerMat[1:2400, 1:4800]
tictoc::tic()
clue <- clue::solve_LSAP(gow1, maximum = FALSE)
tictoc::toc()
# 67.95 sec elapsed
tictoc::tic()
td <- TreeDist::LAPJV(gow1)
tictoc::toc()
# 20.24 sec elapsed

GraphAlignment::LinearAssignment() on Bioconductor also uses the LAPJV algorithm, but can only be applied to square matrices. (A workaround is to add extra rows to the matrix with extremely high values.)

Another alternative is lpSolve; this does not specify which algorithm it uses, but is much slower

tictoc::tic()
lps <- lpSolve::lp.assign(gowerMat[1:800, 1:1600]) # much smaller matrix
tictoc::toc()
# 364.67 sec elapsed
娇俏 2025-02-18 17:40:09

如果您可以将问题转换为在整数上运行而不是双打,则应该更快。不确定在所有用例中都有多大的实用性,但这可能是一种选择。仍然不是快速燃烧,但绝对是一个改进。

gowerdist_all <- readr::read_csv("GowerDistance.csv")

gowerdist1 <- as.matrix(gowerdist_all[,-1])

gowerdist2 <- as.matrix(gowerdist_all[,-1]) * 10000
mode(gowerdist2) <- "integer"

这需要548秒。

tictoc::tic()
z <- clue::solve_LSAP(gowerdist1, maximum = FALSE)
tictoc::toc()
# 547.92 sec elapsed

圆形作为整数需要116秒。

tictoc::tic()
z <- clue::solve_LSAP(gowerdist2, maximum = FALSE)
tictoc::toc()
# 115.64 sec elapsed

If you can transform your problem to run on integers instead of doubles, it should be much faster. Not sure how practical it is for all use cases, but that may be an option. Still not blazing fast, but definitely an improvement.

gowerdist_all <- readr::read_csv("GowerDistance.csv")

gowerdist1 <- as.matrix(gowerdist_all[,-1])

gowerdist2 <- as.matrix(gowerdist_all[,-1]) * 10000
mode(gowerdist2) <- "integer"

This takes 548 seconds.

tictoc::tic()
z <- clue::solve_LSAP(gowerdist1, maximum = FALSE)
tictoc::toc()
# 547.92 sec elapsed

Rounded as an integer takes 116 seconds.

tictoc::tic()
z <- clue::solve_LSAP(gowerdist2, maximum = FALSE)
tictoc::toc()
# 115.64 sec elapsed
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文