如何在向量列表中查找向量的特定索引,其中索引在向量中给出? (没有 for 循环)

发布于 2025-01-18 04:36:50 字数 258 浏览 4 评论 0原文

我想找到一种有效的操作来在列表中进行以下查找:

L = list(10:15,11:20)
a = c(3,7)
b = numeric()
for(i in 1:length(a)) b[i] = L[[i]][a[i]]

我认为 for 循环效率低下,我想这可以使用例如 sapply 更快地完成代码>.我的主要目标是当 L 很长时有效地完成此操作。

I would like to find an efficient operation to do the following look up in a list:

L = list(10:15,11:20)
a = c(3,7)
b = numeric()
for(i in 1:length(a)) b[i] = L[[i]][a[i]]

I think for loops are inefficient and I imagine this can be done faster using, for example, sapply. My main goal is to do this efficiently when L is long.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

放飞的风筝 2025-01-25 04:36:50

另一个 apply 方法是 sapply()

sapply(1:length(a), function(x) L[[x]][a[x]])
[1] 12 17

Another apply method would be sapply().

sapply(1:length(a), function(x) L[[x]][a[x]])
[1] 12 17
ˉ厌 2025-01-25 04:36:50

我们可以

library(dplyr)
stack(setNames(L, a)) %>%
   group_by(ind) %>% 
   summarise(out = values[[as.numeric(as.character(first(ind)))]]) %>%
   pull(out)
[1] 12 17

base R 中使用 vapply 来使用 Or,这会更快

vapply(seq_along(L), \(i) L[[i]][a[i]], numeric(1))
[1] 12 17

,或者使用 imap 作为紧凑选项

library(purrr)
imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])
 3  7 
12 17 

We could use

library(dplyr)
stack(setNames(L, a)) %>%
   group_by(ind) %>% 
   summarise(out = values[[as.numeric(as.character(first(ind)))]]) %>%
   pull(out)
[1] 12 17

Or in base R using vapply which would be faster

vapply(seq_along(L), \(i) L[[i]][a[i]], numeric(1))
[1] 12 17

or use imap as a compact option

library(purrr)
imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])
 3  7 
12 17 
花桑 2025-01-25 04:36:50

更新:

您对循环的的厌恶可能是没有根据的。我发现它可以非常依赖机器。在我当前的计算机上,使用b正确初始化,循环的基本r 仅比rcpp解决方案慢。请参阅下面更新的基准测试。 loop1解决方案已正确初始化。但是,我在其他机器上尝试了此操作,并且在某些循环的某些上确实比apply解决方案要慢。


使用UNLIST, cumsum 和长度 :基准标准的基础r r ro

b <- unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))]

.

library(purrr)

L <- lapply(sample(4:10, 1e5, TRUE), seq)
a <- sapply(lengths(L), function(x) sample(x, 1))

Rcpp::cppFunction("IntegerVector ListIndex(const List& L, const IntegerVector& a) {
const int n = a.size();
IntegerVector b (n);
for (int i = 0; i < n; i++) b(i) = as<IntegerVector>(L[i])(a(i) - 1);
return b;
}")
    
microbenchmark::microbenchmark(sapply = sapply(1:length(a), function(x) L[[x]][a[x]]),
                           vapply = vapply(seq_along(L), function(i) L[[i]][a[i]], integer(1)),
                           purr = as.integer(imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])),
                           unlist = unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))],
                           rcpp = ListIndex(L, a),
                           loop1 = {b <- integer(length(a)); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
                           loop2 = {b <- integer(); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
                           check = "identical")

#> Unit: milliseconds
#>    expr      min       lq      mean    median       uq      max neval
#> sapply 102.4199 113.72450 125.21764 119.72455 130.41480 291.5465   100
#> vapply  97.8447 107.33390 116.41775 112.33445 119.01680 189.9191   100
#>   purr 226.9039 241.02305 258.34032 246.81175 257.87370 502.3446   100
#> unlist  29.4186  29.97935  32.05529  30.86130  33.02160  44.6751   100
#>   rcpp  22.3468  22.78460  25.47667  23.48495  26.63935  37.2362   100
#>  loop1  25.5240  27.34865  28.94650  28.02920  29.32110  42.9779   100
#>  loop2  41.4726  46.04130  52.58843  51.00240  56.54375  88.3444   100

*我无法获得Akrun的dplyr解决方案来与较大的向量配合使用。

UPDATE:

Your aversion to a for loop may be unfounded. I've found that it can be very machine dependent. On my current machine, with b properly initialized, a base R for loop is slower only than an Rcpp solution, and that just barely. See the updated benchmark below. The loop1 solution is properly initialized. However, I've tried this on other machines, and on some the for loops are indeed slower than the apply solutions.


A base R vectorized solution using unlist, cumsum, and lengths:

b <- unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))]

Benchmarking (tossing in an Rcpp solution)*

library(purrr)

L <- lapply(sample(4:10, 1e5, TRUE), seq)
a <- sapply(lengths(L), function(x) sample(x, 1))

Rcpp::cppFunction("IntegerVector ListIndex(const List& L, const IntegerVector& a) {
const int n = a.size();
IntegerVector b (n);
for (int i = 0; i < n; i++) b(i) = as<IntegerVector>(L[i])(a(i) - 1);
return b;
}")
    
microbenchmark::microbenchmark(sapply = sapply(1:length(a), function(x) L[[x]][a[x]]),
                           vapply = vapply(seq_along(L), function(i) L[[i]][a[i]], integer(1)),
                           purr = as.integer(imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])),
                           unlist = unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))],
                           rcpp = ListIndex(L, a),
                           loop1 = {b <- integer(length(a)); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
                           loop2 = {b <- integer(); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
                           check = "identical")

#> Unit: milliseconds
#>    expr      min       lq      mean    median       uq      max neval
#> sapply 102.4199 113.72450 125.21764 119.72455 130.41480 291.5465   100
#> vapply  97.8447 107.33390 116.41775 112.33445 119.01680 189.9191   100
#>   purr 226.9039 241.02305 258.34032 246.81175 257.87370 502.3446   100
#> unlist  29.4186  29.97935  32.05529  30.86130  33.02160  44.6751   100
#>   rcpp  22.3468  22.78460  25.47667  23.48495  26.63935  37.2362   100
#>  loop1  25.5240  27.34865  28.94650  28.02920  29.32110  42.9779   100
#>  loop2  41.4726  46.04130  52.58843  51.00240  56.54375  88.3444   100

*I couldn't get akrun's dplyr solution to work with the larger vector.

情未る 2025-01-25 04:36:50

您可以使用MAPmapply。由于mapply可以自动简化为向量,因此我们可以在这里使用它来获得b一口气:

b <- mapply(function(list_members, indices) list_members[indices],
       list_members = L, indices = a, SIMPLIFY = TRUE)

b
#> [1] 12 17

You could use Map or mapply. Since mapply can automatically simplify to a vector, we can could use that here to get b in one go:

b <- mapply(function(list_members, indices) list_members[indices],
       list_members = L, indices = a, SIMPLIFY = TRUE)

b
#> [1] 12 17
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文