如何在向量列表中查找向量的特定索引，其中索引在向量中给出？（没有 for 循环）

发布于 2025-01-18 04:36:50 字数 258 浏览 4 评论 0原文

我想找到一种有效的操作来在列表中进行以下查找：

L = list(10:15,11:20)
a = c(3,7)
b = numeric()
for(i in 1:length(a)) b[i] = L[[i]][a[i]]

我认为 for 循环效率低下，我想这可以使用例如 sapply 更快地完成代码>.我的主要目标是当 L 很长时有效地完成此操作。

原文

I would like to find an efficient operation to do the following look up in a list:

L = list(10:15,11:20)
a = c(3,7)
b = numeric()
for(i in 1:length(a)) b[i] = L[[i]][a[i]]

I think for loops are inefficient and I imagine this can be done faster using, for example, sapply. My main goal is to do this efficiently when L is long.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

放飞的风筝 2025-01-25 04:36:50

另一个 apply 方法是 sapply()。

sapply(1:length(a), function(x) L[[x]][a[x]])
[1] 12 17

Another apply method would be sapply().

sapply(1:length(a), function(x) L[[x]][a[x]])
[1] 12 17

回复收藏 0 原文

ˉ厌 2025-01-25 04:36:50

我们可以

library(dplyr)
stack(setNames(L, a)) %>%
   group_by(ind) %>% 
   summarise(out = values[[as.numeric(as.character(first(ind)))]]) %>%
   pull(out)
[1] 12 17

在 base R 中使用 vapply 来使用 Or，这会更快

vapply(seq_along(L), \(i) L[[i]][a[i]], numeric(1))
[1] 12 17

，或者使用 imap 作为紧凑选项

library(purrr)
imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])
 3  7 
12 17

We could use

library(dplyr)
stack(setNames(L, a)) %>%
   group_by(ind) %>% 
   summarise(out = values[[as.numeric(as.character(first(ind)))]]) %>%
   pull(out)
[1] 12 17

Or in base R using vapply which would be faster

vapply(seq_along(L), \(i) L[[i]][a[i]], numeric(1))
[1] 12 17

or use imap as a compact option

library(purrr)
imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])
 3  7 
12 17

回复收藏 0 原文

花桑 2025-01-25 04:36:50

更新：

您对循环的的厌恶可能是没有根据的。我发现它可以非常依赖机器。在我当前的计算机上，使用b正确初始化，循环的基本r 仅比rcpp解决方案慢。请参阅下面更新的基准测试。 loop1解决方案已正确初始化。但是，我在其他机器上尝试了此操作，并且在某些循环的某些上确实比apply解决方案要慢。

使用UNLIST， cumsum 和长度 ：基准标准的基础r r ro

b <- unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))]

library(purrr)

L <- lapply(sample(4:10, 1e5, TRUE), seq)
a <- sapply(lengths(L), function(x) sample(x, 1))

Rcpp::cppFunction("IntegerVector ListIndex(const List& L, const IntegerVector& a) {
const int n = a.size();
IntegerVector b (n);
for (int i = 0; i < n; i++) b(i) = as<IntegerVector>(L[i])(a(i) - 1);
return b;
}")
    
microbenchmark::microbenchmark(sapply = sapply(1:length(a), function(x) L[[x]][a[x]]),
                           vapply = vapply(seq_along(L), function(i) L[[i]][a[i]], integer(1)),
                           purr = as.integer(imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])),
                           unlist = unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))],
                           rcpp = ListIndex(L, a),
                           loop1 = {b <- integer(length(a)); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
                           loop2 = {b <- integer(); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
                           check = "identical")

#> Unit: milliseconds
#>    expr      min       lq      mean    median       uq      max neval
#> sapply 102.4199 113.72450 125.21764 119.72455 130.41480 291.5465   100
#> vapply  97.8447 107.33390 116.41775 112.33445 119.01680 189.9191   100
#>   purr 226.9039 241.02305 258.34032 246.81175 257.87370 502.3446   100
#> unlist  29.4186  29.97935  32.05529  30.86130  33.02160  44.6751   100
#>   rcpp  22.3468  22.78460  25.47667  23.48495  26.63935  37.2362   100
#>  loop1  25.5240  27.34865  28.94650  28.02920  29.32110  42.9779   100
#>  loop2  41.4726  46.04130  52.58843  51.00240  56.54375  88.3444   100

*我无法获得Akrun的dplyr解决方案来与较大的向量配合使用。

UPDATE:

Your aversion to a for loop may be unfounded. I've found that it can be very machine dependent. On my current machine, with b properly initialized, a base R for loop is slower only than an Rcpp solution, and that just barely. See the updated benchmark below. The loop1 solution is properly initialized. However, I've tried this on other machines, and on some the for loops are indeed slower than the apply solutions.

A base R vectorized solution using unlist, cumsum, and lengths:

b <- unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))]

Benchmarking (tossing in an Rcpp solution)*

library(purrr)

L <- lapply(sample(4:10, 1e5, TRUE), seq)
a <- sapply(lengths(L), function(x) sample(x, 1))

Rcpp::cppFunction("IntegerVector ListIndex(const List& L, const IntegerVector& a) {
const int n = a.size();
IntegerVector b (n);
for (int i = 0; i < n; i++) b(i) = as<IntegerVector>(L[i])(a(i) - 1);
return b;
}")
    
microbenchmark::microbenchmark(sapply = sapply(1:length(a), function(x) L[[x]][a[x]]),
                           vapply = vapply(seq_along(L), function(i) L[[i]][a[i]], integer(1)),
                           purr = as.integer(imap_dbl(setNames(L, a), ~ .x[as.numeric(.y)])),
                           unlist = unlist(L)[a + c(0, cumsum(lengths(L)[1:(length(L) - 1L)]))],
                           rcpp = ListIndex(L, a),
                           loop1 = {b <- integer(length(a)); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
                           loop2 = {b <- integer(); for(i in seq_along(a)) b[i] <- L[[i]][a[i]]; b},
                           check = "identical")

#> Unit: milliseconds
#>    expr      min       lq      mean    median       uq      max neval
#> sapply 102.4199 113.72450 125.21764 119.72455 130.41480 291.5465   100
#> vapply  97.8447 107.33390 116.41775 112.33445 119.01680 189.9191   100
#>   purr 226.9039 241.02305 258.34032 246.81175 257.87370 502.3446   100
#> unlist  29.4186  29.97935  32.05529  30.86130  33.02160  44.6751   100
#>   rcpp  22.3468  22.78460  25.47667  23.48495  26.63935  37.2362   100
#>  loop1  25.5240  27.34865  28.94650  28.02920  29.32110  42.9779   100
#>  loop2  41.4726  46.04130  52.58843  51.00240  56.54375  88.3444   100

*I couldn't get akrun's dplyr solution to work with the larger vector.

回复收藏 0 原文

情未る 2025-01-25 04:36:50

您可以使用MAP或mapply。由于mapply可以自动简化为向量，因此我们可以在这里使用它来获得b一口气：

b <- mapply(function(list_members, indices) list_members[indices],
       list_members = L, indices = a, SIMPLIFY = TRUE)

b
#> [1] 12 17

You could use Map or mapply. Since mapply can automatically simplify to a vector, we can could use that here to get b in one go:

b <- mapply(function(list_members, indices) list_members[indices],
       list_members = L, indices = a, SIMPLIFY = TRUE)

b
#> [1] 12 17

回复收藏 0 原文

~没有更多了~

关于作者

离鸿

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

如何在向量列表中查找向量的特定索引，其中索引在向量中给出？（没有 for 循环）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

如何在向量列表中查找向量的特定索引，其中索引在向量中给出？ （没有 for 循环）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

如何在向量列表中查找向量的特定索引，其中索引在向量中给出？（没有 for 循环）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。