如何计算向量中的每个元素在另一个较小向量中元素的分数?

发布于 2024-11-08 15:20:59 字数 487 浏览 0 评论 0原文

n<-100000   
aa<-rnorm(n)
bb<-rnorm(n)
system.time(lapply(aa, function(z){mean(bb<pnorm(z))}))

运行这个小代码需要太长时间。简而言之,我有两个向量 aabb。对于 aa 的每个元素,例如 aa[i],我想要 bb bb 的比例。 aa[i]

我找到了这篇文章并尝试用它来加速。但这不起作用。 sapply 与复合函数的速度比较

任何帮助都会赞赏!

n<-100000   
aa<-rnorm(n)
bb<-rnorm(n)
system.time(lapply(aa, function(z){mean(bb<pnorm(z))}))

It takes too long to run this small code. Simply put, I have two vectors aa and bb. For each element of aa, say aa[i], I want the proportion of bb < aa[i]

I found this article and tried to use it to speed up. But it does not work.
Speed comparison of sapply with a composite function

Any help will be appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夏日落 2024-11-15 15:20:59

您也许可以使用 findInterval 函数:

n <- 25000
aa <- rnorm(n)
bb <- rnorm(n)
system.time(q1 <- lapply(aa, function(z){mean(bb<pnorm(z))}))
#   user  system elapsed
# 20.057   2.544  22.807
system.time(q2 <- findInterval(pnorm(aa), sort(bb))/n)
#   user  system elapsed
#  0.020   0.000   0.021
all.equal(as.vector(q1, "numeric"), q2)
# [1] TRUE

请注意,findInterval 返回索引,因此我将结果除以 n。如果您可以在将 pnorm(aa) 提供给 findInterval 之前对它进行排序,速度会更快。

You may be able to use the findInterval function:

n <- 25000
aa <- rnorm(n)
bb <- rnorm(n)
system.time(q1 <- lapply(aa, function(z){mean(bb<pnorm(z))}))
#   user  system elapsed
# 20.057   2.544  22.807
system.time(q2 <- findInterval(pnorm(aa), sort(bb))/n)
#   user  system elapsed
#  0.020   0.000   0.021
all.equal(as.vector(q1, "numeric"), q2)
# [1] TRUE

Note that findInterval returns indices, so I've divided the result by n. If you can sort pnorm(aa) before giving it to findInterval, it will be even faster.

两个我 2024-11-15 15:20:59

我无意开玩笑,但这些是 R 旨在解决的问题类型,而无需进行每一次计算 - 即使用统计!

假设分布呈正态分布...

aa.new <- sample(aa, 1000)
bb.new <- sample(bb, 1000)

x <- lapply(aa.new, function(z){mean(bb.new<pnorm(z))})
x <- unlist(x)

mean(x)

您可以 99% 确定 bb < 的比例aa[i] 落在平均值 (x) 的 +/- 4% 之间。

对于简单随机抽样,99% 误差范围 = 1.29/sqrt(n)

I'm not meaning to be facetious but these are the types of problems that R is designed to solve without having to do every single calculation - ie, use statistics!

Assuming that the distributions are normal...

aa.new <- sample(aa, 1000)
bb.new <- sample(bb, 1000)

x <- lapply(aa.new, function(z){mean(bb.new<pnorm(z))})
x <- unlist(x)

mean(x)

You can be 99% certain that the proportion of bb < aa[i] falls between +/- 4% of mean(x).

For simple random sampling, 99% margin of error = 1.29/sqrt(n)

挽手叙旧 2024-11-15 15:20:59

如果你只想要比例 ' < aa[i]' 那么你应该确定 bb 的数量小于 aa 的每个值,然后除以长度:

bbs <- sort(bb)
zz <- findInterval(aa, bbs)
zz <- zz/length(aa)

它会按照你所说的进行操作,而我担心你的代码不会。

If you only want the proportion ' < aa[i]' then you should just determine the number of bb less than than each value of aa and then divide by length:

bbs <- sort(bb)
zz <- findInterval(aa, bbs)
zz <- zz/length(aa)

It does what you say you want, while your code I fear does not.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文