如何计算向量中的每个元素在另一个较小向量中元素的分数?
n<-100000
aa<-rnorm(n)
bb<-rnorm(n)
system.time(lapply(aa, function(z){mean(bb<pnorm(z))}))
运行这个小代码需要太长时间。简而言之,我有两个向量 aa
和 bb
。对于 aa
的每个元素,例如 aa[i]
,我想要 bb
bb
的比例。 aa[i]
我找到了这篇文章并尝试用它来加速。但这不起作用。 sapply 与复合函数的速度比较
任何帮助都会赞赏!
n<-100000
aa<-rnorm(n)
bb<-rnorm(n)
system.time(lapply(aa, function(z){mean(bb<pnorm(z))}))
It takes too long to run this small code. Simply put, I have two vectors aa
and bb
. For each element of aa
, say aa[i]
, I want the proportion of bb < aa[i]
I found this article and tried to use it to speed up. But it does not work.
Speed comparison of sapply with a composite function
Any help will be appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您也许可以使用
findInterval
函数:请注意,
findInterval
返回索引,因此我将结果除以n
。如果您可以在将pnorm(aa)
提供给findInterval
之前对它进行排序,速度会更快。You may be able to use the
findInterval
function:Note that
findInterval
returns indices, so I've divided the result byn
. If you can sortpnorm(aa)
before giving it tofindInterval
, it will be even faster.我无意开玩笑,但这些是 R 旨在解决的问题类型,而无需进行每一次计算 - 即使用统计!
假设分布呈正态分布...
您可以 99% 确定 bb < 的比例aa[i] 落在平均值 (x) 的 +/- 4% 之间。
对于简单随机抽样,99% 误差范围 = 1.29/sqrt(n)
I'm not meaning to be facetious but these are the types of problems that R is designed to solve without having to do every single calculation - ie, use statistics!
Assuming that the distributions are normal...
You can be 99% certain that the proportion of bb < aa[i] falls between +/- 4% of mean(x).
For simple random sampling, 99% margin of error = 1.29/sqrt(n)
如果你只想要比例 ' < aa[i]' 那么你应该确定 bb 的数量小于 aa 的每个值,然后除以长度:
它会按照你所说的进行操作,而我担心你的代码不会。
If you only want the proportion ' < aa[i]' then you should just determine the number of bb less than than each value of aa and then divide by length:
It does what you say you want, while your code I fear does not.