快速 Perl t 检验函数
我正在使用 perl+R 来分析大型样本数据集。对于每两个样本,我计算 t 检验 p 值。目前,我正在使用 stats::R 模块将值从 perl 导出到 R,然后使用 t.test 函数。然而,这个过程极其缓慢。我想知道是否有人知道一个 perl 函数可以以更有效的方式执行相同的过程。
谢谢!
I'm using perl+R to analyze a large dataset of samples. For each two samples, I calculate the t-test p-value. Currently, I'm using the statistics::R module to export values from perl to R, and then use the t.test function. However, this process is extremely slow. I was wondering if someone knows a perl function that will do the same procedure, in a more efficient manner.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
数据量、数据集对的数量,甚至您编写的代码可能会帮助我们确定您的代码速度慢的原因。例如,将许多小数据集发送到 R 会很慢,但可以通过一次发送所有数据来加快速度。
对于纯 Perl 解决方案,您首先需要计算检验统计量(这很简单,并且已经在
统计::TTest
,例如),然后将其转换为 p 值(您需要类似 R 的 qt 函数之类的东西,但我不确定它在 Perl 中是否可用 - 您可以发送 T 值到 R,在最后的一个块中,将它们转换为 p 值)。
The volume of data, the number of dataset pairs, and perhaps even the code you have written would probably help us identify why your code is slow. For instance, sending many small datasets to R would be slow, but can probably be sped up simply by sending all the data at once.
For a pure Perl solution, you first need to compute the test statistic (that is easy, and already done in
Statistics::TTest
,for instance), and then to convert it to a p-value (you need something like R's
qt
function, but I am not sure it is readily available in Perl -- you could send the T-values to R, in one block, at the end, to convert them to p-values).您还可以尝试 PDL,特别是 PDL::统计。
You can also try PDL, in particular PDL::Stats.
Statistics::TTest
模块为您提供 p 值。稍微玩了一下,我发现这给你的 p 值略低于你从 R 得到的值。 R 显然在做一些降低自由度的事情,但我的统计学知识不足以解释它在做什么或者为什么。 (在上面的示例中,差异约为 1%。如果您使用 320 个浮点样本而不是 32 个浮点样本,则差异为 50% 甚至更多,但这是 1e-12 和 1.5e-12 之间的差异。)如果你需要精确的 p 值,你需要小心。
The
Statistics::TTest
module gives you a p-value.Playing around a bit, I find that the p-values that this gives you are slightly lower than what you get from R. R is apparently doing something that reduces the degrees of freedom, but my knowledge of statistics is insufficient to explain what it's doing or why. (In the above example, the difference is about 1%. If you use samples of 320 floats instead of 32, then the difference is 50% or even more, but it's a difference between 1e-12 and 1.5e-12.) If you need precise p-values, you will want to take care.