当前位置：文江博客话题详情

Haskell 中 Ruby 的 pnormaldist 统计函数相当于什么？

发布于 2024-11-09 08:54:37 字数 1036 浏览 2 评论 0 原文

如下所示： http://www.evamiller.org /how-not-to-sort-by-average- rating.html

这是 Ruby 代码本身，在 Statistics2 库：

# inverse of normal distribution ([2])
# Pr( (-\infty, x] ) = qn -> x
def pnormaldist(qn)
  b = [1.570796288, 0.03706987906, -0.8364353589e-3,
       -0.2250947176e-3, 0.6841218299e-5, 0.5824238515e-5,
       -0.104527497e-5, 0.8360937017e-7, -0.3231081277e-8,
       0.3657763036e-10, 0.6936233982e-12]

  if(qn < 0.0 || 1.0 < qn)
    $stderr.printf("Error : qn <= 0 or qn >= 1  in pnorm()!\n")
    return 0.0;
  end
  qn == 0.5 and return 0.0

  w1 = qn
  qn > 0.5 and w1 = 1.0 - w1
  w3 = -Math.log(4.0 * w1 * (1.0 - w1))
  w1 = b[0]
  1.upto 10 do |i|
    w1 += b[i] * w3**i;
  end
  qn > 0.5 and return Math.sqrt(w1 * w3)
  -Math.sqrt(w1 * w3)
end

原文

As seen here: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

Here's the Ruby code itself, implemented in the Statistics2 library:

# inverse of normal distribution ([2])
# Pr( (-\infty, x] ) = qn -> x
def pnormaldist(qn)
  b = [1.570796288, 0.03706987906, -0.8364353589e-3,
       -0.2250947176e-3, 0.6841218299e-5, 0.5824238515e-5,
       -0.104527497e-5, 0.8360937017e-7, -0.3231081277e-8,
       0.3657763036e-10, 0.6936233982e-12]

  if(qn < 0.0 || 1.0 < qn)
    $stderr.printf("Error : qn <= 0 or qn >= 1  in pnorm()!\n")
    return 0.0;
  end
  qn == 0.5 and return 0.0

  w1 = qn
  qn > 0.5 and w1 = 1.0 - w1
  w3 = -Math.log(4.0 * w1 * (1.0 - w1))
  w1 = b[0]
  1.upto 10 do |i|
    w1 += b[i] * w3**i;
  end
  qn > 0.5 and return Math.sqrt(w1 * w3)
  -Math.sqrt(w1 * w3)
end

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

御弟哥哥 2024-11-16 08:54:37

这翻译起来非常简单：

module PNormalDist where

pnormaldist :: (Ord a, Floating a) => a -> Either String a
pnormaldist qn
  | qn < 0 || 1 < qn = Left "Error: qn must be in [0,1]"
  | qn == 0.5        = Right 0.0
  | otherwise        = Right $
      let w3 = negate . log $ 4 * qn * (1 - qn)
          b = [ 1.570796288, 0.03706987906, -0.8364353589e-3, 
                -0.2250947176e-3, 0.6841218299e-5, 0.5824238515e-5, 
                -0.104527497e-5, 0.8360937017e-7, -0.3231081277e-8, 
                0.3657763036e-10, 0.6936233982e-12]
          w1 = sum . zipWith (*) b $ iterate (*w3) 1
      in (signum $ qn - 0.5) * sqrt (w1 * w3)

首先，让我们看看 ruby - 它返回一个值，但有时它会打印一条错误消息（当给出不正确的参数时）。这不是很哈斯克尔，所以
让我们的返回值是 Either String a - 如果给出不正确的参数，我们将返回一个带有错误消息的 Left String，以及一个 Right a< /代码> 否则。

现在我们检查顶部的两种情况：

qn < 0 || 1 < qn = Left“错误：qn 必须在 [0,1]” - 当 qn 超出范围时，这是错误条件。
qn == 0.5 = Right 0.0 - 这是 ruby 检查 qn == 0.5 并返回 * 0.0

接下来，我们在红宝石代码。但我们稍后重新定义了它，这不是很Ruby。我们第一次存储在 w1 中的值
在 w3 的定义中立即使用，那么为什么我们不跳过将其存储在 w1 中呢？我们甚至不需要执行 qn > 0.5 和 w1 = 1.0 - w1 步骤，因为
我们在 w3 的定义中使用乘积 w1 * (1.0 - w1)。

所以我们跳过所有这些，直接进入定义 w3 = negate 。记录 $ 4 * qn * (1 - qn) 。

接下来是 b 的定义，它直接来自 ruby 代码（ruby 的数组文字语法是 haskell 的列表语法）。

这是最棘手的一点 - 定义 w3 的最终值。 ruby 代码的作用

w1 = b[0]
1.upto 10 do |i|
  w1 += b[i] * w3**i;
end

就是所谓的折叠 - 将一组值（存储在 ruby 数组中）减少为单个值。我们可以使用 Array#reduce 更实用地重述这一点（但仍在 ruby 中）：

w1 = b.zip(0..10).reduce(0) do |accum, (bval,i)|
  accum + bval * w3^i
end

请注意我如何使用身份 b 将 b[0] 推入循环中[0] == b[0] * w3^0。

现在我们可以将其直接移植到 haskell，但这有点难看

w1 = foldl 0 (\accum (bval,i) -> accum + bval * w3**i) $ zip b [0..10]

相反，我将其分为几个步骤 - 首先，我们并不真正需要 i，我们只需要 的功能>w3 （从 w3^0 == 1 开始），所以
让我们用iterate (*w3) 1来计算它们。

然后，我们最终只需要它们的乘积，而不是将它们与 b 的元素配对，这样我们就可以将它们压缩成
使用 zipWith (*) b 计算每对的乘积。

现在我们的折叠功能非常简单 - 我们只需要对乘积求和，这可以使用 sum 来完成。

最后，我们根据 qn 是否大于或小于 0.5 来决定是否返回正负 sqrt (w1 * w3)（我们
已经知道它不相等）。因此，不像 ruby 代码那样在两个单独的位置计算平方根，
我算了一次，根据qn - 0.5的符号（signum）乘以+1或-1 code> 仅返回值的符号）。

This is pretty straightforward to translate:

module PNormalDist where

pnormaldist :: (Ord a, Floating a) => a -> Either String a
pnormaldist qn
  | qn < 0 || 1 < qn = Left "Error: qn must be in [0,1]"
  | qn == 0.5        = Right 0.0
  | otherwise        = Right $
      let w3 = negate . log $ 4 * qn * (1 - qn)
          b = [ 1.570796288, 0.03706987906, -0.8364353589e-3, 
                -0.2250947176e-3, 0.6841218299e-5, 0.5824238515e-5, 
                -0.104527497e-5, 0.8360937017e-7, -0.3231081277e-8, 
                0.3657763036e-10, 0.6936233982e-12]
          w1 = sum . zipWith (*) b $ iterate (*w3) 1
      in (signum $ qn - 0.5) * sqrt (w1 * w3)

First off, let's look at the ruby - it returns a value, but sometimes it prints an error message (when given an improper argument). This isn't very haskellish, so
let's have our return value be Either String a - where we'll return a Left String with an error message if given an improper argument, and a Right a otherwise.

Now we check the two cases at the top:

qn < 0 || 1 < qn = Left "Error: qn must be in [0,1]" - this is the error condition, when qn is out of range.
qn == 0.5 = Right 0.0 - this is the ruby check qn == 0.5 and return * 0.0

Next up, we define w1 in the ruby code. But we redefine it a few lines later, which isn't very rubyish. The value that we store in w1 the first time
is used immediately in the definition of w3, so why don't we skip storing it in w1? We don't even need to do the qn > 0.5 and w1 = 1.0 - w1 step, because
we use the product w1 * (1.0 - w1) in the definition of w3.

So we skip all that, and move straight to the definition w3 = negate . log $ 4 * qn * (1 - qn).

Next up is the definition of b, which is a straight lift from the ruby code (ruby's syntax for an array literal is haskell's syntax for a list).

Here's the most tricky bit - defining the ultimate value of w3. What the ruby code does in

w1 = b[0]
1.upto 10 do |i|
  w1 += b[i] * w3**i;
end

Is what's called a fold - reducing a set of values (stored in a ruby array) into a single value. We can restate this more functionally (but still in ruby) using Array#reduce:

w1 = b.zip(0..10).reduce(0) do |accum, (bval,i)|
  accum + bval * w3^i
end

Note how I pushed b[0] into the loop, using the identity b[0] == b[0] * w3^0.

Now we could port this directly to haskell, but it's a bit ugly

w1 = foldl 0 (\accum (bval,i) -> accum + bval * w3**i) $ zip b [0..10]

Instead, I broke it up into several steps - first off, we don't really need i, we just need the powers of w3 (starting at w3^0 == 1), so
let's calculate those with iterate (*w3) 1.

Then, rather than zipping those into pairs with the elements of b, we ultimately just need their products, so we can zip them into
the products of each pair using zipWith (*) b.

Now our folding function is really easy - we just need to sum up the products, which we can do using sum.

Lastly, we decide whether to return plus or minus sqrt (w1 * w3), according to whether qn is greater or less than 0.5 (we
already know it's not equal). So rather than calculating the square root in two separate locations as in the ruby code,
I calculated it once, and multiplied it by +1 or -1 according to the sign of qn - 0.5 (signum just returns the sign of a value).

回复收藏 0 原文

温折酒 2024-11-16 08:54:37

在 Hackage 上深入研究，有许多用于统计的库：

hmatrix-gsl-stats -- 与 GSL 的纯粹绑定
hstatistics -- 与 GSL 的更高级别接口
hstats -- 常用统计方法
statistics -- 比较常见的统计方法
statistics-linreg——两个样本之间的线性回归，基于其他统计包。

您需要一个 pnormaldist 版本，它“返回 normaldist(x) 的 P 值”。

也许那里有你需要的东西？

回复收藏 0 原文

纵山崖 2024-11-16 08:54:37

您想要的功能现在可以在 hackage 上的 erf 包中找到。它称为invnormcdf。

回复收藏 0 原文

反话 2024-11-16 08:54:37

这是我在 node.js 中伯努利参数的威尔逊得分置信区间

wilson.normaldist = function(qn) {
    var b = [1.570796288, 0.03706987906, -0.0008364353589, -0.0002250947176, 0.000006841218299, 0.000005824238515, -0.00000104527497, 0.00000008360937017, -0.000000003231081277,
        0.00000000003657763036, 0.0000000000006936233982
    ];
    if (qn < 0.0 || 1.0 < qn) return 0;
    if (qn == 0.5) return 0;
    var w1 = qn;
    if (qn > 0.5) w1 = 1.0 - w1;
    var w3 = -Math.log(4.0 * w1 * (1.0 - w1));
    w1 = b[0];

    function loop(i) {
        w1 += b[i] * Math.pow(w3, i);
        if (i < b.length - 1) loop(++i);
    };
    loop(1);
    if (qn > 0.5) return Math.sqrt(w1 * w3);
    else return -Math.sqrt(w1 * w3);
}

wilson.rank = function(up_votes, down_votes) {
    var confidence = 0.95;
    var pos = up_votes;
    var n = up_votes + down_votes;
    if (n == 0) return 0;
    var z = this.normaldist(1 - (1 - confidence) / 2);
    var phat = 1.0 * pos / n;
    return ((phat + z * z / (2 * n) - z * Math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)) * 10000;
}

here is my Wilson's score confidence interval for a Bernoulli parameter in node.js

wilson.normaldist = function(qn) {
    var b = [1.570796288, 0.03706987906, -0.0008364353589, -0.0002250947176, 0.000006841218299, 0.000005824238515, -0.00000104527497, 0.00000008360937017, -0.000000003231081277,
        0.00000000003657763036, 0.0000000000006936233982
    ];
    if (qn < 0.0 || 1.0 < qn) return 0;
    if (qn == 0.5) return 0;
    var w1 = qn;
    if (qn > 0.5) w1 = 1.0 - w1;
    var w3 = -Math.log(4.0 * w1 * (1.0 - w1));
    w1 = b[0];

    function loop(i) {
        w1 += b[i] * Math.pow(w3, i);
        if (i < b.length - 1) loop(++i);
    };
    loop(1);
    if (qn > 0.5) return Math.sqrt(w1 * w3);
    else return -Math.sqrt(w1 * w3);
}

wilson.rank = function(up_votes, down_votes) {
    var confidence = 0.95;
    var pos = up_votes;
    var n = up_votes + down_votes;
    if (n == 0) return 0;
    var z = this.normaldist(1 - (1 - confidence) / 2);
    var phat = 1.0 * pos / n;
    return ((phat + z * z / (2 * n) - z * Math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)) * 10000;
}

回复收藏 0 原文

心凉怎暖 2024-11-16 08:54:37

Ruby 代码没有文档记录；没有说明这个函数应该做什么。谁知道它是否正确地执行了预期的操作？

我不会盲目地将这个算术从一个实现复制并粘贴到另一个实现（就像 Ruby 包的作者所做的那样）。

注释中的引用以 ([2]) 形式给出，但这是悬空的。我们在 _statistics2.c 文件中原生 C 代码的注释块中找到它。

/*
   statistics2.c
   distributions of statistics2
   by Shin-ichiro HARA
   2003.09.25
   Ref:
   [1] http://www.matsusaka-u.ac.jp/~okumura/algo/
   [2] http://www5.airnet.ne.jp/tomy/cpro/sslib11.htm
*/

只引用抄袭系数的 C 源代码，而不是公式的原始来源，这是非常草率的工作。

[1] 链接不再有效；找不到服务器。幸运的是，我们想要的是[2]。这是一个日语页面，其中包含一些用于各种功能的 C 代码。给出了参考文献。我们想要的是pnorm。在表中，该算法属于戸田の近似式，意思是“Toda's Approximation”。

户田是日本的一个常见姓氏；需要更多的侦探工作才能找出这个人是谁。

经过一番努力，我们开始了：论文（日语）：标准正态分布百分比点的极小极大近似 (1993)，作者：Hideo Toda 和 Harumi Ono。

该算法归属于 Toda（我假设是该论文的合著者），日期为 1967 年，第 19 页

。在 Ruby 包中使用它的可能原因是它是在国内源代码中引用国内学者的名字发现的。

The Ruby code is undocumented; there is no specification of what this function is supposed to do. How does anyone know whether it does correctly whatever is intended?

I wouldn't just blindly copy and paste this arithmetic from one implementation to another (like the author of the Ruby package did).

A citation is given as ([2]) in a comment, but this is dangling. We find it in the comment block of the native C code in the _statistics2.c file.

/*
   statistics2.c
   distributions of statistics2
   by Shin-ichiro HARA
   2003.09.25
   Ref:
   [1] http://www.matsusaka-u.ac.jp/~okumura/algo/
   [2] http://www5.airnet.ne.jp/tomy/cpro/sslib11.htm
*/

Very sloppy work to only cite the C source code from where the coefficients were cribbed, rather than the original source of the formula.

The [1] link doesn't work any more; server not found. Luckily, the one we want is [2]. This is a page in Japanese with some C code for various functions. References are given. The one we want is pnorm. In the table, the algorithm is attributed to 戸田の近似式 which means "Toda's Approximation".

Toda is a common surname in Japan; more detective work is required to find out who this is.

After much effort, here we go: paper (Japanese): The Minimax Approximation for Percentage Points of the Standard Normal Distribution (1993) by Hideo Toda and Harumi Ono.

The algorithm is attributed to Toda (I'm assuming the same one that is the paper's co-author), dated to 1967 on P. 19.

It seems fairly obscure; the likely rationale for using it in the Ruby package is that it was found in source code of domestic origin citing the name of a domestic academic.

回复收藏 0 原文