如何在 Ruby 中计算标准差?

发布于 2024-12-09 17:31:49 字数 44 浏览 5 评论 0原文

我有几条具有给定属性的记录,我想找到标准差。

我该怎么做?

I have several records with a given attribute, and I want to find the standard deviation.

How do I do that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

梦屿孤独相伴 2024-12-16 17:31:49
module Enumerable

    def sum
      self.inject(0){|accum, i| accum + i }
    end

    def mean
      self.sum/self.length.to_f
    end

    def sample_variance
      m = self.mean
      sum = self.inject(0){|accum, i| accum +(i-m)**2 }
      sum/(self.length - 1).to_f
    end

    def standard_deviation
      Math.sqrt(self.sample_variance)
    end

end 

测试它:

a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
a.standard_deviation  
# => 4.594682917363407

01/17/2012:

由于 Dave Sag 修复了“sample_variance”

module Enumerable

    def sum
      self.inject(0){|accum, i| accum + i }
    end

    def mean
      self.sum/self.length.to_f
    end

    def sample_variance
      m = self.mean
      sum = self.inject(0){|accum, i| accum +(i-m)**2 }
      sum/(self.length - 1).to_f
    end

    def standard_deviation
      Math.sqrt(self.sample_variance)
    end

end 

Testing it:

a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
a.standard_deviation  
# => 4.594682917363407

01/17/2012:

fixing "sample_variance" thanks to Dave Sag

如日中天 2024-12-16 17:31:49

看来安吉拉可能一直想要一个现有的图书馆。在玩过 statsample、array-statisics 和其他一些之后,如果您试图避免重新发明轮子。

gem install descriptive_statistics
$ irb
1.9.2 :001 > require 'descriptive_statistics'
 => true 
1.9.2 :002 > samples = [1, 2, 2.2, 2.3, 4, 5]
 => [1, 2, 2.2, 2.3, 4, 5] 
1.9.2p290 :003 > samples.sum
 => 16.5 
1.9.2 :004 > samples.mean
 => 2.75 
1.9.2 :005 > samples.variance
 => 1.7924999999999998 
1.9.2 :006 > samples.standard_deviation
 => 1.3388427838995882 

我无法谈论它的统计正确性,或者你对猴子修补枚举的舒适度;但它易于使用且易于贡献。

It appears that Angela may have been wanting an existing library. After playing with statsample, array-statisics, and a few others, I'd recommend the descriptive_statistics gem if you're trying to avoid reinventing the wheel.

gem install descriptive_statistics
$ irb
1.9.2 :001 > require 'descriptive_statistics'
 => true 
1.9.2 :002 > samples = [1, 2, 2.2, 2.3, 4, 5]
 => [1, 2, 2.2, 2.3, 4, 5] 
1.9.2p290 :003 > samples.sum
 => 16.5 
1.9.2 :004 > samples.mean
 => 2.75 
1.9.2 :005 > samples.variance
 => 1.7924999999999998 
1.9.2 :006 > samples.standard_deviation
 => 1.3388427838995882 

I can't speak to its statistical correctness, or your comfort with monkey-patching Enumerable; but it's easy to use and easy to contribute to.

薄暮涼年 2024-12-16 17:31:49

上面给出的答案很优雅,但有一个小错误。我自己并不是统计专家,我坐下来详细阅读了许多网站,发现这个网站对如何导出标准差给出了最容易理解的解释。 http://sonia.hubpages.com/hub/stddev

上面答案中的错误位于sample_variance 方法。

这是我的更正版本,以及一个显示其工作原理的简单单元测试。

./lib/enumerable/standard_deviation.rb

#!usr/bin/ruby

module Enumerable

  def sum
    return self.inject(0){|accum, i| accum + i }
  end

  def mean
    return self.sum / self.length.to_f
  end

  def sample_variance
    m = self.mean
    sum = self.inject(0){|accum, i| accum + (i - m) ** 2 }
    return sum / (self.length - 1).to_f
  end

  def standard_deviation
    return Math.sqrt(self.sample_variance)
  end

end

./test 中使用从简单电子表格派生的数字。

带有示例数据的 Numbers 电子表格的屏幕快照

#!usr/bin/ruby

require 'enumerable/standard_deviation'

class StandardDeviationTest < Test::Unit::TestCase

  THE_NUMBERS = [1, 2, 2.2, 2.3, 4, 5]

  def test_sum
    expected = 16.5
    result = THE_NUMBERS.sum
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_mean
    expected = 2.75
    result = THE_NUMBERS.mean
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_sample_variance
    expected = 2.151
    result = THE_NUMBERS.sample_variance
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_standard_deviation
    expected = 1.4666287874
    result = THE_NUMBERS.standard_deviation
    assert result.round(10) == expected, "expected #{expected} but got #{result}"
  end

end

The answer given above is elegant but has a slight error in it. Not being a stats head myself I sat up and read in detail a number of websites and found this one gave the most comprehensible explanation of how to derive a standard deviation. http://sonia.hubpages.com/hub/stddev

The error in the answer above is in the sample_variance method.

Here is my corrected version, along with a simple unit test that shows it works.

in ./lib/enumerable/standard_deviation.rb

#!usr/bin/ruby

module Enumerable

  def sum
    return self.inject(0){|accum, i| accum + i }
  end

  def mean
    return self.sum / self.length.to_f
  end

  def sample_variance
    m = self.mean
    sum = self.inject(0){|accum, i| accum + (i - m) ** 2 }
    return sum / (self.length - 1).to_f
  end

  def standard_deviation
    return Math.sqrt(self.sample_variance)
  end

end

in ./test using numbers derived from a simple spreadsheet.

Screen Snapshot of a Numbers spreadsheet with example data

#!usr/bin/ruby

require 'enumerable/standard_deviation'

class StandardDeviationTest < Test::Unit::TestCase

  THE_NUMBERS = [1, 2, 2.2, 2.3, 4, 5]

  def test_sum
    expected = 16.5
    result = THE_NUMBERS.sum
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_mean
    expected = 2.75
    result = THE_NUMBERS.mean
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_sample_variance
    expected = 2.151
    result = THE_NUMBERS.sample_variance
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_standard_deviation
    expected = 1.4666287874
    result = THE_NUMBERS.standard_deviation
    assert result.round(10) == expected, "expected #{expected} but got #{result}"
  end

end
薔薇婲 2024-12-16 17:31:49

我不太喜欢向 Enumerable 添加方法,因为可能会产生不需要的副作用。它还为任何继承自 Enumerable 的类提供了真正特定于数字数组的方法,这在大多数情况下没有意义。

虽然这对于测试、脚本或小型应用程序来说很好,但对于大型应用程序来说却存在风险,因此这里有一个基于 @tolitius 答案的替代方案,该答案已经很完美了。这比其他任何东西都更适合参考:

module MyApp::Maths
  def self.sum(a)
    a.inject(0){ |accum, i| accum + i }
  end

  def self.mean(a)
    sum(a) / a.length.to_f
  end

  def self.sample_variance(a)
    m = mean(a)
    sum = a.inject(0){ |accum, i| accum + (i - m) ** 2 }
    sum / (a.length - 1).to_f
  end

  def self.standard_deviation(a)
    Math.sqrt(sample_variance(a))
  end
end

然后您可以这样使用它:

2.0.0p353 > MyApp::Maths.standard_deviation([1,2,3,4,5])
=> 1.5811388300841898

2.0.0p353 :007 > a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
 => [20, 23, 23, 24, 25, 22, 12, 21, 29]

2.0.0p353 :008 > MyApp::Maths.standard_deviation(a)
 => 4.594682917363407

2.0.0p353 :043 > MyApp::Maths.standard_deviation([1,2,2.2,2.3,4,5])
 => 1.466628787389638

行为是相同的,但它避免了向 Enumerable 添加方法的开销和风险。

I'm not a big fan of adding methods to Enumerable since there could be unwanted side effects. It also gives methods really specific to an array of numbers to any class inheriting from Enumerable, which doesn't make sense in most cases.

While this is fine for tests, scripts or small apps, it's risky for larger applications, so here's an alternative based on @tolitius' answer which was already perfect. This is more for reference than anything else:

module MyApp::Maths
  def self.sum(a)
    a.inject(0){ |accum, i| accum + i }
  end

  def self.mean(a)
    sum(a) / a.length.to_f
  end

  def self.sample_variance(a)
    m = mean(a)
    sum = a.inject(0){ |accum, i| accum + (i - m) ** 2 }
    sum / (a.length - 1).to_f
  end

  def self.standard_deviation(a)
    Math.sqrt(sample_variance(a))
  end
end

And then you use it as such:

2.0.0p353 > MyApp::Maths.standard_deviation([1,2,3,4,5])
=> 1.5811388300841898

2.0.0p353 :007 > a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
 => [20, 23, 23, 24, 25, 22, 12, 21, 29]

2.0.0p353 :008 > MyApp::Maths.standard_deviation(a)
 => 4.594682917363407

2.0.0p353 :043 > MyApp::Maths.standard_deviation([1,2,2.2,2.3,4,5])
 => 1.466628787389638

The behavior is the same, but it avoids the overheads and risks of adding methods to Enumerable.

秋叶绚丽 2024-12-16 17:31:49

所提供的计算效率不是很高,因为它们需要多次(至少两次,但通常是三次,因为除了 std-dev 之外,您通常还想显示平均值)传递数组。

我知道 Ruby 不是追求效率的地方,但这里是我的实现,它通过一次遍历列表值来计算平均值和标准差:

module Enumerable

  def avg_stddev
    return nil unless count > 0
    return [ first, 0 ] if count == 1
    sx = sx2 = 0
    each do |x|
      sx2 += x**2
      sx += x
    end
    [ 
      sx.to_f  / count,
      Math.sqrt( # http://wijmo.com/docs/spreadjs/STDEV.html
        (sx2 - sx**2.0/count)
        / 
        (count - 1)
      )
    ]
  end

end

The presented computation are not very efficient because they require several (at least two, but often three because you usually want to present average in addition to std-dev) passes through the array.

I know Ruby is not the place to look for efficiency, but here is my implementation that computes average and standard deviation with a single pass over the list values:

module Enumerable

  def avg_stddev
    return nil unless count > 0
    return [ first, 0 ] if count == 1
    sx = sx2 = 0
    each do |x|
      sx2 += x**2
      sx += x
    end
    [ 
      sx.to_f  / count,
      Math.sqrt( # http://wijmo.com/docs/spreadjs/STDEV.html
        (sx2 - sx**2.0/count)
        / 
        (count - 1)
      )
    ]
  end

end
甜味拾荒者 2024-12-16 17:31:49

作为一个简单的函数,给定一个数字列表:

def standard_deviation(list)
  mean = list.inject(:+) / list.length.to_f
  var_sum = list.map{|n| (n-mean)**2}.inject(:+).to_f
  sample_variance = var_sum / (list.length - 1)
  Math.sqrt(sample_variance)
end

As a simple function, given a list of numbers:

def standard_deviation(list)
  mean = list.inject(:+) / list.length.to_f
  var_sum = list.map{|n| (n-mean)**2}.inject(:+).to_f
  sample_variance = var_sum / (list.length - 1)
  Math.sqrt(sample_variance)
end
一个人的旅程 2024-12-16 17:31:49

如果手头的记录是 IntegerRational 类型,您可能需要使用 Rational 而不是 Float 来计算方差code> 以避免舍入引入的错误。

例如:(

def variance(list)
  mean = list.reduce(:+)/list.length.to_r
  sum_of_squared_differences = list.map { |i| (i - mean)**2 }.reduce(:+)
  sum_of_squared_differences/list.length
end

谨慎的做法是为空列表和其他边缘情况添加特殊情况处理。)

那么平方根可以定义为:

def std_dev(list)
  Math.sqrt(variance(list))
end

If the records at hand are of type Integer or Rational, you may want to compute the variance using Rational instead of Float to avoid errors introduced by rounding.

For example:

def variance(list)
  mean = list.reduce(:+)/list.length.to_r
  sum_of_squared_differences = list.map { |i| (i - mean)**2 }.reduce(:+)
  sum_of_squared_differences/list.length
end

(It would be prudent to add special-case handling for empty lists and other edge cases.)

Then the square root can be defined as:

def std_dev(list)
  Math.sqrt(variance(list))
end
萌能量女王 2024-12-16 17:31:49

如果人们使用 postgres ...它为 stddev_pop 和 stddev_samp 提供聚合函数 - postgresql聚合函数

stddev(相当于stddev_samp)至少从postgres 7.1开始可用,从8.2开始提供samp和pop。

In case people are using postgres ... it provides aggregate functions for stddev_pop and stddev_samp - postgresql aggregate functions

stddev (equiv of stddev_samp) available since at least postgres 7.1, since 8.2 both samp and pop are provided.

烂柯人 2024-12-16 17:31:49

或者怎么样:

class Stats
    def initialize( a )
        @avg = a.count > 0 ? a.sum / a.count.to_f : 0.0
        @stdev = a.count > 0 ? ( a.reduce(0){ |sum, v| sum + (@avg - v) ** 2 } / a.count ) ** 0.5 : 0.0
    end
end

Or how about:

class Stats
    def initialize( a )
        @avg = a.count > 0 ? a.sum / a.count.to_f : 0.0
        @stdev = a.count > 0 ? ( a.reduce(0){ |sum, v| sum + (@avg - v) ** 2 } / a.count ) ** 0.5 : 0.0
    end
end
一萌ing 2024-12-16 17:31:49

您可以将其作为辅助方法并在任何地方对其进行评估。

def calc_standard_deviation(arr)
    mean = arr.sum(0.0) / arr.size
    sum = arr.sum(0.0) { |element| (element - mean) ** 2 }
    variance = sum / (arr.size - 1)
    standard_deviation = Math.sqrt(variance)
end

You can place this as helper method and assess it everywhere.

def calc_standard_deviation(arr)
    mean = arr.sum(0.0) / arr.size
    sum = arr.sum(0.0) { |element| (element - mean) ** 2 }
    variance = sum / (arr.size - 1)
    standard_deviation = Math.sqrt(variance)
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文