Python SciPy 统计百分比分数

发布于 2024-12-15 09:06:25 字数 649 浏览 2 评论 0原文

考虑以下 Python 代码：

In [1]: import numpy as np
In [2]: import scipy.stats as stats
In [3]: ar = np.array([0.8389, 0.5176, 0.1867, 0.1953, 0.4153, 0.6036, 0.2497, 0.5188, 0.4723, 0.3963])
In [4]: x = ar[-1]
In [5]: stats.percentileofscore(ar, x, kind='strict')
Out[5]: 30.0
In [6]: stats.percentileofscore(ar, x, kind='rank')
Out[6]: 40.0
In [7]: stats.percentileofscore(ar, x, kind='weak')
Out[7]: 40.0
In [8]: stats.percentileofscore(ar, x, kind='mean')
Out[8]: 35.0

kind 参数表示对结果分数的解释。

现在，当我使用 Excel 的 PERCENTRANK 函数处理相同的数据时，我得到 0.3333。这似乎是正确的，因为有 3 个值小于 x=0.3963。

有人可以解释为什么我得到的结果不一致吗？

原文

Consider the following Python code:

In [1]: import numpy as np
In [2]: import scipy.stats as stats
In [3]: ar = np.array([0.8389, 0.5176, 0.1867, 0.1953, 0.4153, 0.6036, 0.2497, 0.5188, 0.4723, 0.3963])
In [4]: x = ar[-1]
In [5]: stats.percentileofscore(ar, x, kind='strict')
Out[5]: 30.0
In [6]: stats.percentileofscore(ar, x, kind='rank')
Out[6]: 40.0
In [7]: stats.percentileofscore(ar, x, kind='weak')
Out[7]: 40.0
In [8]: stats.percentileofscore(ar, x, kind='mean')
Out[8]: 35.0

The kind argument represents the interpretation of the resulting score.

Now when I use Excel's PERCENTRANK function with the same data, I get 0.3333. This appears to be correct as there are 3 values less than x=0.3963.

Can someone explain why I'm getting inconsistent results?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

人事已非 2024-12-22 09:06:25

当我在 scipy.stats 中重写这个函数时，我发现了许多不同的定义，其中一些定义被包含在内。

基本的例子是当我想根据分数对学生进行排名时。在这种情况下，分数包括所有学生，分数百分位数给出了所有学生中的排名。主要区别在于如何处理关系。

Excel 似乎使用相对于现有量表对学生进行排名的方式，例如历史 GRE 量表上的分数排名是多少。我不知道如果分数不在现有列表中，Excel 是否会删除一项。

统计学中的一个类似问题是分位数的“绘制位置”。我在互联网上找不到好的参考。这是一个通用公式 http://amsglossary.allenpress.com/glossary/search? id=绘图位置1
维基百科只有一小段： http://en.wikipedia.org/wiki/Q- Q_plot#Plotting_positions

文献中有大量 b 选择不同（甚至选择第二个参数 a）的情况，它们对应于不同分布的不同近似值。有几个是在 scipy.stats.mstats 中实现的。

我认为这不是哪个正确的问题。就是，你想用它做什么？您的问题或领域的通用定义是什么？

回复收藏 0 原文

时常饿 2024-12-22 09:06:25

这是一个奇怪的问题，据我所知，他们正在进行不同的计算，如果以这种方式调用，Scipy 将重现 excel 结果。

In [1]: import numpy as np
In [2]: In [2]: import scipy.stats as stats
In [3]: In [3]: ar = np.array([0.8389, 0.5176, 0.1867, 0.1953, 0.4153, 0.6036, 0.2497, 0.5188, 0.4723, 0.3963])
In [4]: In [4]: x = ar[-1]
In [5]: stats.percentileofscore(ar[:-1], x, kind='mean')
Out[5]: 33.333333333333336

使用任何类型的关键字我都会得到相同的答案。这会遗漏数据中与查询完全相同的值。看看这个 VBA 中的 PercentRank 算法，因为它可能有一些洞察力。

那么哪个是正确的呢？ Excel 还是 Scipy？

This is a weird one, near as I can tell they are doing different calculations, Scipy will reproduce the excel result if called this way.

In [1]: import numpy as np
In [2]: In [2]: import scipy.stats as stats
In [3]: In [3]: ar = np.array([0.8389, 0.5176, 0.1867, 0.1953, 0.4153, 0.6036, 0.2497, 0.5188, 0.4723, 0.3963])
In [4]: In [4]: x = ar[-1]
In [5]: stats.percentileofscore(ar[:-1], x, kind='mean')
Out[5]: 33.333333333333336

using any of the kind keywords I get the same answer. This is leaving out the value in the data that is exactly equal to the query. Have a look at this PercentRank algorithm in VBA as it might have a bit of insight.

So which is right? Excel or Scipy?

回复收藏 0 原文

~没有更多了~