Python SciPy 统计百分比分数

发布于 2024-12-15 09:06:25 字数 649 浏览 2 评论 0原文

考虑以下 Python 代码:

In [1]: import numpy as np
In [2]: import scipy.stats as stats
In [3]: ar = np.array([0.8389, 0.5176, 0.1867, 0.1953, 0.4153, 0.6036, 0.2497, 0.5188, 0.4723, 0.3963])
In [4]: x = ar[-1]
In [5]: stats.percentileofscore(ar, x, kind='strict')
Out[5]: 30.0
In [6]: stats.percentileofscore(ar, x, kind='rank')
Out[6]: 40.0
In [7]: stats.percentileofscore(ar, x, kind='weak')
Out[7]: 40.0
In [8]: stats.percentileofscore(ar, x, kind='mean')
Out[8]: 35.0

kind 参数表示对结果分数的解释。

现在,当我使用 Excel 的 PERCENTRANK 函数处理相同的数据时,我得到 0.3333。这似乎是正确的,因为有 3 个值小于 x=0.3963。

有人可以解释为什么我得到的结果不一致吗?

Consider the following Python code:

In [1]: import numpy as np
In [2]: import scipy.stats as stats
In [3]: ar = np.array([0.8389, 0.5176, 0.1867, 0.1953, 0.4153, 0.6036, 0.2497, 0.5188, 0.4723, 0.3963])
In [4]: x = ar[-1]
In [5]: stats.percentileofscore(ar, x, kind='strict')
Out[5]: 30.0
In [6]: stats.percentileofscore(ar, x, kind='rank')
Out[6]: 40.0
In [7]: stats.percentileofscore(ar, x, kind='weak')
Out[7]: 40.0
In [8]: stats.percentileofscore(ar, x, kind='mean')
Out[8]: 35.0

The kind argument represents the interpretation of the resulting score.

Now when I use Excel's PERCENTRANK function with the same data, I get 0.3333. This appears to be correct as there are 3 values less than x=0.3963.

Can someone explain why I'm getting inconsistent results?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

人事已非 2024-12-22 09:06:25

当我在 scipy.stats 中重写这个函数时,我发现了许多不同的定义,其中一些定义被包含在内。

基本的例子是当我想根据分数对学生进行排名时。在这种情况下,分数包括所有学生,分数百分位数给出了所有学生中的排名。主要区别在于如何处理关系。

Excel 似乎使用相对于现有量表对学生进行排名的方式,例如历史 GRE 量表上的分数排名是多少。我不知道如果分数不在现有列表中,Excel 是否会删除一项。

统计学中的一个类似问题是分位数的“绘制位置”。我在互联网上找不到好的参考。这是一个通用公式 http://amsglossary.allenpress.com/glossary/search? id=绘图位置1
维基百科只有一小段: http://en.wikipedia.org/wiki/Q- Q_plot#Plotting_positions

文献中有大量 b 选择不同(甚至选择第二个参数 a)的情况,它们对应于不同分布的不同近似值。有几个是在 scipy.stats.mstats 中实现的。

我认为这不是哪个正确的问题。就是,你想用它做什么?您的问题或领域的通用定义是什么?

When I rewrote this function in scipy.stats, I found many different definitions, some of them are included.

The basic example is when I want to rank students on a score. In this case the score includes all students, and the percentileofscore gives the rank among all students. The main distinction then is just how to handle ties.

Excel seems to use how you would rank a student relative to an existing scale, for example what's the rank of a score on the historical GRE scale. I have no idea if excel drops one entry if the score is not in the existing list.

A similar problem in statistics are "plotting positions" for quantiles. I don't find a good reference on the internet. Here is one general formula http://amsglossary.allenpress.com/glossary/search?id=plotting-position1
Wikipedia only has a short paragraph: http://en.wikipedia.org/wiki/Q-Q_plot#Plotting_positions

The literature has a large number of cases of different choices of b (or even choices of a second parameter a), that correspond to different approximations for different distributions. Several are implemented in scipy.stats.mstats.

I don't think it's a question of which is right. It's, what you want to use it for? And what's the common definition for your problem or your field?

时常饿 2024-12-22 09:06:25

这是一个奇怪的问题,据我所知,他们正在进行不同的计算,如果以这种方式调用,Scipy 将重现 excel 结果。

In [1]: import numpy as np
In [2]: In [2]: import scipy.stats as stats
In [3]: In [3]: ar = np.array([0.8389, 0.5176, 0.1867, 0.1953, 0.4153, 0.6036, 0.2497, 0.5188, 0.4723, 0.3963])
In [4]: In [4]: x = ar[-1]
In [5]: stats.percentileofscore(ar[:-1], x, kind='mean')
Out[5]: 33.333333333333336

使用任何类型的关键字我都会得到相同的答案。这会遗漏数据中与查询完全相同的值。看看这个 VBA 中的 PercentRank 算法,因为它可能有一些洞察力。

那么哪个是正确的呢? Excel 还是 Scipy?

This is a weird one, near as I can tell they are doing different calculations, Scipy will reproduce the excel result if called this way.

In [1]: import numpy as np
In [2]: In [2]: import scipy.stats as stats
In [3]: In [3]: ar = np.array([0.8389, 0.5176, 0.1867, 0.1953, 0.4153, 0.6036, 0.2497, 0.5188, 0.4723, 0.3963])
In [4]: In [4]: x = ar[-1]
In [5]: stats.percentileofscore(ar[:-1], x, kind='mean')
Out[5]: 33.333333333333336

using any of the kind keywords I get the same answer. This is leaving out the value in the data that is exactly equal to the query. Have a look at this PercentRank algorithm in VBA as it might have a bit of insight.

So which is right? Excel or Scipy?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文