我如何计算这些统计数据?

发布于 2024-07-04 00:44:49 字数 499 浏览 10 评论 0原文

我正在编写一个应用程序来帮助促进一些研究,其中一部分涉及进行一些统计计算。 目前,研究人员正在使用一个名为 SPSS 的程序。 他们关心的部分输出如下所示:

[SPSS 输出的一部分

他们实际上只关心FSig. 值。 我的问题是我没有统计学背景,我不知道这些测试叫什么,或者如何计算它们。

我认为 F 值可能是 F 的结果-test,但是按照维基百科上给出的步骤进行操作后,我得到的结果与SPSS给出的结果不同。

I'm writing an app to help facilitate some research, and part of this involves doing some statistical calculations. Right now, the researchers are using a program called SPSS. Part of the output that they care about looks like this:

[Part of the SPSS output

They're really only concerned about the F and Sig. values. My problem is that I have no background in statistics, and I can't figure out what the tests are called, or how to calculate them.

I thought the F value might be the result of the F-test, but after following the steps given on Wikipedia, I got a result that was different from what SPSS gives.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

云裳 2024-07-11 00:44:49

这是 MANOVA 输出的解释,来自一个非常好的统计网站和 SPSS:

带有解释的输出:
http://faculty.chass.ncsu.edu/garson/PA765/manospss。 htm

如何以及为何进行 MANOVA 或多元 GLM:
(与上面的路径相同,但以“/manova.htm”结尾)

从头开始编写软件来计算这些输出将既漫长又困难;
有很多数值问题和矩阵求逆要做。

正如 Henry 所说,使用 Python 脚本或 R。如果编写脚本,我建议与了解 SPSS 的人一起工作。
此外,SPSS 本身能够使用称为 OMS 的工具将输出表导出到文件。
SPSS 中的脚本可以做到这一点。

找出您的研究小组中谁了解 SPSS 并与他们合作。

Here's an explanation of MANOVA ouptput, from a very good site on statistics and on SPSS:

Output with explanation:
http://faculty.chass.ncsu.edu/garson/PA765/manospss.htm

How and why to do MANOVA or multivariate GLM:
(same path as above, but terminating in '/manova.htm')

Writing software from scratch to calculate these outputs would be both lengthy and difficult;
there's lots of numerical problems and matrix inversions to do.

As Henry said, use Python scripts, or R. I'd suggest working with somebody who knows SPSS if scripting.
In addition, SPSS itself is capable of exporting the output tables to files using something called OMS.
A script within SPSS can do this.

Find out who in your research group knows SPSS and work with them.

东京女 2024-07-11 00:44:49

您能否详细解释一下为什么 SPSS 本身不能很好地解决该问题? 它是否会生成难以操作的数据透视表作为输出? 是程序的费用吗?

F 统计量可以由任意数量的特定检验产生。 F 只是一个分布(宽松地:对值组的“频率”的描述),如正态分布(高斯分布)或均匀分布。 一般来说,它们是由方差比率产生的。 观点:许多统计学家(包括我自己)发现基于 F 的检验不稳定(行话:不稳健)。

特定的输出统计数据(皮莱轨迹等)表明原始分析是一个多元方差分析示例,正如其他发帖者所描述的那样,这是一个复杂且难以正确执行的过程。

我还猜测,基于多元方差分析和 SPSS 的使用,这是一个心理学或社会学项目......如果不是,请赐教。 其他更简单的模型实际上可能更容易理解并且更可重复。 如果您当地的大学有统计咨询小组,请咨询该小组。

祝你好运!

Can you explain more why SPSS itself isn't a fine solution to the problem? Is it that it generates pivot tables as output that are hard to manipulate? Is it the cost of the program?

F-statistics can arise from any number of particular tests. The F is just a distribution (loosely: a description of the "frequencies" of groups of values), like a Normal (Gaussian), or Uniform. In general they arise from ratios of variances. Opinion: many statisticians (myself included), find F-based tests to be unstable (jargon: non-robust).

The particular output statistics (Pillai's trace, etc.) suggest that the original analysis is a MANOVA example, which as other posters describe is a complicated, and hard to get right procedure.

I'm guess also that, based on the MANOVA, and the use of SPSS, this is a psychology or sociology project... if not please enlighten. It might be that other, simpler models might actually be easier to understand and more repeatable. Consult your local university statistical consulting group, if you have one.

Good luck!

梦明 2024-07-11 00:44:49

简而言之:不要手动执行此操作,链接/使用现有软件。 sain_grocen 的答案是不正确的。 :(

这些都是对参数估计显着性的测试,通常用于多元响应多重回归。在统计编程环境之外,这些并不是简单的事情。我建议要么从预先存在的统计程序中获取输出,或者使用您可以链接到并使用该代码的答案,

我担心第一个答案(sain_grocen 的)会引导您走上错误的道路。他的解释可能是您实际处理的情况的特殊情况。 anova 在他的链接中解释的是平衡设计中的单变量响应,这些不是您看到的 F 统计数据(Pillai 的跟踪、Hotelling 的跟踪,...)是一些可用的多变量。它们在某些假设下具有 F 分布。我无法在这里解释教科书上的内容,我建议您先看一下。
Johnson 和 Wichern 的“应用多元统计分析”

In short: don't do this by hand, link/use existing software. And sain_grocen's answer is incorrect. :(

These are all tests for significance of parameter estimates that are typically used in Multivariate response Multiple Regressions. These would not be simple things to do outside of a statistical programming environment. I would suggest either getting the output from a pre-existing statistical program, or using one that you can link to and use that code.

I'm afraid that the first answer (sain_grocen's) will lead you down the wrong path. His explanation is likely of a special case of what you are actually dealing with. The anova explained in his links is for a single variate response, in a balanced design. These aren't the F statistics you are seeing. The names in your output (Pillai's Trace, Hotelling's Trace,...) are some of the available multivariate versions. They have F distributions under certain assumptions. I can't explain a text books worth of material here, I would advise you to start by looking at
"Applied Multivariate Statistical Analysis" by Johnson and Wichern

老娘不死你永远是小三 2024-07-11 00:44:49

统计很难:-)。 经过一年的阅读和重新阅读书籍和论文,我只能自信地说我理解了它的基础知识。

无论您使用哪种编程语言,您可能希望研究现成的库,因为它们在一般数学和特别是统计中存在许多陷阱(舍入错误是一个明显的例子)。

作为示例,您可以查看 R 项目,它既是一个交互式环境,也是一个您可以从 C++ 代码中使用该库,并根据 GPL 分发(即,如果您仅在内部使用它并仅发布结果,则无需打开代码)。

Statistics is hard :-). After a year of reading and re-reading books and papers and can only say with confidence that I understand the very basics of it.

You might wish to investigate ready-made libraries for whichever programming language you are using, because they are many gotcha's in math in general and statistics in particular (rounding errors being an obvious example).

As an example you could take a look at the R project, which is both an interactive environment and a library which you can use from your C++ code, distributed under the GPL (ie if you are using it only internally and publishing only the results, you don't need to open your code).

贪恋 2024-07-11 00:44:49

我从你的问题中假设你的研究同事想要自动化执行某些统计分析的过程(即,他们想要批量处理数据集)。 您有两个选择:

1) SPSS 现在可以通过 python 编写脚本(从版本 15 开始) - 访问 spss.com 并搜索 python。 您可以编写 python 脚本来自动分析数据并从数据透视表中提取关键值,然后以您喜欢的方式处理答案。 这样做的优点是可以对 Python 脚本的结果与协作者在 SPSS 中手工计算的结果进行精确比较。 因此,您不必真正了解任何统计数据即可完成这项工作(这是一个关键优势)

2) 您可以在 R(一个免费的统计环境)中执行此操作,并且可以编写脚本。 这样做的缺点是你必须学习统计学才能确保你做得正确。

I assume from your question that your research colleagues want to automate the process by which certain statistical analyses are performed (i.e., they want to batch process data sets). You have two options:

1) SPSS is now scriptable through python (as of version 15) - go to spss.com and search for python. You can write python scripts to automate data analyses and extract key values from pivot tables, and then process the answers any way you like. This has the virtue of allowing an exact comparison between the results from your python script and the hand-calculated efforts in SPSS of your collaborators. Thus you won't have to really know any statistics to do this work (which is a key advantage)

2) You could do this in R, a free statistics environment, which could probably be scripted. This has the disadvantage that you will have to learn statistics to ensure that you are doing it correctly.

尐籹人 2024-07-11 00:44:49

此网站可能会为您提供更多帮助。 还有这个

我对统计学课程的记忆相当生锈,但这里什么也没有:

当你进行方差分析(ANOVA)时,你实际上将 F 统计量计算为“组之间的均方方差”的比率”和“组内”的均方方差。 上面的第二个链接似乎非常适合此计算。

这使得 F 统计量准确地衡量模型的强大程度,因为“组间”方差是解释力,而“组内”方差是随机误差。 高 F 意味着模型非常重要。

与许多统计操作一样,您可以反向确定 Sig。 使用 F 统计量。 在这里,您的维基百科信息会稍微派上用场。 你想要做的是 - 使用 SPSS 给你的自由度 - 找到合适的 P 值,在该值下 F 表 将为您提供计算得出的 F 统计数据。 发生这种情况时的 P 值 [F(table) = F(calculated)] 就是显着性。

从概念上讲,较低的显着性值表明拒绝零假设的能力非常强(出于这些目的,这意味着确定您的模型具有解释力)。

如果其中有任何错误,请向所有数学爱好者表示歉意。 我会回来检查进行编辑!

祝你好运。 统计很有趣,只是这部分可能不是。 =)

This website might help you out a bit more. Also this one.

I'm working from a fairly rusty memory of a statistics course, but here goes nothing:

When you're doing analysis of variance (ANOVA), you actually calculate the F statistic as the ratio from the mean-square variances "between the groups" and the mean-square variances "within the groups". The second link above seems pretty good for this calculation.

This makes the F statistic measure exactly how powerful your model is, because the "between the groups" variance is explanatory power, and "within the groups" variance is random error. High F implies a highly significant model.

As in many statistical operations, you back-determine Sig. using the F statistic. Here's where your Wikipedia information comes in slightly handy. What you want to do is - using the degrees of freedom given to you by SPSS - find the proper P value at which an F table will give you the F statistic you calculated. The P value where this happens [F(table) = F(calculated)] is the significance.

Conceptually, a lower significance value shows a very strong ability to reject the null hypothesis (which for these purposes means to determine your model has explanatory power).

Sorry to any math folks if any of this is wrong. I'll be checking back to make edits!!!

Good luck to you. Stats is fun, just maybe not this part. =)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文