R和SPSS的区别
我很快将分析大量与网络流量相关的数据,并对数据进行预处理以进行分析。我发现 R 和 SPSS 是最流行的统计分析工具之一。我还将生成大量图表。因此,我想知道这两个软件之间的基本区别是什么。
我不是问哪个更好,只是想知道两者在工作流程方面有什么区别(除了 SPSS 有 GUI 之外)。无论如何,我将主要使用脚本来处理这两种情况,所以我想了解其他差异。
I will be analysing vast amount of network traffic related data shortly, and will pre-process the data in order to analyse it. I have found that R and SPSS are among the most popular tools for statistical analysis. I will also be generating quite a lot of graphs and charts. Therefore, I was wondering what is the basic difference between these two softwares.
I am not asking which one is better, but just wanted to know what are the difference in terms of workflow between the two (besides the fact that SPSS has a GUI). I will be mostly working with scripts in either case anyway so I wanted to know about the other differences.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
以下是我不久前发布到 R-help 邮件列表的内容,但我认为它很好地概括了 R 和 SPSS 的一般差异:
R 的 GUI 使其更易于使用,但也限制了可以轻松使用的功能。 SPSS 确实具有脚本功能,这使它不仅仅是一个总线,但 SPSS 的一般原理引导人们使用 GUI 而不是脚本。
Here is something that I posted to the R-help mailing list a while back, but I think that it gives a good high level overview of the general difference in R and SPSS:
There are GUIs for R that make it a bit easier to use, but also limit the functionality that can be used that easily. SPSS does have scripting which takes it beyond being a mere bus, but the general phylosophy of SPSS steers people towards the GUI rather than the scripts.
我在一家使用 SPSS 进行大部分数据分析的公司工作,出于多种原因,我开始尝试使用 R 进行越来越多的我自己的分析。我遇到的一些最大的差异包括:
LaTex
或使用odfWeave
或Lyx
或类似性质的东西。其他人指出了这些程序在成本和功能方面的一些巨大差异。如果您必须与其他人合作,他们对 SPSS 或 R 的熟悉程度应该是一个因素,因为您不想成为团队中唯一可以处理或编辑您将来编写的脚本的人。
如果您要学习 R,stats Exchange 网站上的这篇文章提供了大量学习 R 的优质资源:https://stats.stackexchange.com/questions/138/resources-for-learning-r
I work at a company that uses SPSS for the majority of our data analysis, and for a variety of reasons - I have started trying to use R for more and more of my own analysis. Some of the biggest differences I have run into include:
LaTex
or using aodfWeave
orLyx
or something of that nature.Others have pointed out some of the big differences in terms of cost and functionality of the programs. If you have to collaborate with others, their comfort level with SPSS or R should play a factor as you don't want to be the only one in your group that can work on or edit a script that you wrote in the future.
If you are going to be learning R, this post on the stats exchange website has a bunch of great resources for learning R: https://stats.stackexchange.com/questions/138/resources-for-learning-r
SPSS 的初始工作流程涉及证明开出一张大额支票的合理性。 R 是免费提供的。
R 有一种用于“脚本”的语言,但不要这样想,R 实际上是一种编程语言,具有强大的内置数据操作、统计和图形功能。SPSS 具有“语法”、“脚本”,并且是也可以用 Python 编写脚本。
另一个大问题是 SPSS 将其数据压缩到电子表格结构中。处理其他数据结构可能非常困难,但对 R 来说是很自然的。我不知道从哪里开始在 SPSS 中处理网络图类型数据,但有一个包可以为 R 做这件事。
此外,您还可以使用 R 集成您的工作流程使用 Sweave 编写报告 - 您编写一个嵌入 R 代码位的文档,生成图表或表格,通过系统运行该文件,然后以 PDF 形式输出报告。非常适合当您想要做每周报告,或者您完成了一系列工作,然后老板给您更新的数据集时。重新运行,读一遍,就完成了。
但你知道,你的电话...
The initial workflow for SPSS involves justifying writing a big fat cheque. R is freely available.
R has a single language for 'scripting', but don't think of it like that, R is really a programming language with great data manipulation, statistics, and graphics functionality built in. SPSS has 'Syntax', 'Scripts' and is also scriptable in Python.
Another biggie is that SPSS squeezes its data into a spreadsheety table structure. Dealing with other data structures is probably very hard, but comes naturally to R. I wouldn't know where to start handling network graph type data in SPSS, but there's a package to do it for R.
Also with R you can integrate your workflow with your reporting by using Sweave - you write a document with embedded bits of R code that generate plots or tables, run the file through the system and out comes the report as a PDF. Great for when you want to do a weekly report, or you do a body of work and then the boss gives you an updated data set. Re-run, read it over, its done.
But you know, your call...
那么,你是一个合格的程序员吗?如果是,那么学习 R 是值得的。与使用 SPSS 相比,您可以在操作和统计建模方面对数据做更多的事情,而且您的图表也可能会更好。另一方面,如果您以前从未真正编程过,或者发现花几个月时间成为一名程序员的想法令人生畏,那么您可能会从 SPSS 中获得更多价值。如果你不深入了解 R 作为一种成熟的编程语言的强大功能,你可以使用 R 做的事情的水平可能不足以证明你所做的努力是值得的。
还有另一种选择——合作。您是否认识可以在您的项目上合作的人(您没有说是学术界的还是工业界的,但无论哪种方式......),谁对 R 很了解?
Well, are you a decent programmer? If you are, then it's worthwhile to learn R. You can do more with your data, both in terms of manipulation and statistical modeling, than you can with SPSS, and your graphs will likely be better too. On the other hand, if you've never really programmed before, or find the idea of spending several months becoming a programmer intimidating, you'll probably get more value out of SPSS. The level of stuff that you can do with R without diving into its power as a full-fledged programming language probably doesn't justify the effort.
There's another option -- collaborate. Do you know someone you can work with on your project (you don't say whether it's academic or industry, but either way...), who knows R well?
这里有一些统计工具之间的有趣(而且相当公平)的比较
http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas- spss-stata/
There's an interesting (and reasonably fair) comparison between a number of stats tools here
http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/
我在一家公司与他们一起工作,可以说:
也就是说,我发现 R 在几乎所有其他意义上都更好:
它经常被忽视,但 R 还具有大量可在团队之间协作的功能(github 与 RStudio 集成,以及使用 devtools 轻松构建包)。
实际上,如果组织中的每个人都了解 R,那么您只需要在 github 上维护一个基本包即可共享所有内容。这当然不是常态,这就是为什么我认为SPSS虽然是最差的产品,但仍然有市场。
I work with both in a company and can say the following:
That said, I find R better in almost every other sense:
It is often overlooked, but R also has plenty of features to cooperate between teams (github integration with RStudio, and easy package building with devtools).
Actually, if everyone in your organization knows R, all you need is to maintain a basic package on github to share everything. This of course is not the norm, which is why I think SPSS, although a worst product, still has a market.
我没有相关数据,但根据我的经验,我可以告诉你一件事:
SPSS 比 R 慢很多。 (而且有很多,我真的意味着很多)
差异的大小可能与 C++ 和 R。
例如,在 R 中,我的等待时间永远不会超过几秒钟。使用 SPSS 和类似的数据,我的计算时间超过 10 分钟。
作为一个不相关的旁注:在我看来,在最近关于 R 速度的讨论中,这一点在某种程度上被忽视了(即与 SPSS 的比较)。此外,我很惊讶这种讨论如何突然出现一段时间然后又悄然消失。
I have not data for it, but from my experience I can tell you one thing:
SPSS is a lot slower than R. (And with a lot, I really mean a lot)
The magnitude of the difference is probably as big as the one between C++ and R.
For example, I never have to wait longer than a couple of seconds in R. Using SPSS and similar data, I had calculations that took longer than 10 minutes.
As an unrelated side note: In my eyes, in the recent discussion on the speed of R, this point was somehow overlooked (i.e., the comparison with SPSS). Furthermore, I am astonished how this discussion popped up for a while and silently disappeared again.
上面有一些很好的回复,但我会尽力提供我的 2 美分。我的部门完全依赖SPSS进行工作,但最近几个月,我一直在有意识地努力学习R;在某种程度上,由于上面列出的一些原因(速度、庞大的数据结构、可用的软件包等)
也就是说,这里是我一路上学到的一些东西:
除非你有一些编程经验,否则我认为在 CTABLES 中创建汇总表会破坏 R 中的任何可用选项。迄今为止,我不知道可以复制使用自定义表创建的内容的软件包。
SPSS 在编写脚本时确实显得比较慢,是的,SPSS 语法很糟糕。也就是说,我发现 SPSS 中的 scipt 总是可以改进,但要谨慎使用 EXECUTE 命令。
SPSS 和 R 可以相互连接,尽管看起来这是一种方式(仅当在 SPSS 内部使用 R 时,而不是相反)。也就是说,除了我想使用 ggplot2 或其他一些高级数据管理技术之外,我发现这没什么用处。 (我鄙视 SPSS 宏)。
我一直觉得在 SPSS 中创建的“报告”工作远远不如其他解决方案。如上所述,如果您可以利用 LaTex 和 Sweave,您将对高效的工作流程感到非常满意。
我已经能够利用 SPSS 中的 OMS 进行一些高级分析。几乎所有内容都可以路由到新数据集,但我发现大多数 SPSS 用户不使用此功能。另外,在查看 R 中的示例时,感觉它比使用 OMS“更容易”。
简而言之,当我无法在 R 中快速弄清楚时,我发现自己正在使用 SPSS,但我真诚地打算在不久的将来的某个时候摆脱 SPSS 并完全使用 R。
There are some great responses above, but I will try to provide my 2 cents. My department completely relies on SPSS for our work, but in recent months, I have been making a conscious effort to learn R; in part, for some of the reasons itemized above (speed, vast data structures, available packages, etc.)
That said, here are a few things I have picked up along the way:
Unless you have some experience programming, I think creating summary tables in CTABLES destroys any available option in R. To date, I am unaware package that can replicate what can be created using Custom Tables.
SPSS does appear to be slower when scripting, and yes, SPSS syntax is terrible. That said, I have found that scipts in SPSS can always be improved but using the EXECUTE command sparingly.
SPSS and R can interface with each other, although it appears that it's one way (only when using R inside of SPSS, not the other way around). That said, I have found this to be of little use other than if I want to use ggplot2 or for some other advanced data management techniques. (I despise SPSS macros).
I have long felt that "reporting" work created in SPSS is far inferior to other solutions. As mentioned above, if you can leverage LaTex and Sweave, you will be very happy with your efficient workflows.
I have been able to do some advanced analysis by leveraging OMS in SPSS. Almost everything can be routed to a new dataset, but I have found that most SPSS users don't use this functionality. Also, when looking at examples in R, it just feels "easier" than using OMS.
In short, I find myself using SPSS when I can't figure it out quickly in R, but I sincerely have every intention of getting away from SPSS and using R entirely at some point in the near future.
SPSS 提供了一个 GUI,可以轻松集成现有 R 程序或开发新程序。有关更多信息,请参阅 IBM Developer Works 上的 SPSS 社区。
SPSS provides a GUI to easily integrate existing R programs or develop new ones. For more info, see the SPSS Community on IBM Developer Works.
@Henrik,我做了你提到的同样的任务(C++ 和 R) 在 SPSS 上。事实证明,在这方面 SPSS 比 R 更快。就我而言,SPSS 大约是。速度快 7 倍。我对此感到很惊讶。
这是我在 SPSS 中使用的代码。
@Henrik, I did the same task you have mentioned (C++ and R) on SPSS. And it turned out that SPSS is faster compared to R on this one. In my case SPSS is aprox. 7 times faster. I am surprised about it.
Here is a code I used in SPSS.
观看此视频为什么将 SPSS 和 R 结合起来会很好...
链接
http://bluemixanalytics.wordpress.com/2014/08/29/7-good-reasons-to-combine-ibm-spss-analytics-and- r/
如果您安装了 R 的兼容副本,则可以从 IBM SPSS Modeler 连接到它,并使用可部署在 IBM SPSS Modeler 中的自定义 R 算法执行模型构建和模型评分。您还必须安装 IBM SPSS Modeler - Essentials for R 的副本。 IBM SPSS Modeler - Essentials for R 为您提供了开始开发与 IBM SPSS Modeler 一起使用的自定义 R 应用程序所需的工具。
Check out this video why is good to combine SPSS and R...
Link
http://bluemixanalytics.wordpress.com/2014/08/29/7-good-reasons-to-combine-ibm-spss-analytics-and-r/
If you have a compatible copy of R installed, you can connect to it from IBM SPSS Modeler and carry out model building and model scoring using custom R algorithms that can be deployed in IBM SPSS Modeler. You must also have a copy of IBM SPSS Modeler - Essentials for R installed. IBM SPSS Modeler - Essentials for R provides you with tools you need to start developing custom R applications for use with IBM SPSS Modeler.
事实是:如果您专业进行数据分析,这两个软件包都很有用。当然,R / RStudio 比 SPSS 实现了更多的统计方法。但 SPSS 更容易使用,并且每次单击按钮都会提供更多信息。因此,每当在 R 和 SPSS 中实施特定分析时,利用速度都会更快。
在现代,CPU 和内存都不是最有价值的资源。研究人员的时间是最宝贵的资源。另外,我认为 SPSS 中的表格在视觉上更令人愉悦。
总之,R 和 SPSS 相辅相成。
The truth is: both packages are useful if you do data analysis professionally. Sure, R / RStudio has more statistical methods implemented than SPSS. But SPSS is much easier to use and gives more information per each button click. And, therefore, it is faster to exploit whenever a particular analysis is implemented in both R and SPSS.
In the modern age, neither CPU nor memory is the most valuable resource. Researcher's time is the most valuable resource. Also, tables in SPSS are more visually pleasing, in my opinion.
In summary, R and SPSS complement each other well.