哪些编程语言适合统计?

发布于 2024-08-20 10:18:44 字数 190 浏览 3 评论 0 原文

我最近对一些事情做了更多的统计分析,我很好奇是否有任何编程语言特别适合这个目的。我了解 R,但我更喜欢一些更通用的东西(或者是R 非常通用?)。

大家有什么建议?是否有任何语言的语法/语义特别面向于此?或者是否有任何语言拥有非常好的库?

I'm doing a bit more statistical analysis on some things lately, and I'm curious if there are any programming languages that are particularly good for this purpose. I know about R, but I'd kind of prefer something a bit more general-purpose (or is R pretty general-purpose?).

What suggestions do you guys have? Are there any languages out there whose syntax/semantics are particularly oriented towards this? Or are there any languages that have exceptionally good libraries?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(16

木落 2024-08-27 10:18:44

没有争议 - R 作为 S 的主要实现(并且发生了这样的情况)是适当的开源和 GNU 项目)。

不仅因为 S 语言正是为此目的而设计的(参见 John Chambers 的书籍),而且在 CRAN 是首屈一指的:超过 2000 个具有适当质量控制的软件包,通常由该领域的专家编写。

ACM 在 1998 年将ACM 软件系统奖授予 John Chambers 时也有同样的看法以下引用

约翰·钱伯斯

对于S系统来说,永远的
改变了人们分析、想象、
并操纵数据。

作为参考,该奖项的其他获奖者包括 TeX、Smalltalk、Postscript、RPC、“the web”、Mosaic、Tcl/Tk、Java、Make……这家公司不错。

现在,如果您“只”想要收集和总结一些数据,任何过程式或函数式语言都可以。但如果您想要某种专为数据编程而设计的东西,那么 R 就是主要的 S 实现。

No contest -- R as the main implementation of S (and one that happens to be proper Open Source and a GNU project as well).

Not only as the S language designed precisely for this purpose (see the books by John Chambers), but the rather rich support of domain-specific packages at CRAN is second to none: over 2000 packages with proper quality control, often authored by experts in the field.

The ACM sees it the same way when it gave the ACM Software Systems Award to John Chambers in 1998 with the following citation

John M. Chambers

For The S system, which has forever
altered how people analyze, visualize,
and manipulate data.

For reference, other winners of this award were TeX, Smalltalk, Postscript, RPC, 'the web', Mosaic, Tcl/Tk, Java, Make, ... Not a bad company to be in.

Now, if you 'only' want to collect and summarize some data just about any procedural or functional language will do. But if you want something that was designed for programming with data then R as the main S implementation it is.

画中仙 2024-08-27 10:18:44

毫无疑问,正如 Dirk 所说,R 是最好的统计语言。我只想补充几点:

首先,我认为您应该使用 R 的主要原因是社区。现阶段,它被学术界和工业界的专家广泛使用,以至于没有其他语言可以与 CRAN 的财富相媲美。

其次,应该承认 R 语言使用起来很愉快。这是我的主要语言,在尝试过其他语言之后,我无意很快就放弃它。但它在数据编程方面也没有垄断,这种说法可能太过分了。所有 Lisp 和函数式语言都擅长数据编程。毕竟,Lisp 源自“列表编程”,正是 Lisp 对 R 的影响才造就了 R 语言。

R 社区的一些成员(例如 Ross Ihaka)实际上将 Lisp 视为未来的统计语言(请参阅“回到未来”论文以供参考),因为 R 语言中存在一些深层次的设计问题(例如没有多线程)。

因此,虽然 R 无疑是统计计算的最佳语言,但我认为熟悉 OCaml、Haskell 或(可能)Clojure/Incanter 等其他语言也有一定的价值。

No question that R is the best language for statistics, as Dirk says. I just want to add a few points to this:

First, I think that the primary reason that you should use R is because of the community. It is used so heavily by experts in academia and industry at this stage, that no other language even comes close to rivalling the wealth on CRAN.

Second, it should be acknowledged that R the language is a joy to work with. It is my primary language, and having tried alternatives, I have no intention of abandoning it any time soon. But it also doesn't have a monopoly on it's strength for programming with data and this claim can be taken too far. All the Lisp and Functional languages are strong at data programming. Lisp, after all, was derived from "list programming", and it is Lisp's influence on R that make the language what it is.

There are members of the R community (eg. Ross Ihaka) who are actually viewing Lisp as the statistical languge of the future (see the "back to the future" paper for a reference) due to some deep design problems in the R language (eg. no multithreading).

So while R is undoubtedly the best language for statistical computing, I see some value in being familiar with another language like OCaml, Haskell, or (possibly) Clojure/Incanter.

鹿港巷口少年归 2024-08-27 10:18:44

看看 Incanter,基于 clojure。 “Incanter 是一个基于 Clojure 的类似 R 的统计计算和图形平台。” Clojure 是一种在 JVM 之上实现的基于 Lisp 的语言。它可以轻松访问 java 库。没有比这更通用的目的了。

Have a look at Incanter, based on clojure. "Incanter is a Clojure-based, R-like platform for statistical computing and graphics." Clojure is a lisp based language implemented on the top of the JVM. It has easy access to java libraries. Can't get more general purpose than that.

不乱于心 2024-08-27 10:18:44

根据我的经验,R 在以下领域是一种异常强大的语言:

  1. 数据的操作和转换。

  2. 统计分析。

  3. 图形。

但 R 绝不是三招小马。我还将该语言应用于不完全属于上述类别的任务。 示例是:

  • 一个脚本,用于通过识别和匹配不同架构的静态和动态库,然后通过 lipo 运行结果组来协助创建 OSX 通用二进制文件。

  • 从网页中抓取信息的脚本。

  • 一组脚本,用于创建地理参考图像、使用 GDAL 将图像切割成图块集、形成描述输出的 JSON 清单并将结果上传到网站以供使用通过 OpenLayers 立即显示。

我最喜欢使用 R 的部分是我经常说:

哇!有一个包可以实现那个?!

From my experience, R is an exceptionally powerful language in these areas:

  1. Manipulation and transformation of data.

  2. Statistical analysis.

  3. Graphics.

But R is by no means a three-trick pony. I have also applied the language to tasks that do not fit entirely into the above categories. Some examples are:

  • A script to assist in the creation of OSX universal binaries by identifying and matching static and dynamic libraries of different architectures and then running the resulting groups through lipo.

  • Scripts to scrape information from web pages.

  • A set of scripts to create georeferenced imagery, cut the images into tilesets using GDAL, form a JSON manifest that describes the output and upload the result to a website for immediate display by OpenLayers.

My favorite part of using the R is the frequency with which I get to say:

WHOA! There's a package that does THAT?!

吾家有女初长成 2024-08-27 10:18:44

你可以看一下sage程序,它是Python解释器的重新实现,可以让你使用 python 语法调用不同的编程语言进行统计(R、matlab、octave 等)。

编写统计程序时的主要问题之一是,您最终可能会拥有许多不同的小脚本,每个脚本都执行一项单独的任务,并且最终可能会出现混乱的文件夹和混乱的结果。

因此,除了选择一种编程语言(我认为其他人已经回答了您的问题)之外,您还需要一种语法来定义脚本管道:您可以使用程序“gnu/make”来制作它(例如阅读 this) 或使用此 sage,或者还有其他解决方案。

You can have a look at the program sage, which is a re-implementation of the python interpreter that allow you to call different programming languages for statistics (R, matlab, octave, etc..) using a python syntax.

One of the major issues while writing programs to do statistics is that you may end up with having many different small scripts, each one doing a separate task, and you can end up with having messy folders and confusion in your results.

So, apart from choosing a programming language (I think other people have answered to your question already) you also need a syntax to define pipelines of scripts: you can make it with the program 'gnu/make' (e.g. read this) or with this sage, or there are other solutions.

只有影子陪我不离不弃 2024-08-27 10:18:44

我会说 R,因为我大学的大多数统计课程都使用 R,而且我的大多数参加过此类课程的朋友都对它的范围和影响感到非常满意。

我什至尝试过 MATLAB,发现它非常方便。

干杯

I would say R as most of the Statistics courses in my University use R and most of my friends who have taken such courses are quite content with its range and reach.

I have even tried MATLAB and found it pretty handy.

cheers

挽容 2024-08-27 10:18:44

如果您所做的所有都是统计数据,那么 R 就很棒。它有一个很好的交互界面和可视化工具。然而,它很难用作通用语言,因为它的语法和语义对于统计数据进行了高度优化。如果您想要一种更通用的语言,带有 SciPy 的 Python 将是一个不错的选择,尽管我使用过它并发现其中的统计例程有些不成熟。它们通常效率低下或在极端情况下失败。

如果您正在对大型数据集进行数据挖掘,那么性能很重要,并且/或者您不介意使用 alpha-ish 工具、D 编程语言和 dstats 库 可能相当不错。 D 是一种通用语言,但恕我直言,dstats 非常容易使用,因为模板元编程使得即使在静态编译的、接近金属的语言中也可以轻松设计出漂亮的 API。 (完全披露:我写了大部分 dstats,所以我当然认为它很好。)

R is great if all you're doing is statistics. It's got a nice interactive interface and visualization tools. However, it's pretty hard to use as a general purpose language because its syntax and semantics are very highly optimized for doing statistics. If you want a more general-purpose language, Python with SciPy would be a decent choice, though I've used it and found the statistical routines in it to be somewhat immature. They often are inefficient or fail in corner cases.

If you're doing data mining on large datasets, making performance important, and/or you don't mind using alpha-ish tools, the D programming language and the dstats library can be pretty good. D is about as general-purpose a language as you get, but IMHO dstats is very easy to use because template metaprogramming makes it easy to design a nice API even in a statically compiled, close-to-the-metal language. (Full disclosure: I wrote most of dstats, so of course I think it's good.)

贪恋 2024-08-27 10:18:44

pystats 库(适用于 Python)非常适合统计分析。

The pystats library (for Python) is well-suited for statistical analysis.

你如我软肋 2024-08-27 10:18:44

您是否考虑过使用像 MatLab 这样的东西?它具有许多执行数据分析的高级功能,您可以在环境中进行一些编程。

Have you considered using somethinbg like MatLab? It has many advanced capabilities to perform data analysis and you can do some programming in the environment.

甜嗑 2024-08-27 10:18:44

Stata 怎么样?我有一个朋友是经济学博士生,他一直对 Stata 赞不绝口。我个人对 Mathematica 有着浓厚的兴趣。

What about Stata? I have a friend who is a PhD Economics student and he raves about Stata all the time. And I have a personal affinity for Mathematica.

眼泪都笑了 2024-08-27 10:18:44

Matlab 也擅长统计。但它并不完全免费。

Octave 是一个免费克隆,也可以满足您的需求。

Matlab is good at statistics too. It's not exactly free, though.

Octave is a free clone that might also do what you need.

烟酉 2024-08-27 10:18:44

我的一位专注于市场统计的朋友使用 SAS。我对此了解不多——它看起来不像是一种“真正的”语言,但可能值得一试。

我完全支持 Python 与 R 绑定。

A friend of mine who focuses on market statistics uses SAS. I don't know much about it- it doesn't seem like a "real" language, but might be worth checking out.

I'm all for Python with R bindings.

清风不识月 2024-08-27 10:18:44

查看 RooFit 软件包的 ROOT。例如,粒子物理学家使用它进行数据分析。

ROOT 是一个 C++ 框架,还附带 python 和 ruby​​ 绑定。它还包括一个有限的交互式C++解释器。

Have a look into the RooFit package for ROOT. It is used by e.g. particle physicists for data analysis.

ROOT is a C++ framework and also comes with python and ruby bindings. It is also includes a limited interactive C++ interpreter.

此刻的回忆 2024-08-27 10:18:44

我还想为 R +1。它可能不像 STATA 甚至 SPSS 那样容易处理,特别是对于非程序员来说。尽管我猜想普通的 stackoverflower 比我更像是一名程序员。

话虽这么说,我想给出一个简短的概述,因为我看到了一些来自用户(经济学家(观点)的统计软件包。STATA

仍然是大多数经济学家的选择,而且确实它有一些优点STATAs GUI 有助于管理大量选项和统计功能,此外 STATA 似乎是唯一具有至少在某种程度上接近基准的邮件列表的软件包:独一无二的 R 邮件列表。仍然可以编写复杂的 .do 文件或从网络下载一些文件。
STATA 可能不像 R 那样接近编程语言,但仍然为统计目的提供了一种很好的编程语言。根据数据集的大小,您应该检查所需的许可证。

您还可以使用 SPSS,它比 STATA 更​​像是一个 GUI 工具,并且不太全面,例如对于 TOBIT 模型或面板回归等计量经济学工作,特别是离散选择模型。

还有 Eviews——不幸的是我已经忘记了它的大部分内容,只在我的研究中使用它进行了一些简单的回归。因此我只是在这里命名它。 GAUSS 也是如此,它看起来比其他包更数学化。最近听说了Octave,也比较数学化。

就我个人的使用而言,R 远远高于其他任何东西。有时我会在 Python 中将其配对或将其连接到 MySQL 或 PostgreSQL 数据库,这也运行良好。 R 确实可以帮助您学习统计数据,因为您需要了解更多才能做某事,而不是点击 SPSS 之类的工具。不过,如果您需要 GUI,您可以尝试 RKward 或考虑在 Windows 上安装 Komodo / Sciviews-R 或 Tinn-R。后者不是 GUI,而是或多或少支持代码突出显示和代码建议的编辑器,这也有助于完成工作。
《Farnsworth Econometrics in R》是一本很好的读物。啊,我不能忘记提到情节。 Hadley Wickham 的 ggplot2 包就可以解决这个问题。创建图形的最佳方法,只要您不需要它们具有交互性。归根结底,R 确实是最灵活的软件包:您甚至可以将其安装在网络服务器上并构建一些漂亮的网络界面 - 天空是极限。

I´d also like to +1 for R. It might not be as easy to handle as STATA or even SPSS, in particular for non-programmers. Though I guess the average stackoverflower is way more of a programmer than I am.

That being said, i´d like to give a short overview, because I have seen a couple of statistical packages from a users (economists( point of view.

STATA is still the choice for the majority of economists, and indeed it has some pluses. STATAs GUI helps to stay in charge of a load of options and statistical functions. Besides STATA appears to be only package which has a mailing list that comes at least somewhat near to the benchmark: the one-of-a-kind R Mailing list. Still one could write sophisticated .do files or download some from the web.
STATA might not be as close to a programming language as R but still offers a nice programming language for statistical purposes. Depending on the size of you datasets you should check what license you need.

You could also use SPSS which is even more of a GUI Tool than STATA and is a little less comprehensive for example for econometric work such as TOBIT models or Panel regressions, particularly discrete choice models.

There´s also Eviews – unfortunately I have forgot most about it and only used it for a couple of easy regressions in my studies. Thus I just name it here. Same about GAUSS, which appears more mathematical than the rest of the pack. Recently I have heard about Octave, which is also more mathematical.

For my personal usage R is head and shoulders above anything else. Occasionally I pair it in Python or connect it to MySQL or PostgreSQL databases which also works well. R really helps you to learn statistics because you need to understand more in order to do something than you would need clicking your way through the likes of SPSS. Though if you need a GUI, you could try RKward or consider installing Komodo / Sciviews-R or Tinn-R on windows. The latter ones aren´t GUIs, but editors more or less that support Code Highlight and code suggestions which also help to go get it done.
Farnsworth Econometrics in R is a good read. Ah, and I can´t forget to mention the plotting. the ggplot2 package from Hadley Wickham is just off the hook. The best way to create graphics as long as you do not need them to be interactive. At the end of the day R is really to most flexible package: you can even install it on a webserver and build some nice webinterface – the sky is the limit.

森罗 2024-08-27 10:18:44

APL 显然是统计工作中最好的语言之一。但它不是通用目的...

它确实需要特殊的键盘和字体,因为它不使用 ascii。

请参阅一行 APL 中的康威生命游戏,了解可以执行的操作的一些概述与它。

APL is apparently one of the best language around for statistics work. It is not general purpose though...

It does require a special keyboard and font as it does not use ascii.

See Conway's Game of Life in one line of APL for a bit of an overview of what can be done with it.

牵你手 2024-08-27 10:18:44

因为我是统计学的学生。我发现 R 在所有其他统计相关软件中是最好的。也许它有能力对统计数据做任何事情。

As I am the student of statistics. I found R the best among all other statistics related software. perhaps it has capacity to do any thing with the statistics.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文