当前位置：文江博客话题详情

Rpy2、pyrserve 和 PypeR 相比如何？

发布于 2024-11-01 04:33:09 字数 76 浏览 4 评论 0原文

我想从 Python 程序中访问 R。我知道 Rpy2、pyrserve 和 PypeR。

这三种选择的优点或缺点是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北渚 2024-11-08 04:33:09

我比其他人更了解这三个中的一个，但按照问题中给出的顺序：

rpy2：

Python和R之间的C级接口（R作为嵌入式进程运行）
暴露给Python的R对象，无需复制数据相反
，Python 的 numpy 数组无需复制即可暴露给 R
低级接口（接近 R C-API）和高级接口（为了方便）
对向量和数组进行就地修改可能的
R 回调函数可以用 Python 实现
可能有带有 Python 标签的匿名 R 对象
Python pickling 可能
通过其控制台完全定制 R 的行为（因此可以实现完整的 R GUI）
MSWindows 提供有限支持

Pyrserve：

本机 Python 代码（将/应该/可能）与 CPython、Jython、IronPython 一起使用）
使用 R 的 Rserve
与远程计算和 RServe 相关的优点和不便

pyper：

本机 Python 代码（将/应该/可能与 CPython、Jython、IronPython 一起使用）
使用管道让 Python 与 R 进行通信（以及与之相关的优点和不便）

编辑： Windows 对 rpy2 的支持

回复收藏 0 原文

情痴 2024-11-08 04:33:09

来自PypeR 统计软件杂志中的论文：

RPy 提供了一种从 Python 访问 R 的简单而有效的方法。它坚固耐用且非常
方便Python和R之间频繁的交互操作。这个包允许
Python 程序将基本数据类型的 Python 对象传递给 R 函数并返回
结果为 Python 对象。这些特性使其成为 Python 和 R 频繁交互的情况下颇具吸引力的解决方案。但是，该软件包仍然存在如下所列的限制。
性能：
对于大型数据集或计算密集型，RPy 可能表现不佳
职责。生成Python不可避免地会消耗大量的时间和内存
R 数据的副本，因为在对话的每一轮中 RPy 都会转换返回的
将 R 表达式的值转换为基本类型的 Python 对象或 NumPy 数组。 RPy2，a
最近开发的 RPy 分支，使用 Python 对象来引用 R 对象而不是
将它们复制回 Python 对象。该策略避免了频繁的数据转换
并提高速度。然而，内存消耗仍然是一个问题。 [...]
当我们实现 WebArray（Xia 等人，2005）（一种用于微阵列数据分析的在线平台）时，如果通过 RPy 而不是通过 R 的命令行用户界面运行 R，则一项作业会多消耗大约四分之一的计算时间。因此，我们决定在后续开发中通过管道在Python中运行R，例如WebArrayDB（Xia et al. 2009），它保留了与独立运行R时相同的性能。我们不知道造成这种性能差异的确切原因，但我们注意到 RPy 直接使用 R 的共享库来运行 R 脚本。相反，通过管道运行 R 意味着直接运行 R 解释器。
内存：
R 因内存使用不经济而受到谴责。大内存使用
size R 对象被删除后很少被释放。有时唯一
从 R 释放内存的方法是退出 R。RPy 模块将 R 包装在 Python 对象中。
但是，即使 Python 对象被删除，R 库也会保留在内存中。在其他方面
也就是说，在主机 Python 脚本终止之前，R 使用的内存无法释放。
便携性：
RPy源码包作为C语言编写的扩展模块，需要编译
在 POSIX（Unix 的便携式操作系统接口）上具有特定的 R 版本
系统，并且 R 必须在启用共享库的情况下进行编译。另外，二进制
Windows 发行版绑定到不同版本的特定组合
Python/R，因此用户经常很难找到适合的发行版
ts用户的软件环境。

From the paper in the Journal of Statistical Software on PypeR:

RPy presents a simple and efficient way of accessing R from Python. It is robust and very
convenient for frequent interaction operations between Python and R. This package allows
Python programs to pass Python objects of basic data types to R functions and return the
results in Python objects. Such features make it an attractive solution for the cases in which Python and R interact frequently. However, there are still limitations of this package as listed below.
Performance:
RPy may not behave very well for large-size data sets or for computation-intensive
duties. A lot of time and memory are inevitably consumed in producing the Python
copy of the R data because in every round of a conversation RPy converts the returned
value of an R expression into a Python object of basic types or NumPy array. RPy2, a
recently developed branch of RPy, uses Python objects to refer to R objects instead of
copying them back into Python objects. This strategy avoids frequent data conversions
and improves speed. However, memory consumption remains a problem. [...]
When we were implementing WebArray (Xia et al. 2005), an online platform for microarray data analysis, a job consumed roughly one quarter more computational time if running R through RPy instead of through R's command-line user interface. Therefore, we decided to run R in Python through pipes in subsequent developments, e.g., WebArrayDB (Xia et al. 2009), which retained the same performance as achieved when running R independently. We do not know the exact reason for such a difference in performance, but we noticed that RPy directly uses the shared library of R to run R scripts. In contrast, running R through pipes means running the R interpreter directly.
Memory:
R has been denounced for its uneconomical use of memory. The memory used by large-
size R objects is rarely released after these objects are deleted. Sometimes the only
way to release memory from R is to quit R. RPy module wraps R in a Python object.
However, the R library will stay in memory even if the Python object is deleted. In other
words, memory used by R cannot be released until the host Python script is terminated.
Portability:
As a module with extensions written in C, the RPy source package has to be compiled
with a specific R version on POSIX (Portable Operating System Interface for Unix)
systems, and the R must be compiled with the shared library enabled. Also, the binary
distributions for Windows are bound to specic combinations of different versions of
Python/R, so it is quite frequent that a user has difficulty in finding a distribution that
ts the user's software environment.

回复收藏 0 原文

雨轻弹 2024-11-08 04:33:09

从开发人员的角度来看，我们过去使用 rpy/rpy2 为基于 Python 的应用程序提供统计和绘图功能。它在交付我们的应用程序时造成了巨大的问题，因为 rpy/rpy2 需要针对 Python 和 R 的特定组合进行编译，这使得我们无法提供开箱即用的二进制发行版，除非我们也捆绑 R。由于 rpy/rpy2 并不是特别容易安装，因此我们最终将相关部分替换为原生 Python 模块（例如 matplotlib）。如果我们必须使用 R，我们会切换到 Pyrserve，因为我们可以在本地启动 R 服务器并连接到它，而不必担心 R 的版本。

回复收藏 0 原文