Rpy2、pyrserve 和 PypeR 相比如何?

发布于 2024-11-01 04:33:09 字数 76 浏览 4 评论 0原文

我想从 Python 程序中访问 R。我知道 Rpy2、pyrserve 和 PypeR。

这三种选择的优点或缺点是什么?

I would like to access R from within a Python program. I am aware of Rpy2, pyrserve and PypeR.

What are the advantages or disadvantages of these three options?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

北渚 2024-11-08 04:33:09

我比其他人更了解这三个中的一个,但按照问题中给出的顺序:

rpy2:

  • Python和R之间的C级接口(R作为嵌入式进程运行)
  • 暴露给Python的R对象,无需复制数据相反
  • ,Python 的 numpy 数组无需复制即可暴露给 R
  • 低级接口(接近 R C-API)和高级接口(为了方便)
  • 对向量和数组进行就地修改 可能的
  • R 回调函数可以用 Python 实现
  • 可能有带有 Python 标签的匿名 R 对象
  • Python pickling 可能
  • 通过其控制台完全定制 R 的行为(因此可以实现完整的 R GUI)
  • MSWindows 提供有限支持

Pyrserve:

  • 本机 Python 代码(将/应该/可能)与 CPython、Jython、IronPython 一起使用)
  • 使用 R 的 Rserve
  • 与远程计算和 RServe 相关的优点和不便

pyper:

  • 本机 Python 代码(将/应该/可能与 CPython、Jython、IronPython 一起使用)
  • 使用管道让 Python 与 R 进行通信(以及与之相关的优点和不便)

编辑: Windows 对 rpy2 的支持

I know one of the 3 better than the others, but in the order given in the question:

rpy2:

  • C-level interface between Python and R (R running as an embedded process)
  • R objects exposed to Python without the need to copy the data over
  • Conversely, Python's numpy arrays can be exposed to R without making a copy
  • Low-level interface (close to the R C-API) and high-level interface (for convenience)
  • In-place modification for vectors and arrays possible
  • R callback functions can be implemented in Python
  • Possible to have anonymous R objects with a Python label
  • Python pickling possible
  • Full customization of R's behavior with its console (so possible to implement a full R GUI)
  • MSWindows with limited support

pyrserve:

  • native Python code (will/should/may work with CPython, Jython, IronPython)
  • use R's Rserve
  • advantages and inconveniences linked to remote computation and to RServe

pyper:

  • native Python code (will/should/may work with CPython, Jython, IronPython)
  • use of pipes to have Python communicate with R (with the advantages and inconveniences linked to it)

edit: Windows support for rpy2

情痴 2024-11-08 04:33:09

来自PypeR 统计软件杂志中的论文:

RPy 提供了一种从 Python 访问 R 的简单而有效的方法。它坚固耐用且非常
方便Python和R之间频繁的交互操作。这个包允许
Python 程序将基本数据类型的 Python 对象传递给 R 函数并返回
结果为 Python 对象。这些特性使其成为 Python 和 R 频繁交互的情况下颇具吸引力的解决方案。但是,该软件包仍然存在如下所列的限制。
性能:
对于大型数据集或计算密集型,RPy 可能表现不佳
职责。生成Python不可避免地会消耗大量的时间和内存
R 数据的副本,因为在对话的每一轮中 RPy 都会转换返回的
将 R 表达式的值转换为基本类型的 Python 对象或 NumPy 数组。 RPy2,a
最近开发的 RPy 分支,使用 Python 对象来引用 R 对象而不是
将它们复制回 Python 对象。该策略避免了频繁的数据转换
并提高速度。然而,内存消耗仍然是一个问题。 [...]
当我们实现 WebArray(Xia 等人,2005)(一种用于微阵列数据分析的在线平台)时,如果通过 RPy 而不是通过 R 的命令行用户界面运行 R,则一项作业会多消耗大约四分之一的计算时间。因此,我们决定在后续开发中通过管道在Python中运行R,例如WebArrayDB(Xia et al. 2009),它保留了与独立运行R时相同的性能。我们不知道造成这种性能差异的确切原因,但我们注意到 RPy 直接使用 R 的共享库来运行 R 脚本。相反,通过管道运行 R 意味着直接运行 R 解释器。
内存:
R 因内存使用不经济而受到谴责。大内存使用
size R 对象被删除后很少被释放。有时唯一
从 R 释放内存的方法是退出 R。RPy 模块将 R 包装在 Python 对象中。
但是,即使 Python 对象被删除,R 库也会保留在内存中。在其他方面
也就是说,在主机 Python 脚本终止之前,R 使用的内存无法释放。
便携性:
RPy源码包作为C语言编写的扩展模块,需要编译
在 POSIX(Unix 的便携式操作系统接口)上具有特定的 R 版本
系统,并且 R 必须在启用共享库的情况下进行编译。另外,二进制
Windows 发行版绑定到不同版本的特定组合
Python/R,因此用户经常很难找到适合的发行版
ts用户的软件环境。

From the paper in the Journal of Statistical Software on PypeR:

RPy presents a simple and efficient way of accessing R from Python. It is robust and very
convenient for frequent interaction operations between Python and R. This package allows
Python programs to pass Python objects of basic data types to R functions and return the
results in Python objects. Such features make it an attractive solution for the cases in which Python and R interact frequently. However, there are still limitations of this package as listed below.
Performance:
RPy may not behave very well for large-size data sets or for computation-intensive
duties. A lot of time and memory are inevitably consumed in producing the Python
copy of the R data because in every round of a conversation RPy converts the returned
value of an R expression into a Python object of basic types or NumPy array. RPy2, a
recently developed branch of RPy, uses Python objects to refer to R objects instead of
copying them back into Python objects. This strategy avoids frequent data conversions
and improves speed. However, memory consumption remains a problem. [...]
When we were implementing WebArray (Xia et al. 2005), an online platform for microarray data analysis, a job consumed roughly one quarter more computational time if running R through RPy instead of through R's command-line user interface. Therefore, we decided to run R in Python through pipes in subsequent developments, e.g., WebArrayDB (Xia et al. 2009), which retained the same performance as achieved when running R independently. We do not know the exact reason for such a difference in performance, but we noticed that RPy directly uses the shared library of R to run R scripts. In contrast, running R through pipes means running the R interpreter directly.
Memory:
R has been denounced for its uneconomical use of memory. The memory used by large-
size R objects is rarely released after these objects are deleted. Sometimes the only
way to release memory from R is to quit R. RPy module wraps R in a Python object.
However, the R library will stay in memory even if the Python object is deleted. In other
words, memory used by R cannot be released until the host Python script is terminated.
Portability:
As a module with extensions written in C, the RPy source package has to be compiled
with a specific R version on POSIX (Portable Operating System Interface for Unix)
systems, and the R must be compiled with the shared library enabled. Also, the binary
distributions for Windows are bound to specic combinations of different versions of
Python/R, so it is quite frequent that a user has difficulty in finding a distribution that
ts the user's software environment.

雨轻弹 2024-11-08 04:33:09

从开发人员的角度来看,我们过去使用 rpy/rpy2 为基于 Python 的应用程序提供统计和绘图功能。它在交付我们的应用程序时造成了巨大的问题,因为 rpy/rpy2 需要针对 Python 和 R 的特定组合进行编译,这使得我们无法提供开箱即用的二进制发行版,除非我们也捆绑 R。由于 rpy/rpy2 并不是特别容易安装,因此我们最终将相关部分替换为原生 Python 模块(例如 matplotlib)。如果我们必须使用 R,我们会切换到 Pyrserve,因为我们可以在本地启动 R 服务器并连接到它,而不必担心 R 的版本。

From a developer's prospective, we used to use rpy/rpy2 to provide statistical and drawing functions to our Python-based application. It has caused huge problems in delivering our application because rpy/rpy2 needs to be compiled for specific combinations of Python and R, which makes it infeasible for us to provide binary distributions that work out of box unless we bundle R as well. Because rpy/rpy2 are not particularly easy to install, we ended up replacing relevant parts with native Python modules such as matplotlib. We would have switched to pyrserve if we had to use R because we could start a R server locally and connect to it without worrying about the version of R.

剩余の解释 2024-11-08 04:33:09

在pyper中,我无法使用assign()将大矩阵从python传递到r实例。不过,我对 rpy2 没有任何问题。
这只是我的经验。

in pyper, i can't pass large matrix from python to r instance with assign(). however, i don't have issue with rpy2.
it is just my experience.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文