在 python 模块中打包支持 R 代码?
我正在尝试使用 rpy2 打包一些调用 R 代码的 Python 代码。该 R 代码当前位于一个单独的文件中,我从 Python 脚本中获取该文件。例如,如果 python 脚本是 myscript.py,那么 R 代码存储在 myscript_support.R 中,并且我在 myscript 中有如下内容。 py:
from rpy2.robjects import *
# Load the R code
r.source(os.path.join(os.path.dirname(__file__), "myscript_support.R"))
# Call the R function
r[["myscript_R_function"]]()
我现在想使用setuptools打包这个Python脚本,我有几个问题:
我应该如何打包R支持代码,一旦我这样做了,我如何找到到的路径R 文件以便我可以获取它?
R 代码依赖于多个R 包。我如何确保这些已安装?如果无法加载这些 R 包,我是否应该引发一个信息性错误?
I am trying to package some of my Python code that calls R code using rpy2. That R code currently sits in a separate file which I source
from the Python script. For example, if the python script is myscript.py
, then the R code is stored in myscript_support.R
, and I have something like the following in myscript.py
:
from rpy2.robjects import *
# Load the R code
r.source(os.path.join(os.path.dirname(__file__), "myscript_support.R"))
# Call the R function
r[["myscript_R_function"]]()
I now want to package this Python script using setuptools, and I have a few questions:
How should I package the R support code, and once I have done so, how do I find the path to the R file so I can source it?
The R code depends on several R packages. How can I ensure that these are installed? Should I just raise an informative error if these R packages cannot be loaded?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这个问题可能已经过时了,但我今天遇到了同样的问题,想为@ivan_pozdeev 建议的问题 1 解决方案提供更多详细信息,并为问题 2 提供新的解决方案。
1) 将 setup.py 文件编辑为:
2) Conda正在迅速成为处理 python 和 R 之间的包依赖关系的好选择。您可以创建一个环境 ( http://conda.pydata.org/docs/using/envs),下载您可能需要的所有 r 和 python 软件包,然后生成一个environment.yml 文件,以便任何人都可以复制您的环境。查看此博客了解更多信息:https://www.continuum.io/content/conda -数据科学
This question might be dated, but I ran into the same issue today and wanted to provide more detail for the question 1 solution suggested by @ivan_pozdeev and a new solution for question 2.
1) Edit your setup.py file to:
2) Conda is quickly becoming a good option for dealing with package dependencies across both python and R. You can create an environment (http://conda.pydata.org/docs/using/envs), download all the r and python packages that you might need, and then generate an environment.yml file so that anyone can replicate your environment. Check out this blog for more info: https://www.continuum.io/content/conda-data-science
好吧,想象一下你自己是 setuptools 打包者,并想想你希望程序员做什么。
对于第一个问题,您有两个选择:
第一个选项可以通过将
include_package_data = True
传递给setup()
并提供要包含在package_data
中的文件掩码来实现>(setuptools 文档,“包括数据文件”部分)。可以使用相对于包目录的路径。这些文件将在运行时通过“资源管理 API”(“在运行时访问数据文件”部分)。第二个选项要求您在调用
setup()
之前将代码添加到 setuptools。例如,您可以添加文件查找器 将相关的 .R 文件添加到find_packages()
的结果中。或者只是通过任意方式生成上一段的文件列表。对于第二个问题,最简单的方法是通过指定
zip_safe = False
。您可以使用
eager_resources
选项来按需提取一组资源 (“自动资源提取”部分)。至于安装第三方 R 软件包,R 安装和管理 - 安装软件包
Well, imagine yourself as the setuptools packager and think of what you would expect the programmer to do.
For the first problem, you have two choices:
The first option is implementable by passing
include_package_data = True
tosetup()
and providing masks of files to include inpackage_data
(setuptools docs, "Including Data Files" section). Paths relative to packages' directories can be used. The files will be accessible at run time at the same relative paths through the "Resource Management API" ("Accessing Data Files at Runtime" section).The second option would require you to add your code to setuptools before invoking
setup()
. For example, you may add a file finder to add relevant .R files to the results offind_packages()
. Or just generate the list of files for the previous paragraph by arbitrary means.For the second problem, the easiest way is to force setuptools to install the package as a directory rather than an .egg by specifying
zip_safe = False
.You might use
eager_resources
option instead that extracts a group of resources on demand ("Automatic Resource Extraction" section).As for installing third-party R packages, an automatable technique is described at R Installation and Administration - Installing packages
对于要安装的源文件,您需要在
package_data
中以某种方式指定它们。您可以像现在一样找到他们的路径。要么让 setup.py 检查它们是否存在(类似于“configtools 方法”),要么在无法加载它们时引发某种异常。或者也许两者都做,然后如果由于某种原因您依赖的文件消失了,至少您会知道。
For the source files to be installed, you need to specify them in some way in
package_data
. You can find their path in the exact same way as you do now.Either make
setup.py
check if they exist (kind of "configtools approach") or just raise some kind of exception once you cannot load them. Or maybe do both of them, and then if for some reason the files you depend on disappear, at least you will know it.