将 bash 脚本合并到 R 包中?
背景
我正在编写一个 R 包来支持可重复的研究。此时,工作流程主要由 bash 脚本组合在一起,我可以通过发送诸如 ./runscript.sh
之类的单个命令来运行分析。我使用 bash 进行以下操作:
- 文件操作
tar
、rsync
、“重命名” - 本地运行 bash 文件以及通过
ssh
- 使用
运行 R 脚本>R --vanilla
依次调用 R 函数, - 使用
sed
查找并替换文件中的文本 - 通过
qsub
提交作业
在我看来,这会是很多从 R 函数或 R 脚本更高效(更干净、更轻松)地执行整个工作流程。我偏爱 R,因为我对它更熟悉并且主要在 emacs ESS 中工作。
问题
是否值得使用
system
和files
函数将 bash 的所有这些用法封装在 R 中?是否还有其他我尚未发现的 R 软件包有助于执行此操作?
注释
根据 Al3xa 的回答,我意识到重要的是要注意使用例如的速度损失。 R 与 bash 版本的 tar 和 gsub 在 1000-2000 个文件上可能会低于工作流程中当前的速率限制步骤:JAGS(约 10-20 分钟)和 FORTRAN(> 4 小时)的计算
Background
I am writing an R package to support reproducible research. At this point, the workflow is mostly held together by bash scripts, and I can run an analysis by sending a single command like ./runscript.sh
. I use bash for the following:
- file manipulation
tar
,rsync
, 'rename' - running bash files locally and via
ssh
- running R scripts using
R --vanilla
that in turn call R functions - find and replace text within files using
sed
- submitting jobs via
qsub
It seems to me that it would be much more efficient (cleaner and easier) to execute the entire workflow from an R function or R script. I am partial to R since I am more familiar with it and mostly work within emacs ESS.
Questions
Would it be worthwhile to encapsulate all of these uses of bash within R using the
system
andfiles
functions?Are there other R packages that I have not yet found that would be helpful for doing this?
Notes
Following Al3xa's answer, I realize that it is important to note that the speed penalty of using eg. R vs bash versions of tar and gsub on 1000-2000 files would likely be less than the current rate limiting steps in the workflow: computations by JAGS (~10-20min) and FORTRAN (>4hrs)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我非常喜欢使用 R 作为“集成”环境而不是 bash 脚本。我正在将所有 bash 和 ruby 脚本移动到 Rscript,因为我需要对它们进行更改。
不将所有内容都转移到 R 中的原因只有几个。我主要指的是使用 Rscript 来完成此任务
1) 速度,从我的测试来看,速度在我遇到的任何情况下都是中等影响,并且相对于您提到的时间来说是微不足道的。
2) 可移植性,因为 Rscript 等的路径在不同系统中可能不同。我在 OS X 上编写东西并将其移动到 Linux 服务器上没有任何问题,但在 Windows 上可能会崩溃。
我的书的优点是:
1)我写起来更容易。我不必在条件语句和 for 循环之类的轻微特性之间来回切换。
2)更加宽容。我无法描述我花了多少时间尝试让 bash 脚本正常工作,因为我不小心在不应该有的地方出现了空格。 R 在这方面要好得多(是的,当然,我们都应该完美地遵循 R 中的约定,但我宁愿它不要让我在我不这样做的情况下耽搁几个小时)。
3)我的工作做得更好。对于
tar
文件来说,这并不重要,但我发现我在 R 中比在awk/sed
中进行更好的文本操作。回复:有用的软件包——据我所知,这并不存在,但我喜欢基于 R 的
make
版本。make 的语法是最不灵活的语法之一(制表符 vs 空格?真的吗?) - 我很想写一个基于 R 的替代方案。有一天,我会...I'm a big fan of using R as your "integrated" environment vs. bash scripts. I'm in the process of moving all of my bash and ruby scripts to Rscript as I need to make changes to them.
There are only a couple of reasons not to move everything into R that come to mind. I'm referring mainly to using Rscript to accomplish this
1) Speed, which from my testing is a moderate impact in any situation I've come across, and would be trivial relative to the times you mentioned.
2) Portability, in that paths to Rscript, etc. may be different across systems. I've had no problems writing things on OS X and moving them to a Linux server, but might break on Windows.
The advantages in my book are:
1) Much easier for me to write. I don't have to switch back and forth between the slight idiosyncrasies with things like conditional statements and for loops.
2) More forgiving. I can't describe how much time I've spent trying to get bash scripts to work because I accidentally had a space where I shouldn't have. R is much nicer in that regard (yes, of course, we should all follow conventions in R perfectly, but I'd rather that it not stall me up for hours if I don't).
3) I do better work. For
tar
a file it doesn't matter, but I find I do better text manipulation in R vs.awk/sed
for example.Re: packages that are helpful -- This doesn't exist, to my knowledge, but I'd love a version of
make
that's based on R. make's syntax is one of the most inflexible out there (tabs vs spaces? really?) - I'd love to write an R-based alternative. Some day, I will...嗯,有诸如
tar
、gsub
等功能。无论如何,我猜您愿意创建一个跨平台解决方案。为了速度,您应该更喜欢 bash,并且仅将 R 用于特定于 R 的函数。我发现将所有基于系统的命令包装在system
和/或file.*
中没有用...它会慢得多...如果你'如果使用 Linux,我建议使用Rscript
接口来littler
。Well, there are functions like
tar
,gsub
etc. Anyway, I guess you're willing to create a crossplatform solution. You should prefer bash for the sake of speed, and use R only for R-specific functions. I don't find it useful to wrap all system-based commands withinsystem
and/orfile.*
... it would be much slower... If you're using Linux, I suggestlittler
overRscript
interface.