专业的 R 开发人员应该拥有哪些核心包,为什么?
有哪些具体实用程序可以帮助 R 开发人员更有效地编码和调试?
我正在寻求建立一个 R 开发环境,并希望了解一些工具的概述,这些工具对我构建单元测试基础设施(包括代码覆盖、调试、生成包文件和帮助文件,也许还有 UML 建模)很有用。
注意:请根据您使用推荐工具的经验,用理由和示例来证明您的答案。 不要只是链接。
相关
What are the specific utilities that can help R developers code and debug more efficiently?
I'm looking to set up an R development environment, and would like an overview of the tools that would be useful to me in crafting a unit testing infrastructure with code coverage, debugging, generation of package files and help files and maybe UML modeling.
Note: Please justify your answers with reasons and examples based on your experience with the tools you recommend. Don't just link.
Related
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我编写了太多的包,因此为了保持事情的可管理性,我在基础设施包上投入了大量时间:这些包帮助我使我的代码更加健壮,并帮助其他人更容易使用。其中包括:
roxygen2
(与 Manuel Eugster 和 Peter Danenberg 合作),它允许您将文档保留在它所记录的函数旁边,这使得我更有可能保留它迄今为止。roxygen2
还具有许多旨在最大限度地减少文档重复的新功能:模板 (@template
)、参数继承 (@inheritParams
) 和函数家庭(@family
)仅举几例。testthat
自动测试我的代码。随着我编写代码的时间越来越少,这一点变得越来越重要:自动化测试会记住函数应该如何工作,即使我不记得。devtools
自动执行许多常见的开发任务(正如 Andrie 提到的)。devtools
的最终目标是让它像R CMD check
一样在后台持续运行,并通知您出现问题的实例。profr
,特别是未发布的交互式浏览器,让我很容易找到我的代码中存在瓶颈。helpr
(与 Barret Schloerke 合作),它将很快为 http://had 提供支持。 co.nz/ggplot2,为 R 文档提供了一个优雅的 html 界面。有用的 R 函数:
apropos
:我总是忘记有用函数的名称,apropos
帮助我找到它们,即使我只记得R 之外的一个片段:
花一些时间学习命令行。从长远来看,您可以采取的任何措施来自动化工作流程的任何部分都会得到回报。从命令行运行 R 会导致一个自然的过程,其中每个项目都有自己的 R 实例;我经常同时运行 2-5 个 R 实例。
使用版本控制。我喜欢
git
和 github。再说一遍,您使用哪个系统并不重要,但要掌握它!我希望 R 拥有的东西:
I have written way too many packages, so to keep things manageable I've invested a lot of time in infrastructure packages: packages that help me make my code more robust and help make it easier for others to use. These include:
roxygen2
(with Manuel Eugster and Peter Danenberg), which allows you to keep documentation next to the function it documents, which it makes it much more likely that I'll keep it up to date.roxygen2
also has a number of new features designed to minimise documentation duplication: templates (@template
), parameter inheritance (@inheritParams
), and function families (@family
) to name a few.testthat
automates the testing of my code. This is becoming more and more important as I have less and less time to code: automated tests remember how the function should work, even when I don't.devtools
automates many common development tasks (as Andrie mentioned). The eventual goal fordevtools
is for it to act likeR CMD check
that runs continuously in the background and notifies you the instance that something goes wrong.profr
, particularly the unreleased interactive explorer, makes it easy for me to find bottlenecks in my code.helpr
(with Barret Schloerke), which will soon power http://had.co.nz/ggplot2, provides an elegant html interface to R documentation.Useful R functions:
apropos
: I'm always forgetting the names of useful functions, andapropos
helps me find them, even if I only remember a fragmentOutside of R:
I use textmate to edit R (and other) files, but I don't think it's really that important. Pick one and learn all it's nooks and crannies.
Spend some time to learn the command line. Anything you can do to automate any part of your workflow will pay off in the long run. Running R from the command line leads to a natural process where each project has it's own instance of R; I often have 2-5 instances of R running at a time.
Use version control. I like
git
and github. Again, it doesn't matter exactly which system you use, but master it!Things I wish R had:
我记得以前有人问过这个问题,我的答案仍然是一样的:Emacs。
借助 ESS, Emacs
MX shell
和/或Mx eshell
,具有 Dired 模式下良好的目录访问功能,具有用于远程访问
不是 Eclipse,也不需要 Java您当然可以将它与您喜欢的任何 CRAN 包结合使用:RUnit 或 testthat,不同的分析支持包、调试包……
其他有用的工具:
R CMD check
确实是您的朋友,因为这是 CRAN 用来决定您是“进入还是退出”的工具;使用它并相信tests/
目录可以通过保存要与输出进行比较(来自先前的R CMD check
运行)来提供单元测试的简化版本,这很有用,但适当的单元测试更好,r -lfoo -e'bar(1, "ab")'
启动一个 R 会话,加载foo
包并计算给定的表达式(这里是一个函数< code>bar() 有两个参数)。这与R CMD INSTALL
相结合,提供了完整的测试周期。As I recall this has been asked before and my answer remains the same: Emacs.
Emacs can
M-x shell
and/orM-x eshell
, has nice directory access functionality with dired mode, has ssh mode for remote access<tongueInCheek>
is not Eclipse and does not require Java</tongueInCheek>
You can of course combine it with whichever CRAN packages you like: RUnit or testthat, the different profiling support packages, the debug package, ...
Additional tools that are useful:
R CMD check
really is your friend as this is what CRAN uses to decide whether you are "in or out"; use it and trust ittests/
directory can offer a simplified version of unit tests by saving to-be-compared against output (from a priorR CMD check
run), this is useful but proper unit tests are betterr -lfoo -e'bar(1, "ab")'
starts an R session, loads thefoo
package and evaluates the given expression (here a functionbar()
with two arguments). This, combined withR CMD INSTALL
, provides a full test cycle.了解并使用基本 R 调试工具的能力是学习快速调试 R 代码的重要第一步。如果您知道如何使用基本工具,则可以在任何地方调试代码,而无需附加软件包中提供的所有额外工具。
traceback()
允许您查看导致错误产生的调用堆栈:
因此我们可以清楚地看到错误发生在函数
bar()
中;我们缩小了错误搜寻的范围。但是如果代码生成警告而不是错误怎么办?这可以通过warn
选项将警告转变为错误来处理:将警告转变为错误。然后您可以使用
traceback()
来追踪它们。与此相关的是让 R 从代码中的错误中恢复,以便您可以调试出错的地方。每当出现错误时,选项(错误=恢复)都会将我们带入调试器框架:
您会看到我们可以进入调用堆栈上的每个框架并查看函数是如何调用的,参数是什么在上面的示例中,我们看到
bar()
传递的是向量而不是矩阵,因此出现错误。options(error = NULL)
将此行为重置为正常。另一个关键函数是
trace()
,它允许您将调试调用插入到现有函数中。这样做的好处是,您可以告诉 R 从源代码中的特定行进行调试:这允许您在代码中的正确位置插入调试调用,而无需单步执行正在进行的函数调用。
如果您想在执行时单步调试函数,则
debug(foo)
将打开函数foo()
的调试器,同时undebug(foo )
将关闭调试器。关于这些选项的一个关键点是,我不需要修改/编辑任何源代码来插入调试调用等。我可以尝试一下,并直接从发生错误的会话中查看问题所在。
有关 R 中调试的不同看法,请参阅 Mark Bravington 的 debug< /a> CRAN 上的包
Knowledge of, and ability to use, the basic R debugging tools is an essential first step in learning to quickly debug R code. If you know how to use the basic tools you can debug code anywhere without having to need all the extra tools provided in add-on packages.
traceback()
allows you to see the call stack leading to an erroryields:
So we can clearly see that the error happened in function
bar()
; we've narrowed down the scope of bug hunt. But what if the code generates warnings, not errors? That can be handled by turning warnings into errors via thewarn
option:will turn warnings into errors. You can then use
traceback()
to track them down.Linked to this is getting R to recover from an error in the code so you can debug what went wrong.
options(error = recover)
will drop us into a debugger frame whenever an error is raised:You see we can drop into each frame on the call stack and see how the functions were called, what the arguments are etc. In the above example, we see that
bar()
was passed a vector not a matrix, hence the error.options(error = NULL)
resets this behaviour to normal.Another key function is
trace()
, which allows you to insert debugging calls into an existing function. The benefit of this is that you can tell R to debug from a particular line in the source:This allows you to insert the debugging calls at the right point in the code without having to step through the proceeding functions calls.
If you want to step through a function as it is executing, then
debug(foo)
will turn on the debugger for functionfoo()
, whilstundebug(foo)
will turn off the debugger.A key point about these options is that I haven't needed to modify/edit any source code to insert debugging calls etc. I can try things out and see what the problem is directly from the session where there error has occurred.
For a different take on debugging in R, see Mark Bravington's debug package on CRAN