专业的 R 开发人员应该拥有哪些核心包,为什么?

发布于 2024-11-26 10:39:51 字数 615 浏览 2 评论 0原文

有哪些具体实用程序可以帮助 R 开发人员更有效地编码和调试?

我正在寻求建立一个 R 开发环境,并希望了解一些工具的概述,这些工具对我构建单元测试基础设施(包括代码覆盖、调试、生成包文件和帮助文件,也许还有 UML 建模)很有用。

注意:请根据您使用推荐工具的经验,用理由和示例来证明您的答案。 不要只是链接

相关

What are the specific utilities that can help R developers code and debug more efficiently?

I'm looking to set up an R development environment, and would like an overview of the tools that would be useful to me in crafting a unit testing infrastructure with code coverage, debugging, generation of package files and help files and maybe UML modeling.

Note: Please justify your answers with reasons and examples based on your experience with the tools you recommend. Don't just link.

Related

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

玉环 2024-12-03 10:39:51

我编写了太多的包,因此为了保持事情的可管理性,我在基础设施包上投入了大量时间:这些包帮助我使我的代码更加健壮,并帮助其他人更容易使用。其中包括:

  • roxygen2(与 Manuel Eugster 和 Peter Danenberg 合作),它允许您将文档保留在它所记录的函数旁边,这使得我更有可能保留它迄今为止。 roxygen2 还具有许多旨在最大限度地减少文档重复的新功能:模板 (@template)、参数继承 (@inheritParams) 和函数家庭(@family)仅举几例。

  • testthat 自动测试我的代码。随着我编写代码的时间越来越少,这一点变得越来越重要:自动化测试会记住函数应该如何工作,即使我不记得。

  • devtools 自动执行许多常见的开发任务(正如 Andrie 提到的)。 devtools 的最终目标是让它像 R CMD check 一样在后台持续运行,并通知您出现问题的实例。

  • profr,特别是未发布的交互式浏览器,让我很容易找到我的代码中存在瓶颈。

  • helpr(与 Barret Schloerke 合作),它将很快为 http://had 提供支持。 co.nz/ggplot2,为 R 文档提供了一个优雅的 html 界面。

有用的 R 函数:

  • apropos:我总是忘记有用函数的名称,apropos 帮助我找到它们,即使我只记得

R 之外的一个片段:

  • < p>我使用textmate来编辑R(和其他)文件,但我认为它并不是那么重要。选择一个并了解它的所有角落和缝隙。

  • 花一些时间学习命令行。从长远来看,您可以采取的任何措施来自动化工作流程的任何部分都会得到回报。从命令行运行 R 会导致一个自然的过程,其中每个项目都有自己的 R 实例;我经常同时运行 2-5 个 R 实例。

  • 使用版本控制。我喜欢 gitgithub。再说一遍,您使用哪个系统并不重要,但要掌握它!

我希望 R 拥有的东西:

  • 代码覆盖工具
  • 依赖管理框架,如 rake 或 jake
  • 更好的内存分析工具
  • 描述数据帧(和其他数据源)的元数据标准
  • 更好的用于以各种输出格式描述和渲染表的工具
  • 包降价渲染

I have written way too many packages, so to keep things manageable I've invested a lot of time in infrastructure packages: packages that help me make my code more robust and help make it easier for others to use. These include:

  • roxygen2 (with Manuel Eugster and Peter Danenberg), which allows you to keep documentation next to the function it documents, which it makes it much more likely that I'll keep it up to date. roxygen2 also has a number of new features designed to minimise documentation duplication: templates (@template), parameter inheritance (@inheritParams), and function families (@family) to name a few.

  • testthat automates the testing of my code. This is becoming more and more important as I have less and less time to code: automated tests remember how the function should work, even when I don't.

  • devtools automates many common development tasks (as Andrie mentioned). The eventual goal for devtools is for it to act like R CMD check that runs continuously in the background and notifies you the instance that something goes wrong.

  • profr, particularly the unreleased interactive explorer, makes it easy for me to find bottlenecks in my code.

  • helpr (with Barret Schloerke), which will soon power http://had.co.nz/ggplot2, provides an elegant html interface to R documentation.

Useful R functions:

  • apropos: I'm always forgetting the names of useful functions, and apropos helps me find them, even if I only remember a fragment

Outside of R:

  • I use textmate to edit R (and other) files, but I don't think it's really that important. Pick one and learn all it's nooks and crannies.

  • Spend some time to learn the command line. Anything you can do to automate any part of your workflow will pay off in the long run. Running R from the command line leads to a natural process where each project has it's own instance of R; I often have 2-5 instances of R running at a time.

  • Use version control. I like git and github. Again, it doesn't matter exactly which system you use, but master it!

Things I wish R had:

  • code coverage tools
  • a dependency management framework like rake or jake
  • better memory profiling tools
  • a metadata standard for describing data frames (and other data sources)
  • better tools for describing and rendering tables in a variety of output formats
  • a package for markdown rendering
以歌曲疗慰 2024-12-03 10:39:51

我记得以前有人问过这个问题,我的答案仍然是一样的:Emacs。

借助 ESS, Emacs

  • 几乎可以用 R 完成任何您想做的事情,包括
    • 各种片段的代码执行(行、区域、函数、缓冲区...)
    • 检查工作区,
    • 显示变量,
    • 多个 R 会话以及它们之间的轻松切换
    • 用于重新运行(部分)之前会话的转录模式
    • 访问帮助系统
    • 还有更多
  • 通过 AucTex 模式可以轻松处理 Latex,这有助于 Sweave for R
  • 拥有与 R 结合使用的任何其他编程语言的模式,无论是 C/C++、Python、shell、SQL,...涵盖自动缩进和颜色突出显示
  • 可以使用sql-*模式访问数据库
  • 可以使用tramp模式远程工作:像本地文件一样访问远程文件(使用ssh/scp)
  • 可以作为守护进程运行,这使其成为有状态的,因此您可以重新连接到同一个Emacs会话,无论是在 X11(或同等版本)下的工作站上还是通过 ssh(有或没有 X11)或 screen 远程。
  • org-mode,它与 babel 一起提供了一个强大的 sweave 替代方案 在本文中讨论(社会)科学家的工作流应用程序
  • 可以通过以下方式运行 shell MX shell 和/或 Mx eshell,具有 Dired 模式下良好的目录访问功能,具有用于远程访问
  • 接口的 ssh 模式,所有源代码存储库都可以通过特定模式(例如 psvn for svn)轻松
  • 实现像 R 一样跨平台,因此您在所有相关操作系统上都有类似的用户界面体验
  • 被广泛使用、广泛可用,并且代码和扩展都在积极开发中,请参阅 emacswiki.org 网站了解后者
  • 不是 Eclipse,也不需要 Java

您当然可以将它与您喜欢的任何 CRAN 包结合使用:RUnit 或 testthat,不同的分析支持包、调试包……

其他有用的工具:

  • R CMD check 确实是您的朋友,因为这是 CRAN 用来决定您是“进入还是退出”的工具;使用它并相信
  • tests/ 目录可以通过保存要与输出进行比较(来自先前的 R CMD check 运行)来提供单元测试的简化版本,这很有用,但适当的单元测试更好,
  • 特别是对于带有目标代码的包,我更喜欢启动新的 R 会话和 littleler 做到了简单:r -lfoo -e'bar(1, "ab")' 启动一个 R 会话,加载 foo 包并计算给定的表达式(这里是一个函数< code>bar() 有两个参数)。这与R CMD INSTALL相结合,提供了完整的测试周期。

As I recall this has been asked before and my answer remains the same: Emacs.

Emacs can

  • do just about anything you want to do with R thanks to ESS, including
    • code execution of various snippets (line, region, function, buffer, ...)
    • inspection of workspaces,
    • display of variables,
    • multiple R sessions and easy switching between them
    • transcript mode for re-running (parts of) previous sessions
    • access to the help system
    • and much more
  • handles Latex with similar ease via the AucTex mode, which helps Sweave for R
  • has modes for whichever other programming languages you combine with R, be it C/C++, Python, shell, SQL, ... covering automatic indentation and colour highlighting
  • can access databases with sql-* mode
  • can work remotely with tramp mode: access remote files as if they were local (uses ssh/scp)
  • can be ran as a daemon which makes it stateful so you can reconnect to your same Emacs session, be it on the workstation under X11 (or equivalent) or remotely via ssh (with or without X11) or screen.
  • has org-mode, which together with babel, provides a powerful sweave alternative as discussed in this paper discussing workflow apps for (social) scientists
  • can run a shell via M-x shell and/or M-x eshell, has nice directory access functionality with dired mode, has ssh mode for remote access
  • interfaces all source code repositories with ease via specific modes (eg psvn for svn)
  • is cross-platform just like R so you have similar user-interface experiences on all relevant operating systems
  • is widely used, widely available and under active development for both code and extensions, see the emacswiki.org site for the latter
  • <tongueInCheek>is not Eclipse and does not require Java</tongueInCheek>

You can of course combine it with whichever CRAN packages you like: RUnit or testthat, the different profiling support packages, the debug package, ...

Additional tools that are useful:

  • R CMD check really is your friend as this is what CRAN uses to decide whether you are "in or out"; use it and trust it
  • the tests/ directory can offer a simplified version of unit tests by saving to-be-compared against output (from a prior R CMD check run), this is useful but proper unit tests are better
  • particularly for packages with object code, I prefer to launch fresh R sessions and littler makes that easy: r -lfoo -e'bar(1, "ab")' starts an R session, loads the foo package and evaluates the given expression (here a function bar() with two arguments). This, combined with R CMD INSTALL, provides a full test cycle.
も让我眼熟你 2024-12-03 10:39:51

了解并使用基本 R 调试工具的能力是学习快速调试 R 代码的重要第一步。如果您知道如何使用基本工具,则可以在任何地方调试代码,而无需附加软件包中提供的所有额外工具。

traceback() 允许您查看导致错误

foo <- function(x) {
    d <- bar(x)
    x[1]
}
bar <- function(x) {
    stopifnot(is.matrix(x))
    dim(x)
}
foo(1:10)
traceback()

产生的调用堆栈:

> foo(1:10)
Error: is.matrix(x) is not TRUE
> traceback()
4: stop(paste(ch, " is not ", if (length(r) > 1L) "all ", "TRUE", 
       sep = ""), call. = FALSE)
3: stopifnot(is.matrix(x))
2: bar(x)
1: foo(1:10)

因此我们可以清楚地看到错误发生在函数 bar() 中;我们缩小了错误搜寻的范围。但是如果代码生成警告而不是错误怎么办?这可以通过 warn 选项将警告转变为错误来处理:

options(warn = 2)

将警告转变为错误。然后您可以使用 traceback() 来追踪它们。

与此相关的是让 R 从代码中的错误中恢复,以便您可以调试出错的地方。每当出现错误时,选项(错误=恢复)都会将我们带入调试器框架:

> options(error = recover)
> foo(1:10)
Error: is.matrix(x) is not TRUE

Enter a frame number, or 0 to exit   

1: foo(1:10)
2: bar(x)
3: stopifnot(is.matrix(x))

Selection: 2
Called from: bar(x)
Browse[1]> x
 [1]  1  2  3  4  5  6  7  8  9 10
Browse[1]> is.matrix(x)
[1] FALSE

您会看到我们可以进入调用堆栈上的每个框架并查看函数是如何调用的,参数是什么在上面的示例中,我们看到 bar() 传递的是向量而不是矩阵,因此出现错误。 options(error = NULL) 将此行为重置为正常。

另一个关键函数是 trace(),它允许您将调试调用插入到现有函数中。这样做的好处是,您可以告诉 R 从源代码中的特定行进行调试:

> x <- 1:10; y <- rnorm(10)
> trace(lm, tracer = browser, at = 10) ## debug from line 10 of the source
Tracing function "lm" in package "stats"
[1] "lm"
> lm(y ~ x)
Tracing lm(y ~ x) step 10 
Called from: eval(expr, envir, enclos)
Browse[1]> n ## must press n <return> to get the next line step
debug: mf <- eval(mf, parent.frame())
Browse[2]> 
debug: if (method == "model.frame") return(mf) else if (method != "qr") warning(gettextf("method = '%s' is not supported. Using 'qr'", 
    method), domain = NA)
Browse[2]> 
debug: if (method != "qr") warning(gettextf("method = '%s' is not supported. Using 'qr'", 
    method), domain = NA)
Browse[2]> 
debug: NULL
Browse[2]> Q
> untrace(lm)
Untracing function "lm" in package "stats"

这允许您在代码中的正确位置插入调试调用,而无需单步执行正在进行的函数调用。

如果您想在执行时单步调试函数,则 debug(foo) 将打开函数 foo() 的调试器,同时 undebug(foo ) 将关闭调试器。

关于这些选项的一个关键点是,我不需要修改/编辑任何源代码来插入调试调用等。我可以尝试一下,并直接从发生错误的会话中查看问题所在。

有关 R 中调试的不同看法,请参阅 Mark Bravington 的 debug< /a> CRAN 上的包

Knowledge of, and ability to use, the basic R debugging tools is an essential first step in learning to quickly debug R code. If you know how to use the basic tools you can debug code anywhere without having to need all the extra tools provided in add-on packages.

traceback() allows you to see the call stack leading to an error

foo <- function(x) {
    d <- bar(x)
    x[1]
}
bar <- function(x) {
    stopifnot(is.matrix(x))
    dim(x)
}
foo(1:10)
traceback()

yields:

> foo(1:10)
Error: is.matrix(x) is not TRUE
> traceback()
4: stop(paste(ch, " is not ", if (length(r) > 1L) "all ", "TRUE", 
       sep = ""), call. = FALSE)
3: stopifnot(is.matrix(x))
2: bar(x)
1: foo(1:10)

So we can clearly see that the error happened in function bar(); we've narrowed down the scope of bug hunt. But what if the code generates warnings, not errors? That can be handled by turning warnings into errors via the warn option:

options(warn = 2)

will turn warnings into errors. You can then use traceback() to track them down.

Linked to this is getting R to recover from an error in the code so you can debug what went wrong. options(error = recover) will drop us into a debugger frame whenever an error is raised:

> options(error = recover)
> foo(1:10)
Error: is.matrix(x) is not TRUE

Enter a frame number, or 0 to exit   

1: foo(1:10)
2: bar(x)
3: stopifnot(is.matrix(x))

Selection: 2
Called from: bar(x)
Browse[1]> x
 [1]  1  2  3  4  5  6  7  8  9 10
Browse[1]> is.matrix(x)
[1] FALSE

You see we can drop into each frame on the call stack and see how the functions were called, what the arguments are etc. In the above example, we see that bar() was passed a vector not a matrix, hence the error. options(error = NULL) resets this behaviour to normal.

Another key function is trace(), which allows you to insert debugging calls into an existing function. The benefit of this is that you can tell R to debug from a particular line in the source:

> x <- 1:10; y <- rnorm(10)
> trace(lm, tracer = browser, at = 10) ## debug from line 10 of the source
Tracing function "lm" in package "stats"
[1] "lm"
> lm(y ~ x)
Tracing lm(y ~ x) step 10 
Called from: eval(expr, envir, enclos)
Browse[1]> n ## must press n <return> to get the next line step
debug: mf <- eval(mf, parent.frame())
Browse[2]> 
debug: if (method == "model.frame") return(mf) else if (method != "qr") warning(gettextf("method = '%s' is not supported. Using 'qr'", 
    method), domain = NA)
Browse[2]> 
debug: if (method != "qr") warning(gettextf("method = '%s' is not supported. Using 'qr'", 
    method), domain = NA)
Browse[2]> 
debug: NULL
Browse[2]> Q
> untrace(lm)
Untracing function "lm" in package "stats"

This allows you to insert the debugging calls at the right point in the code without having to step through the proceeding functions calls.

If you want to step through a function as it is executing, then debug(foo) will turn on the debugger for function foo(), whilst undebug(foo) will turn off the debugger.

A key point about these options is that I haven't needed to modify/edit any source code to insert debugging calls etc. I can try things out and see what the problem is directly from the session where there error has occurred.

For a different take on debugging in R, see Mark Bravington's debug package on CRAN

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文