是否可以在 testthat 测试或 run_examples() 中使用 R 包数据?
我正在使用 devtools、testthat 和 roxygen2 开发 R 包。我的数据文件夹中有几个数据集(foo.txt 和 bar.csv)。
我的文件结构如下所示:
/ mypackage
/ data
* foo.txt, bar.csv
/ inst
/ tests
* run-all.R, test_1.R
/ man
/ R
我很确定“foo”和“bar”已正确记录:
#' Foo data
#'
#' Sample foo data
#'
#' @name foo
#' @docType data
NULL
#' Bar data
#'
#' Sample bar data
#'
#' @name bar
#' @docType data
NULL
我想在文档示例和单元测试中使用“foo”和“bar”中的数据。
例如,我想在我的测试中使用这些数据集,通过调用进行测试:
data(foo)
data(bar)
expect_that(foo$col[1], equals(bar$col[1]))
而且,我希望文档中的示例如下所示:
#' @examples
#' data(foo)
#' functionThatUsesFoo(foo)
如果我在开发包时尝试调用 data(foo),我会得到错误“未找到数据集‘foo’”。但是,如果我构建包、安装它并加载它,那么我就可以使测试和示例正常工作。
我当前的解决方法是不运行该示例:
#' @examples
#' \dontrun{data(foo)}
#' \dontrun{functionThatUsesFoo(foo)}
并且在测试中,使用本地计算机特定的路径预加载数据:
foo <- read.delim(pathToFoo, sep="\t", fill = TRUE, comment.char="#")
bar <- read.delim(pathToBar, sep=";", fill = TRUE, comment.char="#"
expect_that(foo$col[1], equals(bar$col[1]))
这似乎并不理想 - 特别是因为我正在与其他人协作 - 需要所有协作者具有相同的“foo”和“bar”完整路径。另外,文档中的示例看起来无法运行,尽管安装包后它们可以运行。
有什么建议吗?非常感谢。
I'm working on developing an R package, using devtools, testthat, and roxygen2. I have a couple of data sets in the data folder (foo.txt and bar.csv).
My file structure looks like this:
/ mypackage
/ data
* foo.txt, bar.csv
/ inst
/ tests
* run-all.R, test_1.R
/ man
/ R
I'm pretty sure 'foo' and 'bar' are documented correctly:
#' Foo data
#'
#' Sample foo data
#'
#' @name foo
#' @docType data
NULL
#' Bar data
#'
#' Sample bar data
#'
#' @name bar
#' @docType data
NULL
I would like to use the data in 'foo' and 'bar' in my documentation examples and unit tests.
For example, I would like to use these data sets in my testthat tests by calling:
data(foo)
data(bar)
expect_that(foo$col[1], equals(bar$col[1]))
And, I would like the examples in the documentation to look like this:
#' @examples
#' data(foo)
#' functionThatUsesFoo(foo)
If I try to call data(foo) while developing the package, I get the error "data set 'foo' not found". However, if I build the package, install it, and load it - then I can make the tests and examples work.
My current work-arounds are to not run the example:
#' @examples
#' \dontrun{data(foo)}
#' \dontrun{functionThatUsesFoo(foo)}
And in the tests, pre-load the data using a path specific to my local computer:
foo <- read.delim(pathToFoo, sep="\t", fill = TRUE, comment.char="#")
bar <- read.delim(pathToBar, sep=";", fill = TRUE, comment.char="#"
expect_that(foo$col[1], equals(bar$col[1]))
This does not seem ideal - especially since I'm collaborating with others - requiring all the collaborators to have the same full paths to 'foo' and 'bar'. Plus, the examples in the documentation look like they can't be run, even though once the package is installed, they can.
Any suggestions? Thanks much.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在示例/测试中导入非 RData 文件
我通过查看 JSONIO 包,它显然需要提供一些读取 .RData 类型以外的文件的示例。
我让它在函数级示例中工作,并满足
R CMD check mypackage
以及testthat::test_package()
。(1) 重新组织包结构,使示例数据目录位于
inst
内。在某些时候,R CMD check mypackage
告诉我将非 RData 数据文件移动到inst/extdata
,因此在这个新结构中,它也被重命名。(2) (可选)添加顶级
tests
目录,以便您的新 testthat 测试现在也在R CMD check mypackage
期间运行。run-testthat-mypackage.R 脚本至少应包含以下两行:
请注意,这是允许在 R CMD 检查 mypackage 期间调用 testthat 的部分,否则没有必要。您还应该在描述文件中添加
testthat
作为“建议:”依赖项。(3) 最后,用于指定包内路径的秘密:
如果您查看
system.file()
命令的输出,它会返回包内的完整系统路径R 框架。在 Mac OS X 上,这看起来像这样:对我来说这似乎没问题的原因是,除了包中的路径功能之外,您不会对任何路径功能进行硬编码,因此这种方法相对于其他系统上的其他 R 安装应该是稳健的。
data()
方法至于
data()
语义,据我所知,这是特定于 R 二进制 (.RData
) 文件的在顶级data
目录中。因此,您可以通过预先导入数据文件并使用save()
命令将它们保存到数据目录中来绕过我上面的示例。然而,这假设您只需要显示一个数据已加载到 R 中的示例,而不是重复地演示导入文件的上游过程。Importing non-RData files within examples/tests
I found a solution to this problem by peering at the JSONIO package, which obviously needed to provide some examples of reading files other than those of the .RData variety.
I got this to work in function-level examples, and satisfy both
R CMD check mypackage
as well astestthat::test_package()
.(1) Re-organize your package structure so that example data directory is within
inst
. At some pointR CMD check mypackage
told me to move non-RData data files toinst/extdata
, so in this new structure, that is also renamed.(2) (Optional) Add a top-level
tests
directory so that your new testthat tests are now also run duringR CMD check mypackage
.The
run-testthat-mypackage.R
script should have at minimum the following two lines:Note that this is the part that allows testthat to be called during
R CMD check mypackage
, and not necessary otherwise. You should addtestthat
as a "Suggests:" dependency in your DESCRIPTION file as well.(3) Finally, the secret-sauce for specifying your within-package path:
If you look at the output of the
system.file()
command, it is returning the full system path to your package within the R framework. On Mac OS X this looks something like:The reason this seems okay to me is that you don't hard code any path features other than those within your package, so this approach should be robust relative to other R installations on other systems.
data()
approachAs for the
data()
semantics, as far as I can tell this is specific to R binary (.RData
) files in the top-leveldata
directory. So you can circumvent my example above by pre-importing the data files and saving them with thesave()
command into your data-directory. However, this assumes you only need to show an example in which the data is already loaded into R, as opposed to also reproducibly demonstrating the upstream process of importing the files.根据 @hadley 的评论,
.RData
转换效果很好。对于跨团队成员在不同环境下进行团队协作的更广泛问题,一个常见的模式是就单个环境变量达成一致,例如,FOO_PROJECT_ROOT,团队中的每个人都将在其环境中进行适当的设置。从那时起,您可以使用相对路径,包括跨项目。
特定于 R 的方法是就每个团队成员将在其
.Rprofile
文件中设置的一些数据/功能达成一致。例如,这就是devtools
在非标准位置查找包的方式。最后但并非最不重要的一点是,虽然这不是最佳选择,但您实际上可以将特定于开发人员的代码放入存储库中。如果@hadley 这么做了,那也不是什么坏事。例如,请参阅他如何在
中激活某些行为 >testthat
在他自己的环境中。Per @hadley's comment, the
.RData
conversion will work well.As for the broader question of team collaboration with different environments across team members, a common pattern is to agree on a single environment variable, e.g.,
FOO_PROJECT_ROOT
, that everyone on the team will set up appropriately in their environment. From that point on you can use relative paths, including across projects.An R-specific approach would be to agree on some data/functions that every team member will set up in their
.Rprofile
files. That's, for example, howdevtools
finds packages in non-standard locations.Last but not least, though it is not optimal, you can actually put developer-specific code in your repository. If @hadley does it, it's not such a bad thing. See, for example, how he activates certain behaviors in
testthat
in his own environment.