识别 R 函数和脚本的依赖关系
我正在筛选一个包和使用该包的脚本,并希望识别外部依赖项。目标是修改脚本以指定library(pkgName)
,并修改包中的函数以使用require(pkgName)
,以便这些依赖关系稍后会更加明显。
我正在修改代码以考虑每个外部依赖包。举个例子,尽管它绝不是确定的,但我现在发现很难识别依赖于 data.table
的代码。我可以将 data.table
替换为 Matrix
、ggplot2
、bigmemory
、plyr
、或许多其他软件包,因此请随意回答基于其他软件包的示例。
这个搜索并不是特别容易。到目前为止我尝试过的方法包括:
- 在代码中搜索
library
和require
语句 - 搜索提及
data.table
(例如Library(data.table)
) - 尝试运行
codetools::checkUsage
来确定可能存在问题的位置。对于脚本,我的程序将脚本插入到本地函数中,并将checkUsage
应用于该函数。否则,我使用checkUsagePackage
作为包。 - 查找
data.table
所特有的语句,例如:=
。 - 寻找可以通过匈牙利表示法来标识对象的类的位置,例如 DT
我搜索的本质是找到:
- 加载 data.table,其
- 名称指示的对象它们是
data.table
对象, - 似乎是
data.table
特定的方法。
其中唯一简单的部分似乎是找到包的加载位置。不幸的是,并非所有函数都可以显式加载或需要外部包 - 这些函数可能假设它已经被加载。这是一个不好的做法,我正在努力解决它。然而,寻找对象和方法似乎具有挑战性。
这个(data.table
)只是一个包,而且它的用途似乎有限且有些独特。假设我想寻找 ggplot 函数的用途,其中选项更广泛,并且语法文本不那么特殊(即频繁使用 +
并不特殊,而 : = 似乎是)。
我不认为静态分析会给出完美的答案,例如可以将参数传递给函数,该函数指定要加载的包。尽管如此:是否有任何核心工具或软件包可以通过静态或动态分析来改进这种强力方法?
就其价值而言,tools::pkgDepends 只解决包级别的依赖关系,而不是函数或脚本级别的依赖关系,这是我正在工作的级别。
更新 1:应该起作用的动态分析工具的一个示例是报告在代码执行期间加载了哪些包的工具。不过,我不知道 R 中是否存在这样的功能 - 它就像 Rprof
报告 search()
的输出而不是代码堆栈。
I am sifting through a package and scripts that utilize the package, and would like to identify external dependencies. The goal is to modify scripts to specify library(pkgName)
and to modify functions in the package to use require(pkgName)
, so that these dependencies will be more obvious later.
I am revising the code to account for each externally dependent package. As an example, though it is by no means definitive, I am now finding it difficult to identify code that depends on data.table
. I could replace data.table
with Matrix
, ggplot2
, bigmemory
, plyr
, or many other packages, so feel free to answer with examples based on other packages.
This search isn't particularly easy. The approaches I have tried so far include:
- Search the code for
library
andrequire
statements - Search for mentions of
data.table
(e.g.library(data.table)
) - Try running
codetools::checkUsage
to determine where there may be some issues. For the scripts, my program inserts the script into a local function and appliescheckUsage
to that function. Otherwise, I usecheckUsagePackage
for the package. - Look for statements that are somewhat unique to
data.table
, such as:=
. - Look for where objects' classes may be identified via Hungarian notation, such as
DT
The essence of my searching is to find:
- loading of
data.table
, - objects with names that indicate they are
data.table
objects, - methods that appear to be
data.table
-specific
The only easy part of this seems to be finding where the package is loaded. Unfortunately, not all functions may explicitly load or require the external package - these may assume it has already been loaded. This is a bad practice, and I am trying to fix it. However, searching for objects and methods seems to be challenging.
This (data.table
) is just one package, and one with what seems to be limited and somewhat unique usage. Suppose I wanted to look for uses of ggplot functions, where the options are more extensive, and the text of the syntax is not as idiosyncratic (i.e. frequent usage of +
is not idiosyncratic, while :=
seems to be).
I don't think that static analysis will give a perfect answer, e.g. one could pass an argument to a function, which specifies a package to be loaded. Nonetheless: are there any core tools or packages that can improve on this brute force approach, either via static or dynamic analysis?
For what it's worth, tools::pkgDepends
only addresses dependencies at the package level, not the function or script level, which is the level I'm working at.
Update 1: An example of a dynamic analysis tool that should work is one that reports which packages are loaded during code execution. I don't know if such a capability exists in R, though - it would be like Rprof
reporting the output of search()
instead of the code stack.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,感谢 @mathematical.coffee 让我走上了使用 Mark Bravington 的
mvbutils
包的道路。foodweb
功能非常令人满意。回顾一下,我想了解如何检查一个包(例如
myPackage
)与另一个包(例如externalPackage
),以及如何根据externalPackage
检查脚本。我将演示如何进行每一项操作。在本例中,外部包是data.table
。1:对于
myPackage
与data.table
,以下命令就足够了:这会生成一个出色的图表,显示哪些函数依赖于
data.table
中的函数。尽管该图包含data.table
内的依赖项,但它并不太繁琐:我可以轻松查看哪些函数依赖于data.table
以及它们使用哪些函数,例如如as.data.table
、data.table
、:=
、key
等。至此,我们可以说包依赖问题已经解决,但是foodweb
提供了更多功能,所以让我们看一下。最酷的部分是依赖矩阵。这很酷:它现在显示了我的包中函数的依赖关系,我在其中使用了详细名称,例如
myPackage.cleanData
,而不是函数在我的包中,即 data.table 中的函数,它消除了没有依赖关系的行和列。这很简洁,让我可以快速调查依赖关系,并且通过处理 rownames(depMat),我也可以很容易地找到我的函数的补充集。
注意:
plotting = FALSE
似乎并没有阻止创建绘图设备,至少在一系列调用中第一次调用foodweb
时是这样。这很烦人,但并不可怕。也许我做错了什么。2:对于脚本与
data.table
来说,这会变得更有趣。对于每个脚本,我需要创建一个临时函数,然后检查依赖关系。我下面有一个小函数可以做到这一点。现在,我只需要查看
listDeps
,我就可以从上面的 depMat 中获得同样精彩的小见解。我从我编写的发送脚本以供codetools::checkUsage
分析的其他代码中修改了checkScriptDependency
;有一个像这样的小函数来分析独立代码是很好的。感谢 @Spacedman 和 @Tommy 获取使用environment()
改进对foodweb
调用的见解。(真正的匈牙利人会注意到我与名称和类型的顺序不一致 - 太糟糕了。:) 这样做有一个更长的原因,但这并不完全是我正在使用的代码。)
虽然我还没有发布
foodweb
为我的代码生成的图表图片,您可以在 http://web.archive.org/web/20120413190726/http://www.sigmafield.org/2010/09/21/r-function-of-the-day-foodweb。就我而言,它的输出肯定捕获了 data.table 对:=
和J
的使用,以及标准命名函数,例如key
和 <代码>as.data.table。它似乎避免了我的文本搜索,并且在几个方面都有改进(例如查找我忽略的功能)。总而言之,
foodweb
是一个出色的工具,我鼓励其他人探索mvbutils
软件包以及 Mark Bravington 的其他一些不错的软件包,例如debug.如果您安装了
mvbutils
,如果您认为只有您在管理不断发展的 R 代码方面遇到困难,只需查看?changed.funs
即可。 :)First, thanks to @mathematical.coffee to putting me on the path of using Mark Bravington's
mvbutils
package. Thefoodweb
function is more than satisfactory.To recap, I wanted to know about about checking one package, say
myPackage
versus another, sayexternalPackage
, and about checking scripts against theexternalPackage
. I'll demonstrate how to do each. In this case, the external package isdata.table
.1: For
myPackage
versusdata.table
, the following commands suffice:This produces an excellent graph showing which functions depend on functions in
data.table
. Although the graph includes dependencies withindata.table
, it's not overly burdensome: I can easily see which of my functions depend ondata.table
, and which functions they use, such asas.data.table
,data.table
,:=
,key
, and so on. At this point, one could say the package dependency problem is solved, butfoodweb
offers so much more, so let's look at that. The cool part is the dependency matrix.This is cool: it now shows dependencies of functions in my package, where I'm using verbose names, e.g.
myPackage.cleanData
, on functions notin my package, namely functions in
data.table
, and it eliminates rows and columns where there are no dependencies. This is concise, lets me survey dependencies quickly, and I can find the complementary set for my functions quite easily, too, by processingrownames(depMat)
.NB:
plotting = FALSE
doesn't seem to prevent a plotting device from being created, at least the first time thatfoodweb
is called in a sequence of calls. That is annoying, but not terrible. Maybe I'm doing something wrong.2: For scripts versus
data.table
, this gets a little more interesting. For each script, I need to create a temporary function, and then check for dependencies. I have a little function below that does precisely that.Now, I just need to look at
listDeps
, and I have the same kind of wonderful little insights that I have from the depMat above. I modifiedcheckScriptDependencies
from other code that I wrote that sends scripts to be analyzed bycodetools::checkUsage
; it's good to have a little function like this around for analyzing standalone code. Kudos to @Spacedman and @Tommy for insights that improved the call tofoodweb
, usingenvironment()
.(True hungaRians will notice that I was inconsistent with the order of name and type - tooBad. :) There's a longer reason for this, but this isn't precisely the code I'm using, anyway.)
Although I've not posted pictures of the graphs produced by
foodweb
for my code, you can see some nice examples at http://web.archive.org/web/20120413190726/http://www.sigmafield.org/2010/09/21/r-function-of-the-day-foodweb. In my case, its output definitely captures data.table's usage of:=
andJ
, along with the standard named functions, likekey
andas.data.table
. It seems to obviate my text searches and is an improvement in several ways (e.g. finding functions that I'd overlooked).All in all,
foodweb
is an excellent tool, and I encourage others to explore themvbutils
package and some of Mark Bravington's other nice packages, such asdebug
. If you do installmvbutils
, just check out?changed.funs
if you think that only you struggle with managing evolving R code. :)