检查丢失的软件包并安装它们的优雅方法?
这些天我似乎与合著者共享了很多代码。他们中的许多人是 R 新手/中级用户,并且没有意识到他们必须安装他们尚未安装的软件包。
有没有一种优雅的方法来调用 installed.packages()
,将其与我正在加载的包进行比较,并安装(如果丢失)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
使用 lapply 系列和匿名函数方法,您可以:
||
惰性求值)。打印每个包的最终加载状态 (
TRUE
/FALSE
)。Using lapply family and anonymous function approach you may:
||
lazy evaluation).Print each package final load status (
TRUE
/FALSE
).我意识到这是一个较老的问题,但近几个月和几年来,R 中的许多技术已经得到了极大的改进,以解决这个确切的问题。处理具有可能安装或可能未安装在协作者计算机或目标计算机上的包依赖项的代码共享的普遍接受的现代方法是使用
renv
包(“renv
”是“可复制环境”的缩写),最初作为renv 提供
0.8.0
于 2019 年 10 月下旬发布。版本1.0.0
于 2023 年 7 月 7 日正式发布,截至本文发布,当前 CRAN 版本为版本1.0.2
>,于上个月 2023 年 8 月 15 日发布。该包极大地帮助 R 协作者共享代码、包(它们的确切版本及其依赖项和它们的版本),甚至跟踪用于在 R 中构建和执行代码的确切版本。 R 项目。
要使用它,初始程序员通常会在 R Studio/Posit 中创建一个 R 项目,然后使用命令
renv::init()
在项目内初始化renv
。执行此命令时,R 会执行以下操作:.libPaths()
)。作为此步骤的一部分,renv
还会创建一个renv
目录,其中包含一些大多数用户通常不会像renv
那样关心的设置和文件为您处理这件事。Rprofile
文件,该文件是特定于项目的 R 项目配置文件,用于将 R 配置为在以下情况下使用在步骤 1 中创建的项目特定库:你打开项目。renv.lock
的锁定文件,该文件记录有关项目使用的所有包和版本的详细信息 - 这就是使用renv
的原因与包和包/版本依赖项共享代码最有吸引力的选项。使用
init()
命令初始化项目环境后,初始程序员将按照典型方式开发和编写代码,以通常的方式安装软件包(例如,只需输入install.packages("PackageName ")
进入控制台)按照需要的方式进行。然而,这些包存储在项目库中,而不是任何全局库中。当初始程序员认为代码处于共享整个项目的令人满意的状态时,他只需发出renv::snapshot()
命令,该命令只需更新在上面步骤 3 中创建的锁定文件,其中包含名称项目中使用的所有包的版本号、版本号等。此时,初始程序员可以与其他协作者共享该项目(理想情况下,这是通过 GitHub 完成的,Bitbucket,Azure DevOps 等)。当任何新协作者打开 R 项目时,他们只需在控制台输入命令
renv::restore()
即可。这将安装项目中使用的每个包的完全相同版本(并且它将恢复到项目库而不是用户的全局包库中)。这是一个出色的工作流程,因为它将确保所有用户都使用初始开发人员使用的相同版本的软件包。如果没有
renv
,协作者可能会在其全局库中安装完全不同版本的 R 软件包,并且其功能可能与初始开发人员的软件包不同。这可能会导致代码被破坏,或者更糟糕的是,运行,但会导致不同的输出,最终用户可能不知道。正如renv简介小插图中完美描述的那样,
renv
通过确保包隔离、跨用户和环境的可重复性以及可移植性来为用户带来好处。renv()
还具有其他一些好处,例如 R 包的缓存,这样在跨项目使用时就不必多次下载。 它也适用于 Python 和 Python 包。如果您还想确保环境的每个细节(包括操作系统和其他外部(对于 R/R Studio)依赖项)都已就位,您也可以通过将renv
与Docker 镜像 完全容器化解决方案!最后一点:如果
renv
看起来与packrat
类似,那是因为它在某种程度上确实如此。事实上,rev
的开发者首先尝试使用packrat
构建一个可重现的环境包,但在与它的局限性作斗争后放弃并“推出了自己的”。但我认为它也好得多。好消息是,如果您已经在使用packrat
并且想要升级到这个现代工具,您可以使用renv::migrate()
命令轻松完成此操作!祝你好运,合作愉快!
I realize this is an older question, but much of the technology in R has been greatly improved in recent months and years to address this exact issue. The generally accepted and modern approach for handling the sharing of code with package dependencies that may or may not be installed on a collaborator's computer or the target computer is to use the
renv
package ("renv
" is short for "Reproducible Environment") which was first made available asrenv
0.8.0
in late October of 2019. Version1.0.0
was officially released on July 7, 2023, and, as of this post, the current CRAN release is Version1.0.2
, which was made available last month on August 15, 2023.This package greatly aids R collaborators in sharing code, packages (their exact version and their dependencies and their versions), and even the tracking of the exact version of R that was used to build and execute code in an R project.
To use it, typically the initial programmer creates an R project in R Studio/Posit and then initializes
renv
within the project using the commandrenv::init()
. When this is executed, R does the following:.libPaths()
). As part of this step,renv
also creates anrenv
directory which contains some settings and files that most users don't generally bother with asrenv
takes care of this for you.Rprofile
file, which is a project-specific R project profile that configures R to use the project specific library created in step 1 when you open the project.renv.lock
in the project home directory, which logs details about all the packages and versions used by the project -- this is what makes usingrenv
the most attractive option for sharing code with package and package/version dependencies.After initializing the project environment with the
init()
command, the initial programmer develops and writes code as typical, installing packages in the usual manner (e.g., by simply enteringinstall.packages("PackageName")
into the console) along they way as they are needed. These packages are stored into the project library, however, rather than any global library. When the initial programmer feels the code is in a satisfactory state to share the entire project, he simply issues therenv::snapshot()
command which simply updates the lockfile created in step 3 above, with the names, version numbers, etc. of all the packages used in the project.At this point the initial programmer can share the project with other collaborators (ideally this is done through GitHub, Bitbucket, Azure DevOps, etc.). When any new collaborators open the R project, they simply enter the command
renv::restore()
at the console. This will install the exact same version of every package used in the project (and it will be restored into the project library rather than the user's global package library).This is an excellent workflow because it will ensure all users are working with the same version of the packages the initial developer used. Without
renv
, it's possible that a collaborator has a totally different version of an R package installed in their global library, and it may function differently from the initial developer's package. This could result in the code breaking, or worse yet, running, but resulting in different output, possibly unbeknownst to the end users.As perfectly described on the Introduction to renv vignette,
renv
benefits users by ensuring package isolation, reproducibility across users and environments, and portability.renv()
also enjoys some other benefits like caching of R packages so that they don't have to be downloaded multiple times when used across projects. It also works with Python and Python packages. If you also want to ensure that every detail of an enviroment—including the operating system and other external (to R/R Studio) dependencies—are in place, you can easily do so too by combiningrenv
with a Docker image to fully containerize a solution!One final note: if
renv
seems similar topackrat
, that's because it is, somewhat. In fact the deverlopers ofrev
first attempted to build a reproducible environment package usingpackrat
but gave up and "rolled their own" after struggling with its limitations. But it's vastly better too, I'd argue. The good news is, if you're already usingpackrat
and want to upgrade to this modern tool, you can easily do so with therenv::migrate()
command!Good luck and happy collaborating!
我使用以下命令检查包是否已安装以及依赖项是否已更新,然后加载包。
I use the following which will check if package is installed and if dependencies are updated, then loads the package.
这是我的代码:
Here's my code for it:
这适用于不带引号的包名称,并且相当优雅(参见 GeoObserver 的答案)
This works with unquoted package names and is fairly elegant (cf. GeoObserver's answer)
就我而言,我想要一个可以从命令行运行的单行程序(实际上是通过 Makefile)。这是一个安装“VGAM”和“feather”的示例(如果尚未安装):
从 R 内部,它只是:
除了之前的解决方案之外,这里没有任何内容:
repos
参数(为了避免任何弹出窗口询问要使用的镜像)还要注意重要的
character.only=TRUE
(没有它,require
会尝试加载包p
)。In my case, I wanted a one liner that I could run from the commandline (actually via a Makefile). Here is an example installing "VGAM" and "feather" if they are not already installed:
From within R it would just be:
There is nothing here beyond the previous solutions except that:
repos
parameter (to avoid any popups asking about the mirror to use)Also note the important
character.only=TRUE
(without it, therequire
would try to load the packagep
).让我分享一些疯狂的事情:
Let me share a bit of madness:
是的。如果您有软件包列表,请将其与
installed.packages()[,"Package"]
的输出进行比较,然后安装缺少的软件包。像这样的事情:否则:
如果您将代码放入包中并使它们成为依赖项,那么当您安装包时它们将自动安装。
Yes. If you have your list of packages, compare it to the output from
installed.packages()[,"Package"]
and install the missing packages. Something like this:Otherwise:
If you put your code in a package and make them dependencies, then they will automatically be installed when you install your package.
Dason K. 和我的 pacman 包可以很好地做到这一点。包中的函数
p_load
可以执行此操作。第一行只是确保 pacman 已安装。Dason K. and I have the pacman package that can do this nicely. The function
p_load
in the package does this. The first line is just to ensure that pacman is installed.您可以只使用
require
的返回值:我在安装后使用
library
,因为如果安装不成功或无法安装包,它会抛出异常由于其他原因加载。您可以使其更加健壮和可重用:此方法的缺点是您必须在引号中传递包名称,而对于真正的
require
则不需要这样做。You can just use the return value of
require
:I use
library
after the install because it will throw an exception if the install wasn't successful or the package can't be loaded for some other reason. You make this more robust and reuseable:The downside to this method is that you have to pass the package name in quotes, which you don't do for the real
require
.上面的很多答案(以及这个问题的重复项)都依赖于
installed.packages
,这是一种不好的形式。从文档中:因此,更好的方法是尝试使用
require
加载包,如果加载失败则安装(如果失败,require
将返回FALSE
发现)。我更喜欢这种实现:可以像这样使用:
这样它会加载所有包,然后返回并安装所有缺少的包(如果您愿意,这是一个方便的地方,可以插入提示来询问用户是否想要安装包)。它不是为每个包单独调用
install.packages
,而是只传递一次已卸载包的整个向量。这是相同的功能,但带有一个 Windows 对话框,询问用户是否要安装缺少的软件包
A lot of the answers above (and on duplicates of this question) rely on
installed.packages
which is bad form. From the documentation:So, a better approach is to attempt to load the package using
require
and and install if loading fails (require
will returnFALSE
if it isn't found). I prefer this implementation:which can be used like this:
This way it loads all the packages, then goes back and installs all the missing packages (which if you want, is a handy place to insert a prompt to ask if the user wants to install packages). Instead of calling
install.packages
separately for each package it passes the whole vector of uninstalled packages just once.Here's the same function but with a windows dialog that asks if the user wants to install the missing packages
“ggplot2”是包。它检查软件包是否已安装,如果未安装,则安装它。然后它会加载包,无论它采用哪个分支。
"ggplot2" is the package. It checks to see if the package is installed, if it is not it installs it. It then loads the package regardless of which branch it took.
TL;DR 您可以使用 find.package() 来实现此目的。
这里几乎所有的答案都依赖于(1)require()或(2)installed.packages()来检查给定的包是否已安装。
我添加一个答案是因为这些对于回答这个问题的轻量级方法来说并不能令人满意。
require
具有加载包的命名空间的副作用,这可能并不总是理想的installed.packages
是一个点燃蜡烛的火箭筒 - 它会检查首先,我们先检查已安装软件包的范围,然后检查我们的一个(或几个)软件包是否在此库中“有库存”。没必要为了捞针而大海捞针。这个答案也受到 @ArtemKlevtsov 的精彩答案的启发,本着类似的精神对此问题的重复版本进行了解答。他指出,如果未安装软件包,
system.file(package=x)
可以达到返回''
的预期效果,并且使用nchar > ; 1 否则。
如果我们深入了解
system.file
如何实现这一点,我们可以看到它使用了不同的base
函数find.package
,该函数我们可以直接使用:我们也可以在
find.package
的底层查看它是如何工作的,但这主要是一个指导性的练习——我所看到的精简函数的唯一方法是是跳过一些稳健性检查。但基本思想是:查看.libPaths()
——任何已安装的包pkg
都会在file 处有一个
,因此快速而肮脏的检查是DESCRIPTION
文件。 path(.libPaths(), pkg)file.exists(file.path(.libPaths(), pkg, 'DESCRIPTION')
。TL;DR you can use
find.package()
for this.Almost all the answers here rely on either (1)
require()
or (2)installed.packages()
to check if a given package is already installed or not.I'm adding an answer because these are unsatisfactory for a lightweight approach to answering this question.
require
has the side effect of loading the package's namespace, which may not always be desirableinstalled.packages
is a bazooka to light a candle -- it will check the universe of installed packages first, then we check if our one (or few) package(s) are "in stock" at this library. No need to build a haystack just to find a needle.This answer was also inspired by @ArtemKlevtsov's great answer in a similar spirit on a duplicated version of this question. He noted that
system.file(package=x)
can have the desired affect of returning''
if the package isn't installed, and something withnchar > 1
otherwise.If we look under the hood of how
system.file
accomplishes this, we can see it uses a differentbase
function,find.package
, which we could use directly:We can also look under the hood at
find.package
to see how it works, but this is mainly an instructive exercise -- the only ways to slim down the function that I see would be to skip some robustness checks. But the basic idea is: look in.libPaths()
-- any installed packagepkg
will have aDESCRIPTION
file atfile.path(.libPaths(), pkg)
, so a quick-and-dirty check isfile.exists(file.path(.libPaths(), pkg, 'DESCRIPTION')
.该解决方案将采用包名称的字符向量并尝试加载它们,或者如果加载失败则安装它们。它依赖于
require
的返回行为来执行此操作,因为......因此我们可以简单地查看是否能够加载所需的包,如果不能,则使用依赖项安装它。因此,给定您想要加载的包的字符向量......
This solution will take a character vector of package names and attempt to load them, or install them if loading fails. It relies on the return behaviour of
require
to do this because...Therefore we can simply see if we were able to load the required package and if not, install it with dependencies. So given a character vector of packages you wish to load...
尽管 Shane 的回答确实很好,但对于我的一个项目,我需要自动删除输出消息、警告并安装软件包。我终于设法得到这个脚本:
使用:
Although the answer of Shane is really good, for one of my project I needed to remove the ouput messages, warnings and install packages automagically. I have finally managed to get this script:
Use:
RStudio 中的“Packrat”现已替换为“renv”。请参阅下面的评论。所以我的回答不再相关了。
------------------------------------------------
使用
packrat
使得共享库完全相同并且不改变别人的环境。就优雅和最佳实践而言,我认为您从根本上采取了错误的方式。
packrat
包就是为了解决这些问题而设计的。它由 Hadley Wickham 的 RStudio 开发。 Packrat 使用自己的目录并在其中安装程序的所有依赖项,而不会影响某人的环境,而不必安装依赖项并可能弄乱某人的环境系统。https://rstudio.github.io/packrat/
'Packrat' has been replaced in RStudio by 'renv' now. See comment bellow. So my answer is no longer relavent.
------------------------------------
Use
packrat
so that the shared libraries are exactly the same and not changing other's environment.In terms of elegance and best practice I think you're fundamentally going about it the wrong way. The package
packrat
was designed for these issues. It is developed by RStudio by Hadley Wickham. Instead of them having to install dependencies and possibly mess up someone's environment system,packrat
uses its own directory and installs all the dependencies for your programs in there and doesn't touch someone's environment.https://rstudio.github.io/packrat/
这就是 rbundler 包 的目的:提供一种控制包的方法是为特定项目安装的。现在,该包与 devtools 功能一起使用,将包安装到项目的目录中。其功能类似于 Ruby 的 bundler。
如果您的项目是一个包(推荐),那么您所要做的就是加载 rbundler 并捆绑包。
bundle
函数将查看包的DESCRIPTION
文件以确定要捆绑哪些包。现在软件包将安装在 .Rbundle 目录中。
如果您的项目不是包,那么您可以通过在项目的根目录中创建一个
DESCRIPTION
文件来伪造它,其中的 Depends 字段列出了您想要安装的包(带有可选版本信息):如果您有兴趣做出贡献,这里是该项目的 github 存储库:rbundler。
This is the purpose of the rbundler package: to provide a way to control the packages that are installed for a specific project. Right now the package works with the devtools functionality to install packages to your project's directory. The functionality is similar to Ruby's bundler.
If your project is a package (recommended) then all you have to do is load rbundler and bundle the packages. The
bundle
function will look at your package'sDESCRIPTION
file to determine which packages to bundle.Now the packages will be installed in the .Rbundle directory.
If your project isn't a package, then you can fake it by creating a
DESCRIPTION
file in your project's root directory with a Depends field that lists the packages that you want installed (with optional version information):Here's the github repo for the project if you're interested in contributing: rbundler.
您可以简单地使用
setdiff
函数来获取未安装的软件包,然后安装它们。在下面的示例中,我们在安装之前检查ggplot2
和Rcpp
软件包是否已安装。在一行中,上面的内容可以写成:
You can simply use the
setdiff
function to get the packages that aren't installed and then install them. In the sample below, we check if theggplot2
andRcpp
packages are installed before installing them.In one line, the above can be written as:
当前版本的 RStudio (>=1.2) 包含一项功能,可以检测
library()
和require()
调用中缺少的软件包,并提示用户安装它们:这似乎很好地解决了OP最初的担忧:
The current version of RStudio (>=1.2) includes a feature to detect missing packages in
library()
andrequire()
calls, and prompts the user to install them:This seems to address the original concern of OP particularly well:
当然。
您需要将“已安装的软件包”与“所需的软件包”进行比较。这与我对 CRANberry 所做的非常接近,因为我需要将“存储的已知包”与“当前已知的包”进行比较包来确定新的和/或更新的包。
因此,请执行类似
获取所有已知软件包、模拟调用当前安装的软件包并将其与给定的一组目标软件包进行比较的操作。
Sure.
You need to compare 'installed packages' with 'desired packages'. That's very close to what I do with CRANberries as I need to compare 'stored known packages' with 'currently known packages' to determine new and/or updated packages.
So do something like
to get all known packages, simular call for currently installed packages and compare that to a given set of target packages.
今天,我偶然发现了 rlang 包提供的两个方便的函数,即
is_installed()
和check_installed()
。从帮助页面(添加了重点):
由 reprex 包 (v2.0.1) 创建于 2022 年 3 月 25 日
Today, I stumbled on two handy function provided by the rlang package, namely,
is_installed()
andcheck_installed()
.From the help page (emphasis added):
Created on 2022-03-25 by the reprex package (v2.0.1)
下面这个简单的函数就像一个魅力:(
不是我的,不久前在网上发现了这个,从那时起就一直在使用它。不确定原始来源)
The following simple function works like a charm:
(not mine, found this on the web some time back and had been using it since then. not sure of the original source)
如果
require("")
因找不到包错误而退出,我使用以下函数来安装包。它将查询 CRAN 和 Bioconductor 存储库以查找丢失的软件包。改编自Joshua Wiley的原著,
http://r.789695.n4 .nabble.com/Install-package-automatically-if-not-there-td2267532.html
示例:
PS:
update.packages(ask = FALSE)
&biocLite(character(), Ask=FALSE)
将更新系统上所有已安装的软件包。这可能需要很长时间,并将其视为完整的 R 升级,但可能并不总是有保证!I use following function to install package if
require("<package>")
exits with package not found error. It will query both - CRAN and Bioconductor repositories for missing package.Adapted from the original work by Joshua Wiley,
http://r.789695.n4.nabble.com/Install-package-automatically-if-not-there-td2267532.html
Example:
PS:
update.packages(ask = FALSE)
&biocLite(character(), ask=FALSE)
will update all installed packages on the system. This can take a long time and consider it as a full R upgrade which may not be warranted all the time!我想我会贡献我使用的一个:
Thought I'd contribute the one I use:
我已经实现了静默安装和加载所需 R 包的功能。希望可能有帮助。这是代码:
I have implemented the function to install and load required R packages silently. Hope might help. Here is the code:
非常基本的一个。
Quite basic one.
关于您的主要目标“安装他们还没有的库”,并且无论使用“instllaed.packages()”。以下函数屏蔽了 require 的原始函数。它尝试加载并检查名为“x”的包,如果未安装,则直接安装它,包括依赖项;最后正常加载。您将函数名称从“require”重命名为“library”以保持完整性。唯一的限制是包名称应该被引用。
所以你可以用 R 的旧方式加载和安装包。
需要(“ggplot2”)
要求(“Rcpp”)
Regarding your main objective " to install libraries they don't already have. " and regardless of using " instllaed.packages() ". The following function mask the original function of require. It tries to load and check the named package "x" , if it's not installed, install it directly including dependencies; and lastly load it normaly. you rename the function name from 'require' to 'library' to maintain integrity . The only limitation is packages names should be quoted.
So you can load and installed package the old fashion way of R.
require ("ggplot2")
require ("Rcpp")