通过Rentrez连接到NCBI时,如何提供我的身份 /电子邮件?
我的项目负责人告诉我,在不发送识别信息(例如我们机构的电子邮件)的情况下连接 NCBI 来检索序列条目是不可接受的。他们声称这意味着如果我们违反了他们的用户指南,NCBI 不会立即阻止我们的连接,他们会先向我们“发送电子邮件”。我们使用 Rstudio 和 Rentrez 包从 NCBI Genbank 检索蛋白质序列。
但我不确定这是必要的,或者如果rentrez有办法做到这一点。作为参考,这是我们代码的一般格式。
sequence <- entrez_fetch(db="nuccore", id=**accession_number**, rettype="fasta")
Rentrez 在他们的文档中表示:“NCBI 将禁止在其用户指南中不使用 EUtils 的 IP。特别是 /enumerated /item 每秒不要发送超过 3 个请求(rentrez 强制执行此限制) /item 如果您计划发送超过 100 个请求的序列时,请在美国 /item 的高峰时间之外执行 对于大型请求,请使用网络历史记录方法(请参阅 entrez_search 的示例或使用 entrez_post 上传IDs)”
entrez_search 和 entrez_post 都包含一个名为“web_history A web_history 对象,用于后续调用 NCBI”的参数,但我不确定这是否是我正在寻找的。
我找不到任何允许用户在连接时向 NCBI 发送识别信息的参数或函数等。
My project head is telling me that its unacceptable to connect with NCBI to retrieve sequence entries without sending along identifying information such as our institution email. They claim this means NCBI won't instantly block our connection if we violate their user guidelines, they'll 'email' us first. We are using Rstudio with the Rentrez package to retrieve protein sequences from NCBI Genbank.
But I'm not certain that's necessary or IF rentrez has any way to even do that. For reference this is general format of our code.
sequence <- entrez_fetch(db="nuccore", id=**accession_number**, rettype="fasta")
Rentrez says on their documentation: "The NCBI will ban IPs that don't use EUtils within their user guidelines. In particular /enumerated /item Don't send more than three request per second (rentrez enforces this limit) /item If you plan on sending a sequence of more than ~100 requests, do so outside of peak times for the US /item For large requests use the web history method (see examples for entrez_search or use entrez_post to upload IDs)"
Both entrez_search and entrez_post include an argument called "web_history A web_history object for use in subsequent calls to NCBI" I'm not sure if this is what I'm looking for though.
I can't find any arguments or functions etc. which allow the user to send identifying information to NCBI when connecting.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看来您需要 API 密钥。您可以以交互方式从您的 NCBI 帐户获取一个,并且需要在您的
.bash_profile
中指定(至少在 Mac 上,使用 bash,不确定您选择的操作系统/终端)。对于命令行使用,只需将其设置为变量,并将以下行添加到您的配置文件中:
然后,只要 R 在启动时加载该配置文件,就应该没问题。
编辑:
这里有一点切题,您可以使用
curl
和wget
等实用程序,甚至 Biostrings 的函数,例如readDNAStringSet()< /code> 没有 API 密钥,但如果您要使用 eutils 访问内容,则需要一个 - 只要您每秒的查询次数超过 X 次 - 但如果您低于该阈值,我认为他们不在乎就这么多。
It seems like you need an API key. You can get one from your NCBI account interactively, and it needs to be specified in your
.bash_profile
(at least on a mac, using bash, not sure your OS / terminal of choice here).For command line usage it just needs to be set as a variable with the following line added to your profile:
Then as long as R is loading up that profile when it spins up, you should be fine.
EDIT:
A bit of a tangential note here, you can grab files from the FTP site with utilities like
curl
andwget
, or even Biostrings' functions likereadDNAStringSet()
without an API key, but if you're going to access things with eutils, you need one - as long as you're going OVER the X-number of queries per second - but if you're under that threshold, i don't think they care that much.