使用 R 检索 GWAS 信息
I am trying to get specific disease-related information from the GWAS catalog. This can be done directly from the website via a spreadsheet download. But I was wondering if I could possibly do it programmatically in R. Any suggestions will be greatly appreciated.
Thanks.
Avoks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
查看函数 download.file() 和包 rcurl (http://cran.r-project.org/web/packages/RCurl/index.html) - 这应该可以满足您的需求
Checkout the function download.file() and the package rcurl (http://cran.r-project.org/web/packages/RCurl/index.html) - this should do what you are looking for
您必须先下载 .tsv 文件并手动编辑它们。
这是因为 GWAS 目录文件包含 HTML 符号,例如“Behçet 病”中的 §(定义特殊的第四个字母)。这些符号中的#将被R解释为行尾,因此你会收到一条错误消息,例如:
So you download it first, open in plain text编辑器中,自动将每个 # 替换为空字符,然后将其加载到 R 中:
You will have to download .tsv file(s) first and manually edit them.
This is because GWAS Catalog files contain HTML symbols, like § in "Behçet's disease" (defining that special fourth letter). The # in these symbols will be interpreted by R as an end of line, thus you will get an error message, e.g.:
So you downlad it first, open in plain text editor, automatically replace every # with empty character, and only then load it into R with: