使用R通过ssl读取csv文件

发布于 2024-09-30 19:32:15 字数 516 浏览 4 评论 0原文

现在全世界都在努力使用 SSL(这个决定很有意义),我们中的一些使用 github 和相关服务来存储 csv 文件的人面临着一些挑战。从 URL 读取时,read.csv() 函数不支持 SSL。为了解决这个问题,我正在跳一种我喜欢称之为 SSL 歌舞伎舞蹈的小舞蹈。我使用 RCurl 获取文本文件,将其写入临时文件,然后使用 read.csv() 读取它。有更顺畅的方法吗?更好的解决方法?

这是 SSL kabuki 的一个简单示例:

require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
temporaryFile <- tempfile()
con <- file(temporaryFile, open = "w")
cat(myCsv, file = con) 
close(con)

read.csv(temporaryFile)

Now that the whole world is clambering to use SSL all the time (a decision that makes a lot of sense) some of us who have used github and related services to store csv files have a little bit of a challenge. The read.csv() function does not support SSL when reading from a URL. To get around this I'm doing a little dance I like to call the SSL kabuki dance. I grab the text file with RCurl, write it to a temp file, then read it with read.csv(). Is there a smoother way of doing this? Better work-arounds?

Here's a simple example of the SSL kabuki:

require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
temporaryFile <- tempfile()
con <- file(temporaryFile, open = "w")
cat(myCsv, file = con) 
close(con)

read.csv(temporaryFile)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

萤火眠眠 2024-10-07 19:32:15

无需将其写入文件 - 只需使用 textConnection()

require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
WhatJDwants <- read.csv(textConnection(myCsv))

No need to write it to a file - just use textConnection()

require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
WhatJDwants <- read.csv(textConnection(myCsv))
单身情人 2024-10-07 19:32:15

使用 Dirk 的建议来探索 method="" 导致了这种稍微简洁的方法,它不依赖于外部 RCurl 包。

temporaryFile <- tempfile()
download.file("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv",destfile=temporaryFile, method="curl")
read.csv(temporaryFile)

但看来我不能只设置 options("download.file.method"="curl")

Using Dirk's advice to explore method="" resulted in this slightly more concise approach which does not depend on the external RCurl package.

temporaryFile <- tempfile()
download.file("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv",destfile=temporaryFile, method="curl")
read.csv(temporaryFile)

But it appears that I can't just set options("download.file.method"="curl")

皓月长歌 2024-10-07 19:32:15

是的 - 请参阅 read.csv() 及其所有同类所指向的 help(download.file) 。那里的 method= 参数有:

method 用于下载文件的方法。目前可以使用“internal”、“wget”、“curl”和“lynx”下载方法,并且有一个值“auto”:请参阅“详细信息”。该方法也可以通过选项“download.file.method”设置:参见options()。

然后将此选项用于 options()

download.file.method:
用于下载文件的方法。目前可以使用“internal”、“wget”和“lynx”下载方法。当选择 method = "auto" 时,此选项没有默认值:请参阅 download.file。

转向外部程序 curl,而不是 RCurl 包。

编辑:看起来我一半对一半错。 read.csv()使用所选方法,需要手动使用download.file()(然后使用< code>curl 或其他选定的方法)。其他确实使用 download.file() 的功能(例如软件包安装或更新)将从设置该选项中受益,但对于京东通过 https 进行有关 csv 文件的初始查询,需要显式下载。在下载文件的read.csv()之前需要file()。

Yes -- see help(download.file) which is pointed to by read.csv() and all its cousins. The method= argument there has:

method Method to be used for downloading files. Currently download methods "internal", "wget", "curl" and "lynx" are available, and there is a value "auto": see ‘Details’. The method can also be set through the option "download.file.method": see options().

and you then use this option to options():

download.file.method:
Method to be used for download.file. Currently download methods "internal", "wget" and "lynx" are available. There is no default for this option, when method = "auto" is chosen: see download.file.

to turn to the external program curl, rather than the RCurl package.

Edit: Looks like I was half-right and half-wrong. read.csv() et al do not use the selected method, one needs to manually employ download.file() (which then uses curl or other selected methods). Other functions that do use download.file() (such as package installation or updates) will profit from setting the option, but for JD's initial query concerning csv files over https, an explicit download.file() is needed before read.csv() of the downloaded file.

安穩 2024-10-07 19:32:15

R 核心应将 R 连接作为 C API 打开。我过去曾提出过这个:

https://stat.ethz.ch/pipermail /r-devel/2006-October/043056.html

没有任何回应。

R core should open up the R connections as a C API. I've proposed this in the past:

https://stat.ethz.ch/pipermail/r-devel/2006-October/043056.html

with no response.

对风讲故事 2024-10-07 19:32:15

鉴于这个问题经常出现,我一直在开发一个包来无缝处理 HTTPS/SSL 数据。该软件包称为rio。它的一个版本位于 CRAN 上,但现在支持此功能的最新版本仅在 GitHub 上提供。安装该软件包后,您可以一行读取数据:

# install and load rio
library("devtools")
install_github("leeper/rio")
library("rio")

# import
import("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
##   a b
## 1 1 2
## 2 2 3
## 3 3 4
## 4 4 5

基本上,import 处理手动下载(使用curl),然后从文件扩展名推断文件格式,从而创建一个数据框无需知道要使用什么功能或如何下载它。

Given that this question comes up a lot, I've been working on a package to seamlessly handle HTTPS/SSL data. The package is called rio. A version of it is on CRAN but the newest version that now supports this is only available on GitHub. Once you've installed the package, you can read in data in one line:

# install and load rio
library("devtools")
install_github("leeper/rio")
library("rio")

# import
import("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
##   a b
## 1 1 2
## 2 2 3
## 3 3 4
## 4 4 5

Basically, import handles the manual download (using curl) and then infers the file format from the file extension, thus creating a dataframe without needing to know what function to use or how to download it.

几度春秋 2024-10-07 19:32:15

我发现自从 Dropbox 改变了使用 https:// 呈现链接的方式后,上述解决方案都不再有效。幸运的是,我并不是第一个发现这一问题的人,Christopher Gandrud 在 r-bloggers 上发布了一个解决方案:

http://www.r-bloggers.com/dropbox-r-data/

在安装了repmis包及其依赖项之后,这种方法对我有用。

I found that since Dropbox changed the way that they present links with https:// none of the above solutions work any more. Fortunately, I wasn't the first to make this discovery, and a solution was posted by Christopher Gandrud on r-bloggers:

http://www.r-bloggers.com/dropbox-r-data/

That approach works for me, after installing the repmis package and its dependencies.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文