通过 R Console 从网络下载文件

发布于 2024-12-21 06:24:11 字数 615 浏览 1 评论 0原文

我想通过下载链接使用 R 下载日志文件,但我只得到未评估的 html。

这是我尝试过的,但没有成功:

url = "http://statcounter.com/p7447608/csv/download_log_file?ufrom=1323783441&uto=1323860282"

# SSL-certificate:
CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")

curlH = getCurlHandle(
    header = FALSE,
    verbose = TRUE,
    netrc = TRUE,
    maxredirs = as.integer(20),
    followlocation = TRUE,
    userpwd = "me:mypassw",
    ssl.verifypeer = TRUE)

setwd(tempdir())
destfile = "log.csv"
x = getBinaryURL(url, curl = curlH,
                 cainfo = CAINFO) 

shell.exec(dir())

I'd like to download a log-file with R via a download link, but I get only the un-evaluated html.

this is what I tried, without any success:

url = "http://statcounter.com/p7447608/csv/download_log_file?ufrom=1323783441&uto=1323860282"

# SSL-certificate:
CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")

curlH = getCurlHandle(
    header = FALSE,
    verbose = TRUE,
    netrc = TRUE,
    maxredirs = as.integer(20),
    followlocation = TRUE,
    userpwd = "me:mypassw",
    ssl.verifypeer = TRUE)

setwd(tempdir())
destfile = "log.csv"
x = getBinaryURL(url, curl = curlH,
                 cainfo = CAINFO) 

shell.exec(dir())

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

恏ㄋ傷疤忘ㄋ疼 2024-12-28 06:24:11

以下是下载文件的两种方法。

当将文件重命名为 log.html 并打开它时,我们的登录似乎无效。这就是为什么你会得到 html 结构。您需要将登录凭据添加到 URL。

您可以从 html 源代码中获取名称值对:

<label for="username2">Username:</label>
<input type="text" id="username2" name="form_user" value="" size="12" maxlength="64" class="large">
<span class="label-overlay">
<label for="password2">Password:</label>
<input type="password" name="form_pass" id="password2" value="" size="12" maxlength="64" class="large"> 

如您所见,用户名的名称值对称为 form_user=USERNAME,密码的名称值对称为 form_pass=PASSWORD。

这就是为什么curl userpwd 设置不起作用,它无法识别ID 或名称。

 ## Url for downloading - Does not contain login credentials.
 url <- "http://statcounter.com/p7447608/csv/download_log_file?ufrom=1323783441&uto=1323860282" 

 USERNAME = 'your username'
 PASSWORD = 'your password'

 ## Url for downloading - Does contain login credentials. Use this one!! 
 url <- paste( 'http://statcounter.com/p7447608/csv/download_log_file?ufrom=1323783441&uto=1323860282&form_user=', USERNAME, '&form_pass=', PASSWORD, sep = '') 


 ## method one, using download file
 download.file(url, destfile = "log.csv" )

 csv.data <- read.csv("log.csv" )
 head(csv.data)

 ## method 2 using curl
 CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")

 cookie = 'cookiefile.txt'
 curlH = getCurlHandle(
 cookiefile = cookie,
 useragent =  "Mozilla/5.0 (Windows; U; Windows NT 5.1; en - US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6",
     header = FALSE,
     verbose = TRUE,
     netrc = TRUE,
     maxredirs = as.integer(20),
     followlocation = TRUE,
     ssl.verifypeer = TRUE)


 destfile = "log2.csv"
 content = getBinaryURL(url, curl = curlH, cainfo = CAINFO)
 ## write to file
 writeBin(content, destfile)
 ## read from binary object
 csv.data2 <- read.csv(textConnection(rawToChar(content)))
 head(csv.data2)
 csv.data2 == csv.data

Here are two ways of downloading the file.

It seems when renaming the file to log.html and opening it, that we have an invalid login. This is why you get the html structure. You need to add the login credentials to the URL.

You can get the name value pairs from the html source code:

<label for="username2">Username:</label>
<input type="text" id="username2" name="form_user" value="" size="12" maxlength="64" class="large">
<span class="label-overlay">
<label for="password2">Password:</label>
<input type="password" name="form_pass" id="password2" value="" size="12" maxlength="64" class="large"> 

As you can see the name value pair for the username is called form_user=USERNAME and the name value pair for the password is called form_pass=PASSWORD.

This is why the curl userpwd setting doesn't work, it doesn't recognize the ids or the names.

 ## Url for downloading - Does not contain login credentials.
 url <- "http://statcounter.com/p7447608/csv/download_log_file?ufrom=1323783441&uto=1323860282" 

 USERNAME = 'your username'
 PASSWORD = 'your password'

 ## Url for downloading - Does contain login credentials. Use this one!! 
 url <- paste( 'http://statcounter.com/p7447608/csv/download_log_file?ufrom=1323783441&uto=1323860282&form_user=', USERNAME, '&form_pass=', PASSWORD, sep = '') 


 ## method one, using download file
 download.file(url, destfile = "log.csv" )

 csv.data <- read.csv("log.csv" )
 head(csv.data)

 ## method 2 using curl
 CAINFO = paste(system.file(package="RCurl"), "/CurlSSL/ca-bundle.crt", sep = "")

 cookie = 'cookiefile.txt'
 curlH = getCurlHandle(
 cookiefile = cookie,
 useragent =  "Mozilla/5.0 (Windows; U; Windows NT 5.1; en - US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6",
     header = FALSE,
     verbose = TRUE,
     netrc = TRUE,
     maxredirs = as.integer(20),
     followlocation = TRUE,
     ssl.verifypeer = TRUE)


 destfile = "log2.csv"
 content = getBinaryURL(url, curl = curlH, cainfo = CAINFO)
 ## write to file
 writeBin(content, destfile)
 ## read from binary object
 csv.data2 <- read.csv(textConnection(rawToChar(content)))
 head(csv.data2)
 csv.data2 == csv.data
沧笙踏歌 2024-12-28 06:24:11

您似乎不需要 SSL 证书等,因为网址是 http:,而不是 https:...所以也许 download.file(url, "log .csv") 在这种情况下可以正常工作吗?

我首先要确保 URL 及其响应在 R 之外是正确的。

...我使用 Chrome 访问 URL 并获得了下载的文件“StatCounter-Log-7447608.csv”。它包含 csv 标头 HTML?!

"Date and Time","IP Address","IP Address Label","Browser","Version","OS","Resolution","Country","Region","City","Postal Code","ISP","Returning Count","Page URL","Page Title","Came From","SE Name","SE Host","SE Term"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Author" content="StatCounter">
...

You don't seem to need SSL certificates etc since the url is http:, not https:... So maybe download.file(url, "log.csv") would work fine in this case?

I'd first make sure the url and its response is correct outside of R.

...I used Chrome to access the URL and got a downloaded file "StatCounter-Log-7447608.csv". It contains a csv header and HTML?!

"Date and Time","IP Address","IP Address Label","Browser","Version","OS","Resolution","Country","Region","City","Postal Code","ISP","Returning Count","Page URL","Page Title","Came From","SE Name","SE Host","SE Term"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Author" content="StatCounter">
...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文