R 无法从网络下载文件

发布于 01-17 10:47 字数 779 浏览 4 评论 0原文

我可以在浏览器中从此网站下载一个文件 https://www.cmegroup.com/ftpp/ftp/ftp/pub/settle/comex_future。 CSV

但是,当我尝试以下时,

url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"

dest <- "C:\\COMEXfut.csv"

download.file(url, dest)

以下内容,也会收到以下错误消息

Error in download.file(url, dest) : 
  cannot open URL 'https://www.cmegroup.com/ftp/pub/settle/comex_future.csv'
In addition: Warning message:
In download.file(url, dest) :
  InternetOpenUrl failed: 'The operation timed out'

即使选择

options(timeout = max(600, getOption("timeout")))

:有任何想法会发生这种情况吗?谢谢 !

I can download in the browser a file from this website
https://www.cmegroup.com/ftp/pub/settle/comex_future.csv

However when I try the following

url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"

dest <- "C:\\COMEXfut.csv"

download.file(url, dest)

I get the following error message

Error in download.file(url, dest) : 
  cannot open URL 'https://www.cmegroup.com/ftp/pub/settle/comex_future.csv'
In addition: Warning message:
In download.file(url, dest) :
  InternetOpenUrl failed: 'The operation timed out'

even if I choose:

options(timeout = max(600, getOption("timeout")))

any idea why is this happening ? thanks !

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

香草可樂2025-01-24 10:47:08

这里的问题在于,您下载的网站需要其他几个标题。提供它们的最简单方法是使用httr软件包,

library(httr)

url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"
UA <- paste('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0)',
            'Gecko/20100101 Firefox/98.0')

res <- GET(url, add_headers(`User-Agent` = UA, Connection = 'keep-alive'))

该软件包应在不到一秒钟内下载。

如果您想保存文件可以执行

writeBin(res$content, 'myfile.csv')

,或者您只想将数据直接读取为R,甚至不保存它,则可以这样做:

content(res)
#> Rows: 527 Columns: 20                                                                 
#>  0s-- Column specification ----------------------------------------------------------------
#> Delimiter: ","
#> chr (10): PRODUCT SYMBOL, CONTRACT MONTH, CONTRACT DAY, CONTRACT, PRODUCT DESCRIPTIO...
#> dbl (10): CONTRACT YEAR, OPEN, HIGH, LOW, LAST, SETTLE, EST. VOL, PRIOR SETTLE, PRIO...
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 527 x 20
#>    `PRODUCT SYMBOL` `CONTRACT MONTH` `CONTRACT YEAR` `CONTRACT DAY` CONTRACT
#>    <chr>            <chr>                      <dbl> <chr>          <chr>   
#>  1 0GC              07                          2022 NA             0GCN22  
#>  2 4GC              03                          2022 NA             4GCH22  
#>  3 4GC              05                          2022 NA             4GCK22  
#>  4 4GC              06                          2022 NA             4GCM22  
#>  5 4GC              08                          2022 NA             4GCQ22  
#>  6 4GC              10                          2022 NA             4GCV22  
#>  7 4GC              12                          2022 NA             4GCZ22  
#>  8 4GC              02                          2023 NA             4GCG23  
#>  9 4GC              04                          2023 NA             4GCJ23  
#> 10 4GC              06                          2023 NA             4GCM23  
#> # ... with 517 more rows, and 15 more variables: PRODUCT DESCRIPTION <chr>, OPEN <dbl>,
#> #   HIGH <dbl>, HIGH AB INDICATOR <chr>, LOW <dbl>, LOW AB INDICATOR <chr>, LAST <dbl>,
#> #   LAST AB INDICATOR <chr>, SETTLE <dbl>, PT CHG <chr>, EST. VOL <dbl>,
#> #   PRIOR SETTLE <dbl>, PRIOR VOL <dbl>, PRIOR INT <dbl>, TRADEDATE <chr>

The problem here is that the site from which you are downloading needs a couple of additional headers. The easiest way to supply them is using the httr package

library(httr)

url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"
UA <- paste('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0)',
            'Gecko/20100101 Firefox/98.0')

res <- GET(url, add_headers(`User-Agent` = UA, Connection = 'keep-alive'))

This should download in less than a second.

If you want to save the file you can do

writeBin(res$content, 'myfile.csv')

Or if you just want to read the data straight into R without even saving it, you can do:

content(res)
#> Rows: 527 Columns: 20                                                                 
#>  0s-- Column specification ----------------------------------------------------------------
#> Delimiter: ","
#> chr (10): PRODUCT SYMBOL, CONTRACT MONTH, CONTRACT DAY, CONTRACT, PRODUCT DESCRIPTIO...
#> dbl (10): CONTRACT YEAR, OPEN, HIGH, LOW, LAST, SETTLE, EST. VOL, PRIOR SETTLE, PRIO...
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 527 x 20
#>    `PRODUCT SYMBOL` `CONTRACT MONTH` `CONTRACT YEAR` `CONTRACT DAY` CONTRACT
#>    <chr>            <chr>                      <dbl> <chr>          <chr>   
#>  1 0GC              07                          2022 NA             0GCN22  
#>  2 4GC              03                          2022 NA             4GCH22  
#>  3 4GC              05                          2022 NA             4GCK22  
#>  4 4GC              06                          2022 NA             4GCM22  
#>  5 4GC              08                          2022 NA             4GCQ22  
#>  6 4GC              10                          2022 NA             4GCV22  
#>  7 4GC              12                          2022 NA             4GCZ22  
#>  8 4GC              02                          2023 NA             4GCG23  
#>  9 4GC              04                          2023 NA             4GCJ23  
#> 10 4GC              06                          2023 NA             4GCM23  
#> # ... with 517 more rows, and 15 more variables: PRODUCT DESCRIPTION <chr>, OPEN <dbl>,
#> #   HIGH <dbl>, HIGH AB INDICATOR <chr>, LOW <dbl>, LOW AB INDICATOR <chr>, LAST <dbl>,
#> #   LAST AB INDICATOR <chr>, SETTLE <dbl>, PT CHG <chr>, EST. VOL <dbl>,
#> #   PRIOR SETTLE <dbl>, PRIOR VOL <dbl>, PRIOR INT <dbl>, TRADEDATE <chr>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文