从zpipted文件下载的文本文件，通过URL访问，下载到工作目录，而不是r中的全局环境

发布于 2025-02-03 12:36:46 字数 1867 浏览 2 评论 0原文

我正在尝试检索匹配特定模式的多个数据TXT文件，形成了我通过URL访问的多个zipperiped文件。我编写了一个脚本，该脚本从URL下载了所需的数据帧文件，将其保存在列表中，然后将所有数据范围放在一起。然后，我将功能贴在URL列表上。

我期望的最终结果是将所有下载的数据表格在R中的全局环境中的单个数据框中所有URL

。一个数据框。我想知道这个问题是否来自download.file，但是我找不到具有类似问题的解决方案或帖子。

# list of urls
url_df = data.frame(model = c("rcp26", "rcp45", "rcp85"),  
                    url = c("https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp26_day_txt.zip",
"https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp45_day_txt.zip",
"https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp85_day_txt.zip"))

# create empty dataframe where data will be saved
downloaded_data = data.frame()

# create function to retrieve desired files from a single url
get_data = function(url) {
  temp <- tempfile() # create temp file
  download.file(url,temp) # download file contained in the url
  
  # get a list of the desired files
  file.list <- grep("KNMI-RACMO22E.*txt|MPI-CSC-REMO.*txt|SMHI-RCA4.*txt", unzip(temp, list=TRUE)$Name, ignore.case=TRUE, value=TRUE)
  
  data.list = lapply(unzip(temp, files=file.list), read.table, header=FALSE,  comment.char = "", check.names = FALSE)
  
  # bind the dataframes in the list into one single dataframe
  bound_data = dplyr::bind_rows(data.list)
  
  downloaded_data = rbind(downloaded_data, bound_data )
  
  return(downloaded_data)
  
  unlink(temp)
}

# apply function over the list of urls
sapply(url_df$url, get_data)

任何帮助将不胜感激！

原文

I am trying to retrieve multiple data txt files, that match a certain pattern, form multiple zipped files that I access through urls. I wrote a script that downloads the desired dataframe files from the url, saving them in a list, then rbinds all the dataframes together. I then sapply the function over a list of urls.

My desired end result is to have all the downloaded data form all urls in a single dataframe in the global environment in R.

Currently however, the individual files get downloaded into my working directory, which I don't want, and are not combined into a single dataframe. I'm wondering whether this problem stems from download.file, but I have been unable to find a solution or posts with similar issues.

# list of urls
url_df = data.frame(model = c("rcp26", "rcp45", "rcp85"),  
                    url = c("https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp26_day_txt.zip",
"https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp45_day_txt.zip",
"https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp85_day_txt.zip"))

# create empty dataframe where data will be saved
downloaded_data = data.frame()

# create function to retrieve desired files from a single url
get_data = function(url) {
  temp <- tempfile() # create temp file
  download.file(url,temp) # download file contained in the url
  
  # get a list of the desired files
  file.list <- grep("KNMI-RACMO22E.*txt|MPI-CSC-REMO.*txt|SMHI-RCA4.*txt", unzip(temp, list=TRUE)$Name, ignore.case=TRUE, value=TRUE)
  
  data.list = lapply(unzip(temp, files=file.list), read.table, header=FALSE,  comment.char = "", check.names = FALSE)
  
  # bind the dataframes in the list into one single dataframe
  bound_data = dplyr::bind_rows(data.list)
  
  downloaded_data = rbind(downloaded_data, bound_data )
  
  return(downloaded_data)
  
  unlink(temp)
}

# apply function over the list of urls
sapply(url_df$url, get_data)

Any help would be greatly appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ぽ尐不点ル 2025-02-10 12:36:47

您无法在功能中参考下载 - 该函数将分别应用于每个URL，然后您可以将它们绑定在一起以创建downloaded_data。对数据的解压缩和阅读也有一些更改，以确保实际读取文件。

# list of urls
url_df = data.frame(model = c("rcp26", "rcp45", "rcp85"),  
                    url = c("https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp26_day_txt.zip",
                            "https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp45_day_txt.zip",
                            "https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp85_day_txt.zip"))

# create function to retrieve desired files from a single url
get_data = function(url) {
  temp <- tempdir() # create temp file
  download.file(url, file.path(temp, "downloaded.zip")) # download file contained in the url
  downloaded_files <- unzip(file.path(temp, "downloaded.zip"), exdir = temp)
  keep_files <- downloaded_files[grep("KNMI-RACMO22E.*txt|MPI-CSC-REMO.*txt|SMHI-RCA4.*txt", 
                                      downloaded_files)]
  data.list <- lapply(keep_files, read.table, header=FALSE,  comment.char = "", check.names = FALSE)
  # bind the dataframes in the list into one single dataframe
  bound_data = dplyr::bind_rows(data.list)
  return(bound_data)
  unlink(temp)
}

# apply function over the list of urls
downloaded_data <- dplyr::bind_rows(lapply(url_df$url, get_data))
dim(downloaded_data)
#> [1] 912962      7

You can't refer to downloaded_data within the function -- the function will be applied to each URL separately, and then you can bind them together to create downloaded_data. There were also some changes to the unzipping and reading in of the data to make sure the files were actually being read in.

# list of urls
url_df = data.frame(model = c("rcp26", "rcp45", "rcp85"),  
                    url = c("https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp26_day_txt.zip",
                            "https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp45_day_txt.zip",
                            "https://b2share.eudat.eu/api/files/d4850267-3ce2-44f4-b5e3-8391a4f3dc27/LTER_site_data_from_EURO-CORDEX-RCMs_rel1.see_disclaimer.77c127c4-2ebe-453b-b5af-61858ff02e31.huss_historical_rcp85_day_txt.zip"))

# create function to retrieve desired files from a single url
get_data = function(url) {
  temp <- tempdir() # create temp file
  download.file(url, file.path(temp, "downloaded.zip")) # download file contained in the url
  downloaded_files <- unzip(file.path(temp, "downloaded.zip"), exdir = temp)
  keep_files <- downloaded_files[grep("KNMI-RACMO22E.*txt|MPI-CSC-REMO.*txt|SMHI-RCA4.*txt", 
                                      downloaded_files)]
  data.list <- lapply(keep_files, read.table, header=FALSE,  comment.char = "", check.names = FALSE)
  # bind the dataframes in the list into one single dataframe
  bound_data = dplyr::bind_rows(data.list)
  return(bound_data)
  unlink(temp)
}

# apply function over the list of urls
downloaded_data <- dplyr::bind_rows(lapply(url_df$url, get_data))
dim(downloaded_data)
#> [1] 912962      7

回复收藏 0 原文

~没有更多了~