有关在一列中组合不同数据类型的循环的问题

发布于 2025-01-11 04:04:49 字数 1383 浏览 3 评论 0原文

我有超过 1000 个 csv 文件。我想在运行一些进程后合并到一个文件中。因此，我使用了循环函数，如下所示：

> setwd("C:/....") files <- dir(".", pattern = ".csv$") # Get the names
> of the all csv files in the current directory.
> 
> for (i in 1:length(files)) {   obj_name <- files %>% str_sub(end = -5)
> assign(obj_name[i], read_csv(files[i]))  }

直到这里，它运行良好。

我尝试将导入的文件连接到一个列表中，以便立即操作它们，如下所示：

 command <- paste0("RawList <- list(", paste(obj_name, collapse = ","),
> ")") eval(parse(text = command))
> 
> rm(i, obj_name, command, list = ls(pattern = "^g20")) Ref_com_list =
> list()

直到这里，它仍然可以。但是......

> for (i in 1:length(RawList)) {   df <- RawList[[i]] %>% 
>     pivot_longer(cols = -A, names_to = "B", values_to = "C") %>% 
>     mutate(time_sec = paste(YMD[i], B) %>% ymd_hms())%>% 
>     mutate(minute = format(as.POSIXct(B,format="%H:%M:%S"),"%M")) 
> 
>   ...(some calculation)
>      Ref_com_list [[i]] <- file_all }
> 
> Ref_com_all <- do.call(rbind,Ref_com_list)

当时，我得到了如下错误：

> Error: Can't combine `A` <double> and `B` <datetime<UTC>>. Run
> `rlang::last_error()` to see where the error occurred.

如果我运行单个文件，它工作得很好。但如果我运行 for 循环，就会出现错误。有谁能告诉我问题是什么？

预先非常感谢。

原文

I have more than 1000 csv files. I would like to combine in a single file, after running some processes. So, I used loop function as follow:

> setwd("C:/....") files <- dir(".", pattern = ".csvquot;) # Get the names
> of the all csv files in the current directory.
> 
> for (i in 1:length(files)) {   obj_name <- files %>% str_sub(end = -5)
> assign(obj_name[i], read_csv(files[i]))  }

Until here, it works well.

I tried to concatenate the imported files into a list to manipulate them at once as follow:

 command <- paste0("RawList <- list(", paste(obj_name, collapse = ","),
> ")") eval(parse(text = command))
> 
> rm(i, obj_name, command, list = ls(pattern = "^g20")) Ref_com_list =
> list()

Until here, it still okay. But ...

> for (i in 1:length(RawList)) {   df <- RawList[[i]] %>% 
>     pivot_longer(cols = -A, names_to = "B", values_to = "C") %>% 
>     mutate(time_sec = paste(YMD[i], B) %>% ymd_hms())%>% 
>     mutate(minute = format(as.POSIXct(B,format="%H:%M:%S"),"%M")) 
> 
>   ...(some calculation)
>      Ref_com_list [[i]] <- file_all }
> 
> Ref_com_all <- do.call(rbind,Ref_com_list)

At that time, I got the error as follow:

> Error: Can't combine `A` <double> and `B` <datetime<UTC>>. Run
> `rlang::last_error()` to see where the error occurred.

If I run individual file, it work well. But if I run in for loop, the error showed up.
Does anyone could tell me what the problem is?

Thanks a lot in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦里梦着梦中梦 2025-01-18 04:04:49

您的代码还有很大的改进空间。一般来说，如果您在 tidyverse 中工作，您可以直接将多个文件传递给 read_csv。示例：

# Generate some sample files
tmp_dir <- fs::path_temp("some_csv_files")
fs::dir_create(tmp_dir)
for (i in 1:100) {
    readr::write_csv(mtcars, fs::file_temp(pattern = "cars",
                     tmp_dir = tmp_dir, ext = ".csv"))
}

# Actual file reading
dta_cars <- readr::read_csv(
    file = fs::dir_ls(path = tmp_dir, glob = "*.csv"),
    id = "file_path"
)

如果您想保留有关文件来源的信息，请在 read_csv 中使用 id = "file_path" 将路径详细信息存储在列中。可以说，这比以下方法更有效并且更不容易出错：

for (i in 1:length(files)) { obj_name <- files %>% str_sub(end = -5) 
     分配（obj_name [i]，read_csv（文件[i]））}

这比通过循环增长对象要干净得多并且速度更快。当你的转变取得进展后：

dta_cars %>% ...

There is a substantial scope for improvement in your code. Broadly speaking, if you are working in tidyverse you can pass multiple files to read_csv directly. Example:

# Generate some sample files
tmp_dir <- fs::path_temp("some_csv_files")
fs::dir_create(tmp_dir)
for (i in 1:100) {
    readr::write_csv(mtcars, fs::file_temp(pattern = "cars",
                     tmp_dir = tmp_dir, ext = ".csv"))
}

# Actual file reading
dta_cars <- readr::read_csv(
    file = fs::dir_ls(path = tmp_dir, glob = "*.csv"),
    id = "file_path"
)

If you want to keep information on the file origination, using id = "file_path" in read_csv will store the path details in column. This is arguably more efficient than and less error-prone than:

for (i in 1:length(files)) { obj_name <- files %>% str_sub(end = -5) 
     assign(obj_name[i], read_csv(files[i])) }

This is much cleaner and will be faster than growing object via loop. After you would progress with your transformations:

dta_cars %>% ...

回复收藏 0 原文

噩梦成真你也成魔 2025-01-18 04:04:49

尝试：

library(data.table)

files <- list.files(path = '.', full.names=T, pattern='csv')

files_open <- lapply(files, function(x) fread(x, ...)) # ... for arguments like sep, dec, etc...

big_file <- rbindlist(files_open)

fwrite(big_file, ...) # ... for arguments like sep, dec, path to save data, etc...

try:

library(data.table)

files <- list.files(path = '.', full.names=T, pattern='csv')

files_open <- lapply(files, function(x) fread(x, ...)) # ... for arguments like sep, dec, etc...

big_file <- rbindlist(files_open)

fwrite(big_file, ...) # ... for arguments like sep, dec, path to save data, etc...

回复收藏 0 原文