有关在一列中组合不同数据类型的循环的问题

发布于 2025-01-11 04:04:49 字数 1383 浏览 3 评论 0原文

我有超过 1000 个 csv 文件。我想在运行一些进程后合并到一个文件中。因此,我使用了循环函数,如下所示:

> setwd("C:/....") files <- dir(".", pattern = ".csv$") # Get the names
> of the all csv files in the current directory.
> 
> for (i in 1:length(files)) {   obj_name <- files %>% str_sub(end = -5)
> assign(obj_name[i], read_csv(files[i]))  }

直到这里,它运行良好。

我尝试将导入的文件连接到一个列表中,以便立即操作它们,如下所示:

 command <- paste0("RawList <- list(", paste(obj_name, collapse = ","),
> ")") eval(parse(text = command))
> 
> rm(i, obj_name, command, list = ls(pattern = "^g20")) Ref_com_list =
> list()

直到这里,它仍然可以。但是......

> for (i in 1:length(RawList)) {   df <- RawList[[i]] %>% 
>     pivot_longer(cols = -A, names_to = "B", values_to = "C") %>% 
>     mutate(time_sec = paste(YMD[i], B) %>% ymd_hms())%>% 
>     mutate(minute = format(as.POSIXct(B,format="%H:%M:%S"),"%M")) 
> 
>   ...(some calculation)
>      Ref_com_list [[i]] <- file_all }
> 
> Ref_com_all <- do.call(rbind,Ref_com_list)

当时,我得到了如下错误:

> Error: Can't combine `A` <double> and `B` <datetime<UTC>>. Run
> `rlang::last_error()` to see where the error occurred.

如果我运行单个文件,它工作得很好。但如果我运行 for 循环,就会出现错误。 有谁能告诉我问题是什么?

预先非常感谢。

I have more than 1000 csv files. I would like to combine in a single file, after running some processes. So, I used loop function as follow:

> setwd("C:/....") files <- dir(".", pattern = ".csv
quot;) # Get the names
> of the all csv files in the current directory.
> 
> for (i in 1:length(files)) {   obj_name <- files %>% str_sub(end = -5)
> assign(obj_name[i], read_csv(files[i]))  }

Until here, it works well.

I tried to concatenate the imported files into a list to manipulate them at once as follow:

 command <- paste0("RawList <- list(", paste(obj_name, collapse = ","),
> ")") eval(parse(text = command))
> 
> rm(i, obj_name, command, list = ls(pattern = "^g20")) Ref_com_list =
> list()

Until here, it still okay. But ...

> for (i in 1:length(RawList)) {   df <- RawList[[i]] %>% 
>     pivot_longer(cols = -A, names_to = "B", values_to = "C") %>% 
>     mutate(time_sec = paste(YMD[i], B) %>% ymd_hms())%>% 
>     mutate(minute = format(as.POSIXct(B,format="%H:%M:%S"),"%M")) 
> 
>   ...(some calculation)
>      Ref_com_list [[i]] <- file_all }
> 
> Ref_com_all <- do.call(rbind,Ref_com_list)

At that time, I got the error as follow:

> Error: Can't combine `A` <double> and `B` <datetime<UTC>>. Run
> `rlang::last_error()` to see where the error occurred.

If I run individual file, it work well. But if I run in for loop, the error showed up.
Does anyone could tell me what the problem is?

Thanks a lot in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梦里梦着梦中梦 2025-01-18 04:04:49

您的代码还有很大的改进空间。一般来说,如果您在 tidyverse 中工作,您可以直接将多个文件传递给 read_csv。示例:

# Generate some sample files
tmp_dir <- fs::path_temp("some_csv_files")
fs::dir_create(tmp_dir)
for (i in 1:100) {
    readr::write_csv(mtcars, fs::file_temp(pattern = "cars",
                     tmp_dir = tmp_dir, ext = ".csv"))
}

# Actual file reading
dta_cars <- readr::read_csv(
    file = fs::dir_ls(path = tmp_dir, glob = "*.csv"),
    id = "file_path"
)

如果您想保留有关文件来源的信息,请在 read_csv 中使用 id = "file_path" 将路径详细信息存储在列中。可以说,这比以下方法更有效并且更不容易出错:

for (i in 1:length(files)) { obj_name <- files %>% str_sub(end = -5) 
     分配(obj_name [i],read_csv(文件[i]))}

这比通过循环增长对象要干净得多并且速度更快。当你的转变取得进展后:

dta_cars %>% ...

There is a substantial scope for improvement in your code. Broadly speaking, if you are working in tidyverse you can pass multiple files to read_csv directly. Example:

# Generate some sample files
tmp_dir <- fs::path_temp("some_csv_files")
fs::dir_create(tmp_dir)
for (i in 1:100) {
    readr::write_csv(mtcars, fs::file_temp(pattern = "cars",
                     tmp_dir = tmp_dir, ext = ".csv"))
}

# Actual file reading
dta_cars <- readr::read_csv(
    file = fs::dir_ls(path = tmp_dir, glob = "*.csv"),
    id = "file_path"
)

If you want to keep information on the file origination, using id = "file_path" in read_csv will store the path details in column. This is arguably more efficient than and less error-prone than:

for (i in 1:length(files)) { obj_name <- files %>% str_sub(end = -5) 
     assign(obj_name[i], read_csv(files[i])) }

This is much cleaner and will be faster than growing object via loop. After you would progress with your transformations:

dta_cars %>% ...
噩梦成真你也成魔 2025-01-18 04:04:49

尝试:

library(data.table)

files <- list.files(path = '.', full.names=T, pattern='csv')

files_open <- lapply(files, function(x) fread(x, ...)) # ... for arguments like sep, dec, etc...

big_file <- rbindlist(files_open)

fwrite(big_file, ...) # ... for arguments like sep, dec, path to save data, etc...

try:

library(data.table)

files <- list.files(path = '.', full.names=T, pattern='csv')

files_open <- lapply(files, function(x) fread(x, ...)) # ... for arguments like sep, dec, etc...

big_file <- rbindlist(files_open)

fwrite(big_file, ...) # ... for arguments like sep, dec, path to save data, etc...
掩耳倾听 2025-01-18 04:04:49

现在我知道了事情发生的原因。还有另一个文件,其文件名不同,但文件类型相同。因此,代码读取了所有文件,并提供了错误。
很抱歉让你们都感到困惑。
太感谢了!

Now I found out the reason why it happened. There was another file which is not the same file name but with the same file type. So, the code read all the files, and provided the error.
I am sorry I made you all confused.
Thank you so much!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文