如何在 R 中检查列表元素的名称是否包含“this text”并传递到 for 循环中的下一个元素？

发布于 2025-01-13 03:31:25 字数 892 浏览 1 评论 0原文

我是 R 新手，有一个包含 30 个元素的大列表，每个元素都是一个包含几百行和大约 20 列的数据框（这取决于数据框）。每个数据帧均以原始 .csv 文件名命名（例如“实验数据 XYZ QWERTY 01”）。如何检查整个列表并仅过滤文件名中不包含特定文本的数据帧，并向这些过滤后的数据帧添加唯一的 id 列（id 值将是该文件名的前三个字符）？例如，列表中包含“XYZ QWERTY”作为其名称一部分的所有元素/数据帧/文件将不会被过滤，并且不需要唯一的 ID。我有这个伪样式代码：

for(i in 1:length(list_of_dataframes)){
  if 
  list_of_dataframes[[i]] contains "this text" then don't filter
  else
  list_of_dataframes[[i]] <- filter(list_of_dataframes[[i]], rule) AND add unique.id.of.first.three.char.of.list_of_dataframes[[i]]
}

抱歉，如果这里使用的术语有点尴尬，但我刚刚开始编程并第一次在这里发帖，所以还有很多东西需要学习（作为奖励，如果您有任何好的资源/网站）学习自动化并用 R 做类似的事情，我会非常高兴得到一些好的建议:-))

编辑：

我尝试的过滤部分的代码是：

for(i in 1:length(tbl)){
  if (!(str_detect (tbl[[i]], "OLD"))){
    tbl[[i]] <- filter(tbl[[i]], age < 50)
  }
}

但是有一条错误消息，指出“参数不是”原子向量；强制”和“条件长度 > 1 并且仅使用第一个元素”。有什么方法可以让这段代码工作吗？

原文

I'm new at R and have a large list of 30 elements, each of which is a dataframe that contains few hundred rows and around 20 columns (this varies depending on the dataframe). Each dataframe is named after the original .csv filename (for example "experiment data XYZ QWERTY 01"). How can I check through the whole list and only filter those dataframes that don't have specific text included in their filename AND also add an unique id column to those filtered dataframes (the id value would be first three characters of that filename)? For example all the elements/dataframes/files in the list which include "XYZ QWERTY" as a part of their name won't be filtered and doesn't need unique id. I had this pseudo style code:

for(i in 1:length(list_of_dataframes)){
  if 
  list_of_dataframes[[i]] contains "this text" then don't filter
  else
  list_of_dataframes[[i]] <- filter(list_of_dataframes[[i]], rule) AND add unique.id.of.first.three.char.of.list_of_dataframes[[i]]
}

Sorry if the terminology used here is a bit awkward, but just starting out with programming and first time posting here, so there's still a lot to learn (as a bonus, if you have any good resources/websites to learn to automate and do similar stuff with R, I would be more than glad to get some good recommendations! :-))

EDIT:

The code I tried for the filtering part was:

for(i in 1:length(tbl)){
  if (!(str_detect (tbl[[i]], "OLD"))){
    tbl[[i]] <- filter(tbl[[i]], age < 50)
  }
}

However there was an error message stating "argument is not an atomic vector; coercing" and "the condition has length > 1 and only the first element will be used". Is there any way to get this code working?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

堇色安年 2025-01-20 03:31:25

假设有一个名为 files 的目录，其中包含这些 csv 文件：

'experiment 1.csv'  'experiment 2.csv'  'experiment 3.csv'
'OLDexperiment 1.csv'  'OLDexperiment 2.csv'

这将为您提供带有过滤条件的数据框列表（此处：文件名中不包含子字符串 OLD）。只需删除 ! 即可仅包含旧实验。添加了一个包含文件路径的新列 ID：

library(tidyverse)

list.files("files")

paths <- list.files("files", full.names = TRUE)
names(paths) <- list.files("files", full.names = TRUE)
list_of_dataframes <- paths %>% map(read_csv)

list_of_dataframes %>%
  enframe() %>%
  filter(! name %>% str_detect("OLD")) %>%
  mutate(value = name %>% map2(value, ~ {
    .y %>% mutate(id = .x)
  })) %>%
  pull(value)

免费书籍 R for Data Science< 是一个很好的入门资源/a>

这是一种更简单的方法，无需列表即可获取匹配相同条件的一个大文件组合表：

list.files("files", full.names = TRUE) %>%
  tibble(id = .) %>%
  # discard old experiments
  filter(! id %>% str_detect("OLD")) %>%
  # read the csv table for every matching file
  mutate(data = id %>% map(read_csv)) %>%
  # combine the tables into one big one
  unnest(data)

Let there be a directory called files containing these csv files:

'experiment 1.csv'  'experiment 2.csv'  'experiment 3.csv'
'OLDexperiment 1.csv'  'OLDexperiment 2.csv'

This will give you a list of data frames with a filter condition (here: do not contain the substring OLD in the filename). Just remove the ! to only include old experiments instead. A new column id is added containing the file path:

library(tidyverse)

list.files("files")

paths <- list.files("files", full.names = TRUE)
names(paths) <- list.files("files", full.names = TRUE)
list_of_dataframes <- paths %>% map(read_csv)

list_of_dataframes %>%
  enframe() %>%
  filter(! name %>% str_detect("OLD")) %>%
  mutate(value = name %>% map2(value, ~ {
    .y %>% mutate(id = .x)
  })) %>%
  pull(value)

A good resource to start is the free book R for Data Science

This is a much simpler approach without a list to get one big combined table of files matching the same condition:

list.files("files", full.names = TRUE) %>%
  tibble(id = .) %>%
  # discard old experiments
  filter(! id %>% str_detect("OLD")) %>%
  # read the csv table for every matching file
  mutate(data = id %>% map(read_csv)) %>%
  # combine the tables into one big one
  unnest(data)

回复收藏 0 原文

~没有更多了~