将相同的函数应用于多个数据框 - R

发布于 2025-01-15 01:25:38 字数 1082 浏览 1 评论 0原文

我目前正在使用 8 个具有相同结构的数据库,我想知道如何同时对所有数据库应用相同的步骤和修改。

我知道使用 lapply 函数并将数据库传递到列表中是可以做到的,但我无法指定它。

我需要执行的步骤如下:

df1$EMAIL <- str_to_lower(df1$EMAIL)
df2$EMAIL <- str_to_lower(df2$EMAIL)
dfn$EMAIL <- str_to_lower(dfn$EMAIL)
df8$EMAIL <- str_to_lower(df8$EMAIL)

d1$EMAIL <- stri_trans_general(d1$EMAIL,"Latin-ASCII") 
d2$EMAIL <- stri_trans_general(d2$EMAIL,"Latin-ASCII")
dn$EMAIL <- stri_trans_general(dn$EMAIL,"Latin-ASCII")
d8$EMAIL <- stri_trans_general(d8$EMAIL,"Latin-ASCII")

df1$CATEGORY <- str_to_Title(df1$CATEGORY)
df2$CATEGORY <- str_to_Title(df2$CATEGORY)
dfn$CATEGORY <- str_to_Title(dfn$CATEGORY)
df8$CATEGORY <- str_to_Title(df8$CATEGORY)

df1_e <- select(df1, EMAIL, CATEGORY, COMPANY)
df2_e <- select(df2, EMAIL, CATEGORY, COMPANY)
dfn_e <- select(dfn, EMAIL, CATEGORY, COMPANY)
df8_e <- select(df8, EMAIL, CATEGORY, COMPANY)

EMAILS <- bind_rows(df1_e, df2_e, dfn_e, dfn_8)%>%unique(EMAIL)

这些都是简单的步骤,不需要太多时间来一一执行。但我想学习如何在脚本中提高效率并节省空间和时间。

提前致谢

I'm currently working with 8 databases with the same structure, what I would like to know is how to apply the same steps and modifications to all the bases at the same time.

I know that with the lapply function and passing the databases to a list it is possible to do but I can not specify it.

The steps I need to perform are as follows:

df1$EMAIL <- str_to_lower(df1$EMAIL)
df2$EMAIL <- str_to_lower(df2$EMAIL)
dfn$EMAIL <- str_to_lower(dfn$EMAIL)
df8$EMAIL <- str_to_lower(df8$EMAIL)

d1$EMAIL <- stri_trans_general(d1$EMAIL,"Latin-ASCII") 
d2$EMAIL <- stri_trans_general(d2$EMAIL,"Latin-ASCII")
dn$EMAIL <- stri_trans_general(dn$EMAIL,"Latin-ASCII")
d8$EMAIL <- stri_trans_general(d8$EMAIL,"Latin-ASCII")

df1$CATEGORY <- str_to_Title(df1$CATEGORY)
df2$CATEGORY <- str_to_Title(df2$CATEGORY)
dfn$CATEGORY <- str_to_Title(dfn$CATEGORY)
df8$CATEGORY <- str_to_Title(df8$CATEGORY)

df1_e <- select(df1, EMAIL, CATEGORY, COMPANY)
df2_e <- select(df2, EMAIL, CATEGORY, COMPANY)
dfn_e <- select(dfn, EMAIL, CATEGORY, COMPANY)
df8_e <- select(df8, EMAIL, CATEGORY, COMPANY)

EMAILS <- bind_rows(df1_e, df2_e, dfn_e, dfn_8)%>%unique(EMAIL)

They are simple steps that do not require much time to perform one by one. But I would like to learn how to be more efficient and save space and time in the script.

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

北渚 2025-01-22 01:25:38

您已经确定的通用解决方案是将数据帧放入列表中,并在每个数据帧上使用 lapply/map

这是使用 purrr 中的 map_df 的解决方案。如果数据帧被称为df1df2...df8,那么您可以使用mget创建一个列表数据帧。我还创建了一个 id 变量,它将给出每行的数据帧名称。

library(dplyr)
library(purrr)

EMAILS <- map_df(mget(paste0('df', 1:8)), function(x) {
  x %>%
    transmute(EMAIL = str_to_lower(EMAIL) %>% stri_trans_general("Latin-ASCII"), 
              CATEGORY = str_to_title(CATEGORY), 
              COMPANY)
}, .id = 'id') %>% distinct(EMAIL, .keep_all = TRUE)

A general solution as you have already identified is to put the dataframes in a list and use lapply/map on each dataframe.

Here's a solution using map_df from purrr. If the dataframe are called as df1, df2... df8 then you can use mget to create a list of dataframes. I have also created an id variable which will give the dataframe name for each row.

library(dplyr)
library(purrr)

EMAILS <- map_df(mget(paste0('df', 1:8)), function(x) {
  x %>%
    transmute(EMAIL = str_to_lower(EMAIL) %>% stri_trans_general("Latin-ASCII"), 
              CATEGORY = str_to_title(CATEGORY), 
              COMPANY)
}, .id = 'id') %>% distinct(EMAIL, .keep_all = TRUE)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文