将相同的函数应用于多个数据框 - R
我目前正在使用 8 个具有相同结构的数据库,我想知道如何同时对所有数据库应用相同的步骤和修改。
我知道使用 lapply 函数并将数据库传递到列表中是可以做到的,但我无法指定它。
我需要执行的步骤如下:
df1$EMAIL <- str_to_lower(df1$EMAIL)
df2$EMAIL <- str_to_lower(df2$EMAIL)
dfn$EMAIL <- str_to_lower(dfn$EMAIL)
df8$EMAIL <- str_to_lower(df8$EMAIL)
d1$EMAIL <- stri_trans_general(d1$EMAIL,"Latin-ASCII")
d2$EMAIL <- stri_trans_general(d2$EMAIL,"Latin-ASCII")
dn$EMAIL <- stri_trans_general(dn$EMAIL,"Latin-ASCII")
d8$EMAIL <- stri_trans_general(d8$EMAIL,"Latin-ASCII")
df1$CATEGORY <- str_to_Title(df1$CATEGORY)
df2$CATEGORY <- str_to_Title(df2$CATEGORY)
dfn$CATEGORY <- str_to_Title(dfn$CATEGORY)
df8$CATEGORY <- str_to_Title(df8$CATEGORY)
df1_e <- select(df1, EMAIL, CATEGORY, COMPANY)
df2_e <- select(df2, EMAIL, CATEGORY, COMPANY)
dfn_e <- select(dfn, EMAIL, CATEGORY, COMPANY)
df8_e <- select(df8, EMAIL, CATEGORY, COMPANY)
EMAILS <- bind_rows(df1_e, df2_e, dfn_e, dfn_8)%>%unique(EMAIL)
这些都是简单的步骤,不需要太多时间来一一执行。但我想学习如何在脚本中提高效率并节省空间和时间。
提前致谢
I'm currently working with 8 databases with the same structure, what I would like to know is how to apply the same steps and modifications to all the bases at the same time.
I know that with the lapply function and passing the databases to a list it is possible to do but I can not specify it.
The steps I need to perform are as follows:
df1$EMAIL <- str_to_lower(df1$EMAIL)
df2$EMAIL <- str_to_lower(df2$EMAIL)
dfn$EMAIL <- str_to_lower(dfn$EMAIL)
df8$EMAIL <- str_to_lower(df8$EMAIL)
d1$EMAIL <- stri_trans_general(d1$EMAIL,"Latin-ASCII")
d2$EMAIL <- stri_trans_general(d2$EMAIL,"Latin-ASCII")
dn$EMAIL <- stri_trans_general(dn$EMAIL,"Latin-ASCII")
d8$EMAIL <- stri_trans_general(d8$EMAIL,"Latin-ASCII")
df1$CATEGORY <- str_to_Title(df1$CATEGORY)
df2$CATEGORY <- str_to_Title(df2$CATEGORY)
dfn$CATEGORY <- str_to_Title(dfn$CATEGORY)
df8$CATEGORY <- str_to_Title(df8$CATEGORY)
df1_e <- select(df1, EMAIL, CATEGORY, COMPANY)
df2_e <- select(df2, EMAIL, CATEGORY, COMPANY)
dfn_e <- select(dfn, EMAIL, CATEGORY, COMPANY)
df8_e <- select(df8, EMAIL, CATEGORY, COMPANY)
EMAILS <- bind_rows(df1_e, df2_e, dfn_e, dfn_8)%>%unique(EMAIL)
They are simple steps that do not require much time to perform one by one. But I would like to learn how to be more efficient and save space and time in the script.
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您已经确定的通用解决方案是将数据帧放入列表中,并在每个数据帧上使用
lapply
/map
。这是使用
purrr
中的map_df
的解决方案。如果数据帧被称为df1
,df2
...df8
,那么您可以使用mget
创建一个列表数据帧。我还创建了一个 id 变量,它将给出每行的数据帧名称。A general solution as you have already identified is to put the dataframes in a list and use
lapply
/map
on each dataframe.Here's a solution using
map_df
frompurrr
. If the dataframe are called asdf1
,df2
...df8
then you can usemget
to create a list of dataframes. I have also created anid
variable which will give the dataframe name for each row.