处理 R 中的重复任务
我经常发现自己必须在 R 中执行重复性任务。必须不断地在一个或多个数据结构上一遍又一遍地运行相同的函数,这非常令人沮丧。
例如,假设我在 R 中有三个单独的数据帧,并且我想删除每个数据帧中具有缺失值的行。对于三个数据帧,在每个 df 上运行 na.omit() 并不是那么困难,但效率可能非常低 当一个人有一百个相似的数据结构需要相同的操作时。
df1 <- data.frame(Region=c("Asia","Africa","Europe","N.America","S.America",NA),
variable=c(2004,2004,2004,2004,2004,2004), value=c(35,20,20,50,30,NA))
df2 <- data.frame(Region=c("Asia","Africa","Europe","N.America","S.America",NA),
variable=c(2005,2005,2005,2005,2005,2005), value=c(55,350,40,90,99,NA))
df3 <- data.frame(Region=c("Asia","Africa","Europe","N.America","S.America",NA),
variable=c(2006,2006,2006,2006,2006,2006), value=c(300,200,200,500,300,NA))
tot04 <- na.omit(df1)
tot05 <- na.omit(df2)
tot06 <- na.omit(df3)
在 R 中处理重复任务的一般准则是什么?
是的,我认识到这个问题的答案是特定于人们所面临的任务的,但我只是询问用户在执行重复性任务时应该考虑的一般事项。
I often find myself having to perform repetitive tasks in R. It gets extremely frustrating having to constantly run the same function on one or more data structures over and over again.
For example, let's say I have three separate data frames in R, and I want to delete the rows in each data frame which possess a missing value. With three data frames, it's not all that difficult to run na.omit() on each of the df's, but it can get extremely inefficient
when one has one hundred similar data structures which require the same action.
df1 <- data.frame(Region=c("Asia","Africa","Europe","N.America","S.America",NA),
variable=c(2004,2004,2004,2004,2004,2004), value=c(35,20,20,50,30,NA))
df2 <- data.frame(Region=c("Asia","Africa","Europe","N.America","S.America",NA),
variable=c(2005,2005,2005,2005,2005,2005), value=c(55,350,40,90,99,NA))
df3 <- data.frame(Region=c("Asia","Africa","Europe","N.America","S.America",NA),
variable=c(2006,2006,2006,2006,2006,2006), value=c(300,200,200,500,300,NA))
tot04 <- na.omit(df1)
tot05 <- na.omit(df2)
tot06 <- na.omit(df3)
What are some general guidelines for dealing with repetitive tasks in R?
Yes, I recognise that the answer to this question is specific to the task that one faces, but I'm just asking about general things that a user should consider when they have a repetitive task.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
作为一般准则,如果您想要对多个对象应用相同的操作,则应该将它们收集到一个数据结构中。然后你可以使用循环、[sl]apply 等一次性完成操作。在这种情况下,您可以将它们放入数据帧列表中,然后运行
na.omit<,而不是使用单独的数据帧
df1
、df2
等。 /code> 所有这些:As a general guideline, if you have several objects that you want to apply the same operations to, you should collect them into one data structure. Then you can use loops, [sl]apply, etc to do the operations in one go. In this case, instead of having separate data frames
df1
,df2
, etc, you could put them into a list of data frames and then runna.omit
on all of them:除了 @Hong Ooi 的回答之外,我建议查看包 plyr 和 reshape。在您的情况下,以下内容可能有用:
Besides @Hong Ooi answer I suggest looking into packages plyr and reshape. In your case following might be useful:
如果名称相似,您可以使用
ls
的pattern
参数来迭代它们:但是,更“R”的做法似乎是使用单独的环境和
eapply
:结果是:
不幸的是,这是一个巨大的列表,因此将其作为单独的对象取出来有点棘手。类似以下内容的内容
应该可以工作,但是
substitute
没有正确挑选出列表元素名称。If the names are similar you could iterate over them using the
pattern
argument tols
:However, a more "R" way of doing it seems to be to use separate environment and
eapply
:Which yields:
Unfortunately, this is one huge list so getting this out as seperate objects is a little tricky. Something on the lines of:
should work, but the
substitute
is not picking out the list element names properly.