R中通过引用调用(使用函数修改对象)
我刚刚接触 R,很惊讶地发现函数不会修改对象,至少看起来这是默认的。例如,我编写了一个函数,只是为了在表格中的一个标签上粘贴一个星号;它在函数内部工作,但表本身没有改变。 (我主要来自 Ruby)
那么,在 R 中使用函数更改对象的正常、可接受的方法是什么?如何在表格标题中添加星号?
替换整个对象:
myTable = title.asterisk(myTable)
使用解决方法通过引用进行调用(例如,如 在 R 中通过引用调用 TszKin Julian?
使用函数以外的结构?对象方法?
I'm just getting my feet wet in R and was surprised to see that a function doesn't modify an object, at least it seems that's the default. For example, I wrote a function just to stick an asterisk on one label in a table; it works inside the function but the table itself is not changed. (I'm coming mainly from Ruby)
So, what is the normal, accepted way to use functions to change objects in R? How would I add an asterisk to the table title?
Replace the whole object:
myTable = title.asterisk(myTable)
Use a work-around to call by reference (as described, for example, in Call by reference in R by TszKin Julian?
Use some structure other than a function? An object method?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您遇到麻烦的原因是您将对象传递到函数的本地命名空间中。这是 R 伟大/可怕的事情之一:它允许隐式变量声明,然后随着命名空间变得更深而实现取代。
这会影响您,因为函数会在当前命名空间内创建新的命名空间。我假设对象“myTable”最初是在全局命名空间中创建的,但是当它被传递到函数“title.asterisk”时,新的函数局部命名空间现在拥有一个具有相同属性的对象。其工作原理如下:
在这种情况下,函数“title.asterisk”不会对全局对象“myTable”进行任何更改。相反,会创建一个同名的本地对象,因此本地对象会取代全局对象。如果我们以这种方式调用函数
title.asterisk(myTable)
,则该函数仅对局部变量进行更改。有两种直接方法可以修改全局对象(以及许多间接方法)。
选项 1:正如您提到的,第一个是让函数返回对象并覆盖全局对象,如下所示:
这没问题,但您仍然使代码有点难以理解,因为实际上有两个不同的“myTable”对象,一个是全局的,一个是函数本地的。许多程序员通过添加句点“.”来解决这个问题。在变量参数前面,如下所示:
好的,现在我们有一个视觉提示,表明这两个变量是不同的。这很好,因为当我们稍后尝试调试代码时,我们不想依赖命名空间取代之类的不可见的东西。它只会让事情变得比原来更困难。
选项 2:您可以只在函数内修改对象。当您想要对对象进行破坏性编辑并且不希望内存膨胀时,这是更好的选择。如果您正在进行破坏性编辑,则无需保存原始副本。另外,如果您的对象足够大,那么您不想在不必要时复制它。要对全局命名空间对象进行编辑,只需不要将其传递到函数中或从函数中声明它即可。
现在我们正在函数内直接编辑对象“myTable”。事实上,我们没有传递对象,这使得我们的函数寻找更高级别的命名空间来尝试解析变量名称。你瞧,它在上面找到了一个“myTable”对象!函数中的代码对对象进行更改。
需要考虑的注意事项:我讨厌调试。我的意思是我真的很讨厌调试。这对 R 中的我来说意味着一些事情:
The reason you're having trouble is the fact that you are passing the object into the local namespace of the function. This is one of the great / terrible things about R: it allows implicit variable declarations and then implements supercedence as the namespaces become deeper.
This is affecting you because a function creates a new namespace within the current namespace. The object 'myTable' was, I assume, originally created in the global namespace, but when it is passed into the function 'title.asterisk' a new function-local namespace now has an object with the same properties. This works like so:
In this case, the function 'title.asterisk' does not make any changes to the global object 'myTable'. Instead, a local object is created with the same name, so the local object supercedes the global object. If we call the function
title.asterisk(myTable)
in this way, the function makes changes only to the local variable.There are two direct ways to modify the global object (and many indirect ways).
Option 1: The first, as you mention, is to have the function return the object and overwrite the global object, like so:
This is okay, but you are still making your code a little difficult to understand, since there are really two different 'myTable' objects, one global and one local to the function. A lot of coders clear this up by adding a period '.' in front of variable arguments, like so:
Okay, now we have a visual cue that the two variables are different. This is good, because we don't want to rely on invisible things like namespace supercedence when we're trying to debug our code later. It just makes things harder than they have to be.
Option 2: You could just modify the object from within the function. This is the better option when you want to do destructive edits to an object and don't want memory inflation. If you are doing destructive edits, you don't need to save an original copy. Also, if your object is suitably large, you don't want to be copying it when you don't have to. To make edits to a global namespace object, simply don't pass it into or declare it from within the function.
Now we are making direct edits to the object 'myTable' from within the function. The fact that we aren't passing the object makes our function look to higher levels of namespace to try and resolve the variable name. Lo, and behold, it finds a 'myTable' object higher up! The code in the function makes the changes to the object.
A note to consider: I hate debugging. I mean I really hate debugging. This means a few things for me in R:
正如您所指出的,这两种范例正在替换整个对象,或者编写“替换”函数,例如
with
这是例如
$<-
-- 就地更新元素的行为通过$
访问。 这里是一个相关问题。人们可以将替换函数视为语法糖,但在我看来,“语法糖”这个标签并不能真正正确地描述所涉及的中心范式。它支持方便的就地更新,这与 R 通常维护的更改时复制错觉不同,它实际上是更新对象的“R”方式(而不是使用
?ReferenceClasses
例如,它具有更多其他语言的感觉,但会让期待更改时复制语义的 R 用户感到惊讶)。The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as
with
This is the behavior of, for instance,
$<-
-- in-place update the element accessed by$
. Here is a related question. One could think of replacement functions as syntactic sugar forbut the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using
?ReferenceClasses
, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).对于将来寻找一种简单方法(不知道是否是更合适的方法)来解决此问题的任何人:
在函数内部创建对象以临时保存要更改的修改版本。使用
deparse(substitute())
获取已传递给函数参数的变量名称,然后使用assign()
覆盖您的对象。您需要在assign()
内使用envir =parent.frame()
来让您的对象在函数外部的环境中定义。定义对象时使用括号比定义然后打印更有效率(对我来说,更好看)。
For anybody in the future looking for a simple way (do not know if it is the more appropriate one) to get this solved:
Inside the function create the object to temporally save the modified version of the one you want to change. Use
deparse(substitute())
to get the name of the variable that has been passed to the function argument and then useassign()
to overwrite your object. You will need to useenvir = parent.frame()
insideassign()
to let your object be defined in the environment outside the function.Using parentheses when defining an object is a little more efficient (and to me, better looking) than defining then printing.