从 R 中的 data.frame 中删除列的奇怪行为
从 data.frame 中删除列时,我遇到了奇怪的行为。最初我有:
> a <- data.frame("a" = c(1,2,3), "abc" = c(3,2,1)); print(a)
a abc
1 1 3
2 2 2
3 3 1
现在,我从 data.frame 中删除 a$a
> a$a <- NULL; print(a)
abc
1 3
2 2
3 1
正如预期的那样,我的 data.frame 中只有 abc
列。但当我尝试引用已删除的列 a
时,奇怪的部分开始了。
> print(a$a)
[1] 3 2 1
> print(is.null(a$a))
[1] FALSE
看起来 R 返回的是 a$abc
的值,而不是 NULL
。
当剩余列的名称开头与已删除列的名称完全匹配时,就会发生这种情况。
这是一个错误还是我在这里错过了什么?
I've encountered a strange behavior when dropping columns from data.frame. Initially I have:
> a <- data.frame("a" = c(1,2,3), "abc" = c(3,2,1)); print(a)
a abc
1 1 3
2 2 2
3 3 1
Now, I remove a$a
from the data.frame
> a$a <- NULL; print(a)
abc
1 3
2 2
3 1
As expected, I have only abc
column in my data.frame. But the strange part begins, when I try to reference deleted column a
.
> print(a$a)
[1] 3 2 1
> print(is.null(a$a))
[1] FALSE
It looks like R returns value of the a$abc
instead of NULL
.
This happens when the beginning of the name of remaining column exactly matches the name of deleted column.
Is it a bug or do I miss something here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
来自帮助。 ?$
这是正常行为,因为名称部分匹配。有关部分匹配的更多信息,请参阅 ?pmatch。
干杯
From the the help. ?$
So that's the normal behaviour because the name is partially matched. See ?pmatch for more info about partial matching.
Cheers
也许值得指出(因为它没有出现在之前的 相关问题)这种部分匹配行为可能是避免使用 '$' 的潜在原因,除非作为交互式使用 R 时的方便速记(至少,这是谨慎使用的一个原因) 它)。
如果您知道列名但不知道位置,则通过
dat[,'ind']
选择列;如果您知道列名,则通过dat[,3]
选择列位置,通常更安全,因为您不会与部分匹配发生冲突。Perhaps it's worth pointing out (since it didn't come up on the previous related question) that this partial matching behavior is potentially a reason to avoid using '$' except as a convenient shorthand when using R interactively (at least, it's a reason to be careful using it).
Selecting a column via
dat[,'ind']
if you know the name of the column, but not the position, or viadat[,3]
if you know the position, is often safer since you won't run afoul of the partial matching.虽然评论中已经回答了您的确切问题,但避免这种行为的另一种方法是将您的
data.frame
转换为tibble
,这是一个精简版本一个data.frame
,没有列名称修改,位于其他事情:While your exact question has already been answered in the comments, an alternative to avoid this behaviour is to convert your
data.frame
to atibble
, which is a stripped downed version of adata.frame
, without column name munging, among other things:来自 R 语言定义 [第 3.4.1 节 pg.16-17] --
https://cran.r-project.org/doc/manuals/r-release/R-lang.pdf
• 字符:i 中的字符串是与名称属性匹配使用 x 和所得整数。对于 [[ 和 $,如果精确匹配失败,则使用部分匹配,因此如果 x 不包含名为“aa”的组件并且“aabb”是唯一具有前缀“aa”的名称,则 x$aa 将匹配 x$aabb。对于 [[,可以通过精确参数控制部分匹配,该参数默认为 NA,表示允许部分匹配,但应该会导致
发生时发出警告。将精确值设置为 TRUE 可防止发生部分匹配,FALSE 值允许发生部分匹配并且不会发出任何警告。请注意,[ 始终需要完全匹配。字符串“”经过特殊处理:它表示“无名称”并且不匹配任何元素(甚至不匹配那些没有名称的元素)。请注意,部分匹配仅在提取时使用
而不是在更换时。
From the R Language Definition [section 3.4.1 pg.16-17] --
https://cran.r-project.org/doc/manuals/r-release/R-lang.pdf
• Character: The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a
warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exactmatch. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting
and not when replacing.