哪些 1-2 个字母的对象名称与现有 R 对象冲突?

发布于 2024-11-28 19:32:47 字数 960 浏览 0 评论 0原文

为了使我的代码更具可读性,我喜欢在创建新对象时避免使用已经存在的对象名称。由于 R 基于包的性质,并且函数是一等对象,因此可以很容易地覆盖基本 R 中没有的通用函数(因为通用包可能使用短函数名称,但不知道要使用哪个包)加载没有办法检查它)。诸如内置逻辑 T 和 F 之类的对象也会引起麻烦。

我想到的一些例子是:

一个字母

  • c
  • t
  • T/F
  • J

两个字母

  • df

更好的解决方案可能是完全避免使用短名称,而使用更具描述性的名称,我通常会尝试将其作为一种习惯。然而,操作通用 data.frame 的函数的“df”具有足够的描述性,较长的名称增加的很少,因此短名称有其用途。此外,对于不一定知道更大背景的问题,想出描述性名称几乎是不可能的。

还有哪些一字母和两字母变量名与现有 R 对象冲突?其中哪些非常常见,应该避免?如果它们不在 base 中,请也列出该包。最好的答案至少涉及一些代码;如果使用请提供。

请注意,我并不是在问覆盖已经存在的函数是否可取。这个问题已经在 SO 上得到解决:

在 R 中,变量与基本 R 函数同名到底有什么问题?

有关此处一些答案的可视化,请参阅简历上的此问题:

https://stats.stackexchange.com/questions/13999/visualizing-2-letter-combinations

To make my code more readable, I like to avoid names of objects that already exist when creating new objects. Because of the package-based nature of R, and because functions are first-class objects, it can be easy to overwrite common functions that are not in base R (since a common package might use a short function name but without knowing what package to load there is no way to check for it). Objects such as the built-in logicals T and F also cause trouble.

Some examples that come to mind are:

One letter

  • c
  • t
  • T/F
  • J

Two letters

  • df

A better solution might be to avoid using short names altogether in favor of more descriptive ones, and I generally try to do that as a matter of habit. Yet "df" for a function which manipulates a generic data.frame is plenty descriptive and a longer name adds little, so short names have their uses. In addition, for SO questions where the larger context isn't necessarily known, coming up with descriptive names is well-nigh impossible.

What other one- and two-letter variable names conflict with existing R objects? Which among those are sufficiently common that they should be avoided? If they are not in base, please list the package as well. The best answers will involve at least some code; please provide it if used.

Note that I am not asking whether or not overwriting functions that already exist is advisable or not. That question is addressed on SO already:

In R, what exactly is the problem with having variables with the same name as base R functions?

For visualizations of some answers here, see this question on CV:

https://stats.stackexchange.com/questions/13999/visualizing-2-letter-combinations

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

似最初 2024-12-05 19:32:47

apropos 非常适合这种情况:

apropos("^[[:alpha:]]{1,2}$")

在没有加载任何包的情况下,返回:

 [1] "ar" "as" "by" "c"  "C"  "cm" "D"  "de" "df" "dt" "el" "F"  "gc" "gl"
[15] "I"  "if" "Im" "is" "lh" "lm" "ls" "pf" "pi" "pt" "q"  "qf" "qr" "qt"
[29] "Re" "rf" "rm" "rt" "sd" "t"  "T"  "ts" "vi"

确切的内容将取决于搜索列表。如果您担心与常用的包发生冲突,请尝试加载一些包并重新运行它。


我使用以下命令加载了机器上安装的所有(> 200)个软件包:

lapply(rownames(installed.packages()), require, character.only = TRUE)

并重新调用apropos,将其包装在unique中,因为有一些重复项。

one_or_two <- unique(apropos("^[[:alpha:]]{1,2}$"))

返回:

  [1] "Ad" "am" "ar" "as" "bc" "bd" "bp" "br" "BR" "bs" "by" "c"  "C" 
 [14] "cc" "cd" "ch" "ci" "CJ" "ck" "Cl" "cm" "cn" "cq" "cs" "Cs" "cv"
 [27] "d"  "D"  "dc" "dd" "de" "df" "dg" "dn" "do" "ds" "dt" "e"  "E" 
 [40] "el" "ES" "F"  "FF" "fn" "gc" "gl" "go" "H"  "Hi" "hm" "I"  "ic"
 [53] "id" "ID" "if" "IJ" "Im" "In" "ip" "is" "J"  "lh" "ll" "lm" "lo"
 [66] "Lo" "ls" "lu" "m"  "MH" "mn" "ms" "N"  "nc" "nd" "nn" "ns" "on"
 [79] "Op" "P"  "pa" "pf" "pi" "Pi" "pm" "pp" "ps" "pt" "q"  "qf" "qq"
 [92] "qr" "qt" "r"  "Re" "rf" "rk" "rl" "rm" "rt" "s"  "sc" "sd" "SJ"
[105] "sn" "sp" "ss" "t"  "T"  "te" "tr" "ts" "tt" "tz" "ug" "UG" "UN"
[118] "V"  "VA" "Vd" "vi" "Vo" "w"  "W"  "y"

您可以看到它们来自哪里

lapply(one_or_two, find)

apropos is ideal for this:

apropos("^[[:alpha:]]{1,2}$")

With no packages loaded, this returns:

 [1] "ar" "as" "by" "c"  "C"  "cm" "D"  "de" "df" "dt" "el" "F"  "gc" "gl"
[15] "I"  "if" "Im" "is" "lh" "lm" "ls" "pf" "pi" "pt" "q"  "qf" "qr" "qt"
[29] "Re" "rf" "rm" "rt" "sd" "t"  "T"  "ts" "vi"

The exact contents will depend upon the search list. Try loading a few packages and re-running it if you care about conflicts with packages that you commonly use.


I loaded all the (>200) packages installed on my machine with this:

lapply(rownames(installed.packages()), require, character.only = TRUE)

And reran the call to apropos, wrapping it in unique, since there were a few duplicates.

one_or_two <- unique(apropos("^[[:alpha:]]{1,2}$"))

This returned:

  [1] "Ad" "am" "ar" "as" "bc" "bd" "bp" "br" "BR" "bs" "by" "c"  "C" 
 [14] "cc" "cd" "ch" "ci" "CJ" "ck" "Cl" "cm" "cn" "cq" "cs" "Cs" "cv"
 [27] "d"  "D"  "dc" "dd" "de" "df" "dg" "dn" "do" "ds" "dt" "e"  "E" 
 [40] "el" "ES" "F"  "FF" "fn" "gc" "gl" "go" "H"  "Hi" "hm" "I"  "ic"
 [53] "id" "ID" "if" "IJ" "Im" "In" "ip" "is" "J"  "lh" "ll" "lm" "lo"
 [66] "Lo" "ls" "lu" "m"  "MH" "mn" "ms" "N"  "nc" "nd" "nn" "ns" "on"
 [79] "Op" "P"  "pa" "pf" "pi" "Pi" "pm" "pp" "ps" "pt" "q"  "qf" "qq"
 [92] "qr" "qt" "r"  "Re" "rf" "rk" "rl" "rm" "rt" "s"  "sc" "sd" "SJ"
[105] "sn" "sp" "ss" "t"  "T"  "te" "tr" "ts" "tt" "tz" "ug" "UG" "UN"
[118] "V"  "VA" "Vd" "vi" "Vo" "w"  "W"  "y"

You can see where they came from with

lapply(one_or_two, find)
め可乐爱微笑 2024-12-05 19:32:47

一直在思考这个问题。这是基本 R 中的一个字母对象名称列表:

> var.names <- c(letters,LETTERS)
> var.names[sapply(var.names,exists)]
[1] "c" "q" "t" "C" "D" "F" "I" "T" "X"

基本 R 中的一个和两个字母对象名称:

one.letter.names <- c(letters,LETTERS)

N <- length(one.letter.names)


first <- rep(one.letter.names,N)
second <- rep(one.letter.names,each=N)

two.letter.names <- paste(first,second,sep="")

var.names <- c(one.letter.names,two.letter.names)

> var.names[sapply(var.names,exists)]
[1] "c"  "d"  "q"  "t"  "C"  "D"  "F"  "I"  "J"  "N"  "T"  "X"  "bc" "gc"
[15] "id" "sd" "de" "Re" "df" "if" "pf" "qf" "rf" "lh" "pi" "vi" "el" "gl"
[29] "ll" "cm" "lm" "rm" "Im" "sp" "qq" "ar" "qr" "tr" "as" "bs" "is" "ls"
[43] "ns" "ps" "ts" "dt" "pt" "qt" "rt" "tt" "by" "VA" "UN"

这是一个比我最初怀疑的要大得多的列表,尽管我永远不会想到将变量命名为“if”,所以在某种程度上这是有道理的。

仍然无法捕获不在基数中的对象名称,或者给出最好避免哪些函数的任何意义。我认为更好的答案是使用专家意见来找出哪些函数是重要的(例如,使用c可能比使用qf更糟糕),或者在a上使用数据挖掘方法。一堆 R 代码,看看哪些短命名函数最常用。

Been thinking about this more. Here's a list of one-letter object names in base R:

> var.names <- c(letters,LETTERS)
> var.names[sapply(var.names,exists)]
[1] "c" "q" "t" "C" "D" "F" "I" "T" "X"

And one- and two-letter object names in base R:

one.letter.names <- c(letters,LETTERS)

N <- length(one.letter.names)


first <- rep(one.letter.names,N)
second <- rep(one.letter.names,each=N)

two.letter.names <- paste(first,second,sep="")

var.names <- c(one.letter.names,two.letter.names)

> var.names[sapply(var.names,exists)]
[1] "c"  "d"  "q"  "t"  "C"  "D"  "F"  "I"  "J"  "N"  "T"  "X"  "bc" "gc"
[15] "id" "sd" "de" "Re" "df" "if" "pf" "qf" "rf" "lh" "pi" "vi" "el" "gl"
[29] "ll" "cm" "lm" "rm" "Im" "sp" "qq" "ar" "qr" "tr" "as" "bs" "is" "ls"
[43] "ns" "ps" "ts" "dt" "pt" "qt" "rt" "tt" "by" "VA" "UN"

That's a much bigger list than I initially suspected, although I would never think of naming a variable "if", so to a certain degree it makes sense.

Still doesn't capture object names not in base, or give any sense of which functions are best avoided. I think a better answer would either use expert opinion to figure out which functions are important (e.g. using c is probably worse than using qf) or use a data mining approach on a bunch of R code to see what short-named functions get used the most.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文