为什么因子中的标签和水平的术语如此奇怪?

发布于 2024-11-30 18:49:37 字数 1023 浏览 1 评论 0原文

不可设置函数的一个例子是labels。您只能在使用 factor() 函数创建因子标签时设置它们。没有 labels<- 函数。因素中的“标签”和“水平”没有任何意义......

>  fac <- factor(1:3, labels=c("one", "two", "three"))
> fac
[1] one   two   three
Levels: one two three
> labels(fac)
[1] "1" "2" "3"

好吧,我要求提供标签,人们可能认为这些标签是由因素调用设置的,但我得到了一些东西......这个词是什么,不直观?

> levels(fac)
[1] "one"   "two"   "three"

如此看来,设置标签实际上就是设置级别。

>  fac <- factor(1:3, levels=c("one", "two", "three"))
> levels(fac)
[1] "one"   "two"   "three"

好的,正如预期的那样。那么,当设置级别时,标签是什么?

>  fac <- factor(1:3, levels=c("one", "two", "three"), labels=c("x","y", "z") )
> labels(fac)
[1] "1" "2" "3"
> levels(fac)
[1] "x" "y" "z"

看起来,factor() 的“标签”参数胜过任何级别规范的“级别”参数。为什么会这样呢?为什么labels()返回我想象中用as.character(as.numeric(fac))检索的内容?

(这是一个关于赋值函数的早期答案中的一个切题评论[如此标记],我被要求转到一个问题。所以这是你启发我的机会。)

An example of a non-settable function would be labels. You can only set factor labels when they are created with the factor() function. There is no labels<- function. Not that 'labels' and 'levels' in factors make any sense....

>  fac <- factor(1:3, labels=c("one", "two", "three"))
> fac
[1] one   two   three
Levels: one two three
> labels(fac)
[1] "1" "2" "3"

OK, I asked for labels, which one might assume were as set by the factor call, but I get something quite ... what's the word, unintuitive?

> levels(fac)
[1] "one"   "two"   "three"

So it appears that setting labels is really setting levels.

>  fac <- factor(1:3, levels=c("one", "two", "three"))
> levels(fac)
[1] "one"   "two"   "three"

OK that is as expected. So what are labels when one sets levels?

>  fac <- factor(1:3, levels=c("one", "two", "three"), labels=c("x","y", "z") )
> labels(fac)
[1] "1" "2" "3"
> levels(fac)
[1] "x" "y" "z"

It would seem that 'labels' arguments for factor() trump any 'levels' arguments for the specification of levels. Why should this be? And why does labels() return what I would have imagined to be retrieved with as.character(as.numeric(fac))?

(This was a tangential comment [labelled as such] in an earlier answer about assignment functions to which I was asked to move to a question. So here's your opportunity to enlighten me.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

爱要勇敢去追 2024-12-07 18:49:37

我认为考虑 labelslevels 之间差异的方法(忽略 Tommy 在他的答案中描述的 labels() 函数)是levels 旨在告诉 R 在输入 (x) 中查找哪些值以及在结果 factor 的级别中使用什么顺序对象,而labels是改变输入被编码为因子后的级别的...正如Tommy的回答所建议的,factor返回的factor对象没有任何部分() 被称为 labels ...只是级别,已通过 labels 参数调整...(清晰如泥)。

例如:

> f <- factor(x=c("a","b","c"),levels=c("c","d","e"))
> f
[1] <NA> <NA> c  
Levels: c d e
> str(f)
Factor w/ 3 levels "c","d","e": NA NA 1

因为在levels中找不到x的前两个元素,所以f的前两个元素是NA.因为 "d""e" 包含在 levels 中,所以它们甚至出现在 f 的levels中尽管它们没有出现在x中。

现在使用标签:

> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("C","D","E"))
> f
[1] <NA> <NA> C   
Levels: C D E

R 找出因子中应包含的内容后,它会重新编码级别。人们当然可以用它来做一些令人费脑筋的事情,例如:

> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("a","b","c"))
> f
[1] <NA> <NA> a   
Levels: a b c

考虑级别的另一种方式是factor(x,levels=L1,labels=L2)是相当于

f <- factor(x,levels=L1)
levels(f) <- L2

我认为此示例的适当措辞版本可能适合 Pat Burns 的 R inferno——第 8.2 节中有很多因素谜题,但不是这个特定的谜题……

I think the way to think about the difference between labels and levels (ignoring the labels() function that Tommy describes in his answer) is that levels is intended to tell R which values to look for in the input (x) and what order to use in the levels of the resulting factor object, and labels is to change the values of the levels after the input has been coded as a factor ... as suggested by Tommy's answer, there is no part of the factor object returned by factor() that is called labels ... just the levels, which have been adjusted by the labels argument ... (clear as mud).

For example:

> f <- factor(x=c("a","b","c"),levels=c("c","d","e"))
> f
[1] <NA> <NA> c  
Levels: c d e
> str(f)
Factor w/ 3 levels "c","d","e": NA NA 1

Because the first two elements of x were not found in levels, the first two elements of f are NA. Because "d" and "e" were included in levels, they show up in the levels of f even though they did not occur in x.

Now with labels:

> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("C","D","E"))
> f
[1] <NA> <NA> C   
Levels: C D E

After R figures out what should be in the factor, it re-codes the levels. One can of course use this to do brain-frying things such as:

> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("a","b","c"))
> f
[1] <NA> <NA> a   
Levels: a b c

Another way to think about levels is that factor(x,levels=L1,labels=L2) is equivalent to

f <- factor(x,levels=L1)
levels(f) <- L2

I think an appropriately phrased version of this example might be nice for Pat Burns's R inferno -- there are plenty of factor puzzles in section 8.2, but not this particular one ...

迷你仙 2024-12-07 18:49:37

labels 函数听起来非常适合获取因子的标签。

...但是labels函数与因素无关!它被用作获取某些东西来“标记”对象的通用方法。对于原子向量,这就是名称。但如果没有名称,labels 函数将返回强制转换为字符串的元素索引 - 类似于 as.character(seq_along(x))

...这就是当您尝试为某个因素贴上标签时所看到的结果。该因子是一个没有任何名称的整数向量,但具有 levels 属性。

因子没有标签。它只有级别。 factorlabels 参数只是一种能够给出一组字符串但生成另一组字符串作为级别的方法......
但更令人困惑的是,dput 函数将 levels 属性打印为 .Label!我认为这是一个遗留问题...

# Translate lower case letters to upper case.
f <- factor(letters[2:4], letters[1:3], LETTERS[1:3])
dput(f)
#structure(c(2L, 3L, NA), .Label = c("A", "B", "C"), class = "factor")
attributes(f)
#$levels
#[1] "A" "B" "C"
#
#$class
#[1] "factor"

但是,由于 labels 是一个通用函数,因此定义 labels.factor 可能是一个好主意,如下所示(目前有没有)。也许 R 核心需要考虑一些事情?

labels.factor <- function(x, ...) as.character(x)

The labels function sounds like the perfect fit for getting the labels of a factor.

...but the labels function has nothing to do with factors! It is used as a generic way of getting something to "label" an object. For atomic vectors, this would be the names. But if there are no names, the labels function returns the element indices coerced to strings - something like as.character(seq_along(x)).

...So that's what your seeing when you try labels on a factor. The factor is an integer vector without any names, but with a levels attribute.

A factor has no labels. It only has levels. The labels argument to factor is just a way to be able to give a set of strings but produce another set of strings as the levels...
But to confuse things further, the dput function prints the levels attributes as .Label! I think that is a legacy thing...

# Translate lower case letters to upper case.
f <- factor(letters[2:4], letters[1:3], LETTERS[1:3])
dput(f)
#structure(c(2L, 3L, NA), .Label = c("A", "B", "C"), class = "factor")
attributes(f)
#$levels
#[1] "A" "B" "C"
#
#$class
#[1] "factor"

However, since labels is a generic function, it would probably be a good idea to define labels.factor as follows (currently there is none). Perhaps something for R core to consider?

labels.factor <- function(x, ...) as.character(x)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文