为什么因子中的标签和水平的术语如此奇怪?
不可设置函数的一个例子是labels
。您只能在使用 factor()
函数创建因子标签时设置它们。没有 labels<-
函数。因素中的“标签”和“水平”没有任何意义......
> fac <- factor(1:3, labels=c("one", "two", "three"))
> fac
[1] one two three
Levels: one two three
> labels(fac)
[1] "1" "2" "3"
好吧,我要求提供标签,人们可能认为这些标签是由因素调用设置的,但我得到了一些东西......这个词是什么,不直观?
> levels(fac)
[1] "one" "two" "three"
如此看来,设置标签实际上就是设置级别。
> fac <- factor(1:3, levels=c("one", "two", "three"))
> levels(fac)
[1] "one" "two" "three"
好的,正如预期的那样。那么,当设置级别时,标签是什么?
> fac <- factor(1:3, levels=c("one", "two", "three"), labels=c("x","y", "z") )
> labels(fac)
[1] "1" "2" "3"
> levels(fac)
[1] "x" "y" "z"
看起来,factor()
的“标签”参数胜过任何级别规范的“级别”参数。为什么会这样呢?为什么labels()
返回我想象中用as.character(as.numeric(fac))
检索的内容?
(这是一个关于赋值函数的早期答案中的一个切题评论[如此标记],我被要求转到一个问题。所以这是你启发我的机会。)
An example of a non-settable function would be labels
. You can only set factor labels when they are created with the factor()
function. There is no labels<-
function. Not that 'labels' and 'levels' in factors make any sense....
> fac <- factor(1:3, labels=c("one", "two", "three"))
> fac
[1] one two three
Levels: one two three
> labels(fac)
[1] "1" "2" "3"
OK, I asked for labels, which one might assume were as set by the factor call, but I get something quite ... what's the word, unintuitive?
> levels(fac)
[1] "one" "two" "three"
So it appears that setting labels is really setting levels.
> fac <- factor(1:3, levels=c("one", "two", "three"))
> levels(fac)
[1] "one" "two" "three"
OK that is as expected. So what are labels when one sets levels?
> fac <- factor(1:3, levels=c("one", "two", "three"), labels=c("x","y", "z") )
> labels(fac)
[1] "1" "2" "3"
> levels(fac)
[1] "x" "y" "z"
It would seem that 'labels' arguments for factor()
trump any 'levels' arguments for the specification of levels. Why should this be? And why does labels()
return what I would have imagined to be retrieved with as.character(as.numeric(fac))
?
(This was a tangential comment [labelled as such] in an earlier answer about assignment functions to which I was asked to move to a question. So here's your opportunity to enlighten me.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为考虑
labels
和levels
之间差异的方法(忽略 Tommy 在他的答案中描述的labels()
函数)是levels
旨在告诉 R 在输入 (x
) 中查找哪些值以及在结果factor
的级别中使用什么顺序对象,而labels
是改变输入被编码为因子后的级别的值...正如Tommy的回答所建议的,factor返回的
被称为factor
对象没有任何部分()labels
...只是级别,已通过labels
参数调整...(清晰如泥)。例如:
因为在
levels
中找不到x
的前两个元素,所以f
的前两个元素是NA.因为
"d"
和"e"
包含在levels
中,所以它们甚至出现在f
的levels中尽管它们没有出现在x
中。现在使用标签:
R 找出因子中应包含的内容后,它会重新编码级别。人们当然可以用它来做一些令人费脑筋的事情,例如:
考虑
级别
的另一种方式是factor(x,levels=L1,labels=L2)
是相当于我认为此示例的适当措辞版本可能适合 Pat Burns 的 R inferno——第 8.2 节中有很多因素谜题,但不是这个特定的谜题……
I think the way to think about the difference between
labels
andlevels
(ignoring thelabels()
function that Tommy describes in his answer) is thatlevels
is intended to tell R which values to look for in the input (x
) and what order to use in the levels of the resultingfactor
object, andlabels
is to change the values of the levels after the input has been coded as a factor ... as suggested by Tommy's answer, there is no part of thefactor
object returned byfactor()
that is calledlabels
... just the levels, which have been adjusted by thelabels
argument ... (clear as mud).For example:
Because the first two elements of
x
were not found inlevels
, the first two elements off
areNA
. Because"d"
and"e"
were included inlevels
, they show up in the levels off
even though they did not occur inx
.Now with
labels
:After R figures out what should be in the factor, it re-codes the levels. One can of course use this to do brain-frying things such as:
Another way to think about
levels
is thatfactor(x,levels=L1,labels=L2)
is equivalent toI think an appropriately phrased version of this example might be nice for Pat Burns's R inferno -- there are plenty of factor puzzles in section 8.2, but not this particular one ...
labels
函数听起来非常适合获取因子的标签。...但是
labels
函数与因素无关!它被用作获取某些东西来“标记”对象的通用方法。对于原子向量,这就是名称。但如果没有名称,labels
函数将返回强制转换为字符串的元素索引 - 类似于as.character(seq_along(x))
。...这就是当您尝试为某个因素贴上标签时所看到的结果。该因子是一个没有任何名称的整数向量,但具有
levels
属性。因子没有标签。它只有级别。
factor
的labels
参数只是一种能够给出一组字符串但生成另一组字符串作为级别的方法......但更令人困惑的是,
dput
函数将levels
属性打印为.Label
!我认为这是一个遗留问题...但是,由于
labels
是一个通用函数,因此定义labels.factor
可能是一个好主意,如下所示(目前有没有)。也许 R 核心需要考虑一些事情?The
labels
function sounds like the perfect fit for getting the labels of a factor....but the
labels
function has nothing to do with factors! It is used as a generic way of getting something to "label" an object. For atomic vectors, this would be the names. But if there are no names, thelabels
function returns the element indices coerced to strings - something likeas.character(seq_along(x))
....So that's what your seeing when you try labels on a factor. The factor is an integer vector without any names, but with a
levels
attribute.A factor has no labels. It only has levels. The
labels
argument tofactor
is just a way to be able to give a set of strings but produce another set of strings as the levels...But to confuse things further, the
dput
function prints thelevels
attributes as.Label
! I think that is a legacy thing...However, since
labels
is a generic function, it would probably be a good idea to definelabels.factor
as follows (currently there is none). Perhaps something for R core to consider?