在不更改值顺序的情况下对因子的级别进行重新排序

发布于 2024-08-23 07:42:16 字数 586 浏览 12 评论 0原文

我有包含一些数值变量和一些分类因子变量的数据框。这些因素的级别顺序不是我想要的方式。

numbers <- 1:4
letters <- factor(c("a", "b", "c", "d"))
df <- data.frame(numbers, letters)
df
#   numbers letters
# 1       1       a
# 2       2       b
# 3       3       c
# 4       4       d

如果我改变级别的顺序,字母将不再与它们对应的数字在一起(从现在起我的数据完全是无稽之谈)。

levels(df$letters) <- c("d", "c", "b", "a")
df
#   numbers letters
# 1       1       d
# 2       2       c
# 3       3       b
# 4       4       a

我只是想更改级别顺序,因此在绘图时,条形图会按所需的顺序显示 - 这可能与默认的字母顺序不同。

I have data frame with some numerical variables and some categorical factor variables. The order of levels for those factors is not the way I want them to be.

numbers <- 1:4
letters <- factor(c("a", "b", "c", "d"))
df <- data.frame(numbers, letters)
df
#   numbers letters
# 1       1       a
# 2       2       b
# 3       3       c
# 4       4       d

If I change the order of the levels, the letters no longer are with their corresponding numbers (my data is total nonsense from this point on).

levels(df$letters) <- c("d", "c", "b", "a")
df
#   numbers letters
# 1       1       d
# 2       2       c
# 3       3       b
# 4       4       a

I simply want to change the level order, so when plotting, the bars are shown in the desired order - which may differ from default alphabetical order.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

開玄 2024-08-30 07:42:16

使用 factorlevels 参数:

df <- data.frame(f = 1:4, g = letters[1:4])
df
#   f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

levels(df$g)
# [1] "a" "b" "c" "d"

df$g <- factor(df$g, levels = letters[4:1])
# levels(df$g)
# [1] "d" "c" "b" "a"

df
#   f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

Use the levels argument of factor:

df <- data.frame(f = 1:4, g = letters[1:4])
df
#   f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

levels(df$g)
# [1] "a" "b" "c" "d"

df$g <- factor(df$g, levels = letters[4:1])
# levels(df$g)
# [1] "d" "c" "b" "a"

df
#   f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
花开半夏魅人心 2024-08-30 07:42:16

更多,仅供记录

## reorder is a base function
df$letters <- reorder(df$letters, new.order=letters[4:1])

library(gdata)
df$letters <- reorder.factor(df$letters, letters[4:1])

您可能还会发现有用的重新调整 和 combine_factor

some more, just for the record

## reorder is a base function
df$letters <- reorder(df$letters, new.order=letters[4:1])

library(gdata)
df$letters <- reorder.factor(df$letters, letters[4:1])

You may also find useful Relevel and combine_factor.

新人笑 2024-08-30 07:42:16

由于这个问题最后是活跃的,哈德利发布了他的新的用于操纵因素的forcats包,我发现它非常有用。 OP 数据框中的示例:

levels(df$letters)
# [1] "a" "b" "c" "d"

反转级别:

library(forcats)
fct_rev(df$letters) %>% levels
# [1] "d" "c" "b" "a"

添加更多级别:

fct_expand(df$letters, "e") %>% levels
# [1] "a" "b" "c" "d" "e"

以及许多更有用的 fct_xxx() 函数。

Since this question was last active Hadley has released his new forcats package for manipulating factors and I'm finding it outrageously useful. Examples from the OP's data frame:

levels(df$letters)
# [1] "a" "b" "c" "d"

To reverse levels:

library(forcats)
fct_rev(df$letters) %>% levels
# [1] "d" "c" "b" "a"

To add more levels:

fct_expand(df$letters, "e") %>% levels
# [1] "a" "b" "c" "d" "e"

And many more useful fct_xxx() functions.

七堇年 2024-08-30 07:42:16

因此,在 R 词典中,您想要的是仅更改给定因子变量的标签(即保留数据以及因子水平< /em>,不变)。

df$letters = factor(df$letters, labels=c("d", "c", "b", "a"))

假设您只想更改数据点到标签的映射,而不是数据或因子模式(数据点如何分箱到单独的箱或因子值中,这可能有助于了解映射如何最初是在您最初创建因子时设置的,

规则很简单:

  • 标签通过索引值(即值)映射到级别。
    在levels[2]处给出标签,label[2]);
  • 因子水平可以通过将它们传递给显式设置
    级别参数;或者
  • 如果没有为 level 参数提供值,则使用默认值
    使用的值是在数据向量上调用unique的结果
    传入(对于 data 参数);
  • 可以通过 labels 参数显式设置标签;或者
  • 如果没有为 labels 参数提供值,则默认值为
    使用的只是 levels 向量

so what you want, in R lexicon, is to change only the labels for a given factor variable (ie, leave the data as well as the factor levels, unchanged).

df$letters = factor(df$letters, labels=c("d", "c", "b", "a"))

given that you want to change only the datapoint-to-label mapping and not the data or the factor schema (how the datapoints are binned into individual bins or factor values, it might help to know how the mapping is originally set when you initially create the factor.

the rules are simple:

  • labels are mapped to levels by index value (ie, the value
    at levels[2] is given the label, label[2]);
  • factor levels can be set explicitly by passing them in via the the
    levels argument; or
  • if no value is supplied for the levels argument, the default
    value is used which is the result calling unique on the data vector
    passed in (for the data argument);
  • labels can be set explicitly via the labels argument; or
  • if no value is supplied for the labels argument, the default value is
    used which is just the levels vector
海之角 2024-08-30 07:42:16

我必须承认,处理 R 中的因子是一项相当特殊的工作......在重新排序因子级别时,您并没有重新排序基础数值。这里有一个小演示:

> numbers = 1:4
> letters = factor(letters[1:4])
> dtf <- data.frame(numbers, letters)
> dtf
  numbers letters
1       1       a
2       2       b
3       3       c
4       4       d
> sapply(dtf, class)
  numbers   letters 
"integer"  "factor" 

现在,如果您将此因子转换为数字,您将得到:

# return underlying numerical values
1> with(dtf, as.numeric(letters))
[1] 1 2 3 4
# change levels
1> levels(dtf$letters) <- letters[4:1]
1> dtf
  numbers letters
1       1       d
2       2       c
3       3       b
4       4       a
# return numerical values once again
1> with(dtf, as.numeric(letters))
[1] 1 2 3 4

正如您所看到的...通过更改级别,您仅更改级别(谁会告诉,呃?),而不是数值!但是,当您按照 @Jonathan Chang 建议使用 factor 函数时,会发生不同的情况:您自己更改数值。

您再次收到错误,因为您执行了级别,然后尝试使用因子重新级别。不要这样做!不要不要使用级别,否则你会把事情搞砸(除非你确切地知道你在做什么)。

<我>
一个小建议:避免使用与 R 对象相同的名称来命名对象(df 是 F 分布的密度函数,letters 给出小写字母)。在这种特殊情况下,您的代码不会有错误,但有时可能会……但这会造成混乱,我们不希望这样,不是吗?!? =)

相反,使用类似这样的内容(我将再次从头开始):

> dtf <- data.frame(f = 1:4, g = factor(letters[1:4]))
> dtf
  f g
1 1 a
2 2 b
3 3 c
4 4 d
> with(dtf, as.numeric(g))
[1] 1 2 3 4
> dtf$g <- factor(dtf$g, levels = letters[4:1])
> dtf
  f g
1 1 a
2 2 b
3 3 c
4 4 d
> with(dtf, as.numeric(g))
[1] 4 3 2 1

请注意,您还可以使用 df 和 < 命名 data.frame 。 code>letters 而不是 g,结果就OK了。实际上,这段代码与您发布的代码相同,只是名称发生了变化。这部分 factor(dtf$letter,levels = letters[4:1]) 不会抛出错误,但它可能会令人困惑!

仔细阅读 ?factor 手册! factor(g,levels = letter[4:1])factor(g, labels = letter[4:1]) 之间有什么区别? levels(g) <- letters[4:1]g <- factor(g, labels = letter[4:1]) 有什么相似之处?

您可以使用 ggplot 语法,这样我们就可以在这方面为您提供更多帮助!

干杯!!!

编辑:

ggplot2实际上需要更改级别和值?嗯……我把这个挖出来……

Dealing with factors in R is quite peculiar job, I must admit... While reordering the factor levels, you're not reordering underlying numerical values. Here's a little demonstration:

> numbers = 1:4
> letters = factor(letters[1:4])
> dtf <- data.frame(numbers, letters)
> dtf
  numbers letters
1       1       a
2       2       b
3       3       c
4       4       d
> sapply(dtf, class)
  numbers   letters 
"integer"  "factor" 

Now, if you convert this factor to numeric, you'll get:

# return underlying numerical values
1> with(dtf, as.numeric(letters))
[1] 1 2 3 4
# change levels
1> levels(dtf$letters) <- letters[4:1]
1> dtf
  numbers letters
1       1       d
2       2       c
3       3       b
4       4       a
# return numerical values once again
1> with(dtf, as.numeric(letters))
[1] 1 2 3 4

As you can see... by changing levels, you change levels only (who would tell, eh?), not the numerical values! But, when you use factor function as @Jonathan Chang suggested, something different happens: you change numerical values themselves.

You're getting error once again 'cause you do levels and then try to relevel it with factor. Don't do it!!! Do not use levels or you'll mess things up (unless you know exactly what you're doing).


One lil' suggestion: avoid naming your objects with an identical name as R's objects (df is density function for F distribution, letters gives lowercase alphabet letters). In this particular case, your code would not be faulty, but sometimes it can be... but this can create confusion, and we don't want that, do we?!? =)

Instead, use something like this (I'll go from the beginning once again):

> dtf <- data.frame(f = 1:4, g = factor(letters[1:4]))
> dtf
  f g
1 1 a
2 2 b
3 3 c
4 4 d
> with(dtf, as.numeric(g))
[1] 1 2 3 4
> dtf$g <- factor(dtf$g, levels = letters[4:1])
> dtf
  f g
1 1 a
2 2 b
3 3 c
4 4 d
> with(dtf, as.numeric(g))
[1] 4 3 2 1

Note that you can also name you data.frame with df and letters instead of g, and the result will be OK. Actually, this code is identical with the one you posted, only the names are changed. This part factor(dtf$letter, levels = letters[4:1]) wouldn't throw an error, but it can be confounding!

Read the ?factor manual thoroughly! What's the difference between factor(g, levels = letters[4:1]) and factor(g, labels = letters[4:1])? What's similar in levels(g) <- letters[4:1] and g <- factor(g, labels = letters[4:1])?

You can put ggplot syntax, so we can help you more on this one!

Cheers!!!

Edit:

ggplot2 actually requires to change both levels and values? Hm... I'll dig this one out...

鸢与 2024-08-30 07:42:16

我希望添加另一种情况,其中级别可以是带有数字和一些特殊字符的字符串:如下面的示例

df <- data.frame(x = c("15-25", "0-4", "5-10", "11-14", "100+"))

x 的默认级别是:

df$x
# [1] 15-25 0-4   5-10  11-14 100+ 
# Levels: 0-4 100+ 11-14 15-25 5-10

在这里,如果我们想根据数值对因子级别重新排序,在没有明确写出级别的情况下,我们能做的是

library(gtools)
df$x <- factor(df$x, levels = mixedsort(df$x))

df$x
# [1] 15-25 0-4   5-10  11-14 100+ 
# Levels: 0-4 5-10 11-14 15-25 100+
as.numeric(df$x)
# [1] 4 1 2 3 5

我希望这可以被视为对未来读者有用的信息。

I wish to add another case where the levels could be strings carrying numbers alongwith some special characters : like below example

df <- data.frame(x = c("15-25", "0-4", "5-10", "11-14", "100+"))

The default levels of x is :

df$x
# [1] 15-25 0-4   5-10  11-14 100+ 
# Levels: 0-4 100+ 11-14 15-25 5-10

Here if we want to reorder the factor levels according to the numeric value, without explicitly writing out the levels, what we could do is

library(gtools)
df$x <- factor(df$x, levels = mixedsort(df$x))

df$x
# [1] 15-25 0-4   5-10  11-14 100+ 
# Levels: 0-4 5-10 11-14 15-25 100+
as.numeric(df$x)
# [1] 4 1 2 3 5

I hope this can be considered as useful information for future readers.

你的他你的她 2024-08-30 07:42:16

这是我对给定数据帧的因子进行重新排序的函数:

reorderFactors <- function(df, column = "my_column_name", 
                           desired_level_order = c("fac1", "fac2", "fac3")) {

  x = df[[column]]
  lvls_src = levels(x) 

  idxs_target <- vector(mode="numeric", length=0)
  for (target in desired_level_order) {
    idxs_target <- c(idxs_target, which(lvls_src == target))
  }

  x_new <- factor(x,levels(x)[idxs_target])

  df[[column]] <- x_new

  return (df)
}

用法:reorderFactors(df, "my_col",desired_level_order = c("how","I","want"))

Here's my function to reorder factors of a given dataframe:

reorderFactors <- function(df, column = "my_column_name", 
                           desired_level_order = c("fac1", "fac2", "fac3")) {

  x = df[[column]]
  lvls_src = levels(x) 

  idxs_target <- vector(mode="numeric", length=0)
  for (target in desired_level_order) {
    idxs_target <- c(idxs_target, which(lvls_src == target))
  }

  x_new <- factor(x,levels(x)[idxs_target])

  df[[column]] <- x_new

  return (df)
}

Usage: reorderFactors(df, "my_col", desired_level_order = c("how","I","want"))

破晓 2024-08-30 07:42:16

我会简单地使用级别参数:

levels(df$letters) <- levels(df$letters)[c(4:1)]

I would simply use the levels argument:

levels(df$letters) <- levels(df$letters)[c(4:1)]
长途伴 2024-08-30 07:42:16

添加另一种非常有用的方法,因为它使我们免于记住不同包中的函数。因素的级别只是属性,因此可以执行以下操作:

numbers <- 1:4
letters <- factor(c("a", "b", "c", "d"))
df <- data.frame(numbers, letters)

# Original attributes
> attributes(df$letters)
$levels
[1] "a" "b" "c" "d"

$class
[1] "factor"

# Modify attributes
attr(df$letters,"levels") <- c("d", "c", "b", "a")

> df$letters
[1] d c b a
Levels: d c b a

# New attributes
> attributes(df$letters)
$levels
[1] "d" "c" "b" "a"

$class
[1] "factor"

To add yet another approach that is quite useful as it frees us from remembering functions from differents packages. The levels of a factor are just attributes, so one can do the following:

numbers <- 1:4
letters <- factor(c("a", "b", "c", "d"))
df <- data.frame(numbers, letters)

# Original attributes
> attributes(df$letters)
$levels
[1] "a" "b" "c" "d"

$class
[1] "factor"

# Modify attributes
attr(df$letters,"levels") <- c("d", "c", "b", "a")

> df$letters
[1] d c b a
Levels: d c b a

# New attributes
> attributes(df$letters)
$levels
[1] "d" "c" "b" "a"

$class
[1] "factor"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文