将数值列的 NA 替换为 R 中的数值和字符值

发布于 2025-01-09 12:00:12 字数 967 浏览 1 评论 0原文

我有一个包含多列的数据框 df。
其中两列( AGE 和 SALARY 列)的类型为 double。
我想用 0 和
替换 AGE 列的缺失值 SALARY 列缺失值“未找到”。

最有效的方法是什么?

replace_na(df, list(AGE=0, SALARY="not found"))

我收到错误:

Error in `stop_vctrs()`:
! Can't convert `replace$SALARY` <character> to match type of `data$SALARY` <double>.
Backtrace:
 1. tidyr::replace_na(df, list(AGE= 0, SALARY= "not found"))
 2. tidyr:::replace_na.data.frame(df, list(AGE= 0, SALARY= "not found"))
 3. vctrs::vec_assign(...)
 4. vctrs `<fn>`()
 5. vctrs::vec_default_cast(...)
 6. vctrs::stop_incompatible_cast(...)
 7. vctrs::stop_incompatible_type(...)
 8. vctrs:::stop_incompatible(...)
 9. vctrs:::stop_vctrs(...)

编辑:这是我的数据集的来源:https://drive.google.com/file/d/1cKxzNrnIMq4RxdMcBz3nlr7YtYaPhn5_/view?usp=sharing

I have a dataframe df with multiple columns.
two of them ( columns AGE and SALARY ), are of type double.
I want to replace the missing value of column AGE with 0 and
missing value of column SALARY with "not found".

what is the most efficient way to do so?

replace_na(df, list(AGE=0, SALARY="not found"))

I get error :

Error in `stop_vctrs()`:
! Can't convert `replace$SALARY` <character> to match type of `data$SALARY` <double>.
Backtrace:
 1. tidyr::replace_na(df, list(AGE= 0, SALARY= "not found"))
 2. tidyr:::replace_na.data.frame(df, list(AGE= 0, SALARY= "not found"))
 3. vctrs::vec_assign(...)
 4. vctrs `<fn>`()
 5. vctrs::vec_default_cast(...)
 6. vctrs::stop_incompatible_cast(...)
 7. vctrs::stop_incompatible_type(...)
 8. vctrs:::stop_incompatible(...)
 9. vctrs:::stop_vctrs(...)

edit : this is the source of my dataset : https://drive.google.com/file/d/1cKxzNrnIMq4RxdMcBz3nlr7YtYaPhn5_/view?usp=sharing

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

桃扇骨 2025-01-16 12:00:12

将 tidyr 更新到版本 1.2.0 后,我遇到了同样的问题

从 tidyr 的更改日志中:

replace_na() 不再允许在应用替换时更改数据类型。现在,replace 将始终转换为进行替换之前的数据类型。例如,这意味着不再允许在整数列上使用替换值 1.5。同样,替换列表列中的缺失值现在必须使用 list("foo") 而不仅仅是 "foo" 来完成。

您正在尝试转换两列“年龄”和“薪水”。

单独转换“Age”应该可以,因为它可能是 double 类型,并且您将 NA 转换为 0,也是 double 类型。

mutate(Age = Replace_na(Age, 0) #This should work

但是,当您尝试使用字符串“未找到”在“Salary”中 replace_na 时,您必须转换character 列过去会自动执行此操作,但现在您可以通过添加对 as.character 的调用来解决此问题。 >

mutate(Salary = Replace_na(Salary, "not found") #used to work

新方法:

mutate(Salary = Replace_na(as.character(Salary), "not find") #New方法

I ran into the same problem after updating tidyr to version 1.2.0

From the changelog for tidyr:

replace_na() no longer allows the type of data to change when the replacement is applied. replace will now always be cast to the type of data before the replacement is made. For example, this means that using a replacement value of 1.5 on an integer column is no longer allowed. Similarly, replacing missing values in a list-column must now be done with list("foo") rather than just "foo".

You are trying to convert two columns, 'Age' and 'Salary'.

Converting 'Age' by itself should work because it is probably type double, and you are converting the NAs to 0, also type double.

mutate(Age = replace_na(Age, 0) #This should work

But when you try to replace_na in 'Salary' with a string "not found", you have to convert the column to character. replace_na used to do this automatically, but it no longer does. You can fix this by adding a call to as.character

mutate(Salary = replace_na(Salary, "not found") #used to work

New method:

mutate(Salary = replace_na(as.character(Salary), "not found") #New method

另类 2025-01-16 12:00:12

根据链接的数据判断,您似乎有空白数据单元格,而不是NA。如果这是正确的,那么这应该有效:

df %>%
  mutate(
    AGE = ifelse(AGE == "", 0, AGE),
    SALARY = ifelse(SALARY == "", "not found", SALARY)
  )
  AGE    SALARY
1   0         4
2   2         3
3   3 not found
4   5         7
5   7         5

数据:

df <- data.frame(AGE = c("", 2, 3, 5, 7), 
                 SALARY = c(4, 3, "", 7, 5))

To judge by the linked data, you seem to have blank data cells rather than NA. If that's correct then this should work:

df %>%
  mutate(
    AGE = ifelse(AGE == "", 0, AGE),
    SALARY = ifelse(SALARY == "", "not found", SALARY)
  )
  AGE    SALARY
1   0         4
2   2         3
3   3 not found
4   5         7
5   7         5

Data:

df <- data.frame(AGE = c("", 2, 3, 5, 7), 
                 SALARY = c(4, 3, "", 7, 5))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文