仅当 R (dplyr) 中满足特定条件时,才将一列中的某些数据替换为另一列数据
我有一个包含 30 多列和 10000 多行的广泛数据框。今天我想重点关注两列:languages
和 languages2
:
languages languages2
Spanish NA
Spanish NA
Other (specify) French
Other (specify) German
Other (specify) Russian
English NA
Other (specify) Portuguese
English NA
(...)
这就是我所需要的:
languages
Spanish
Spanish
French
German
Russian
English
Portuguese
English
(...)
我正在使用 mutate 函数寻找答案dplyr
I have an extensive data frame with 30+ columns and 10000+ rows. Today I want to focus in two columns: languages
and languages2
:
languages languages2
Spanish NA
Spanish NA
Other (specify) French
Other (specify) German
Other (specify) Russian
English NA
Other (specify) Portuguese
English NA
(...)
This is what I need:
languages
Spanish
Spanish
French
German
Russian
English
Portuguese
English
(...)
I am looking for an answer using mutate function from dplyr
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
对于更大的数据,您可能需要控制其他场景,例如防止用
NA
替换数据并保留Other
值。此外,如果解决第一列可能包含Other()
或Other, lang
的场景可能有好处。您可能需要考虑使用正则表达式或预处理第一列。For a bigger data you may want to control for additional scenarios, like preventing replacing the data with
NA
and laving theOther
value. Also if there may be a merit for addressing scenarios where the first column may containOther()
orOther, lang
. You may want to consider using a regular expression or pre-processing the first column.使用
dplyr
,我们可以将Other (specify)
替换为NA
,然后使用coalesce
:输出< /strong>
tidyverse
选项是使用str_replace_all
将Other (specify)
替换为languages2
中的值。数据
基准
但是,如果您有大量数据并且需要更快的速度,那么您可能会考虑基本 R,它比 dplyr 更快或
data.table
。Using
dplyr
, we could replaceOther (specify)
withNA
, then usecoalesce
:Output
A
tidyverse
option is to usestr_replace_all
to replaceOther (specify)
with the value fromlanguages2
.Data
Benchmark
However, if you have a lot of data and need something faster, then you might consider base R, which would be faster than
dplyr
ordata.table
.另一种可能性(如果您希望检查位于第二列而不是第一列):
Another possibility (if you prefer the check to be over the 2nd column instead of the 1st):