如何更改 data.frame 中列的内容
我正在使用世界发展指标 (WDI) 的数据,并希望将此数据与其他一些数据合并。我的问题是两个数据集中国家/地区名称的拼写不同。如何更改国家/地区变量?
library('WDI')
df <- WDI(country="all", indicator= c("NY.GDP.MKTP.CD", "EN.ATM.CO2E.KD.GD", 'SE.TER.ENRR'), start=1998, end=2011, extra=FALSE)
head(df)
country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD SE.TER.ENRR
99 ArabWorld 1A 1998 575369488074 1.365953 NA
100 ArabWorld 1A 1999 627550544566 1.355583 19.54259
101 ArabWorld 1A 2000 723111925659 1.476619 NA
102 ArabWorld 1A 2001 703688747656 1.412750 NA
103 ArabWorld 1A 2002 713021728054 1.413733 NA
104 ArabWorld 1A 2003 803017236111 1.469197 NA
如何将阿拉伯世界更改为阿拉伯世界?
我需要更改很多名称,因此使用 row.numbers 执行此操作不会给我足够的灵活性。我想要类似于 Stata 中的 replace
函数的东西。
I’m using data from World Development Indicators (WDI) and want to merge this data with some other data. My problem is that the spelling of country names in the two datasets is different. How do I change the country variable?
library('WDI')
df <- WDI(country="all", indicator= c("NY.GDP.MKTP.CD", "EN.ATM.CO2E.KD.GD", 'SE.TER.ENRR'), start=1998, end=2011, extra=FALSE)
head(df)
country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD SE.TER.ENRR
99 ArabWorld 1A 1998 575369488074 1.365953 NA
100 ArabWorld 1A 1999 627550544566 1.355583 19.54259
101 ArabWorld 1A 2000 723111925659 1.476619 NA
102 ArabWorld 1A 2001 703688747656 1.412750 NA
103 ArabWorld 1A 2002 713021728054 1.413733 NA
104 ArabWorld 1A 2003 803017236111 1.469197 NA
How do i change ArabWorld to Arab World?
There are a lot of names I need to change so doing this with the use of row.numbers will not give me enough flexibility. I want something that is similar to the replace
function in Stata.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这适用于性格或因素。
这是等效的:
如果您创建一个包含所需更改的数据框,您可以循环更改它们。请注意,我已经更新了此内容,以便它显示如何在该列中输入括号,以便将它们正确传递给
sub
:This would work for character or factors.
This is equivalent:
If you create a dataframe with the desired changes you can loop through to change them. Note that I have updated this so that it shows how to enter the parentheses in that column so they would be correctly passed to
sub
:最简单的方法(特别是如果您有许多名称需要更改)可能是将对应表放入
data.frame
中,并使用merge
命令将其与数据连接起来。例如,如果您想更改韩国的名称:
但是,在国家/地区 ISO 代码上连接两个数据集可能比在国家/地区名称上连接更安全,因为国家/地区 ISO 代码比在国家/地区名称上连接更标准。
The easiest, especially if you have many names to change, is probably to put your correspondance table in a
data.frame
, and join it with the data, with themerge
command.For instance, if you wanted to change the name of the Koreas:
However, it may be safer to join your two datasets on the country ISO code, which is more standard, than on the country name.
使用子集化:
Using subsetting: