如何更改 data.frame 中列的内容

发布于 2024-12-26 11:24:00 字数 875 浏览 1 评论 0原文

我正在使用世界发展指标 (WDI) 的数据，并希望将此数据与其他一些数据合并。我的问题是两个数据集中国家/地区名称的拼写不同。如何更改国家/地区变量？

library('WDI')
df <- WDI(country="all", indicator= c("NY.GDP.MKTP.CD", "EN.ATM.CO2E.KD.GD", 'SE.TER.ENRR'), start=1998, end=2011, extra=FALSE)

head(df)
      country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD SE.TER.ENRR
99  ArabWorld    1A 1998   575369488074          1.365953          NA
100 ArabWorld    1A 1999   627550544566          1.355583    19.54259
101 ArabWorld    1A 2000   723111925659          1.476619          NA
102 ArabWorld    1A 2001   703688747656          1.412750          NA
103 ArabWorld    1A 2002   713021728054          1.413733          NA
104 ArabWorld    1A 2003   803017236111          1.469197          NA

如何将阿拉伯世界更改为阿拉伯世界？

我需要更改很多名称，因此使用 row.numbers 执行此操作不会给我足够的灵活性。我想要类似于 Stata 中的 replace 函数的东西。

原文

I’m using data from World Development Indicators (WDI) and want to merge this data with some other data. My problem is that the spelling of country names in the two datasets is different. How do I change the country variable?

library('WDI')
df <- WDI(country="all", indicator= c("NY.GDP.MKTP.CD", "EN.ATM.CO2E.KD.GD", 'SE.TER.ENRR'), start=1998, end=2011, extra=FALSE)

head(df)
      country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD SE.TER.ENRR
99  ArabWorld    1A 1998   575369488074          1.365953          NA
100 ArabWorld    1A 1999   627550544566          1.355583    19.54259
101 ArabWorld    1A 2000   723111925659          1.476619          NA
102 ArabWorld    1A 2001   703688747656          1.412750          NA
103 ArabWorld    1A 2002   713021728054          1.413733          NA
104 ArabWorld    1A 2003   803017236111          1.469197          NA

How do i change ArabWorld to Arab World?

There are a lot of names I need to change so doing this with the use of row.numbers will not give me enough flexibility. I want something that is similar to the replace function in Stata.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

天冷不及心凉 2025-01-02 11:24:00

这适用于性格或因素。

df$country <- sub("ArabWorld", "Arab World", df$country)

这是等效的：

> df[,1] <- sub("ArabWorld", "Arab World", df[,1] )
> head(df)
       country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD
99  Arab World    1A 1998   575369488074          1.365953
100 Arab World    1A 1999   627550544566          1.355583
101 Arab World    1A 2000   723111925659          1.476619
102 Arab World    1A 2001   703688747656          1.412750

如果您创建一个包含所需更改的数据框，您可以循环更改它们。请注意，我已经更新了此内容，以便它显示如何在该列中输入括号，以便将它们正确传递给 sub：

name.cng <- data.frame(orig = c("AntiguaandBarbuda", "AmericanSamoa", 
                                    "EastAsia&Pacific\\(developingonly\\)",
                                    "Europe&CentralAsia\\(developingonly\\)", 
                                    "UnitedArabEmirates"), 
                           spaced=c("Antigua and Barbuda", "American Samoa",
                                    "East Asia & Pacific (developing only)",
                                     "Europe&CentralAsia (developing only)", 
                                      "United Arab Emirates") )
for (i in 1:NROW(name.cng)){ 
      df$country <- sub(name.cng[i,1], name.cng[i,2], df$country) }

This would work for character or factors.

df$country <- sub("ArabWorld", "Arab World", df$country)

This is equivalent:

> df[,1] <- sub("ArabWorld", "Arab World", df[,1] )
> head(df)
       country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD
99  Arab World    1A 1998   575369488074          1.365953
100 Arab World    1A 1999   627550544566          1.355583
101 Arab World    1A 2000   723111925659          1.476619
102 Arab World    1A 2001   703688747656          1.412750

If you create a dataframe with the desired changes you can loop through to change them. Note that I have updated this so that it shows how to enter the parentheses in that column so they would be correctly passed to sub:

name.cng <- data.frame(orig = c("AntiguaandBarbuda", "AmericanSamoa", 
                                    "EastAsia&Pacific\\(developingonly\\)",
                                    "Europe&CentralAsia\\(developingonly\\)", 
                                    "UnitedArabEmirates"), 
                           spaced=c("Antigua and Barbuda", "American Samoa",
                                    "East Asia & Pacific (developing only)",
                                     "Europe&CentralAsia (developing only)", 
                                      "United Arab Emirates") )
for (i in 1:NROW(name.cng)){ 
      df$country <- sub(name.cng[i,1], name.cng[i,2], df$country) }

回复收藏 0 原文

稚然 2025-01-02 11:24:00

最简单的方法（特别是如果您有许多名称需要更改）可能是将对应表放入 data.frame 中，并使用 merge 命令将其与数据连接起来。
例如，如果您想更改韩国的名称：

# Correspondance table
countries <- data.frame(
  iso2c = c("KR", "KP"),
  country = c("South Korea", "North Korea")
)

# Join the data.frames
d <- merge( df, countries, by="iso2c", all.x=TRUE )
# Compute the new country name
d$country <- ifelse(is.na(d$country.y), as.character(d$country.x), as.character(d$country.y))
# Remove the columns we no longer need
d <- d[, setdiff(names(d), c("country.x", "country.y"))]

# Check that the result looks correct
head(d)
head(d[ d$iso2c %in% c("KR", "KP"), ])

但是，在国家/地区 ISO 代码上连接两个数据集可能比在国家/地区名称上连接更安全，因为国家/地区 ISO 代码比在国家/地区名称上连接更标准。

The easiest, especially if you have many names to change, is probably to put your correspondance table in a data.frame, and join it with the data, with the merge command.
For instance, if you wanted to change the name of the Koreas:

# Correspondance table
countries <- data.frame(
  iso2c = c("KR", "KP"),
  country = c("South Korea", "North Korea")
)

# Join the data.frames
d <- merge( df, countries, by="iso2c", all.x=TRUE )
# Compute the new country name
d$country <- ifelse(is.na(d$country.y), as.character(d$country.x), as.character(d$country.y))
# Remove the columns we no longer need
d <- d[, setdiff(names(d), c("country.x", "country.y"))]

# Check that the result looks correct
head(d)
head(d[ d$iso2c %in% c("KR", "KP"), ])

However, it may be safer to join your two datasets on the country ISO code, which is more standard, than on the country name.

回复收藏 0 原文

简单爱 2025-01-02 11:24:00

使用子集化：

df[df[, "country"] == "ArabWorld", "country"] <- "Arab World"

head(df)
   country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD SE.TER.ENRR
99  Arab World    1A 1998   575369488074          1.365953          NA
100 Arab World    1A 1999   627550544566          1.355583    19.54259
101 Arab World    1A 2000   723111925659          1.476619          NA
102 Arab World    1A 2001   703688747656          1.412750          NA
103 Arab World    1A 2002   713021728054          1.413733          NA
104 Arab World    1A 2003   803017236111          1.469197          NA

Using subsetting:

df[df[, "country"] == "ArabWorld", "country"] <- "Arab World"

head(df)
   country iso2c year NY.GDP.MKTP.CD EN.ATM.CO2E.KD.GD SE.TER.ENRR
99  Arab World    1A 1998   575369488074          1.365953          NA
100 Arab World    1A 1999   627550544566          1.355583    19.54259
101 Arab World    1A 2000   723111925659          1.476619          NA
102 Arab World    1A 2001   703688747656          1.412750          NA
103 Arab World    1A 2002   713021728054          1.413733          NA
104 Arab World    1A 2003   803017236111          1.469197          NA

回复收藏 0 原文

~没有更多了~