用R中的匹配ID替换单元格

发布于 2025-01-23 18:45:39 字数 1026 浏览 1 评论 0原文

i dataFrame(XLSX文件)的形式:(此DataFrame具有152069行)。

目标
dortmundantwerp
孟买Spijkenisse
xioalanbeilun beilun
ettringenbremerhaven
hilterbremerhaven

和我还有另一个带有IDS和名称的数据框:(此DataFrame具有10200行,每个名称唯一的ID)。

ID名称
2678Dortmund
6049孟买
9873Xioalan
3014Ettringen
4055Hilter
338Antwerp
8323Spijkenisse
824Beilun
1272Bremerhaven,

我想用适当的ID替换第一个数据框架的数据(在第二个数据Frame中)。您对如何执行此操作有任何建议吗?先感谢您。

I a dataframe (xlsx file) in the form: (this dataframe has 152069 rows).

SourceTarget
DORTMUNDANTWERP
MUMBAISPIJKENISSE
XIOALANBEILUN
ETTRINGENBREMERHAVEN
HILTERBREMERHAVEN

and I also have another dataframe with the Ids and Names: (this dataframe has 10200 rows with unique id's for each name).

IDName
2678DORTMUND
6049MUMBAI
9873XIOALAN
3014ETTRINGEN
4055HILTER
338ANTWERP
8323SPIJKENISSE
824BEILUN
1272BREMERHAVEN

I would like to replace the data of the first dataframe with their appropriate id (in the second dataframe). Do you have any suggestions on how to do this? Thank you in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

云柯 2025-01-30 18:45:39

使用匹配第一个DFRM数据的第二次DFRM name创建索引,以应用于第二个数据框架的ID列。显然,这应该在您具有足够备份的R对象上完成。

txt1 <-"| Source   | Target         |
+ | DORTMUND | ANTWERP        |
+ | MUMBAI   | SPIJKENISSE    |
+ | XIOALAN  | BEILUN         |
+ |ETTRINGEN |BREMERHAVEN     |
+ |HILTER    |BREMERHAVEN     |"

 txt2 <- "| ID       | Name           |
+ | 2678     | DORTMUND       |
+ | 6049     | MUMBAI         |
+ | 9873     | XIOALAN        |
+ | 3014     | ETTRINGEN      |
+ | 4055     | HILTER         |
+ | 338      | ANTWERP        |
+ | 8323     | SPIJKENISSE    |
+ | 824      | BEILUN         |
+ | 1272     | BREMERHAVEN    |"
  inp1 <-read.delim(text=txt1, sep="|")[,2:3]
  inp2 <-read.delim(text=txt2, sep="|")[,2:3]

> inp1[] <- lapply(inp1,trimws) 
> inp2[] <- lapply(inp2,trimws)

> inp1[] <- lapply(inp1, function(col){inp2$ID[match(col,inp2$Name)]})
> inp1
  Source Target
1   2678    338
2   6049   8323
3   9873    824
4   3014   1272
5   4055   1272

Use match of first dfrm data agains second dfrm Name to create an index to apply to the ID column of the second dataframe. Obviously this should be done on R object for which you have adequate backups.

txt1 <-"| Source   | Target         |
+ | DORTMUND | ANTWERP        |
+ | MUMBAI   | SPIJKENISSE    |
+ | XIOALAN  | BEILUN         |
+ |ETTRINGEN |BREMERHAVEN     |
+ |HILTER    |BREMERHAVEN     |"

 txt2 <- "| ID       | Name           |
+ | 2678     | DORTMUND       |
+ | 6049     | MUMBAI         |
+ | 9873     | XIOALAN        |
+ | 3014     | ETTRINGEN      |
+ | 4055     | HILTER         |
+ | 338      | ANTWERP        |
+ | 8323     | SPIJKENISSE    |
+ | 824      | BEILUN         |
+ | 1272     | BREMERHAVEN    |"
  inp1 <-read.delim(text=txt1, sep="|")[,2:3]
  inp2 <-read.delim(text=txt2, sep="|")[,2:3]

> inp1[] <- lapply(inp1,trimws) 
> inp2[] <- lapply(inp2,trimws)

> inp1[] <- lapply(inp1, function(col){inp2$ID[match(col,inp2$Name)]})
> inp1
  Source Target
1   2678    338
2   6049   8323
3   9873    824
4   3014   1272
5   4055   1272
我纯我任性 2025-01-30 18:45:39

使用基本R,您可以使用匹配将名称配对在一起,然后我们可以获得id供每列替换的:

df1$Source <- df2$ID[match(df1$Source, df2$Name)]
df1$Target <- df2$ID[match(df1$Target, df2$Name)]

output

  Source Target
1   2678    338
2   6049   8323
3   9873    824
4   3014   1272
5   4055   1272

又选项将再次突变并使用匹配

library(tidyverse)

df1 %>% 
  mutate(across(everything(), ~ df2$ID[match(.x, df2$Name)]))

另一个选项是旋转到长形式,然后将数据连接在一起,然后重新启动宽(但不是很高效)。

df1 %>% 
  pivot_longer(everything()) %>% 
  left_join(., df2, by = c("value" = "Name")) %>% 
  select(-value) %>% 
  group_by(grp = ceiling(row_number()/2)) %>% 
  pivot_wider(names_from = "name", values_from = "ID") %>% 
  select(-grp)

数据

df1 <- structure(list(Source = c("DORTMUND", "MUMBAI", "XIOALAN", "ETTRINGEN", 
"HILTER"), Target = c("ANTWERP", "SPIJKENISSE", "BEILUN", "BREMERHAVEN", 
"BREMERHAVEN")), class = "data.frame", row.names = c(NA, -5L))

df2 <- structure(list(ID = c(2678L, 6049L, 9873L, 3014L, 4055L, 338L, 
8323L, 824L, 1272L), Name = c("DORTMUND", "MUMBAI", "XIOALAN", 
"ETTRINGEN", "HILTER", "ANTWERP", "SPIJKENISSE", "BEILUN", "BREMERHAVEN"
)), class = "data.frame", row.names = c(NA, -9L))

With base R, you can use match to pair the names together then we can get the ID for those to replace for each column:

df1$Source <- df2$ID[match(df1$Source, df2$Name)]
df1$Target <- df2$ID[match(df1$Target, df2$Name)]

Output

  Source Target
1   2678    338
2   6049   8323
3   9873    824
4   3014   1272
5   4055   1272

Another option would be to mutate across and use match again:

library(tidyverse)

df1 %>% 
  mutate(across(everything(), ~ df2$ID[match(.x, df2$Name)]))

Another option would be to pivot to long form, then join the data together then pivot back wide (but not very efficient).

df1 %>% 
  pivot_longer(everything()) %>% 
  left_join(., df2, by = c("value" = "Name")) %>% 
  select(-value) %>% 
  group_by(grp = ceiling(row_number()/2)) %>% 
  pivot_wider(names_from = "name", values_from = "ID") %>% 
  select(-grp)

Data

df1 <- structure(list(Source = c("DORTMUND", "MUMBAI", "XIOALAN", "ETTRINGEN", 
"HILTER"), Target = c("ANTWERP", "SPIJKENISSE", "BEILUN", "BREMERHAVEN", 
"BREMERHAVEN")), class = "data.frame", row.names = c(NA, -5L))

df2 <- structure(list(ID = c(2678L, 6049L, 9873L, 3014L, 4055L, 338L, 
8323L, 824L, 1272L), Name = c("DORTMUND", "MUMBAI", "XIOALAN", 
"ETTRINGEN", "HILTER", "ANTWERP", "SPIJKENISSE", "BEILUN", "BREMERHAVEN"
)), class = "data.frame", row.names = c(NA, -9L))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文