根据r中的第一个数据集的值，将变量的新变量添加添加新变量

发布于 2025-01-23 11:47:38 字数 1322 浏览 0 评论 0原文

我有一个数据集“ DF”，具有许多观察结果和多个变量，包括一些邮政编码（在某些情况下重复几次）和不同的数据集“ DF2”，其中这些邮政代码的坐标。我想在我的第一个数据集“ DF”中添加两个新变量，并具有这些邮政编码的坐标，但是鉴于我拥有的大量数据，循环需要太长。我想知道我是否可以在维护数据帧结构而不变成矩阵的同时以某种方式进行矢量化。我附上了我想要实现的简化版本。

# This dataset has my variables (removed the rest for simplicity)
df <- data.frame(pc = c("00001", "00002", "00003", 
                        "00001", "00002", "00003", 
                        "00001", "00002", "00003"))
     pc
1 00001
2 00002
3 00003
4 00001
5 00002
6 00003
7 00001
8 00002
9 00003

#This dataset holds the coordinates
df2 <- data.frame(pc = c("00001", "00002", "00003"),
                 lat = c(1, 2, 3),
                 long = c(4, 5, 6))
     pc lat long
1 00001   1    4
2 00002   2    5
3 00003   3    6

#This is the dataset I need
good.df <- data.frame(pc = c("00001", "00002", "00003", 
                             "00001", "00002", "00003", 
                             "00001", "00002", "00003"),
                      lat = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
                      long = c(4, 5, 6, 4, 5, 6, 4, 5, 6))
     pc lat long
1 00001   1    4
2 00002   2    5
3 00003   3    6
4 00001   1    4
5 00002   2    5
6 00003   3    6
7 00001   1    4
8 00002   2    5
9 00003   3    6

我已经搜索了很长一段时间的解决方案，但是考虑到我不知道如何正确提出到目前为止我没有成功的问题。我真的很感谢这里的一些指导。

谢谢

原文

I have a dataset "df" with many observations and multiple variables including some postal codes (repeated several times in some cases) and a different dataset "df2" with the coordinates of these postal codes. I want to add two new variables to my first dataset "df" with the coordinates of these postal codes but, given that huge amount of data I have, it takes too long with a loop. I would like to know if I can vectorize it in some way while maintaining the dataframe structure and not changing into matrix. I attach a simplified version of what I want to achieve.

# This dataset has my variables (removed the rest for simplicity)
df <- data.frame(pc = c("00001", "00002", "00003", 
                        "00001", "00002", "00003", 
                        "00001", "00002", "00003"))
     pc
1 00001
2 00002
3 00003
4 00001
5 00002
6 00003
7 00001
8 00002
9 00003

#This dataset holds the coordinates
df2 <- data.frame(pc = c("00001", "00002", "00003"),
                 lat = c(1, 2, 3),
                 long = c(4, 5, 6))
     pc lat long
1 00001   1    4
2 00002   2    5
3 00003   3    6

#This is the dataset I need
good.df <- data.frame(pc = c("00001", "00002", "00003", 
                             "00001", "00002", "00003", 
                             "00001", "00002", "00003"),
                      lat = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
                      long = c(4, 5, 6, 4, 5, 6, 4, 5, 6))
     pc lat long
1 00001   1    4
2 00002   2    5
3 00003   3    6
4 00001   1    4
5 00002   2    5
6 00003   3    6
7 00001   1    4
8 00002   2    5
9 00003   3    6

I have searched for the solution for quite a long time, but considering I do not know how to properly ask the question I have had no success so far. I would really appreciate some guidance here.

Thank you

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

土豪我们做朋友吧 2025-01-30 11:47:38

我们可以从dplyr软件包中使用left_join。加入PC：

library(dplyr)
left_join(df, df2, by = "pc")

     pc lat long
1 00001   1    4
2 00002   2    5
3 00003   3    6
4 00001   1    4
5 00002   2    5
6 00003   3    6
7 00001   1    4
8 00002   2    5
9 00003   3    6

We could use left_join from dplyr package. Joining by pc:

library(dplyr)
left_join(df, df2, by = "pc")

     pc lat long
1 00001   1    4
2 00002   2    5
3 00003   3    6
4 00001   1    4
5 00002   2    5
6 00003   3    6
7 00001   1    4
8 00002   2    5
9 00003   3    6

回复收藏 0 原文

~没有更多了~