从Julia的另一个DF中替换DF中的列
假设我有两个dataframes df1
和df2
,目的
df1 = DataFrame(id=["a", "a", "a", "b", "b", "b", "c", "c", "c", "d", "d"],
var=[1, 32, 3, 22, 5, 4, 6, 7, 8, 4, 3])
11×2 DataFrame
Row │ id var
│ String Int64
─────┼───────────────
1 │ a 1
2 │ a 32
3 │ a 3
4 │ b 22
5 │ b 5
6 │ b 4
7 │ c 6
8 │ c 7
9 │ c 8
10 │ d 4
11 │ d 3
df2 = DataFrame(id=["a", "a", "b", "b", "b", "c", "c", "c"],
var=[1, 1, 2, 2, 2, 6, 6, 6])
8×2 DataFrame
Row │ id var
│ String Int64
─────┼───────────────
1 │ a 1
2 │ a 1
3 │ b 2
4 │ b 2
5 │ b 2
6 │ c 6
7 │ c 6
8 │ c 6
是为每个ID var
列列>使用var
的值,来自df2
的每个相应的ID
,仅适用df2
和df1
,
因此所需的结果看起来像:
DataFrame(id=["a", "a", "a", "b", "b", "b", "c", "c", "c", "d", "d"],
var=[1, 32, 3, 22, 5, 4, 6, 7, 8, 4, 3])
11×2 DataFrame
Row │ id var
│ String Int64
─────┼───────────────
1 │ a 1
2 │ a 1
3 │ a 1
4 │ b 2
5 │ b 2
6 │ b 2
7 │ c 6
8 │ c 6
9 │ c 6
10 │ d 4
11 │ d 3
尝试以下内容,但它们不起作用
for d1 in groupby(df1, :id)
replace!(d1.var .= [d2.var for d1 in groupby(df2, :id)])
end
#or
[[d1.var = d2.var for d2 in groupby(df2, :id)] for d1 in groupby(df1, :id)]
将感谢任何帮助。谢谢!
Suppose I have two DataFrames df1
and df2
as follows
df1 = DataFrame(id=["a", "a", "a", "b", "b", "b", "c", "c", "c", "d", "d"],
var=[1, 32, 3, 22, 5, 4, 6, 7, 8, 4, 3])
11×2 DataFrame
Row │ id var
│ String Int64
─────┼───────────────
1 │ a 1
2 │ a 32
3 │ a 3
4 │ b 22
5 │ b 5
6 │ b 4
7 │ c 6
8 │ c 7
9 │ c 8
10 │ d 4
11 │ d 3
df2 = DataFrame(id=["a", "a", "b", "b", "b", "c", "c", "c"],
var=[1, 1, 2, 2, 2, 6, 6, 6])
8×2 DataFrame
Row │ id var
│ String Int64
─────┼───────────────
1 │ a 1
2 │ a 1
3 │ b 2
4 │ b 2
5 │ b 2
6 │ c 6
7 │ c 6
8 │ c 6
The objective is to replace the var
column in df1 for each id
with the value of var
from df2
for each corresponding id
, only for those id
which exist in both df2
and df1
So the desired outcome will look like:
DataFrame(id=["a", "a", "a", "b", "b", "b", "c", "c", "c", "d", "d"],
var=[1, 32, 3, 22, 5, 4, 6, 7, 8, 4, 3])
11×2 DataFrame
Row │ id var
│ String Int64
─────┼───────────────
1 │ a 1
2 │ a 1
3 │ a 1
4 │ b 2
5 │ b 2
6 │ b 2
7 │ c 6
8 │ c 6
9 │ c 6
10 │ d 4
11 │ d 3
Tried the following but they don't work
for d1 in groupby(df1, :id)
replace!(d1.var .= [d2.var for d1 in groupby(df2, :id)])
end
#or
[[d1.var = d2.var for d2 in groupby(df2, :id)] for d1 in groupby(df1, :id)]
Will appreciate any help. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会这样做:
请注意,您的数据的问题是
df2
对于相同的唯一:id
具有多个行,所以我运行unique
在加入之前,请注意(还请注意,df1
和df2
中的每个组值数量不相同)。还有其他方法可以这样做(如果您想看到它们,请评论),但是它们将涉及更多代码(通过小组等进行迭代)。我提出的解决方案依赖于DataFrames.jl API的一部分的函数。
编辑
性能比较(首次运行包括汇编时间):
I would do it like this:
Note that the problem with your data is that
df2
has multiple rows for the same unique:id
so I rununique
on it before joining (also note that number of values per group indf1
anddf2
is not the same).There are other ways to do it (please comment if you would like to see them), but they would involve more code (iteration through groups etc.). The solution I propose relies on the functions that are part of DataFrames.jl API.
EDIT
Performance comparison (first run includes compilation time):
我相信 @bogumiłkamiński答案是如果您决定使用dataframes.jl API提供的内容,则需要解决。但是,如果您想尝试更长的时间,但更快的速度,您可能想尝试以下内容:
I believe @BogumiłKamiński answer is the way to go if you decide to use what DataFrames.jl API has to offer. If you want to try something longer but way faster, however, you might want to try the following: