创建并填充数据框列模拟(excel)vlookup 函数
我正在尝试在数据帧中创建一个新列,并使用另一个数据帧列中的值填充它,该值与两个数据帧列中的公共列相匹配。
DF1 DF2
A B W B
——— ———
Y 2 X 2
N 4 F 4
Y 5 T 5
我认为以下可以做到这一点。
df2[‘new_col’] = df1[‘A’] if df1[‘B’] == df2[‘B’] else “Not found”
所以结果应该是:
DF2
W B new_col
X 2 Y -> Because DF1[‘B’] == 2 and value in same row is Y
F 4 N
T 5 Y
但我收到以下错误,我相信这是因为数据帧的大小不同?
raise ValueError("Can only compare identically-labeled Series objects”)
您能帮助我了解我做错了什么以及实现我所追求的目标的最佳方法是什么吗?
先感谢您。
更新1 尝试 Corralien 解决方案我仍然得到以下结果:
ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat
这是我编写的代码
df1 = pd.DataFrame(np.array([['x', 2, 3], ['y', 5, 6], ['z', 8, 9]]),
columns=['One', 'b', 'Three'])
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
df2.reset_index().merge(df1.reset_index(), on=['b'], how='left') \
.drop(columns='index').rename(columns={'One': 'new_col'})
更新 2 这是第二个选项,但它似乎没有在 df2 中添加列。
df1 = pd.DataFrame(np.array([['x', 2, 3], ['y', 5, 6], ['z', 8, 9]]),
columns=['One', 'b', 'Three'])
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
df2 = df2.set_index('b', append=True).join(df1.set_index('b', append=True)) \
.reset_index('b').rename(columns={'One': 'new_col'})
print(df2)
b a c new_col Three
0 2 1 3 NaN NaN
1 5 4 6 NaN NaN
2 8 7 9 NaN NaN
为什么上面的代码不起作用?
I am trying to create a new column in a dataframe and polulate it with a value from another data frame column which matches a common column from both data frames columns.
DF1 DF2
A B W B
——— ———
Y 2 X 2
N 4 F 4
Y 5 T 5
I though the following could do the tick.
df2[‘new_col’] = df1[‘A’] if df1[‘B’] == df2[‘B’] else “Not found”
So result should be:
DF2
W B new_col
X 2 Y -> Because DF1[‘B’] == 2 and value in same row is Y
F 4 N
T 5 Y
but I get the below error, I believe that is because dataframes are different sizes?
raise ValueError("Can only compare identically-labeled Series objects”)
Can you help me understand what am I doing wrong and what is the best way to achieve what I am after?
Thank you in advance.
UPDATE 1
Trying Corralien solution I still get the below:
ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat
This is the code I wrote
df1 = pd.DataFrame(np.array([['x', 2, 3], ['y', 5, 6], ['z', 8, 9]]),
columns=['One', 'b', 'Three'])
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
df2.reset_index().merge(df1.reset_index(), on=['b'], how='left') \
.drop(columns='index').rename(columns={'One': 'new_col'})
UPDATE 2
Here is the second option, but it does not seem to add columns in df2.
df1 = pd.DataFrame(np.array([['x', 2, 3], ['y', 5, 6], ['z', 8, 9]]),
columns=['One', 'b', 'Three'])
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
df2 = df2.set_index('b', append=True).join(df1.set_index('b', append=True)) \
.reset_index('b').rename(columns={'One': 'new_col'})
print(df2)
b a c new_col Three
0 2 1 3 NaN NaN
1 5 4 6 NaN NaN
2 8 7 9 NaN NaN
Why is the code above not working?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你的问题不清楚,因为为什么F与N相关,T与Y相关?为什么F不和Y、T和N不在一起呢?
使用
merge
:如何决定正确的值?有行索引吗?
更新
所以你需要使用索引位置:
使用
join
设置:
Your question is not clear because why is F associated with N and T with Y? Why not F with Y and T with N?
Using
merge
:How do you decide on the right value? With row index?
Update
So you need to use the index position:
Using
join
Setup: