创建并填充数据框列模拟（excel）vlookup 函数

发布于 2025-01-10 22:44:53 字数 1783 浏览 2 评论 0原文

我正在尝试在数据帧中创建一个新列，并使用另一个数据帧列中的值填充它，该值与两个数据帧列中的公共列相匹配。

DF1   DF2
A B   W B
———   ——— 
Y 2   X 2
N 4   F 4   
Y 5   T 5

我认为以下可以做到这一点。

df2[‘new_col’] = df1[‘A’] if df1[‘B’] == df2[‘B’] else “Not found”

所以结果应该是：

DF2
W B new_col
X 2 Y        -> Because DF1[‘B’] == 2 and value in same row is Y
F 4 N
T 5 Y

但我收到以下错误，我相信这是因为数据帧的大小不同？

raise ValueError("Can only compare identically-labeled Series objects”)

您能帮助我了解我做错了什么以及实现我所追求的目标的最佳方法是什么吗？

先感谢您。

更新1 尝试 Corralien 解决方案我仍然得到以下结果：

ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat

这是我编写的代码

df1 = pd.DataFrame(np.array([['x', 2, 3], ['y', 5, 6], ['z', 8, 9]]),
                   columns=['One', 'b', 'Three'])
                   
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])

df2.reset_index().merge(df1.reset_index(), on=['b'], how='left') \
       .drop(columns='index').rename(columns={'One': 'new_col'})

更新 2 这是第二个选项，但它似乎没有在 df2 中添加列。

df1 = pd.DataFrame(np.array([['x', 2, 3], ['y', 5, 6], ['z', 8, 9]]),
                   columns=['One', 'b', 'Three'])

df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])

df2 = df2.set_index('b', append=True).join(df1.set_index('b', append=True)) \
       .reset_index('b').rename(columns={'One': 'new_col'})

print(df2)


   b  a  c new_col Three
0  2  1  3     NaN   NaN
1  5  4  6     NaN   NaN
2  8  7  9     NaN   NaN

为什么上面的代码不起作用？

原文

I am trying to create a new column in a dataframe and polulate it with a value from another data frame column which matches a common column from both data frames columns.

DF1   DF2
A B   W B
———   ——— 
Y 2   X 2
N 4   F 4   
Y 5   T 5

I though the following could do the tick.

df2[‘new_col’] = df1[‘A’] if df1[‘B’] == df2[‘B’] else “Not found”

So result should be:

DF2
W B new_col
X 2 Y        -> Because DF1[‘B’] == 2 and value in same row is Y
F 4 N
T 5 Y

but I get the below error, I believe that is because dataframes are different sizes?

raise ValueError("Can only compare identically-labeled Series objects”)

Can you help me understand what am I doing wrong and what is the best way to achieve what I am after?

Thank you in advance.

UPDATE 1
Trying Corralien solution I still get the below:

ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat

This is the code I wrote

df1 = pd.DataFrame(np.array([['x', 2, 3], ['y', 5, 6], ['z', 8, 9]]),
                   columns=['One', 'b', 'Three'])
                   
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])

df2.reset_index().merge(df1.reset_index(), on=['b'], how='left') \
       .drop(columns='index').rename(columns={'One': 'new_col'})

UPDATE 2
Here is the second option, but it does not seem to add columns in df2.

df1 = pd.DataFrame(np.array([['x', 2, 3], ['y', 5, 6], ['z', 8, 9]]),
                   columns=['One', 'b', 'Three'])

df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])

df2 = df2.set_index('b', append=True).join(df1.set_index('b', append=True)) \
       .reset_index('b').rename(columns={'One': 'new_col'})

print(df2)


   b  a  c new_col Three
0  2  1  3     NaN   NaN
1  5  4  6     NaN   NaN
2  8  7  9     NaN   NaN

Why is the code above not working?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

平安喜乐 2025-01-17 22:44:53

你的问题不清楚，因为为什么F与N相关，T与Y相关？为什么F不和Y、T和N不在一起呢？

使用merge：

>>> df2.merge(df1, on='B', how='left')
   W  B  A
0  X  2  Y
1  F  4  N  # What you want
2  F  4  Y  # Another solution
3  T  4  N  # What you want
4  T  4  Y  # Another solution

如何决定正确的值？有行索引吗？

更新

所以你需要使用索引位置：

>>> df2.reset_index().merge(df1.reset_index(), on=['index', 'B'], how='left') \
       .drop(columns='index').rename(columns={'A': 'new_col'})

   W  B new_col
0  X  2       Y
1  F  4       N
2  T  4       Y

事实上，您可以将 B 列视为每个数据帧的附加索引。

使用 join

>>> df2.set_index('B', append=True).join(df1.set_index('B', append=True)) \
       .reset_index('B').rename(columns={'A': 'new_col'})

   B  W new_col
0  2  X       Y
1  4  F       N
2  4  T       Y

设置：

df1 = pd.DataFrame([['x', 2, 3], ['y', 5, 6], ['z', 8, 9]],
                   columns=['One', 'b', 'Three'])

df2 = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],
                   columns=['a', 'b', 'c'])

Your question is not clear because why is F associated with N and T with Y? Why not F with Y and T with N?

Using merge:

>>> df2.merge(df1, on='B', how='left')
   W  B  A
0  X  2  Y
1  F  4  N  # What you want
2  F  4  Y  # Another solution
3  T  4  N  # What you want
4  T  4  Y  # Another solution

How do you decide on the right value? With row index?

Update

So you need to use the index position:

>>> df2.reset_index().merge(df1.reset_index(), on=['index', 'B'], how='left') \
       .drop(columns='index').rename(columns={'A': 'new_col'})

   W  B new_col
0  X  2       Y
1  F  4       N
2  T  4       Y

In fact you can consider the column B as an additional index of each dataframe.

Using join

>>> df2.set_index('B', append=True).join(df1.set_index('B', append=True)) \
       .reset_index('B').rename(columns={'A': 'new_col'})

   B  W new_col
0  2  X       Y
1  4  F       N
2  4  T       Y

Setup:

df1 = pd.DataFrame([['x', 2, 3], ['y', 5, 6], ['z', 8, 9]],
                   columns=['One', 'b', 'Three'])

df2 = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],
                   columns=['a', 'b', 'c'])

回复收藏 0 原文

~没有更多了~