pandas：根据列条件将行追加到相似行下的另一个数据帧

发布于 2025-01-14 06:59:06 字数 1164 浏览 2 评论 0原文

我有两个数据框，如下所示，

import pandas as pd
d1 ={'col1': ['I ate dinner','I ate dinner', 'the play was inetresting','the play was inetresting'],
 'col2': ['min', 'max', 'mid','min'],
 'col3': ['min', 'max', 'max','max']}

d2 ={'col1': ['I ate dinner',' the glass is shattered', 'the play was inetresting'],
     'col2': ['min', 'max', 'max'],
     'col3': ['max', 'min', 'mid']}

df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

我在 df2 中创建了一个名为“exist”的列，并根据 df2.col1 中的句子是否存在于 df1.col1 中添加值（true、false）：

common = df1.merge(df2,on=['col1'])
index_list = df2[(~df2.col1.isin(common.col1))].index.to_list()
df2['exist'] = ' '
df2.loc[index_list, 'exist'] = 'false'
df2.loc[df2["exist"] == " ",'exist'] = 'true'

我现在想做的是如果存在列中的值 == true，我想将该行添加到 df1 中的类似行下。所以所需的输出应该是：

output:
                       col1 col2 col3
0              I ate dinner  min  min
1              I ate dinner  max  max
2              I ate dinner  min  max
3  the play was inetresting  mid  max
4  the play was inetresting  min  max
5  the play was inetresting  max  mid

我想我必须使用 np.where，但我不确定如何制定附加以获得所需的输出

原文

I have two dataframes as follows,

import pandas as pd
d1 ={'col1': ['I ate dinner','I ate dinner', 'the play was inetresting','the play was inetresting'],
 'col2': ['min', 'max', 'mid','min'],
 'col3': ['min', 'max', 'max','max']}

d2 ={'col1': ['I ate dinner',' the glass is shattered', 'the play was inetresting'],
     'col2': ['min', 'max', 'max'],
     'col3': ['max', 'min', 'mid']}

df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

I have created a column in df2 called 'exist' and added values (true, false) based on whether the sentences in df2.col1 exist in df1.col1:

common = df1.merge(df2,on=['col1'])
index_list = df2[(~df2.col1.isin(common.col1))].index.to_list()
df2['exist'] = ' '
df2.loc[index_list, 'exist'] = 'false'
df2.loc[df2["exist"] == " ",'exist'] = 'true'

what I would like to do now, is that if the value in the exist column == true, I would like to add that row under the similar row in df1. so the desired output should be:

output:
                       col1 col2 col3
0              I ate dinner  min  min
1              I ate dinner  max  max
2              I ate dinner  min  max
3  the play was inetresting  mid  max
4  the play was inetresting  min  max
5  the play was inetresting  max  mid

I guess I have to use np.where, but I am not sure how to formulate the append to get the desired output

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一人独醉 2025-01-21 06:59:06

第一个想法是通过 df1.col1 过滤 df2 值，并通过 concat 然后按 DataFrame.sort_values：

df = pd.concat([df1, df2[(df2.col1.isin(df1.col1))]]).sort_values('col1', ignore_index=True)
print (df)
                       col1 col2 col3
0              I ate dinner  min  min
1              I ate dinner  max  max
2              I ate dinner  min  max
3  the play was inetresting  mid  max
4  the play was inetresting  min  max
5  the play was inetresting  max  mid

如果只需要两个 DataFrame 中的共同值，则可以通过 numpy.intersect1d：

common = np.intersect1d(df1['col1'], df2['col1'])

df = (pd.concat([df1[df1.col1.isin(common)],
                 df2[df2.col1.isin(common)]])
                     .sort_values('col1', ignore_index=True))
print (df)

First idea is filter df2 values by df1.col1 and append to df1 by concat and then sorting by DataFrame.sort_values:

df = pd.concat([df1, df2[(df2.col1.isin(df1.col1))]]).sort_values('col1', ignore_index=True)
print (df)
                       col1 col2 col3
0              I ate dinner  min  min
1              I ate dinner  max  max
2              I ate dinner  min  max
3  the play was inetresting  mid  max
4  the play was inetresting  min  max
5  the play was inetresting  max  mid

If need only common values in both DataFrames is possible filter by numpy.intersect1d:

common = np.intersect1d(df1['col1'], df2['col1'])

df = (pd.concat([df1[df1.col1.isin(common)],
                 df2[df2.col1.isin(common)]])
                     .sort_values('col1', ignore_index=True))
print (df)

回复收藏 0 原文

长安忆 2025-01-21 06:59:06

IIUC，您想要添加匹配的行而不一定依赖于排序。

df2b = df2.set_index('col1')

(df1
 .groupby('col1', as_index=False, group_keys=False)
 .apply(lambda d: pd.concat([d, df2b.loc[[d.name]].reset_index()]))
 .reset_index(drop=True)
)

输出：

                       col1 col2 col3
0              I ate dinner  min  min
1              I ate dinner  max  max
2              I ate dinner  min  max
3  the play was inetresting  mid  max
4  the play was inetresting  min  max
5  the play was inetresting  max  mid

IIUC, you want to add the matching row(s) and not necessarily rely on sorting.

df2b = df2.set_index('col1')

(df1
 .groupby('col1', as_index=False, group_keys=False)
 .apply(lambda d: pd.concat([d, df2b.loc[[d.name]].reset_index()]))
 .reset_index(drop=True)
)

output:

                       col1 col2 col3
0              I ate dinner  min  min
1              I ate dinner  max  max
2              I ate dinner  min  max
3  the play was inetresting  mid  max
4  the play was inetresting  min  max
5  the play was inetresting  max  mid

回复收藏 0 原文

~没有更多了~