根据Python中第二个数据帧中的列值过滤一个数据帧中的记录

发布于 2025-01-09 08:53:07 字数 1312 浏览 0 评论 0原文

我有两个 DataFrame df1 和 df2。

df1 是原始数据集，df2 是 df1 经过一些操作后生成的数据集。

在df1中，我有列'log'，在df2中，我有列'log1'和' log2' 两列。

其中 'log1' 和 'log2' 列中的值包含在 df1 的 'log' 列中。

下面的 df2 示例

date  id     log1    log2
1   uu1q    (2,4)   (3,5)
1   uu1q    (2,4)   (7,6)
1   uu1q    (3,5)   (7,6)
5   u25a    (4,7)   (3,9)
5   uu25a   (1,9)   (3,9)
6   ua3b7   (1,1)   (2,2)
6   ua3b7   (1,1)   (3,3)
6   ua3b7   (2,2)   (3,3)

df1 列示例以及下面的数据

date id     log    name   col1  col2
1   uu1q    (2,4)   xyz   1123  qqq
1   uu1q    (3,5)   aas   2132  wew
1   uu1q    (7,6)   wqas  2567  uuo
5   u25a    (4,7)   enj   666   ttt
5   fff     (0,0)   ddd   0     lll

现在我想根据每行的列值从 df1 获取/过滤所有记录在 df2 中，即基于 'date'、'id'、'log1' 或 'log2 ' 并将其与中的列进行比较df1 即 '日期'、'id'、'日志'。

注意：值列'log1'和'log2'包含在单列'log'中

原文

I have two DataFrame df1 and df2.

df1 is the original dataset and df2 is the dataset made from df1 after some manipulation.

In df1 I have column 'log' and in df2 I have column 'log1' and 'log2' two columns.

where the values in columns 'log1' and 'log2' contains in column 'log' in df1.

df2 sample below

date  id     log1    log2
1   uu1q    (2,4)   (3,5)
1   uu1q    (2,4)   (7,6)
1   uu1q    (3,5)   (7,6)
5   u25a    (4,7)   (3,9)
5   uu25a   (1,9)   (3,9)
6   ua3b7   (1,1)   (2,2)
6   ua3b7   (1,1)   (3,3)
6   ua3b7   (2,2)   (3,3)

df1 column sample with data below

date id     log    name   col1  col2
1   uu1q    (2,4)   xyz   1123  qqq
1   uu1q    (3,5)   aas   2132  wew
1   uu1q    (7,6)   wqas  2567  uuo
5   u25a    (4,7)   enj   666   ttt
5   fff     (0,0)   ddd   0     lll

Now I want to take fetch/filter all the records from df1 based on column values for each row in df2 i.e. based on 'date', 'id', 'log1' or 'log2' and compare it with columns in df1 i.e.
'date', 'id', 'log'.

NOTE: values columns 'log1' and 'log2' contained in single column 'log'

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉城凉梦凉人心 2025-01-16 08:53:07

IIUC，您正在寻找链式 isin：

out = df1[df1['date'].isin(df2['date']) & df1['id'].isin(df2['id']) & (df1['log'].isin(df2['log1']) | df1['log'].isin(df2['log2']))]

输出：

   date    id    log  name  col1 col2
0     1  uu1q  (2,4)   xyz  1123  qqq
1     1  uu1q  (3,5)   aas  2132  wew
2     1  uu1q  (7,6)  wqas  2567  uuo
3     5  u25a  (4,7)   enj   666  ttt

IIUC, you're looking for a chained isin:

out = df1[df1['date'].isin(df2['date']) & df1['id'].isin(df2['id']) & (df1['log'].isin(df2['log1']) | df1['log'].isin(df2['log2']))]

Output:

   date    id    log  name  col1 col2
0     1  uu1q  (2,4)   xyz  1123  qqq
1     1  uu1q  (3,5)   aas  2132  wew
2     1  uu1q  (7,6)  wqas  2567  uuo
3     5  u25a  (4,7)   enj   666  ttt

回复收藏 0 原文

白芷 2025-01-16 08:53:07

使用 DataFrame.melt 用于来自 log1、log2... 列的 log 列以及过滤 DataFrame.merge：

df = (df2.melt(['date','id'], value_name='log')
         .drop('variable', axis=1)
         .drop_duplicates()
         .merge(df1))
print (df)
   date    id    log  name  col1 col2
0     1  uu1q  (2,4)   xyz  1123  qqq
1     1  uu1q  (3,5)   aas  2132  wew
2     5  u25a  (4,7)   enj   666  ttt
3     1  uu1q  (7,6)  wqas  2567  uuo

Use DataFrame.melt for column log from log1, log2... columns and for filtering inner join in DataFrame.merge:

df = (df2.melt(['date','id'], value_name='log')
         .drop('variable', axis=1)
         .drop_duplicates()
         .merge(df1))
print (df)
   date    id    log  name  col1 col2
0     1  uu1q  (2,4)   xyz  1123  qqq
1     1  uu1q  (3,5)   aas  2132  wew
2     5  u25a  (4,7)   enj   666  ttt
3     1  uu1q  (7,6)  wqas  2567  uuo

回复收藏 0 原文

~没有更多了~