当前位置：文江博客话题详情

合并数据框并仅提取其他数据框中不存在的数据框的行

发布于 2025-02-09 06:58:01 字数 327 浏览 1 评论 0原文

我正在尝试合并两个数据范围并创建一个新的数据框架，该框架仅包含第一个数据框架中不存在的行中的行。例如：

我作为输入的dataFrames：

“在此处输入image

我想具有的data frame：

< img src =“ https://i.sstatic.net/mfxxm.png” alt =“在此处输入图像说明”>

您知道是否有办法做到这一点？如果您能帮助我，我将不仅仅是感谢！谢谢，埃莱尼

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寻梦旅人 2025-02-16 06:58:01

创建一些数据，我们有两个数据范围：

import pandas as pd
import numpy as np

rng = np.random.default_rng(seed=5)
df1 = pd.DataFrame(data=rng.integers(0, 5, size=(5, 2)))
df2 = pd.DataFrame(data=rng.integers(0, 5, size=(5, 2)))

我们可以使用pandas.merge结合相等的行。而且我们可以使用其indosator = true功能来标记仅从左侧（以及适用时右）的行。由于我们只需要那些独特的左侧的人，因此我们可以使用How =“ left”合并才能提高效率。

dfm = pd.merge(df1, df2, on=list(df1.columns), how="left", indicator=True)

# dfm

    a   b   _merge
0   3   4   left_only
1   0   4   both
2   2   2   both
3   3   1   left_only
4   4   0   left_only

太好了，所以最终结果是使用合并
但是，仅保留具有left_only的指示器的人：

(dfm.loc[dfm._merge == 'left_only']
    .drop(columns=['_merge']))

如果，则需要通过列的子集进行重复地进行重复。在这种情况下，我会这样进行合并，重复该子集，以免从左侧和右侧获得重复版本的其他列。

pd.merge（df1，df2 [subset]，on = subset，how =“ left”，indistor = true）>

Creating some data, we have two dataframes:

import pandas as pd
import numpy as np

rng = np.random.default_rng(seed=5)
df1 = pd.DataFrame(data=rng.integers(0, 5, size=(5, 2)))
df2 = pd.DataFrame(data=rng.integers(0, 5, size=(5, 2)))

We can use pandas.merge to combine equal rows. And we can use its indicator=True feature to mark those rows that are only from the left (and right, when applicable). Since we only need those that are unique to left, we can merge using how="left" to be more efficient.

dfm = pd.merge(df1, df2, on=list(df1.columns), how="left", indicator=True)

# dfm

    a   b   _merge
0   3   4   left_only
1   0   4   both
2   2   2   both
3   3   1   left_only
4   4   0   left_only

Great, so then the final result is using the merge
but only keeping those that have an indicator of left_only:

(dfm.loc[dfm._merge == 'left_only']
    .drop(columns=['_merge']))

If you'd want to deduplicate by a subset of the columns, that should be possible. In that case I would do the merge it like this, repeating the subset so that we don't get other columns in duplicate versions from the left and right side.

pd.merge(df1, df2[subset], on=subset, how="left", indicator=True)

回复收藏 0 原文

~没有更多了~