合并数据框并仅提取其他数据框中不存在的数据框的行

发布于 2025-02-09 06:58:01 字数 327 浏览 1 评论 0原文

我正在尝试合并两个数据范围并创建一个新的数据框架,该框架仅包含第一个数据框架中不存在的行中的行。例如:

我作为输入的dataFrames:

“在此处输入image

我想具有的data frame:

< img src =“ https://i.sstatic.net/mfxxm.png” alt =“在此处输入图像说明”>

您知道是否有办法做到这一点?如果您能帮助我,我将不仅仅是感谢!谢谢,埃莱尼

I am trying to merge two dataframes and create a new dataframe containing only the rows from the first dataframe that does not exist in the second one. For example:

The dataframes that I have as input:

enter image description here

The dataframe that I want to have as output:

enter image description here

Do you know if there is a way to do that? If you could help me, I would be more than thankful!! Thanks, Eleni

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

寻梦旅人 2025-02-16 06:58:01

创建一些数据,我们有两个数据范围:

import pandas as pd
import numpy as np

rng = np.random.default_rng(seed=5)
df1 = pd.DataFrame(data=rng.integers(0, 5, size=(5, 2)))
df2 = pd.DataFrame(data=rng.integers(0, 5, size=(5, 2)))
# df1
   a  b
0  3  4
1  0  4
2  2  2
3  3  1
4  4  0

# df2
   a  b
0  1  1
1  2  2
2  0  0
3  0  0
4  0  4

我们可以使用pandas.merge结合相等的行。而且我们可以使用其indosator = true功能来标记仅从左侧(以及适用时右)的行。由于我们只需要那些独特的左侧的人,因此我们可以使用How =“ left”合并才能提高效率。

dfm = pd.merge(df1, df2, on=list(df1.columns), how="left", indicator=True)
# dfm

    a   b   _merge
0   3   4   left_only
1   0   4   both
2   2   2   both
3   3   1   left_only
4   4   0   left_only

太好了,所以最终结果是使用合并
但是,仅保留具有left_only的指示器的人:

(dfm.loc[dfm._merge == 'left_only']
    .drop(columns=['_merge']))
    a   b
0   3   4
3   3   1
4   4   0

如果,则需要通过列的子集进行重复地进行重复。在这种情况下,我会这样进行合并,重复该子集,以免从左侧和右侧获得重复版本的其他列。

pd.merge(df1,df2 [subset],on = subset,how =“ left”,indistor = true)>

Creating some data, we have two dataframes:

import pandas as pd
import numpy as np

rng = np.random.default_rng(seed=5)
df1 = pd.DataFrame(data=rng.integers(0, 5, size=(5, 2)))
df2 = pd.DataFrame(data=rng.integers(0, 5, size=(5, 2)))
# df1
   a  b
0  3  4
1  0  4
2  2  2
3  3  1
4  4  0

# df2
   a  b
0  1  1
1  2  2
2  0  0
3  0  0
4  0  4

We can use pandas.merge to combine equal rows. And we can use its indicator=True feature to mark those rows that are only from the left (and right, when applicable). Since we only need those that are unique to left, we can merge using how="left" to be more efficient.

dfm = pd.merge(df1, df2, on=list(df1.columns), how="left", indicator=True)
# dfm

    a   b   _merge
0   3   4   left_only
1   0   4   both
2   2   2   both
3   3   1   left_only
4   4   0   left_only

Great, so then the final result is using the merge
but only keeping those that have an indicator of left_only:

(dfm.loc[dfm._merge == 'left_only']
    .drop(columns=['_merge']))
    a   b
0   3   4
3   3   1
4   4   0

If you'd want to deduplicate by a subset of the columns, that should be possible. In that case I would do the merge it like this, repeating the subset so that we don't get other columns in duplicate versions from the left and right side.

pd.merge(df1, df2[subset], on=subset, how="left", indicator=True)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文