当前位置：文江博客话题详情

两个 DataFrame 之间的不匹配行

发布于 01-11 13:24 字数 237 浏览 5 评论 0原文

我是 Python 新手，想寻求您的帮助。我想找出 2 个数据帧（即 df1 和 df2）之间有数千行的不匹配行。它们都包含相同数量且具有相同名称的列。

df2 有 10 个比 df1 少的条目，我正在尝试找出它们是什么。我已尝试 pd.concat([df1,df2]).drop_duplicates(keep=False) 但它返回零结果。

可能是什么原因？任何帮助/建议将不胜感激。多谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我的奇迹2025-01-18 13:24:45

以下代码将删除 df1 中存在于 df2 中的行

df1[~df1.isin(df2)]

The following code will remove rows in df1 that are present in df2

df1[~df1.isin(df2)]

回复收藏 0 原文

攒眉千度2025-01-18 13:24:45

concat 组合两个框架。您正在尝试找出两个框架之间的差异。这可以通过比较来完成。正如文档示例所示，给定这两个框架：

df = pd.DataFrame(
    {
        "col1": ["a", "a", "b", "b", "a"],
        "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
        "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
    },
    columns=["col1", "col2", "col3"],
)

df2 = df.copy()
df2.loc[0, 'col1'] = 'c'
df2.loc[2, 'col3'] = 4.0

您可以使用 compare 找到不同的行：

df.compare(df2)
  col1       col3
  self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

在这种情况下，compare 为每一行有差异的行返回一行，并且仅实际上不同的列。

compare 也可以返回相等的值或所有原始值：

df.compare(df2, keep_equal=True)
  col1       col3
  self other self other
0    a     c  1.0   1.0
2    b     b  3.0   4.0

或

df.compare(df2, keep_shape=True, keep_equal=True)
  col1       col2       col3
  self other self other self other
0    a     c  1.0   1.0  1.0   1.0
1    a     a  2.0   2.0  2.0   2.0
2    b     b  3.0   3.0  3.0   4.0
3    b     b  NaN   NaN  4.0   4.0
4    a     a  5.0   5.0  5.0   5.0

concat combines two frames. You're trying to find the difference between two frames. This can be done with compare. As the doc example shows, given these two frames:

df = pd.DataFrame(
    {
        "col1": ["a", "a", "b", "b", "a"],
        "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
        "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
    },
    columns=["col1", "col2", "col3"],
)

df2 = df.copy()
df2.loc[0, 'col1'] = 'c'
df2.loc[2, 'col3'] = 4.0

You can find the different rows with compare :

df.compare(df2)
  col1       col3
  self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

In this case compare returns one row for every row that has a difference and only the columns that are actually different.

compare can return the equal values or all original values as well:

df.compare(df2, keep_equal=True)
  col1       col3
  self other self other
0    a     c  1.0   1.0
2    b     b  3.0   4.0

df.compare(df2, keep_shape=True, keep_equal=True)
  col1       col2       col3
  self other self other self other
0    a     c  1.0   1.0  1.0   1.0
1    a     a  2.0   2.0  2.0   2.0
2    b     b  3.0   3.0  3.0   4.0
3    b     b  NaN   NaN  4.0   4.0
4    a     a  5.0   5.0  5.0   5.0

回复收藏 0 原文

娇柔作态2025-01-18 13:24:45

仅仅因为您的条目少了 10 个，并不意味着您会找到重复项。您可能在第一个数据框中已经有重复项。

演示：

# Length: 7
df1 = pd.DataFrame({'col1': list('AAABCDE'),
                    'col2': list('FFFGHIJ'),
                    'col3': list('1112345')})

# Length: 5
df2 = pd.DataFrame({'col1': list('ABCDE'),
                    'col2': list('FGHIJ'),
                    'col3': list('12345')})

您的代码：

>>> pd.concat([df1,df2]).drop_duplicates(keep=False)
Empty DataFrame
Columns: [col1, col2, col3]
Index: []

尝试：

>>> len(df1.drop_duplicates())
5

>>> len(df2.drop_duplicates())
5

Just because you have 10 less entries doesn't mean you're going to find duplicates. You probably already have duplicates inside the first dataframe.

Demo:

# Length: 7
df1 = pd.DataFrame({'col1': list('AAABCDE'),
                    'col2': list('FFFGHIJ'),
                    'col3': list('1112345')})

# Length: 5
df2 = pd.DataFrame({'col1': list('ABCDE'),
                    'col2': list('FGHIJ'),
                    'col3': list('12345')})

Your code:

>>> pd.concat([df1,df2]).drop_duplicates(keep=False)
Empty DataFrame
Columns: [col1, col2, col3]
Index: []

Try:

>>> len(df1.drop_duplicates())
5

>>> len(df2.drop_duplicates())
5

回复收藏 0 原文

旧时模样2025-01-18 13:24:45

假设 df1 和 df2 都是 Pandas Dataframe，以下代码对于匹配的行返回 True，对于另一行返回 false：

print((df1 == df2).any(1))

如果需要检查所有行中的每一列，请尝试以下操作：

print((df1 == df2).stack())

Assuming both df1 and df2 are Pandas Dataframe, the following code returns True for matching rows and false for the other:

print((df1 == df2).any(1))

If needed to check each and every columns in all the rows, try this:

print((df1 == df2).stack())

回复收藏 0 原文

~没有更多了~

关于作者

转身以后

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

两个 DataFrame 之间的不匹配行

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

身边

qq_oxT0yE

卷着的草席

￡冰雨忧蓝°

我还不会笑

Unbroken

友情链接

两个 DataFrame 之间的不匹配行

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

身边

qq_oxT0yE

卷着的草席

￡冰雨忧蓝°

我还不会笑

Unbroken

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。