当前位置：文江博客话题详情

将 pandas DataFrame 保存并加载到 csv 后出现 ValueError

发布于 2025-01-10 08:37:10 字数 2250 浏览 0 评论 0 原文

我试图根据所有列的值查找 DataFrame 中是否存在行。我相信我找到了解决方案，但在将 DataFrame 保存到 .csv 文件或从 .csv 文件加载数据帧后遇到问题。

在下面的示例中，我迭代 DataFrame 的每一行，并找到与每一行对应的索引——即所有列都与正在查询的行相同的行。

注意：在我的实际代码中，我迭代较小的 DataFrame 并在较大的 DataFrame 中搜索行。但这两种情况都会出现这个问题。

import pandas  as pd

df = pd.DataFrame([[1, 2], [3, 4]])         # Create data frame
df.to_csv(my_filename, index=False)         # Save to csv
df1 = pd.read_csv(my_filename)              # Load from csv

# Find original data in loaded data
for row_idx, this_row in df.iterrows():
    print(np.where((df  == this_row).all(axis=1)))    # This returns the correct index

for row_idx, this_row in df.iterrows():
    print(np.where((df1 == this_row).all(axis=1)))    # This returns an empty index, and a FutureWarning

输出为：

(array([0]),)
(array([1]),)
(array([], dtype=int64),)
(array([], dtype=int64),)
tmp.py:25: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version.  Do `left, right = left.align(right, axis=1, copy=False)` before e.g. `left == right`

经过一些调试，我发现从 csv 加载的 DataFrame 与原始 DataFrame 不同：

# The DataFrames look identical, but comparing gives me a ValueError:
df
df1
df == df1

输出为：

   0  1
0  1  2
1  3  4

   0  1
0  1  2
1  3  4

Traceback (most recent call last):

  File "tmp.py", line 30, in <module>
    df == df1

  File "python3.9/site-packages/pandas/core/ops/common.py", line 69, in new_method
    return method(self, other)

  File "python3.9/site-packages/pandas/core/arraylike.py", line 32, in __eq__
    return self._cmp_method(other, operator.eq)

  File "python3.9/site-packages/pandas/core/frame.py", line 6851, in _cmp_method
    self, other = ops.align_method_FRAME(self, other, axis, flex=False, level=None)

  File "python3.9/site-packages/pandas/core/ops/__init__.py", line 288, in align_method_FRAME
    raise ValueError(

ValueError: Can only compare identically-labeled DataFrame objects

注意：这似乎与类似的问题，但提出的解决方案，即指定索引标签，并没有解决我的问题。

提前致谢。

原文

I am trying to find whether a row exists in a DataFrame based on the values of all columns. I believe I found a solution, but I'm having problems after saving and loading the DataFrame into/from a .csv file.

In the following example, I iterate over each row of the DataFrame, and find the index corresponding to each row -- i.e. the row where all columns are identical to the row being queried).

NB: In my real code, I iterate over a smaller DataFrame and search for rows in a larger DataFrame. But the issue happens in both cases.

import pandas  as pd

df = pd.DataFrame([[1, 2], [3, 4]])         # Create data frame
df.to_csv(my_filename, index=False)         # Save to csv
df1 = pd.read_csv(my_filename)              # Load from csv

# Find original data in loaded data
for row_idx, this_row in df.iterrows():
    print(np.where((df  == this_row).all(axis=1)))    # This returns the correct index

for row_idx, this_row in df.iterrows():
    print(np.where((df1 == this_row).all(axis=1)))    # This returns an empty index, and a FutureWarning

The output is:

(array([0]),)
(array([1]),)
(array([], dtype=int64),)
(array([], dtype=int64),)
tmp.py:25: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version.  Do `left, right = left.align(right, axis=1, copy=False)` before e.g. `left == right`

After some debugging, I found that the DataFrame loaded from csv is not identical to the original DataFrame:

# The DataFrames look identical, but comparing gives me a ValueError:
df
df1
df == df1

The output is:

   0  1
0  1  2
1  3  4

   0  1
0  1  2
1  3  4

Traceback (most recent call last):

  File "tmp.py", line 30, in <module>
    df == df1

  File "python3.9/site-packages/pandas/core/ops/common.py", line 69, in new_method
    return method(self, other)

  File "python3.9/site-packages/pandas/core/arraylike.py", line 32, in __eq__
    return self._cmp_method(other, operator.eq)

  File "python3.9/site-packages/pandas/core/frame.py", line 6851, in _cmp_method
    self, other = ops.align_method_FRAME(self, other, axis, flex=False, level=None)

  File "python3.9/site-packages/pandas/core/ops/__init__.py", line 288, in align_method_FRAME
    raise ValueError(

ValueError: Can only compare identically-labeled DataFrame objects

Note: This appears to be related to a similar question, but the proposed solution, namely specifying the index labels, did not solve my problem.

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

窝囊感情。 2025-01-17 08:37:10

如果您正在迭代数据框，我建议您将 df 转换为字典。

df_dict = df.to_dict('records')

它比这个伟大的文章详细信息。

现在您可以通过 df_dict 进行枚举并将其与您想要的数据进行匹配。

    target_values = {'col1': 'foo', 'col2': 'bar', ...}
    for i, row in enumerate(df_dict):
          if row == target_values:
                match_index = i

也许一个好主意是从仅匹配一列开始，如果匹配则检查其他所有内容是否也相同。

If you are iterating through a data frame I would recommend you to transform your df into a dictionary.

df_dict = df.to_dict('records')

It is much faster as this great article details.

Now you can enumerate through df_dict and match it to your desired data.

    target_values = {'col1': 'foo', 'col2': 'bar', ...}
    for i, row in enumerate(df_dict):
          if row == target_values:
                match_index = i

Maybe also a good idea would be to start by matching only one column and if it matches check if everything else is identical too.

回复收藏 0 原文

~没有更多了~

关于作者

咋地

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

将 pandas DataFrame 保存并加载到 csv 后出现 ValueError

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

燃烧我的卡路李先生

qq_2gSKZM

∞梦里开花

qq_IklFPL

迷途知返

深海不蓝

友情链接

将 pandas DataFrame 保存并加载到 csv 后出现 ValueError

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

燃烧我的卡路李先生

qq_2gSKZM

∞梦里开花

qq_IklFPL

迷途知返

深海不蓝

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。