将 pandas DataFrame 保存并加载到 csv 后出现 ValueError

发布于 2025-01-10 08:37:10 字数 2250 浏览 0 评论 0 原文

我试图根据所有列的值查找 DataFrame 中是否存在行。我相信我找到了解决方案,但在将 DataFrame 保存到 .csv 文件或从 .csv 文件加载数据帧后遇到问题。

在下面的示例中,我迭代 DataFrame 的每一行,并找到与每一行对应的索引——即所有列都与正在查询的行相同的行。

注意:在我的实际代码中,我迭代较小的 DataFrame 并在较大的 DataFrame 中搜索行。但这两种情况都会出现这个问题。

import pandas  as pd

df = pd.DataFrame([[1, 2], [3, 4]])         # Create data frame
df.to_csv(my_filename, index=False)         # Save to csv
df1 = pd.read_csv(my_filename)              # Load from csv

# Find original data in loaded data
for row_idx, this_row in df.iterrows():
    print(np.where((df  == this_row).all(axis=1)))    # This returns the correct index

for row_idx, this_row in df.iterrows():
    print(np.where((df1 == this_row).all(axis=1)))    # This returns an empty index, and a FutureWarning

输出为:

(array([0]),)
(array([1]),)
(array([], dtype=int64),)
(array([], dtype=int64),)
tmp.py:25: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version.  Do `left, right = left.align(right, axis=1, copy=False)` before e.g. `left == right`

经过一些调试,我发现从 csv 加载的 DataFrame 与原始 DataFrame 不同:

# The DataFrames look identical, but comparing gives me a ValueError:
df
df1
df == df1

输出为:

   0  1
0  1  2
1  3  4

   0  1
0  1  2
1  3  4

Traceback (most recent call last):

  File "tmp.py", line 30, in <module>
    df == df1

  File "python3.9/site-packages/pandas/core/ops/common.py", line 69, in new_method
    return method(self, other)

  File "python3.9/site-packages/pandas/core/arraylike.py", line 32, in __eq__
    return self._cmp_method(other, operator.eq)

  File "python3.9/site-packages/pandas/core/frame.py", line 6851, in _cmp_method
    self, other = ops.align_method_FRAME(self, other, axis, flex=False, level=None)

  File "python3.9/site-packages/pandas/core/ops/__init__.py", line 288, in align_method_FRAME
    raise ValueError(

ValueError: Can only compare identically-labeled DataFrame objects
  • 注意:这似乎与 类似的问题,但提出的解决方案,即指定索引标签,并没有解决我的问题。

提前致谢。

I am trying to find whether a row exists in a DataFrame based on the values of all columns. I believe I found a solution, but I'm having problems after saving and loading the DataFrame into/from a .csv file.

In the following example, I iterate over each row of the DataFrame, and find the index corresponding to each row -- i.e. the row where all columns are identical to the row being queried).

NB: In my real code, I iterate over a smaller DataFrame and search for rows in a larger DataFrame. But the issue happens in both cases.

import pandas  as pd

df = pd.DataFrame([[1, 2], [3, 4]])         # Create data frame
df.to_csv(my_filename, index=False)         # Save to csv
df1 = pd.read_csv(my_filename)              # Load from csv

# Find original data in loaded data
for row_idx, this_row in df.iterrows():
    print(np.where((df  == this_row).all(axis=1)))    # This returns the correct index

for row_idx, this_row in df.iterrows():
    print(np.where((df1 == this_row).all(axis=1)))    # This returns an empty index, and a FutureWarning

The output is:

(array([0]),)
(array([1]),)
(array([], dtype=int64),)
(array([], dtype=int64),)
tmp.py:25: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version.  Do `left, right = left.align(right, axis=1, copy=False)` before e.g. `left == right`

After some debugging, I found that the DataFrame loaded from csv is not identical to the original DataFrame:

# The DataFrames look identical, but comparing gives me a ValueError:
df
df1
df == df1

The output is:

   0  1
0  1  2
1  3  4

   0  1
0  1  2
1  3  4

Traceback (most recent call last):

  File "tmp.py", line 30, in <module>
    df == df1

  File "python3.9/site-packages/pandas/core/ops/common.py", line 69, in new_method
    return method(self, other)

  File "python3.9/site-packages/pandas/core/arraylike.py", line 32, in __eq__
    return self._cmp_method(other, operator.eq)

  File "python3.9/site-packages/pandas/core/frame.py", line 6851, in _cmp_method
    self, other = ops.align_method_FRAME(self, other, axis, flex=False, level=None)

  File "python3.9/site-packages/pandas/core/ops/__init__.py", line 288, in align_method_FRAME
    raise ValueError(

ValueError: Can only compare identically-labeled DataFrame objects
  • Note: This appears to be related to a similar question, but the proposed solution, namely specifying the index labels, did not solve my problem.

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

窝囊感情。 2025-01-17 08:37:10

如果您正在迭代数据框,我建议您将 df 转换为字典。

df_dict = df.to_dict('records')

它比这个伟大的 文章详细信息。

现在您可以通过 df_dict 进行枚举并将其与您想要的数据进行匹配。

    target_values = {'col1': 'foo', 'col2': 'bar', ...}
    for i, row in enumerate(df_dict):
          if row == target_values:
                match_index = i

也许一个好主意是从仅匹配一列开始,如果匹配则检查其他所有内容是否也相同。

If you are iterating through a data frame I would recommend you to transform your df into a dictionary.

df_dict = df.to_dict('records')

It is much faster as this great article details.

Now you can enumerate through df_dict and match it to your desired data.

    target_values = {'col1': 'foo', 'col2': 'bar', ...}
    for i, row in enumerate(df_dict):
          if row == target_values:
                match_index = i

Maybe also a good idea would be to start by matching only one column and if it matches check if everything else is identical too.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文