当前位置：文江博客话题详情

pandas to_excel（）输出浮点不正确

发布于 2025-01-25 16:21:44 字数 900 浏览 4 评论 0 原文

我有一些奇怪的情况，试图在to_excel pandas函数中获得输出。

我尝试读取一个带有“ 21631706.9893399”数字的XLSX Excel文件，但是当尝试在新的XLSX Excel文件中写入时，我会给我输出“ 21631706.98934”。我用openpyxl和xlsxwriter进行了测试，但是我得到了相同的结果，但是，如果我试图用XLWT引擎编写XLS Excel，它给了我正确的答案，是否有一种方法可以使用这些引擎读取和编写相同的浮点数？我怀疑XLSX文件中有一些浮点限制，但是在XLSX输入中，数字很好。

我的代码：

import pandas as pd

df = pd.read_excel(r'input\sample.xlsx')

df.to_excel(r'output\excel xlsx - xlsxwriter.xlsx', engine='xlsxwriter', index=False)
df.to_excel(r'output\excel xlsx - openpyxl.xlsx', engine='openpyxl', index=False)
df.to_excel(r'output\excel xls - xlwt.xls', engine='xlwt', index=False)

带有样本的github存储库依赖项

依赖项：

python = "3.7.4"
pandas = "1.3.5"
XlsxWriter = "3.0.3"
openpyxl = "3.0.9"
xlwt = "1.3.0"

原文

I have some a weird situation trying to get the output in to_excel pandas function.

I tried to read a xlsx excel file with the number "21631706.9893399", but when a try to write in a new xlsx excel file gives me the output "21631706.98934". I tested with openpyxl and xlsxwriter but I got the same results, however if I tried to write a xls excel with xlwt engine it gives me the right answer, is there a way to read and write the same float number with these engines? I suspect there some float limitations in xlsx file, but in the xlsx input the number is fine.

My code:

import pandas as pd

df = pd.read_excel(r'input\sample.xlsx')

df.to_excel(r'output\excel xlsx - xlsxwriter.xlsx', engine='xlsxwriter', index=False)
df.to_excel(r'output\excel xlsx - openpyxl.xlsx', engine='openpyxl', index=False)
df.to_excel(r'output\excel xls - xlwt.xls', engine='xlwt', index=False)

Github repository with the samples

Dependencies:

python = "3.7.4"
pandas = "1.3.5"
XlsxWriter = "3.0.3"
openpyxl = "3.0.9"
xlwt = "1.3.0"

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ぺ禁宫浮华殁 2025-02-01 16:21:44

根据您的要求设置浮子的格式

format = workbook.add_format({'num_format':'0.0000000'}]

Set the format of the float as per your requirement

format = workbook.add_format({'num_format':'0.0000000'}]

For more details:
https://xlsxwriter.readthedocs.io/example_pandas_column_formats.html

回复收藏 0 原文

裸钻 2025-02-01 16:21:44

您看到的问题只是浮点数数字的表现的症状，或更具体地说是行为。

Excel和Python（没有更高的精度库）都使用IEEE754“双”浮点数，其一般精度为15位数字。测试案例中的数字具有15个以上的数字（不计数十进制位置），因此在读取或书面时将其四舍五入为显示或存储精度。

例如，存储在

$ unzip input/sample.xlsx -d input_file
...

$ xmllint --format input_file/xl/worksheets/sheet1.xml | grep -C 2 216
    <row r="2" spans="1:1" x14ac:dyDescent="0.25">
      <c r="A2">
        <v>21631706.989339948</v>
      </c>
    </row>

数字17位数字，因此不能在IEE754浮子中丢失精度而不会表示它。 Excel读取并显示为编号21631706.9893399（如上所述）。

将此数字写回文本表示，还会遭受15多位数字中的精确度损失，这就是发生的：

$ unzip output/excel\ xlsx\ -\ xlsxwriter.xlsx -d xlsxwriter_output
...

$ xmllint --format xlsxwriter_output/xl/worksheets/sheet1.xml | grep -C 2 216
    <row r="2" spans="1:1">
      <c r="A2">
        <v>21631706.98933995</v>
      </c>
    </row>

XLS文件的行为不同，因为它是二进制格式，而Python为IEEEEEEE754 double的python拥有的64位数以/到文件格式的方式读取/写入/写入。这意味着它看起来更加一致，但并不意味着它更精确，因为双重的基础表示是相同的。

因此，总而言之，此行为是处理超出IEE754双重精度范围的浮点数的函数，尤其是从/到文本文件读取和编写它们时。

The issue that you are seeing is just a symptom of how floating point numbers behave, or more specifically how IEEE754 floating point numbers behave.

Excel and Python (without higher precision libraries) both use IEEE754 "double" floating point numbers which have a general precision of 15 digits. The number in your test case has more than 15 digits (not counting the decimal place) and as a result it gets rounded to a display or storage precision when it is read or written.

For example the number stored in the input file is actually 21631706.989339948:

$ unzip input/sample.xlsx -d input_file
...

$ xmllint --format input_file/xl/worksheets/sheet1.xml | grep -C 2 216
    <row r="2" spans="1:1" x14ac:dyDescent="0.25">
      <c r="A2">
        <v>21631706.989339948</v>
      </c>
    </row>

This number has 17 digits so it cannot be represented without loss of precision in a IEE754 float. Excel reads and displays this as number 21631706.9893399 (as you say above).

Writing this number back out as a text representation is also subject to a loss of precision in the 15+ digits which is what happens:

$ unzip output/excel\ xlsx\ -\ xlsxwriter.xlsx -d xlsxwriter_output
...

$ xmllint --format xlsxwriter_output/xl/worksheets/sheet1.xml | grep -C 2 216
    <row r="2" spans="1:1">
      <c r="A2">
        <v>21631706.98933995</v>
      </c>
    </row>

The xls file behaves differently because it is a binary format and the 64bits that Python has in memory for the IEEE754 double is read/written in same way from/to the file format. This means that it appears more consistent but it doesn't mean that it is more precise since the underlying representation of the double is the same.

So in summary, this behaviour is a function of handling floating point numbers beyond the range of a IEE754 double's precision, and in particular when reading and writing them from/to text files.

回复收藏 0 原文

~没有更多了~