在熊猫中使用read_csv时丢失精度

发布于 2025-02-03 06:39:15 字数 608 浏览 1 评论 0 原文

我在文本文件中具有以下格式的文件，我试图将其读取为pandas dataframe。

895|2015-4-23|19|10000|LA|0.4677978806|0.4773469340|0.4089938425|0.8224291972|0.8652525793|0.6829942860|0.5139162227|

如您所见，在输入文件中的浮点之后有 10 整数。

df = pd.read_csv('mockup.txt',header=None,delimiter='|')

当我尝试将其读取到数据框中时，我没有得到最后4个整数，

df[5].head()

0    0.467798
1    0.258165
2    0.860384
3    0.803388
4    0.249820
Name: 5, dtype: float64

如何获得输入文件中存在的完整精度？我有一些需要执行的矩阵操作，因此我无法将其施加为字符串。

我发现我必须做一些关于 dtype 的事情，但我不确定应该在哪里使用它。

原文

I have files of the below format in a text file which I am trying to read into a pandas dataframe.

895|2015-4-23|19|10000|LA|0.4677978806|0.4773469340|0.4089938425|0.8224291972|0.8652525793|0.6829942860|0.5139162227|

As you can see there are 10 integers after the floating point in the input file.

df = pd.read_csv('mockup.txt',header=None,delimiter='|')

When I try to read it into dataframe, I am not getting the last 4 integers

df[5].head()

0    0.467798
1    0.258165
2    0.860384
3    0.803388
4    0.249820
Name: 5, dtype: float64

How can I get the complete precision as present in the input file? I have some matrix operations that needs to be performed so i cannot cast it as string.

I figured out that I have to do something about dtype but I am not sure where I should use it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

苏璃陌 2025-02-10 06:39:15

它仅是显示问题，请参见 docs ：

#temporaly set display precision
with pd.option_context('display.precision', 10):
    print df

     0          1   2      3   4             5            6             7   \
0  895  2015-4-23  19  10000  LA  0.4677978806  0.477346934  0.4089938425   

             8             9            10            11  12  
0  0.8224291972  0.8652525793  0.682994286  0.5139162227 NaN

edit ：（谢谢

pandas使用专用的十进制到二进制转换器，为速度而牺牲完美的精度。传递 float_precision ='round_trip' read_csv修复了此问题。请参阅

It is only display problem, see docs:

#temporaly set display precision
with pd.option_context('display.precision', 10):
    print df

     0          1   2      3   4             5            6             7   \
0  895  2015-4-23  19  10000  LA  0.4677978806  0.477346934  0.4089938425   

             8             9            10            11  12  
0  0.8224291972  0.8652525793  0.682994286  0.5139162227 NaN

EDIT: (Thank you Mark Dickinson):

Pandas uses a dedicated decimal-to-binary converter that sacrifices perfect accuracy for the sake of speed. Passing float_precision='round_trip' to read_csv fixes this. See the documentation for more.