Numpy 将数组从浮点转换为字符串
我有一个浮点数组,已将其标准化为 1(即数组中的最大数字为 1),并且我想将其用作图形的颜色索引。在使用 matplotlib 使用灰度时,这需要使用 0 到 1 之间的字符串,因此我想将浮点数组转换为字符串数组。我试图通过使用“astype('str')”来做到这一点,但这似乎创建了一些与原始值不同(甚至接近)的值。
我注意到这一点是因为 matplotlib 抱怨在数组中找到数字 8,这很奇怪,因为它被归一化为 1!
简而言之,我有一个 float64 的数组 phis,使得:
numpy.where(phis.astype('str').astype('float64') != phis)
不为空。这令人费解,因为(希望是天真地)它似乎是 numpy 中的一个错误,是否有什么我可能做错了导致这种情况?
编辑:经过调查,这似乎是由于字符串函数处理高精度浮点数的方式造成的。使用向量化的 toString 函数(来自 robbles 的答案),情况也是如此,但是如果 lambda 函数是:
lambda x: "%.2f" % x
那么绘图工作 - 越来越好奇。 (显然数组不再相等!)
I have an array of floats that I have normalised to one (i.e. the largest number in the array is 1), and I wanted to use it as colour indices for a graph. In using matplotlib to use grayscale, this requires using strings between 0 and 1, so I wanted to convert the array of floats to an array of strings. I was attempting to do this by using "astype('str')", but this appears to create some values that are not the same (or even close) to the originals.
I notice this because matplotlib complains about finding the number 8 in the array, which is odd as it was normalised to one!
In short, I have an array phis, of float64, such that:
numpy.where(phis.astype('str').astype('float64') != phis)
is non empty. This is puzzling as (hopefully naively) it appears to be a bug in numpy, is there anything that I could have done wrong to cause this?
Edit: after investigation this appears to be due to the way the string function handles high precision floats. Using a vectorized toString function (as from robbles answer), this is also the case, however if the lambda function is:
lambda x: "%.2f" % x
Then the graphing works - curiouser and curiouser. (Obviously the arrays are no longer equal however!)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您似乎对 numpy 数组在幕后如何工作感到有点困惑。数组中的每个项目的大小必须相同。
浮点数的字符串表示形式不能以这种方式工作。例如,
repr(1.3)
生成'1.3'
,但repr(1.33)
生成'1.3300000000000001'
。浮点数的精确字符串表示会生成可变长度字符串。
由于 numpy 数组由大小相同的元素组成,因此当您使用字符串数组时,numpy 要求您指定数组中字符串的长度。
如果您使用
x.astype('str')
,它总是会将内容转换为长度为1的字符串数组。例如,使用
x = np.array(1.344566)
code>,x.astype('str')
产生'1'
!您需要更明确地使用
'|Sx'
dtype 语法,其中x
是数组中每个元素的字符串长度。例如,使用 x.astype('|S10') 将数组转换为长度为 10 的字符串。
更好的是,完全避免使用字符串的 numpy 数组。这通常是一个坏主意,从你对问题的描述中我没有理由首先使用它们......
You seem a bit confused as to how numpy arrays work behind the scenes. Each item in an array must be the same size.
The string representation of a float doesn't work this way. For example,
repr(1.3)
yields'1.3'
, butrepr(1.33)
yields'1.3300000000000001'
.A accurate string representation of a floating point number produces a variable length string.
Because numpy arrays consist of elements that are all the same size, numpy requires you to specify the length of the strings within the array when you're using string arrays.
If you use
x.astype('str')
, it will always convert things to an array of strings of length 1.For example, using
x = np.array(1.344566)
,x.astype('str')
yields'1'
!You need to be more explict and use the
'|Sx'
dtype syntax, wherex
is the length of the string for each element of the array.For example, use
x.astype('|S10')
to convert the array to strings of length 10.Even better, just avoid using numpy arrays of strings altogether. It's usually a bad idea, and there's no reason I can see from your description of your problem to use them in the first place...
如果您有一个
数字
数组,并且想要一个字符串
数组,您可以这样写:如果您的数字是浮点数,则该数组将是一个与以下数字相同的数组:具有两位小数的字符串。
请注意,它也适用于 numpy 数组:
如果您有多维数组,则可以使用类似的方法:
示例:
如果您检查 您正在使用的函数的 Matplotlib 示例,您会注意到他们使用类似的方法:构建空矩阵并用构建的字符串填充它插值法。引用代码的相关部分是:
If you have an array of
numbers
and you want an array ofstrings
, you can write:If your numbers are floats, the array would be an array with the same numbers as strings with two decimals.
Notice that it also works with
numpy
arrays:A similar methodology can be used if you have a multi-dimensional array:
Example:
If you check the Matplotlib example for the function you are using, you will notice they use a similar methodology: build empty matrix and fill it with strings built with the interpolation method. The relevant part of the referenced code is:
当 我的 pandas 时,我遇到了这个问题数据帧开始出现浮点精度问题,这些问题在执行
df.round(2).astype(str)
时渗透到其字符串表示形式中。我最终选择了 np.char.mod("%.2f", phys) ,其中 使用广播在数据帧的每个元素上运行
"%.2f".__mod__(el)
,而不是在 Python 中迭代,如果你的数据帧足够大,这会产生相当大的差异。使用有限长度的字符串(就像接受的答案所暗示的那样)对我来说是行不通的,因为在我的情况下保留小数比精确的有效数字位数更重要。我会尝试
numpy .format_float_positional
,用于格式化的,是据说比 Python 使用的 stringf 等效项快得多,但那个不起作用 -在 ndarrays 上明智(或根本)和手动迭代是我想要避免的部分。没有用于格式化的 ufunc,据我所知,这可能是最有效的方法。
I ran into this problem when my pandas dataframes started having float precision issues that were bleeding into their string representations when doing
df.round(2).astype(str)
.I ended up going with
np.char.mod("%.2f", phys)
, which uses broadcasting to run"%.2f".__mod__(el)
on each element of the dataframe, instead of iterating in Python, which can make a pretty sizeable difference if your dataframes are large enough. Using limited-length string (like the accepted answer suggests) was a non-starter for me because keeping the decimals mattered more in my case than an exact number of significant digits.I would have tried
numpy.format_float_positional
, which is the one used for formatting and is supposedly much faster than the stringf-equivalent used by Python, but that one doesn't work element-wise (or at all) on ndarrays and manual iteration was the part I was looking to avoid.There's no ufunc for formatting, so as far as I can tell that's likely to be the most efficient way of doing it.
这可能比您想要的慢,但您可以这样做:
看起来它在从 float64 转换为 str 时对值进行四舍五入,但这样您就可以根据需要自定义转换。
This is probably slower than what you want, but you can do:
It looks like it rounds off the values when it converts to str from float64, but this way you can customize the conversion however you like.
如果主要问题是从浮点数转换为字符串时精度损失,一种可能的方法是将浮点数转换为
十进制
S:http://docs.python.org/library/decimal.html。在 python 2.7 及更高版本中,您可以直接将浮点数转换为十进制对象。
If the main problem is the loss of precision when converting from a float to a string, one possible way to go is to convert the floats to the
decimal
S: http://docs.python.org/library/decimal.html.In python 2.7 and higher you can directly convert a float to a
decimal
object.