最好使用元组或 numpy 数组来存储坐标
我正在将 C++ 科学应用程序移植到 python,由于我是 python 新手,我想到了一些问题:
1) 我正在定义一个包含坐标 (x,y) 的类。这些值将被多次访问,但只有在类实例化之后才会被读取。在内存和访问时间方面,使用元组还是 numpy 数组更好?
2) 在某些情况下,这些坐标将用于构建复数,在复函数上求值,并且将使用该函数的实部。假设没有办法分离这个函数的实数部分和复数部分,并且最终必须使用实数部分,也许直接使用复数来存储(x,y)更好? python 中从复杂到真实的转换开销有多严重? C++ 中的代码做了很多这样的转换,这会导致代码速度大大减慢。
3) 此外,还必须执行一些坐标转换,对于坐标,将分别访问 x 和 y 值,完成转换并返回结果。坐标变换是在复平面中定义的,因此直接使用分量 x 和 y 是否比依赖复变量更快?
谢谢
I'm porting an C++ scientific application to python, and as I'm new to python, some problems come to my mind:
1) I'm defining a class that will contain the coordinates (x,y). These values will be accessed several times, but they only will be read after the class instantiation. Is it better to use an tuple or an numpy array, both in memory and access time wise?
2) In some cases, these coordinates will be used to build a complex number, evaluated on a complex function, and the real part of this function will be used. Assuming that there is no way to separate real and complex parts of this function, and the real part will have to be used on the end, maybe is better to use directly complex numbers to store (x,y)? How bad is the overhead with the transformation from complex to real in python? The code in c++ does a lot of these transformations, and this is a big slowdown in that code.
3) Also some coordinates transformations will have to be performed, and for the coordinates the x and y values will be accessed in separate, the transformation be done, and the result returned. The coordinate transformations are defined in the complex plane, so is still faster to use the components x and y directly than relying on the complex variables?
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
就内存消耗而言,numpy 数组比 Python 元组更紧凑。
numpy 数组使用单个连续的内存块。 numpy 数组的所有元素都必须是声明的类型(例如 32 位或 64 位浮点数)。Python 元组不一定使用连续的内存块,并且元组的元素可以是任意 Python 对象,这通常比 numpy 数字类型消耗更多的内存。
所以这个问题对于 numpy 来说是一个轻而易举的胜利(假设数组的元素可以存储为 numpy 数字类型)。
在速度问题上,我认为选择归结为一个问题:“你能向量化你的代码吗?”
也就是说,您能否将计算表达为对整个数组按元素进行的操作。
如果代码可以向量化,那么 numpy 很可能会比 Python 元组更快。 (我能想象的唯一情况是,如果你有许多非常小的元组。在这种情况下,形成 numpy 数组的开销和导入 numpy 的一次性成本可能会掩盖矢量化的好处。)
无法矢量化的代码示例是,如果您的计算涉及查看数组
z
中的第一个复数,并进行计算以生成整数索引idx,然后检索
z[idx]
,对该数字进行计算,生成下一个索引idx2
,然后检索z[idx2]
> 等。这种类型的计算可能不可矢量化。在这种情况下,您不妨使用 Python 元组,因为您将无法利用 numpy 的优势。我不会担心访问复数的实部/虚部的速度。我的猜测是矢量化问题很可能决定哪种方法更快。 (不过,顺便说一句,numpy 可以将复数数组转换为其实部,只需跨过复数数组,跳过所有其他浮点数,并将结果视为浮点数。此外,语法非常简单: >z 是一个复杂的 numpy 数组,然后
z.real
是浮点 numpy 数组的实部,这应该比使用属性列表理解的纯 Python 方法快得多。查找:[z.real for z in zlist]
。)出于好奇,您将 C++ 代码移植到 Python 的原因是什么?
In terms of memory consumption, numpy arrays are more compact than Python tuples.
A numpy array uses a single contiguous block of memory. All elements of the numpy array must be of a declared type (e.g. 32-bit or 64-bit float.) A Python tuple does not necessarily use a contiguous block of memory, and the elements of the tuple can be arbitrary Python objects, which generally consume more memory than numpy numeric types.
So this issue is a hands-down win for numpy, (assuming the elements of the array can be stored as a numpy numeric type).
On the issue of speed, I think the choice boils down to the question, "Can you vectorize your code?"
That is, can you express your calculations as operations done on entire arrays element-wise.
If the code can be vectorized, then numpy will most likely be faster than Python tuples. (The only case I could imagine where it might not be, is if you had many very small tuples. In this case the overhead of forming the numpy arrays and one-time cost of importing numpy might drown-out the benefit of vectorization.)
An example of code that could not be vectorized would be if your calculation involved looking at, say, the first complex number in an array
z
, doing a calculation which produces an integer indexidx
, then retrievingz[idx]
, doing a calculation on that number, which produces the next indexidx2
, then retrievingz[idx2]
, etc. This type of calculation might not be vectorizable. In this case, you might as well use Python tuples, since you won't be able to leverage numpy's strength.I wouldn't worry about the speed of accessing the real/imaginary parts of a complex number. My guess is the issue of vectorization will most likely determine which method is faster. (Though, by the way, numpy can transform an array of complex numbers to their real parts simply by striding over the complex array, skipping every other float, and viewing the result as floats. Moreover, the syntax is dead simple: If
z
is a complex numpy array, thenz.real
is the real parts as a float numpy array. This should be far faster than the pure Python approach of using a list comprehension of attribute lookups:[z.real for z in zlist]
.)Just out of curiosity, what is your reason for porting the C++ code to Python?
具有额外维度的 numpy 数组在内存使用上更加严格,并且至少与元组的 numpy 数组一样快!复数至少同样好甚至更好,包括你的第三个问题。顺便说一句,您可能已经注意到,虽然晚于您提出的问题得到了充足的答案,但您却处于闲置状态:毫无疑问,部分原因是在一个问题中提出三个问题会让回复者失去兴趣。为什么不每个问题只问一个问题呢?这并不是说你会因提问或其他任何事情而被收费,你知道......!-)
A
numpy
array with an extra dimension is tighter in memory use, and at least as fast!, as anumpy
array of tuples; complex numbers are at least as good or even better, including for your third question. BTW, you may have noticed that -- while questions asked later than yours were getting answers aplenty -- your was laying fallow: part of the reason is no doubt that asking three questions within a question turns responders off. Why not just ask one question per question? It's not as if you get charged for questions or anything, you know...!-)